-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate Annotation from Seqr loading pipeline #9
Comments
Plan with @vladsaveliev:
This means:
Next:
|
Note - the current logic mixes 3 things:
Instead this will focus on parts 1 & 2, with any references to VQSR removed |
in reply to this comment gnomAD has indeed generated various hail tables containing all the possible SNPs and annotated them with VEP. Here is the location of some of them:
The VEP 85 and 95 Tables are also available through various cloud providers' open datasets programs and can be downloaded without paying egress. More detail about how to access those can be found in this blog post and the gnomAD downloads page. In my experience performing a join with these tables to get the annotations is extremely fast and convenient. For example I use it to get the LOFTEE annotations on pLoF variants |
@tiboloic do you know if they have released the details of how this was generated? My google foo is coming up blank. |
@cassimons the best I have found is the description in the supplementary material , page 31 of the gnomAD flagship paper, but it is very succinct. I pasted it below:
|
Currently the annotation step (within Dataproc) is not the sole source of annotations
As the process begins from a MT instead of a VCF, some of the annotations on that original MT are carried over instead of being updated. These are:
The source for these annotations:
https://github.com/populationgenomics/hail-elasticsearch-pipelines/tree/main/download_and_create_reference_datasets/v02
https://github.com/populationgenomics/hail-elasticsearch-pipelines/blob/main/download_and_create_reference_datasets/v02/hail_scripts/write_combined_reference_data_ht.py#L39-L48
The annotations are available as a single Hail Table, which is used to annotate the MatrixTable as a join.
For now - make sure the Annotation stage can supply these annotations, with as little dependence on external libraries as possible
Future - obtain all annotations from a single source, so that this bespoke preparation isn't required?
The text was updated successfully, but these errors were encountered: