New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store multiple versions of variant annotation #830
Comments
j-coll
added a commit
that referenced
this issue
Apr 24, 2018
j-coll
added a commit
that referenced
this issue
Apr 26, 2018
j-coll
added a commit
that referenced
this issue
Apr 26, 2018
j-coll
added a commit
that referenced
this issue
Apr 27, 2018
j-coll
added a commit
that referenced
this issue
Apr 30, 2018
j-coll
added a commit
that referenced
this issue
May 1, 2018
j-coll
added a commit
that referenced
this issue
May 1, 2018
j-coll
added a commit
that referenced
this issue
May 4, 2018
j-coll
added a commit
that referenced
this issue
May 8, 2018
j-coll
added a commit
that referenced
this issue
Jul 19, 2018
j-coll
added a commit
that referenced
this issue
Jul 19, 2018
j-coll
added a commit
that referenced
this issue
Jul 19, 2018
j-coll
added a commit
that referenced
this issue
Jul 19, 2018
1 task
j-coll
added a commit
that referenced
this issue
Aug 29, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When analysing data is really important to be able to review the results.
No more than 3 or 4 times a year, there are new versions of the data contained in CellBase. New Ensembl version, new population frequencies, new datasets... change the generated annotation, which may change the result of the analysis.
After determine which set of variants may be interesting for the study of a certain disease, we may want to know the exact annotation that was used to select those variants.
This is why we need to store multiple versions of the variant annotation.
Proposed new operations
Provide a version name for the current annotation, and create a read-only snapshot.
variant annotation-save --project myProject --annotation-id v1
variant annotation-delete --project myProject --annotation-id v1
analysis/variant/annotation/query?annotationId=v1&id=1:1234:A:C
analysis/variant/annotation/query?annotationId=CURRENT®ion=1:1000-2000
variant annotation-query --annotation-id v1 --id 1:1234:A:C
variant annotation-query --annotation-id CURRENT --region 1:1000-2000
analysis/variant/annotation/metadata
analysis/variant/annotation/metadata?annotationId=v1
variant annotation-metadata
variant annotation-metadata --annotation-id v1
New global metadata
This will require to keep track of the existing annotation snapshots in the database. Will need to modify the ProjectMetadata to add information regarding all the studies. i.e:
See #832
Example:
Storage-MongoDB
Copy annotation
Create a new copy with an aggregation pipeline, writing the output in a separated collection:
db.variants.aggregate([ { $project: {"annotation" :1 } } , { $out : "annot_1" } ] )
The new collection won't have any index (apart from the default "_id"), so only queries by "id" and "region" will be possible
Delete annotation
Delete a previous annotation will be an easy operation, as it should only remove one collection.
Query annotation
The query will be build against that annotation collection, filtering only by the
_id
Storage-Hadoop
Copy annotation
Create a MR job that copies the column
A_FULL
intoA_1
. There is no need to register this columns in Phoenix.An alternative here would be to copy the column in a different column family, or a new table:
<db-name>_annot
.Delete annotation
Same as when deleting files, it will be a MR deleting a whole column.
Query annotation
The query will be using the same row-key as for any other variant, and selecting only the required column.
Tasks
VariantAnnotationManager
To do this, the "currentAnnotation" should have assigned an annotation-id.
annotationId < currentAnnotationId
, as they are old (i.e. invalid)?Storage-MongoDB
VariantAnnotationManager
Storage-Hadoop
VariantAnnotationManager
Questions
The text was updated successfully, but these errors were encountered: