Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store multiple versions of variant annotation #830

Open
13 of 16 tasks
j-coll opened this issue Apr 24, 2018 · 0 comments
Open
13 of 16 tasks

Store multiple versions of variant annotation #830

j-coll opened this issue Apr 24, 2018 · 0 comments
Assignees
Milestone

Comments

@j-coll
Copy link
Member

j-coll commented Apr 24, 2018

When analysing data is really important to be able to review the results.
No more than 3 or 4 times a year, there are new versions of the data contained in CellBase. New Ensembl version, new population frequencies, new datasets... change the generated annotation, which may change the result of the analysis.
After determine which set of variants may be interesting for the study of a certain disease, we may want to know the exact annotation that was used to select those variants.
This is why we need to store multiple versions of the variant annotation.

Proposed new operations

  1. Create a snapshot of the current annotation
    Provide a version name for the current annotation, and create a read-only snapshot.
    • CLI command:
      variant annotation-save --project myProject --annotation-id v1
  2. Delete a certain annotation version
    • CLI command:
      variant annotation-delete --project myProject --annotation-id v1
  3. Get the variant annotation from a certain version, given a set of variants or a region.
    • REST endpoint:
      analysis/variant/annotation/query?annotationId=v1&id=1:1234:A:C
      analysis/variant/annotation/query?annotationId=CURRENT&region=1:1000-2000
    • CLI command:
      variant annotation-query --annotation-id v1 --id 1:1234:A:C
      variant annotation-query --annotation-id CURRENT --region 1:1000-2000
  4. Get the variant annotation metadata. By default, get all the annotations.
    • REST endpoint:
      analysis/variant/annotation/metadata
      analysis/variant/annotation/metadata?annotationId=v1
    • CLI command:
      variant annotation-metadata
      variant annotation-metadata --annotation-id v1

New global metadata

This will require to keep track of the existing annotation snapshots in the database. Will need to modify the ProjectMetadata to add information regarding all the studies. i.e:

  • Current annotation version
  • Existing annotation snapshots
See #832
Example:
{
  "species" : "hsapiens",
  "assembly" : "GRCh38",
  "release" : 4,
  "annotation" : {
    "current" : { 
      "id" : 3,
      "name" : "CURRENT",
      "creationDate" : null,
      "annotator" : {  // http://cellbase.clinbioinfosspa.es/cb/webservices/rest/v4/meta/about
        "name: ": "CellBase (OpenCB)",
        "gitCommit: ": "ab4e4dd83ff5b337392f33ae6bc7ba33be385807",
        "version: ": "4.5.3"
      } ,
      "sourceVersion" : [ { ... } ] // http://cellbase.clinbioinfosspa.es/cb/webservices/rest/v4/meta/hsapiens/versions?assembly=grch38
    } ,
    "saved" : [
    {
      "id" : 1,
      "name" : "v1",
      "creationDate" : "2017-11-24T11:34:16.461+0000",
      "annotator" : { 
        "name: ": "CellBase (OpenCB)",
        "gitCommit: ": "b118caadb557a77f66bbeb4e81261288b11174f7",
        "version: ": "4.5.2"
      } ,
      "sourceVersion" : [ { ... } ]
    } , {
      "id" : 2,
      "name" : "v2",
      "creationDate" : "2018-04-02T16:10:44.194+0000",
      "annotator" : {
        "name: ": "CellBase (OpenCB)",
        "gitCommit: ": "ab4e4dd83ff5b337392f33ae6bc7ba33be385807",
        "version: ": "4.5.3"
      } ,
      "sourceVersion" : [ { ... } ]
    } ]
  }
}

Storage-MongoDB

  1. Copy annotation
    Create a new copy with an aggregation pipeline, writing the output in a separated collection:

    db.variants.aggregate([ { $project: {"annotation" :1 } } , { $out : "annot_1" } ] )

    The new collection won't have any index (apart from the default "_id"), so only queries by "id" and "region" will be possible

  2. Delete annotation
    Delete a previous annotation will be an easy operation, as it should only remove one collection.

  3. Query annotation
    The query will be build against that annotation collection, filtering only by the _id

Storage-Hadoop

  1. Copy annotation
    Create a MR job that copies the column A_FULL into A_1. There is no need to register this columns in Phoenix.
    An alternative here would be to copy the column in a different column family, or a new table: <db-name>_annot.

  2. Delete annotation
    Same as when deleting files, it will be a MR deleting a whole column.

  3. Query annotation
    The query will be using the same row-key as for any other variant, and selecting only the required column.

Tasks

  • Add new methods to the interface VariantAnnotationManager
  • Modify project metadata to include VariantAnnotation metadata
  • Add new REST endpoints and CLI commands
    • limit, skip, include, exclude
    • Improve descriptions
  • Add security checks
  • Decide what to do when the annotation changes.
    • Fail if annotation metadata changes and no "--overwrite-annotation" is provided
  • Use internal auto-incremental counters to assign incremental IDs
  • Save the annotation ID of each VariantAnnotation element. Return as an AdditionalAttribute
    To do this, the "currentAnnotation" should have assigned an annotation-id.
  • PENDING DECISION When annotating, annot all variants where annotationId < currentAnnotationId, as they are old (i.e. invalid)?
  • PENDING DECISION Increment currentAnnotationId every time that annotator (CellBase) version changes, even if the annotation is not saved, to invalidate existing annotations?
  • Update documentation Document multiple variant annotation versions  #879
Storage-MongoDB
  • Extend new methods from VariantAnnotationManager
  • Allow query annotations
Storage-Hadoop
  • Extend new methods from VariantAnnotationManager
  • Allow query annotations

Questions

  • Do we want to store the custom annotation as well? And the release?
  • How should we name the versions/snapshots? Autoincremental numeric version, user provided snapshot name, of both?
  • Should we specify in each annotation in the main collection the version of the annotation, so it can be easily re-annotated if the annotation version changes?
  • Which permissions should be checked to access this annotation?
@j-coll j-coll added this to the v1.4.0 milestone Apr 24, 2018
@j-coll j-coll self-assigned this Apr 24, 2018
j-coll added a commit that referenced this issue Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant