This repository has been archived by the owner. It is now read-only.

Variant Annotation API - use cases #226

Open
sarahhunt opened this Issue Jan 14, 2015 · 2 comments

Comments

Projects
None yet
4 participants
@sarahhunt
Contributor

sarahhunt commented Jan 14, 2015

The question of use cases for the variant annotation API came up on yesterday's call so we thought it would be good to create an issue to share ideas.

As well as the approach discussed - querying for variants and their annotations in a dataset/ annotation set between 2 points on a sequence, we would also interested in using the API to allow the submission of variants for which we do not hold the dataset and returning annotations based on the submitted positions/allele changes and the reference data we hold.

It may also be useful to allow some filtering of the response, such as returning only missense variants.

@jacmarjorie

This comment has been minimized.

Show comment
Hide comment
@jacmarjorie

jacmarjorie Feb 15, 2015

The use case I am most interested in at this point would allow for the retrival of variants which fall on certain genes. For a simple example, I have a VCF file generated via somatic mutation calling. This VCF now exists as a VariantSet in the database, from which I would like to query at the gene level to satisfy the following two generic uses cases:

  1. Querying a pool of annotated variants without specifying a gene of interest. I have a VariantSet from which I would like to use to identify candidate cancer genes. It would be of use to pull down variants within this set that are annotated with all the information needed to generate a MAF formatted file from the response. This would prevent the need to annotate the variants before inputting them into a program like MutSig, for example. While I could pull the data down using a POST /variant/search, embedding the Variant Annotations into this format would be of use. Possibly, this requires some cross over with the GASearchVariantsResponse?

  2. Querying for variants associated with a specific gene. If I am interested in a mutation profile of a certain gene from a cancer type, then I can query the database for this gene (or gene sets) and retrieve some statistics back – such as a general query performed in cBioPortal or COSMIC. What is the proportion of mutation types that occur on this gene? What are the indel lengths? What are the calculated substitution/indel ratios for genes? What are mutations that are common/different among a set of samples? What is the mutation frequency of this gene, and is it significantly higher than the background mutation rate? And other general distribution and abundance statistics related to variant classification.

jacmarjorie commented Feb 15, 2015

The use case I am most interested in at this point would allow for the retrival of variants which fall on certain genes. For a simple example, I have a VCF file generated via somatic mutation calling. This VCF now exists as a VariantSet in the database, from which I would like to query at the gene level to satisfy the following two generic uses cases:

  1. Querying a pool of annotated variants without specifying a gene of interest. I have a VariantSet from which I would like to use to identify candidate cancer genes. It would be of use to pull down variants within this set that are annotated with all the information needed to generate a MAF formatted file from the response. This would prevent the need to annotate the variants before inputting them into a program like MutSig, for example. While I could pull the data down using a POST /variant/search, embedding the Variant Annotations into this format would be of use. Possibly, this requires some cross over with the GASearchVariantsResponse?

  2. Querying for variants associated with a specific gene. If I am interested in a mutation profile of a certain gene from a cancer type, then I can query the database for this gene (or gene sets) and retrieve some statistics back – such as a general query performed in cBioPortal or COSMIC. What is the proportion of mutation types that occur on this gene? What are the indel lengths? What are the calculated substitution/indel ratios for genes? What are mutations that are common/different among a set of samples? What is the mutation frequency of this gene, and is it significantly higher than the background mutation rate? And other general distribution and abundance statistics related to variant classification.

@buske

This comment has been minimized.

Show comment
Hide comment
@buske

buske Feb 18, 2015

Member

In the Matchmaker API, we are also trying to figure out the best way to include gene-level variant information in queries (e.g. specify "stopgain mutation in SRCAP", or even "heterozygous stopgain mutation in SRCAP" without specifying a specific one).

Member

buske commented Feb 18, 2015

In the Matchmaker API, we are also trying to figure out the best way to include gene-level variant information in queries (e.g. specify "stopgain mutation in SRCAP", or even "heterozygous stopgain mutation in SRCAP" without specifying a specific one).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.