Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNV, STRs, somatic Var Rep Group concept needed? #39

Closed
larrybabb opened this issue Aug 1, 2018 · 6 comments
Closed

CNV, STRs, somatic Var Rep Group concept needed? #39

larrybabb opened this issue Aug 1, 2018 · 6 comments

Comments

@larrybabb
Copy link
Contributor

CNVs, microsatellites and a variety of somatic variant representations have given rise to the notion of defining a variant grouping that is a set of variant instances (not necessarily equivalent) which can be used to for annotations, assertions, interpretations, evidence collection, etc...

In our modeling to date we have intentionally been focusing on the most atomic representations, rightfully so. However, with the advent of the copy number discussion, we have introduce the notion of providing a range for the quantity of copies for copy number gain variants.

All previous examples (afaik) have focused on defining a very specific instance of a variant (i.e. allele, haplotype, genotype). We sort of got into this realm of a "set" or "group" of instances when discussion PGx haplotypes as defined by CPIC/PharmGKB, but we never really resolved the concern.

Question...
To focus on CNVs and micro-satellites for now, what does it mean to specify a range of copy numbers (i,e. from 5 to 20 -or- more than 47)?

Possible answer..
A CNV instance is a specific number of copies of a given region of a chromosome. The region of the chromosome that has a non-negative number of copies, is the instance of the sequence. So, to specify a "range" of copies is essentially saying any one of the "instances" in this range belongs in this group.

For example,
If you wanted to specify that a given interpretation is valid for any copy between 4 and 10 of region 1000 to 2000 on chromosome 1 then you are saying that any specific copy instance between 4 copies and 10 copies would be covered by that interpretation.
Interpretation 1...
Variant Group : NC_00001.10:1000..2000 (4 to 10 copies)
Pathogenicity: Uncertain Significance

Interpretation 2...
Variant Group : NC_00001.10:1000..2000 (>10 copies)
Condition: Condition X
Pathogenicity: Pathogenic

Case 1 specific finding...
Variant found: NC_000001.10:1000..20000 (6 copies)
Result: interp 1 above matches and the assertion may potentially be used to inform the patient's results.

Case 2 specific finding...
Variant found: NC_000001.10:1000..2000 (20 copies)
Result: interp 2 above matches and the assertion may potentially be used to inform the patient's results.

Hopefully, this highlights the distinction between defining "variants" that are "sets" or "groups" verses "instances" and the need to be able to do both in order to collect knowledge and associate it with actual findings.

This can also be applied to microsatellites, which are short tandem repeats that often get expressed as a range as well as in the HTT gene for Huntington's disease. see ClinVar NM_002111.6(HTT):c.52CAG(27_35).

Individual assay findings produce a specific count of the tandem repeats and then determine if the fall into the variant group defined by NM_002111.6(HTT):c.52CAG(27_35) or some other group that may have a different interpretation.

As we explore variant representations, let's determine if we need to be separating the notion of atomic, specific, instance representations from group or set representations and provide a clean separation, if so.

@larrybabb larrybabb changed the title Variant Group concept may be needed CNV, STRs, somatic Var Rep Group concept needed? Aug 1, 2018
@larrybabb
Copy link
Contributor Author

Also, bear in mind, that while this "group" concept may seem to be similar to genotype (or haplotype) it is different in that haplotype and genotype represent a "complete" set of variants that must all co-occur. This concept is more of an "OR" than and "AND" of grouped variants.

@mbaudis
Copy link
Member

mbaudis commented Aug 1, 2018

@larrybabb I think this moves into the variant annotation area, by mixing cases the need of variant type representation (do we have a proper name for that?) with variant instance representation.

Maybe we should just separate the ways variant types / equivalencies are represented from the instance == case... specific representation, into really different approaches?

So we would have:

  • representation of equivalent genomic variants (use: variant references, ClinVar, dbVar, ClinGen, VICC, BRCAx...)
  • representation of callset specific variants (use: object storage of variant calls from single samples)
  • queries against variant representation (Beacon, "Discovery search API", ...)

For instance:

  • the equivalent representation could define any deletion of involving one allele of a gene as an instance of variant equivalence; and any deletion of all alleles as another equivalence
  • a callset specific variant representation would specify a deletion as a class +/- numeric allele count (with boundaries fulfilling the condition of "gene is gone or fewer")
  • a query could bracket values which would represent anything hitting the gene (we do this in Beacon...) with a class "DEL" and/ot $lt copy number

@larrybabb
Copy link
Contributor Author

larrybabb commented Aug 2, 2018

Malachi Griffith added a summarization of distinct types of variants found in CiVIC in Var Anno repo issue 13.

@larrybabb
Copy link
Contributor Author

@mbaudis we will setup a call for this discussion as it may be too complex to fully separate all the concerns effectively in an issue thread.

But to respond to the three instances at the bottom of your comment above

I’m not sure I agree the following 2 kinds of variants as equivalent in bullet 1
A. Deletion of one allele of a gene
B. Deletion of all alleles of a gene

The relationship between these two is a subset superset relationship (I believe).
In any case the question I’m trying to answer is “How do we represent item B as a variant, when it appears to be a representation of a set of variants?”

The notion of a set or class of variants was recently spotlighted by Malachi on the Var Anno call as types of Var reps that would be needed to support the “subject” attribute of many of the somatic interp types.

I also see the similarity of this pattern in regards to using copy number ranges to define a set of cnvs which all share a common interp.

Finally I would say that I agree with your third bullet in regards to queries needing the ability to query Var class types and/or copy count quantities.

However we haven’t yet demonstrated how these kind of qualifying attributes will be bundled with variant concepts needed to build objects that can support the role of Var Anno “subjects”.

@mbaudis
Copy link
Member

mbaudis commented Aug 3, 2018

@larrybabb I tried to put too much in the sentence (bullet 1); my note was on the "any deletion of one", and the "any deletion of all" as two different types of equivalence.

In imprecise CNV reports (i.e. w/o phasing), the homozygous deletion would "self compose":

#####___________#####
#######_______#######
222221100000001122222

Without full allelic reconstruction it would not be sure how the 0 comes about; could be

#####_________#######
#######_________#####
222221100000001122222

... and so would be reported as 3 different variants (11, 0000000, 11). So this is a case where we get some meaningful outcome description (yes, there is a homozygous deletion) w/o knowing about the specific alleles.

See example of array based data here.

But such a (widespread, simplistic) model does not cover the composition of multiple variants, just reports the outcome of this composition.

The problem is that we have to accommodate both; but maybe not necessarily in all scenarios. And maybe really thinking this through could help to reduce complexity for each of those implementations.

@reece reece transferred this issue from ga4gh/variant-representation May 3, 2019
@reece reece added the CNV label May 6, 2019
@ga4gh ga4gh locked and limited conversation to collaborators May 6, 2019
@reece
Copy link
Member

reece commented May 6, 2019

Please use #46 for a consolidated discussion of CNV requirements.

@reece reece closed this as completed May 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants