Replies: 1 comment
-
|
Thanks for this well-thought out post Larry. It took me a while to get around to responding to this between PRC panic, but I don;t want you to think that we're ignoring this. I agree with your characterization of categorical variation in the opening 4 paragraphs. On track with you there. In fact I think most of what you lay out here I agree with, particularly your conclusions:
Yes, 100%.
Quite possibly, yes. I agree! So digging into the meat of your proposal, if I understand you correctly, you are making two (not necessarilly related) proposals here. The first proposal is to explicitly introduce top-down subtypes of The second proposal is, in essence, to break up the allele constraint into two separate constraints, one which simply handles the sequence location, and another than handles the sequence state. Something sort of like this. This makes the constraints maximally simple, and is actually precisely what I originally suggested, as seen in The reason we didn’t go this route, if memory serves, was due to wanting to build in compatibility with VRS, where there is already a |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
The concept of a Categorical Variant originated with the acknowledgement that genomic knowledge associated to variants require the ability to associate that knowledge with one or more
contextual variants(i.e. a specific variant change at a specific location on a specific reference sequence).A typical case is the knowledge in ClinVar where pathogenic classifications are associated to a specific change on a genomic variant. In actuality, the rare disease community that uses ClinVar considers all the
contextual variantsthat originate from a singledefining allele context. In ClinVar, the policy is to define all variants on the genomic build 38 (GRCh38, hg20) contextual location and change whenever possible. From that defining genomic build 38 context, ClinVar lifts over to build 37 (and 36 sometimes) to infer that their contextual variant representations are also associated to the variant classification. Additionally, ClinVar derives all the transcript forms from RefSeq that align with thedefining alleleto provide additional contextual members of their representation of a ClinVar variant.This typical case is also used in other resources to provide what we refer to as a
Canonical Allele, a pattern or policy for defining the membership of possible contextual variants that satisfy the constraints of that policy.As Cat-VRS has started to dig into the concept of a
Categorical Variantthe idea of creating filters or constraints has become the primary principle behind how the Cat-VRS spec will provide a standard for all types ofCategorical Variantneeded by the community. Each constraint is analogous to a function that can filter down the membership within a Categorical Variant such that we can provide a set of ingredients to allow implementers to create a recipe that results in a Categorical Variant design that satisfies their needs. While there may be some standard or common patterns whereby recipes can be prebuilt into out of the box Categorical Variants, the option to custom build is a requirement due to the dynamic nature of how the community needs to represent Categorical Variants for the purpose of associating discoveries and knowledge to share.So how do we make this flexible design programmatically pragmatic and adoptable?
I suggest that we define how
Constraintsor functions to filter members should be applied in a system. From my experience in a system that requires that I constrain or filter a set of data I must first have a basis or set of data on which to apply those constraints. Therefore, I'd like to clarify as to what is the starting set? on which constraint are applied within the Categorical Variant model.I think the vast majority of Categorical Variants are going to have a known sequence location as a starting point. I will also acknowledge that there may be different kinds of categorical variants on which sequence locations may not make sense. I would start by saying that if a sequence location can be used to set the basis or foundation of membership for a categorical variant then that categorical variant would only have members that are
Molecular Variantsin it, sinceSequenceLocationis tightly coupled with the concept ofMolecular Variation. I would need domain experts to assist in defining other kinds of Categorical Variants, but I can imagine there will be others.Taking this notion of a basis that is foundational to all Categorical Variants and applying it to our current
Canonical Alleletype of Categorical Variant I would re-design it along the following lines:SequenceLocationfrom theDefiningAlleleConstraintand make it the basis for a subtype ofCategoricalVariant, let's call itCategoricalLocationVariant(or something more appropriate as we get a clearer picture of this design). Within theCategoricalLocationVariant.definingLocationthat would take aSequenceLocation. This attribute would basically scope the region in which all eventual members must exist and for which all recipeConstraintsuse to filter down the list of members to the final result.DefiningAlleleConstraintthat is only based on thestateof the variant at thedefiningLocationcalledAlleleStateConstraint. It would be a super simple constraint (as all constraints should be IMO) which would specify the state of the residues at thedefiningLocation.CanonicalAllelerecipe to be based on this newCategoricalLocationVariantwith a singleAlleleStateConstraint.relationsconcept should be handled in this new design. Is it a foundational aspect of how the overarchingCategoricalVariantbehaves or is it aconstraint. One could argue that it isn't really a constraint since it isn't limiting the set of data, but instead defining the scope of variants that can be limited by the constraints. In other words, having both thedefiningLocationandrelationsas part of theCategoricalLocationVariantwould provide the constraints the starting dataset on which to begin constraining.To test this proposal out we would need to apply this to the notion of other types of
CategoricalVariantsthat we've found in practice from our registered implementers. For example, Categorical CNVs, Protein Sequence Expressions, ....The main idea to focus on to test this design out is to verify whether it makes sense to define subclasses of
CategoricalVariantthat let's the users define thescopeordatasetstarting point on which the one or moreConstraintsapplied in the categorical variant recipe can be used to determine the final result set of variants (or members). With this approach the programming would then be able to emulate the functions needed to first define a sequence, cytoband, gene, etc. region that is then used as input to the constraint's functional filters to arrive at the final outcome.While it is understood that the purpose of a Categorical Variant is not to always precisely define every member in a set, it should be possible to always take one or more variants and test it against a
CategoricalVariantrecipe to determine if it meets the scope and constraints it defines.Finally, I would like to promote the idea that all Constraints are designed to be as simplistic as possible. Contriving complex constraints early on is an indicator that the design may be suboptimal and would also lead to a greater barrier for adoption in crafting the functions needed to share in a standard way.
Beta Was this translation helpful? Give feedback.
All reactions