-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for mito m.
in translate_from
#362
Conversation
meant to make this a draft PR. I'll try again. |
@larrybabb I got you |
@ahwagner IIRC you had said you had comments that were not submitted. Did you want to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments included, but my greatest concern is how circular sequences are handled; SequenceLocation requires start <= end, but this is based on sequence linearity assumptions. Should we update the VRS model to allow end < start for circular sequences? If not, how do we plan to handle variants that span the 0 coordinate?
src/ga4gh/vrs/extras/translator.py
Outdated
@@ -572,7 +578,7 @@ def _to_hgvs(self, vo, namespace="refseq"): | |||
if ns.startswith("GRC") and namespace is None: | |||
continue | |||
|
|||
if not (any(a.startswith(pfx) for pfx in ("NM", "NP", "NC", "NG"))): | |||
if not (any(a.startswith(pfx) for pfx in ("NM", "NP", "NC", "NG", "NR", "NW"))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not want to support X{MRP}_
accession prefixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should. I just hit these use cases in clinvar. there's not many. and the vrs-python seqrepo does have the XM_... accessions available.
src/ga4gh/vrs/extras/translator.py
Outdated
@@ -580,7 +586,7 @@ def _to_hgvs(self, vo, namespace="refseq"): | |||
if ns.startswith("GRC") and namespace is None: | |||
continue | |||
|
|||
if not (any(a.startswith(pfx) for pfx in ("NM", "NP", "NC", "NG"))): | |||
if not (any(a.startswith(pfx) for pfx in ("NM", "NP", "NC", "NG", "NR", "NW"))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are NW accessions? I found one associated with a linear Zebrafish reference sequence, but have a hard time finding documentation on what this prefix means and how it should be interpreted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genomic contigs or scaffolds - NT_010718.17, NW_003315950.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are 182 hgvs expressions in clinvar that have NW_ associated expressions. Here are a couple if it helps to discern what they are...
https://www.ncbi.nlm.nih.gov/clinvar/variation/146167
(patch scaffolding?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larrybabb should we also add "NT"?
My own $0.02 on the circular sequence question. And I'm really just considering human mitochondrial variants here. Supporting circular reference sequences and allowing locations that span the zero point seems like a solution in search of a problem. My reasons for thinking this:
For example, I ran this test in Variant Validator:
And from a human biology standpoint I would think that in/dels that disrupt the zero point and replication would be highly unlikely to be compatible with life? |
@ehclark I agree completely with your points. I also think we should state this clearly as a policy decision and address it in the future if/when the need is presented by the community and the priority is high enough. |
Respectfully, I disagree about this not being a problem. Spanning the mitochondrial origin is something that resources like gnomAD do struggle with. For example, from the gnomAD mitchondrial origin calling documentation:
I also want to disentangle the concern I raised (how do we represent these variants) from the concerns raised by @ehclark (how do you normalize these variants). Whether or not we want to add support for normalization over the origin, the fact is that HGVS expressions (such as the submitted |
@ahwagner The main point I am trying to make is that variants spanning the origin point don't actually seem to exist in the real world (at least based on current data) and therefore expending engineering effort and increasing complexity to support them is not worthwhile. That said, I think your point about separating the various concerns here makes sense. To build on your points, it seems there are three basic elements:
Most of the engineering complexity is wrapped up in the normalization element. I interpret your comment to mean you think that VRS model representation is a must-have because it is necessary to allow the model to accurately represent circular sequence locations. And it seems that supporting the representation would not be complicated. I would also note that as of right now, vrs-python will happily create VRS objects/ids for mitochondrial variants converting from gnomAD format, which is what the VCF annotator uses, and assuming a linear sequence. So if the VRS model changes to support sequence locations with start > end, it would be a breaking change. So getting this resolved would be high on my priority list. |
@ahwagner would you mind reviewing this asap. I reverted the secondary concern I found regarding the lack of support of |
fixes bug #360 |
Added missing support for
m.
mitochondrial translation from hgvs, beacon, gnomad and spdi formats. Also added mapping ofNR_
refseq accessions ton.
nomenclature.