Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No transcripts found" in a region that spans several genes #399

Open
leicray opened this issue Jul 21, 2022 · 12 comments
Open

"No transcripts found" in a region that spans several genes #399

leicray opened this issue Jul 21, 2022 · 12 comments
Assignees

Comments

@leicray
Copy link
Contributor

leicray commented Jul 21, 2022

Describe the bug
Submission of a large deletion description generates a "No transcripts found that fully overlap the described variation in the genomic sequence" error even though the deleted region spans several genes.

To Reproduce
Steps to reproduce the behaviour:

  1. Enter the variant description NC_000002.11:g.189401615_190055558del
  2. Click on the "GRCh37 (hg19, build19)" radio button
  3. Submit
  4. See error message "No transcripts found that fully overlap the described variation in the genomic sequence"

Expected behaviour
The region that is spanned by the deletion covers several genes including GULP1, DIRC1, COL3A1, and COL5A2. hence, the error message is incorrect.

View in UCSC Genome browser (GRCh37)

@ifokkema
Copy link
Collaborator

I suppose this may be a duplicate of #333?
That's the issue with not getting any mapping info back for variants partially intergenic. Your deletion falls within this category as well. Since the HVNC can't decide on the nomenclature to use for variants that extend beyond the UTR, VV can't return you any mappings for your deletion, either.

@leicray
Copy link
Contributor Author

leicray commented Jul 22, 2022

The equivalent deletion description in the context of GRCh38 is NC_000002.12:g.188536886_189190829del but that too returns the "No transcripts found that fully overlap the described variation in the genomic sequence" error message.

The GRCh37 variant description NC_000017.10:g.48275363_48275364del validates correctly and the deletion is shown to be within the COL1A1 gene, as expected,

The GRCh37 variant description NC_000017.10:g.48275363_48276364del also validates.

However, the GRCh37 variant description NC_000017.10:g.48275363_48286364del fails to validate and also returns the "No transcripts found that fully overlap the described variation in the genomic sequence" error message.

The length of NC_000017.10 is 243199373 bp which means that the start and end points of the deletion are valid positions within the context of the NC_000017.10 reference sequence.

The evidence points to Validator not being able to project large deletions onto transcripts.

@leicray
Copy link
Contributor Author

leicray commented Jul 22, 2022

You are quite correct this this is a duplicate of #333

The solution for now is probably to amend the error message which is somewhat misleading, even though the wording is factually correct. A revised error message might be:

"The variant description spans regions that are intergenic. The HGVS variant nomenclature does not allow intergenic region alterations to be described in the context of transcripts."

@ifokkema
Copy link
Collaborator

I like that solution - for the short term. In the long term, we really need a solution. I really feel since there is a solution for intronic variants (also having positions that are not on the transcript), this can also be solved. With me not being selected for the HVNC, I can't help much but now and then remind you guys that it needs to be looked at 😉

@leicray
Copy link
Contributor Author

leicray commented Jul 22, 2022

An additional message could be a posted to indicate any genes that fall within the span of the deletion (or duplication):

"The region specified in the variant description includes all, or part, of the following genes: Gene1, Gene2, Gene3, etc."

The gene symbols would be listed in gene-order on the chromosome.

@ifokkema
Copy link
Collaborator

That wouldn't be a replacement for having a proper variant description, but it could be a start for other users, yes. For LOVD, however, we'd need a variant description on a specific transcript to be able to show that there is a mapping to the specified gene(s).

@leicray
Copy link
Contributor Author

leicray commented Jul 25, 2022

I agree with you that my suggestion would not be a replacement. I am just saying that listing the genes within the dup/del would be better, for now, than saying "No transcripts found that fully overlap the described variation in the genomic sequence".

@ifokkema
Copy link
Collaborator

That's absolutely true!

@Peter-J-Freeman
Copy link
Collaborator

Links to
#108

@Peter-J-Freeman
Copy link
Collaborator

I think that this would be a vv_hgvs/VVTA issue. What we would need to do is add to vv_hgvs a SQL query that can seatch VVTA for all genes that are fully encompasses by a span / or overlap too???

Linking a feature request in VVTA for John to think about.

@Peter-J-Freeman
Copy link
Collaborator

Once vv_hgvs / VVTA is updated as necessary, I can plug into VV

@Peter-J-Freeman Peter-J-Freeman self-assigned this Aug 16, 2022
@leicray
Copy link
Contributor Author

leicray commented Aug 16, 2022

Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants