Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bidirectional_promoter_lncRNA and case sensitivity #155

Closed
joaoe opened this issue Jul 20, 2016 · 2 comments
Closed

bidirectional_promoter_lncRNA and case sensitivity #155

joaoe opened this issue Jul 20, 2016 · 2 comments

Comments

@joaoe
Copy link

joaoe commented Jul 20, 2016

I have the following VCF file

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT
chr11   119477254   .   A   G   262.77  .   .   .

and am using release 84 of the mouse reference genome

pyensembl install --release 84 --species mus_musculus

So, a simple testcase is

import pyensembl
genome = pyensembl.genome_for_reference_name("GRCm38")
print genome.transcripts_at_locus(11, 119477254, 119477256)

and I get the exception

Traceback (most recent call last):
  File "testcase", line 6, in <module>
    print genome.transcripts_at_locus(11, 119477254, 119477256)
  File "/<home>/.local/lib/python2.7/site-packages/pyensembl/genome.py", line 466, in transcripts_at_locus
    for transcript_id in transcript_ids
  File "/<home>/.local/lib/python2.7/site-packages/pyensembl/genome.py", line 832, in transcript_by_id
    require_valid_biotype=("transcript_biotype" in field_names))
  File "/<home>/.local/lib/python2.7/site-packages/pyensembl/transcript.py", line 48, in __init__
    biotype, transcript_id, transcript_name))
ValueError: Invalid biotype 'bidirectional_promoter_lncRNA' for transcript with ID=ENSMUST00000207133, name=RP23-25M3.6-002

Looking at pyensembl/biotypes.py I see "bidirectional_promoter_lncrna" all in lower case so is_valid_biotype() fails. Should the function is case insensitive ?

@iskandr
Copy link
Contributor

iskandr commented Sep 12, 2016

I assumed that the biotypes would be identical across different GTF (including case), but apparently I was wrong. Do you think I should make is_valid_biotype case insensitive?

Alternatively, I can just get rid of all this biotype checking code and let it be a string without structure.

@joaoe
Copy link
Author

joaoe commented Sep 12, 2016

I'm not an expert in biotypes, but isn't that what is checked to know if a variant affects a coding transcript ? I have no clue how often biotypes are likely to change. Looking at what they mean, I think case insensitivity makes sense. In case then there is an unknown biotype, it's best to just log a warning and ignore the transcript during annotation. That way, pyensembl does not break and can be gracefully updated later.

PS: the ensembl release number was bumped to 85.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants