Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not create iterator for region 'himom' #1263

Open
dbolser opened this issue Jan 18, 2024 · 1 comment
Open

ValueError: could not create iterator for region 'himom' #1263

dbolser opened this issue Jan 18, 2024 · 1 comment

Comments

@dbolser
Copy link

dbolser commented Jan 18, 2024

Not sure if this has been discussed before, but the following was unexpected for me:

tbx = pysam.TabixFile("whatever.tab.gz")

tbx.contigs
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', ...]

cons_tbx.fetch('himom')
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pysam/libctabix.pyx", line 507, in pysam.libctabix.TabixFile.fetch
ValueError: could not create iterator for region 'himom'

This is a surprise because:

$ tabix whatever.tab.gz 1:1000000-1100000 > /dev/null && echo 'kk'
kk

$ tabix whatever.tab.gz himom:1000000-1100000 > /dev/null && echo 'kk'
kk

and:

tbx = pysam.TabixFile("whatever.tab.gz")

len(list(tbx.fetch("1:1000000-1100000")))
100001

len(list(tbx.fetch("1:99991000000-99991100000")))
0

len(list(tbx.fetch("himom:99991000000-99991100000")))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pysam/libctabix.pyx", line 507, in pysam.libctabix.TabixFile.fetch
ValueError: could not create iterator for region 'himom:99991000000-99991100000'

If my code handles zero results from a given range, then why should it fall over just because a contig isn't found? The cli version doesn't fall over...

I know this is minor, just thought I'd make sure it had been considered properly...

@dbolser
Copy link
Author

dbolser commented Jan 25, 2024

I just found another example of this that's a bit more serious:

$ tabix genes-sorted.tsv.gz 1:100001-120001 && echo "OK"
1       89295   133566  ENSG00000238009 ENST00000453576 
1       89295   133566  ENSG00000238009 ENST00000466430 
1       89295   133566  ENSG00000238009 ENST00000471248 
1       89295   133566  ENSG00000238009 ENST00000477740 
OK

$ tabix genes-sorted.tsv.gz MT:-100001-120001 && echo "OK"
MT      577     647     ENSG00000210049 ENST00000387314         MT-TF
MT      648     1601    ENSG00000211459 ENST00000389680         MT-RNR1
...
MT      14747   15887   ENSG00000198727 ENST00000361789 ENSP00000354554 MT-CYB
MT      15888   15953   ENSG00000210195 ENST00000387460         MT-TT
MT      15956   16023   ENSG00000210196 ENST00000387461         MT-TP
OK

But:

gene_tbx = pysam.TabixFile("genes-sorted.tsv.gz")

for gene in gene_tbx.fetch("1", 100000, 120000):
    print(gene)

for gene in gene_tbx.fetch("MT", -10000, 10000):
    print(gene)

Gives:

1       89295   133566  ENSG00000238009 ENST00000453576 
1       89295   133566  ENSG00000238009 ENST00000466430 
1       89295   133566  ENSG00000238009 ENST00000471248 
1       89295   133566  ENSG00000238009 ENST00000477740 
Traceback (most recent call last):
  File "/home/danbolser/Work/IUK1/Integration/test_pysam.py", line 11, in <module>
    for gene in gene_tbx.fetch("MT", -10000, 10000):
  File "pysam/libctabix.pyx", line 456, in pysam.libctabix.TabixFile.fetch
ValueError: start out of range (10000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant