Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi flag in tabix_index #995

Closed
FredericBGA opened this issue Feb 24, 2021 · 1 comment
Closed

csi flag in tabix_index #995

FredericBGA opened this issue Feb 24, 2021 · 1 comment

Comments

@FredericBGA
Copy link

I try to index a VCF that needs a .csi index.
This file can be dowloaded here:
https://usegalaxy.org/u/fredbga/h/example-vcf-gz

I didn't manage to work with the csi flag of tabix_index function.

pip freeze | grep pysam
pysam==0.16.0.1

The csi flag does not work (at least as I expected, like on command line: tabix -p vcf -C )

 >>> import pysam
>>> pysam.tabix_index('example-vcf-gz', index='example-vcf-gz.csi', preset='vcf', force=True, csi=True)                                       [E::hts_idx_check_range] Region 557988904..557988905 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pysam/libctabix.pyx", line 1035, in pysam.libctabix.tabix_index
OSError: building of index for example-vcf-gz failed
>>>

If I set the min_shift value, it works (a .csi index is created), even without the csi flag:

>>> pysam.tabix_index('example-vcf-gz', index='example-vcf-gz.csi', preset='vcf', force=True, min_shift=14)
'example-vcf-gz'
>>>

Is this the expected behavior?

@jmarshall
Copy link
Member

This is a bug in pysam.tabix_index's csi argument handling. In this and other non-BCF cases, setting csi=True changed the filename to .csi but still wrote a TBI index (now with a misleading filename extension). Which of course does not work for wheat data.

Fixed by making the argument handling more like the tabix(1) command, for which using either -C or -m INT or both makes it write a CSI index. (TBI indices use a hardcoded min_shift of 14.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants