Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMFileWriterFactory creates .bai file when writing .cram file #1672

Open
rickymagner opened this issue Jul 18, 2023 · 1 comment
Open

SAMFileWriterFactory creates .bai file when writing .cram file #1672

rickymagner opened this issue Jul 18, 2023 · 1 comment

Comments

@rickymagner
Copy link
Contributor

Description of the issue:

When using the SAMFileWriterFactory to write a .cram file, when the "create index" default is toggled on, it will create a .bai file for the index rather than .crai. This means that e.g. when running gatk MergeSamFiles --CREATE_INDEX… with a .cram output, you end up with an output.cram.bai file instead of output.cram.crai.

Your environment:

  • version of htsjdk: 3.0.1
  • version of java: 17
  • which OS: MacOS

Steps to reproduce

Run gatk MergeSamFiles as described above.

Expected behaviour

You should get a .crai file.

Actual behaviour

You get a .bai file.


There are a few very old issues surrounding .crai files in the repo. According to this issue it seems like support was added for this but kept off for reasons discussed here. Perhaps it's too much to resurrect the project of getting these indices sorted out, but at the moment is seems GATK just silently puts out .cram.bai files due to this, which can be pretty confusing. I don't know enough about CRAM vs BAM to know how bad it might be to use one index for the other, but at least GATK seems to work just fine doing random access on CRAMs with the .bai file produced as described above. Also not sure if this issue should be pushed up to GATK or kept down here in htsjdk. At the very least it'd be nice if the library could be updated to use the proper file extension for the index.

@lbergelson
Copy link
Member

@rickymagner It's actually producing a bai index, not a crai. So it would be equally wrong to rename it to crai. It would be great to fix it to make a crai index but I think it's a bit of a project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants