New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bam->CRAM doesn't always preserve cigar elements identically #713
Comments
@vadimzalunin Can you let us know if this is expected, and if so what the nature of the transformations are ? Thx. |
@cmnbroad some folks from hts-spec are wondering about this CIGAR string...can I ask where it came from? did it show up in a file? did you come up with it artificially? thanks. |
@yfarjoun Its one of the htsjdk cram test files - I assume it was part of the original cram checkin so I'm not sure where it ultimately came from. It looks pretty artisanal/handmade though. |
This is something that is impossible in CRAM and has been since the very first draft. Fundamentally it doesn't store CIGAR strings, but edits. You will notice that it also changes cigars containing = and X to M too. To fix this would require a new CRAM version (likely major version bump as it would be incompatible). I added this to the list of ideas for CRAM v4, incase we ever get around to having a serious stab at it: samtools/hts-specs#144 |
The file xx#minimal.3.sam (in the src/test/resources/htsjdk/samtools/cram test folder) contains a read with cigar "5H0M5H" that, after being round tripped through CRAM, is restored as "10H". This task/question here is to understand and characterize round trip fidelity of cigar elements in CRAM.
Steps to reproduce
Write the above file to a .cram and then read and compare the records with the original.
Expected behaviour
Cigar string is preserved
Actual behaviour
An equivalent but different cigar is restored.
The text was updated successfully, but these errors were encountered: