Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bam->CRAM doesn't always preserve cigar elements identically #713

Open
cmnbroad opened this issue Sep 22, 2016 · 4 comments
Open

Bam->CRAM doesn't always preserve cigar elements identically #713

cmnbroad opened this issue Sep 22, 2016 · 4 comments
Labels

Comments

@cmnbroad
Copy link
Collaborator

The file xx#minimal.3.sam (in the src/test/resources/htsjdk/samtools/cram test folder) contains a read with cigar "5H0M5H" that, after being round tripped through CRAM, is restored as "10H". This task/question here is to understand and characterize round trip fidelity of cigar elements in CRAM.

Steps to reproduce

Write the above file to a .cram and then read and compare the records with the original.

Expected behaviour

Cigar string is preserved

Actual behaviour

An equivalent but different cigar is restored.

@cmnbroad
Copy link
Collaborator Author

@vadimzalunin Can you let us know if this is expected, and if so what the nature of the transformations are ? Thx.

@cmnbroad cmnbroad changed the title Bam->CRAM can alter cigar string Bam->CRAM doesn't always preserve cigar elements indentically Sep 22, 2016
@cmnbroad cmnbroad changed the title Bam->CRAM doesn't always preserve cigar elements indentically Bam->CRAM doesn't always preserve cigar elements identically Sep 22, 2016
@yfarjoun
Copy link
Contributor

@cmnbroad some folks from hts-spec are wondering about this CIGAR string...can I ask where it came from? did it show up in a file? did you come up with it artificially?

thanks.

@cmnbroad
Copy link
Collaborator Author

@yfarjoun Its one of the htsjdk cram test files - I assume it was part of the original cram checkin so I'm not sure where it ultimately came from. It looks pretty artisanal/handmade though.

@jkbonfield
Copy link

This is something that is impossible in CRAM and has been since the very first draft.

Fundamentally it doesn't store CIGAR strings, but edits. You will notice that it also changes cigars containing = and X to M too.

To fix this would require a new CRAM version (likely major version bump as it would be incompatible).

I added this to the list of ideas for CRAM v4, incase we ever get around to having a serious stab at it: samtools/hts-specs#144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants