New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_reference_sequence with_seq=True returns different reference for same position? #1180
Comments
I'm not sure what |
The first problem here is that the coordinates in the tuples returned by
which shows the expected unanimous T. I will improve the The second problem is that you would expect the answer to be unanimous for (1-based) position 10469 too. By looking at all the tuples returned by
I have also reconstructed these manually from CIGAR, SEQ, and MD for these three reads, and get the same results as pysam. So I am fairly sure pysam is correct for these MD values. The position marked A similar discrepancy between the three reads also occurs e.g. at the position marked So it would appear that the MD values emitted by biscuit, while they are syntactically correct w.r.t. the ‘0’ separators, are not quite correct. It would be interesting to see what |
Thanks for taking the time to look into this and for such a detailed reply! Here's GRCh38-DAC-U2AF1.fna.gz For reference, it's generated from:
Via:
|
I looked at this for BISCUIT specifically (problem 2 from @jmarshall's comment). I'll keep that specific discussion over on BISCUIT's issue, but you can see my comment here: huishenlab/biscuit#37 (comment). |
Clarification for #1180. In future, we may consider returning this as a collections.namedtuple, as there's no natural ordering to the elements.
Documentation has been clarified. The remainder of the issue has I think been covered by the linked BISCUIT issue. |
I'm parsing an alignment of methylated sequencing data from biscuit.
At some regions,
get_reference_sequence(with_seq=True)
returns different bases for the same position, which I don't understand.Here's an example run on this tiny .bam file, looking at chr1:10468
However pysam reports both C and T as the reference (in addition to a substitution).
Output:
(This seems a little like #895 -- I wonder if the MD tags are corrupt somehow?)
The text was updated successfully, but these errors were encountered: