-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mem should set the mate cigar tag. #138
Conversation
While not in a position to approve this particular PR. I am very much in favor of people having better data...this PR certainly intends to do that. |
bwamem.c
Outdated
@@ -899,6 +905,7 @@ void mem_aln2sam(const mem_opt_t *opt, const bntseq_t *bns, kstring_t *str, bseq | |||
kputsn("\tNM:i:", 6, str); kputw(p->NM, str); | |||
kputsn("\tMD:Z:", 6, str); kputs((char*)(p->cigar + p->n_cigar), str); | |||
} | |||
if (m->n_cigar) kputsn("\tMC:Z:", 6, str); add_cigar(opt, m, str, which); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs { … }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@lh3 any chance you could review this? |
@lh3 bump |
Thanks! |
Testing on macOS, this PR is causing segfaults for 0.7.16. If I revert 3b96dce, the segaults go away.
|
@lh3
A lot of downstream tools rely upon having a mate cigar present. To set the mate cigar efficiently, we need to have read pairs grouped together, so it is a bit a pain if the BAM is already coordinate sorted. Tools like
SetMateInformation
andMergeBamAlignment
will add these if not present, but not all folks post-process the output of BWA, and so post-processing is necessary. Setting the mate cigar within BWA is a better option.The mate cigar is extremely useful to have, as it allows us to obtain the genomic span of the template rather than just the given end of a read (i.e. R1 or R2), since we can calculate the end position rather than just the start (MPOS). We can also assess the span using either the clipped or unclipped bases, which is useful for many analyses that try to group or compare reads (think duplicate marking-like algorithms).
Thanks for considering this pull request.