Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mem should set the mate cigar tag. #138

Merged
merged 1 commit into from
Jul 30, 2017
Merged

Conversation

nh13
Copy link
Contributor

@nh13 nh13 commented Jun 25, 2017

@lh3

A lot of downstream tools rely upon having a mate cigar present. To set the mate cigar efficiently, we need to have read pairs grouped together, so it is a bit a pain if the BAM is already coordinate sorted. Tools like SetMateInformation and MergeBamAlignment will add these if not present, but not all folks post-process the output of BWA, and so post-processing is necessary. Setting the mate cigar within BWA is a better option.

The mate cigar is extremely useful to have, as it allows us to obtain the genomic span of the template rather than just the given end of a read (i.e. R1 or R2), since we can calculate the end position rather than just the start (MPOS). We can also assess the span using either the clipped or unclipped bases, which is useful for many analyses that try to group or compare reads (think duplicate marking-like algorithms).

Thanks for considering this pull request.

@yfarjoun
Copy link

While not in a position to approve this particular PR. I am very much in favor of people having better data...this PR certainly intends to do that.

bwamem.c Outdated
@@ -899,6 +905,7 @@ void mem_aln2sam(const mem_opt_t *opt, const bntseq_t *bns, kstring_t *str, bseq
kputsn("\tNM:i:", 6, str); kputw(p->NM, str);
kputsn("\tMD:Z:", 6, str); kputs((char*)(p->cigar + p->n_cigar), str);
}
if (m->n_cigar) kputsn("\tMC:Z:", 6, str); add_cigar(opt, m, str, which);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs { … }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@nh13
Copy link
Contributor Author

nh13 commented Jul 3, 2017

@lh3 any chance you could review this?

@nh13
Copy link
Contributor Author

nh13 commented Jul 15, 2017

@lh3 bump

@lh3 lh3 merged commit 950ea8c into lh3:master Jul 30, 2017
@lh3
Copy link
Owner

lh3 commented Jul 30, 2017

Thanks!

@ilovezfs
Copy link

Testing on macOS, this PR is causing segfaults for 0.7.16. If I revert 3b96dce, the segaults go away.

bash-3.2$ cat test.fasta 
>0
AGATGTGCTG
bash-3.2$ sudo lldb ./bwa
(lldb) r index test.fasta
Process 11822 launched: './bwa' (x86_64)
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.16-r1180
[main] CMD: ./bwa index test.fasta
[main] Real time: 0.002 sec; CPU: 0.002 sec
Process 11822 exited with status = 0 (0x00000000) 
(lldb) r mem test.fasta test.fasta
Process 11825 launched: './bwa' (x86_64)
[M::bwa_idx_load_from_disk] read 0 ALT contigs
@SQ	SN:0	LN:10
@PG	ID:bwa	PN:bwa	VN:0.7.16-r1180	CL:./bwa mem test.fasta test.fasta
[M::process] read 1 sequences (10 bp)...
bwa was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 11825 stopped
* thread #2: tid = 0x1d2adc, 0x000000010002f3df bwa`mem_aln2sam(opt=0x0000000100200430, bns=0x00000001002000c0, str=<unavailable>, s=<unavailable>, n=1, list=<unavailable>, which=0, m_=<unavailable>) + 5945 at bwamem.c:908, stop reason = EXC_BAD_ACCESS (code=1, address=0x14)
    frame #0: 0x000000010002f3df bwa`mem_aln2sam(opt=0x0000000100200430, bns=0x00000001002000c0, str=<unavailable>, s=<unavailable>, n=1, list=<unavailable>, which=0, m_=<unavailable>) + 5945 at bwamem.c:908 [opt]
   905 			kputsn("\tNM:i:", 6, str); kputw(p->NM, str);
   906 			kputsn("\tMD:Z:", 6, str); kputs((char*)(p->cigar + p->n_cigar), str);
   907 		}
-> 908 		if (m->n_cigar) { kputsn("\tMC:Z:", 6, str); add_cigar(opt, m, str, which); }
   909 		if (p->score >= 0) { kputsn("\tAS:i:", 6, str); kputw(p->score, str); }
   910 		if (p->sub >= 0) { kputsn("\tXS:i:", 6, str); kputw(p->sub, str); }
   911 		if (bwa_rg_id[0]) { kputsn("\tRG:Z:", 6, str); kputs(bwa_rg_id, str); }
(lldb) bt
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available.
* thread #2: tid = 0x1d2adc, 0x000000010002f3df bwa`mem_aln2sam(opt=0x0000000100200430, bns=0x00000001002000c0, str=<unavailable>, s=<unavailable>, n=1, list=<unavailable>, which=0, m_=<unavailable>) + 5945 at bwamem.c:908, stop reason = EXC_BAD_ACCESS (code=1, address=0x14)
  * frame #0: 0x000000010002f3df bwa`mem_aln2sam(opt=0x0000000100200430, bns=0x00000001002000c0, str=<unavailable>, s=<unavailable>, n=1, list=<unavailable>, which=0, m_=<unavailable>) + 5945 at bwamem.c:908 [opt]
    frame #1: 0x0000000100030f13 bwa`mem_reg2sam(opt=<unavailable>, bns=<unavailable>, pac=<unavailable>, s=<unavailable>, a=<unavailable>, extra_flag=<unavailable>, m=<unavailable>) + 967 at bwamem.c:1036 [opt]
    frame #2: 0x0000000100031e0f bwa`worker2(data=0x0000700000080ca8, i=<unavailable>, tid=<unavailable>) + 187 at bwamem.c:1195 [opt]
    frame #3: 0x000000010001ccdf bwa`ktf_worker(data=<unavailable>) + 54 at kthread.c:42 [opt]
    frame #4: 0x00007fff888be99d libsystem_pthread.dylib`_pthread_body + 131
    frame #5: 0x00007fff888be91a libsystem_pthread.dylib`_pthread_start + 168
    fr

lh3 added a commit that referenced this pull request Jul 31, 2017
@nh13 nh13 deleted the nh_mem_mate_cigar branch March 15, 2019 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants