Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Adding a flag to sampe (-d) to disable pair rescuing if too many rescue attempts are made. #12

Open
wants to merge 2 commits into
from

Conversation

Projects
None yet
2 participants

drio commented Sep 30, 2012

Hi,

We have been experiencing dramatic increase in waltime for the sampe step for some of our libraries.
They are circularized libraries where the distribution of insert sizes is very wide. Because of that,
BWA tries to rescue many pairs so more SW computations are performed.

We could disable SW for those libraries, but sometimes they are not properly marked as such. I thought this method would transparently take care of this for "peculiar" libraries while working as usual for the normal ones.

The sampe execution of a "bad" library without the (-d) flag would look like:

[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] 27969 out of 48925 Q17 singletons are mated.
[bwa_paired_sw] 1143 out of 56132 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 6839.43 sec

and with the -d (4000) flag:

[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_paired_sw] Too many rescue attemps, disabling Smith-Waterman for unmapped mates.
[bwa_paired_sw] 1122 out of 1868 Q17 singletons are mated.
[bwa_paired_sw] 95 out of 2133 Q17 discordant pairs are fixed.
[bwa_sai2sam_pe_core] time elapses: 261.08 sec

I have done some basic testing and everything looks stable.

Please let me know what you think,
-drd

drio commented Oct 5, 2012

Hi,

Do you have any comments on this?

Thanks,
-drd

This is useful - making the sampe adaptive to different situations. However why not just use -s? I would think that this should be an "all or nothing" flag, not something that happens for some alignments that are, randomly, in the beginning of a block of reads, and not others at the end. I've found that this issue only occurs when there's a poor reference. And in that case, -s, or possibly penalizing clipping, is the better choice.

drio commented Jul 9, 2014

Thanks.

You want to run bwa in normal mode all the time (no -s), SW enabled will recover a few reads, and that is desirable. You don't want to disable SW for all the libraries, only the bad ones (and still get a decent wall time).

Another alternative to this would be to use the IS distribution metrics to decide when to disable SW (instead of using the number of reads being rescued).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment