Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in non directional mode #37

Closed
sylvainforet opened this issue Apr 11, 2016 · 5 comments
Closed

Bug in non directional mode #37

sylvainforet opened this issue Apr 11, 2016 · 5 comments

Comments

@sylvainforet
Copy link

Hello,

I have used Bismark a few times in directional mode, and I have been very happy with the results so far (thanks for that!). However, I believe that I have found a bug in the non-directional mode (I am using bismark_v0.15.0).

Specifically, we are looking at amplicons, and our library contains amplicons targeting the OT and OB strands of the same region. In this case, some amplicons that should be reported as mapping to the OB strand (XG:Z:GA) are reported on the OT strand (XG:Z:CT) and thus the methylation calls are wrong. This problem is likely to affect any type of non-directional library, not just amplicons.

The problem can be seen with the following reference and reads:

Reference:
GTGAAGATTGTCGTGTTTCTTGACTAATTTCCATTGCTTGTTTCATGAGTGCATCCATTAAATCTTGAACATTCTCTGACTTCGTTTCATTTTCGAATATGGTACCAATCTTTACAGACGTTAATGAATTTTTTTTAGCATCATCTGATTTTAATATCTTTGTTTTTTCAATAGTTTTCTTTGCTTTCTTTTTTTGTGATTTTGATTCATTGCTTTCATCACTCGATAATACCATAATATCATCGATTTTTTCTTTTTGTGAAACCATATCGCAAACTTTCGTGATTACATCATCTATTAAAGCTACGACATCATGCATTAATTGCAAATGAGATACATTGCCTAATTTTATTTTCTTTATATTATCTTGAGGTAATTCAGTAGGAGATCTTTTACGTGTTTCAGTTTTTTCTTGTTCATTACGTTGATTTGTTCCAAATCCTGTATTTGTTCTGTCTATAATTTCAATCTCCTCGATACTATTTGAACGTTGAGCAATGGTGACAACATTTCTGTCACTTTCAGTTGTAAATTCCATTTCTTTTCCAATAATTTTATTACTTGTGAAAAGATTTTCTTCCTTAGTTTTCTCATTTAATTTTGAAGTATATTTCATAAAAGGGCCTTTAGCACATTTTTCAAGATGAAATTGAAGCAATTGACCTAAAGATTTTTTCAAGGGTTTTCCTTCGTGCTGCTCAAGATATTTTAATATGCTTGCATGAATTCGATAATG

Forward read:
TTAATTCATTACTTTCATCACTCAATAATACCATAATATCATCAATTTTTTCTTTTTATAAAACCATATCACAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTT

Reverse read:
ATTATTGTTTAACGTTTAAATAGTATCGAGGAGATTGAAATTATAGATAGAATAAATATAGGATTTGGAATAAATTAACGTAATGAATAAGAAAAAATTGAAATATGTAAAAGATTTTTTATTGAATTATTTTAAGATAATATAAAGAAAATAAAATTAGGTAATGTATTTTATTTGTAATTAATGCATGATGTCGTAGTTTTAATAGATGATGTAATTATGAAAGTTTG

bismark command:
bismark --bowtie2 -minins 0 --maxins 1000 --output_dir . --sam --non_directional -N 1 -L 20 -D 20 -R 3 --score_min L,-0.8,-0.8 -p 4 /home/laurawelsh/bio/myBisulf_data/work/mapping/bismark -1 tmp1.fastq -2 tmp2.fastq

I think that I have found a potential solution for this problem.
I'm attaching a diff of my code against bismark_v0.15.0.
The fix is still incomplete: the same thing must be applied to single-end case and to the case where there is ambiguous mapping on the same strand.

Am I missing missing something or is this a real bug?
If it is a real bug, does this fix seem OK?

Thanks,
Sylvain
diff.txt

@FelixKrueger
Copy link
Owner

Dear Sylvain,
Many thanks for reporting this, I was a little swamped with other things but I will take a look at this soon. Kind regards, Felix

From: sylvainforet <notifications@github.commailto:notifications@github.com>
Reply-To: FelixKrueger/Bismark <reply@reply.github.commailto:reply@reply.github.com>
Date: Monday, 11 April 2016 at 08:06
To: FelixKrueger/Bismark <Bismark@noreply.github.commailto:Bismark@noreply.github.com>
Subject: [FelixKrueger/Bismark] Bug in non directional mode (#37)

Hello,

I have used Bismark a few times in directional mode, and I have been very happy with the results so far (thanks for that!). However, I believe that I have found a bug in the non-directional mode (I am using bismark_v0.15.0).

Specifically, we are looking at amplicons, and our library contains amplicons targeting the OT and OB strands of the same region. In this case, some amplicons that should be reported as mapping to the OB strand (XG:Z:GA) are reported on the OT strand (XG:Z:CT) and thus the methylation calls are wrong. This problem is likely to affect any type of non-directional library, not just amplicons.

The problem can be seen with the following reference and reads:

Reference:
GTGAAGATTGTCGTGTTTCTTGACTAATTTCCATTGCTTGTTTCATGAGTGCATCCATTAAATCTTGAACATTCTCTGACTTCGTTTCATTTTCGAATATGGTACCAATCTTTACAGACGTTAATGAATTTTTTTTAGCATCATCTGATTTTAATATCTTTGTTTTTTCAATAGTTTTCTTTGCTTTCTTTTTTTGTGATTTTGATTCATTGCTTTCATCACTCGATAATACCATAATATCATCGATTTTTTCTTTTTGTGAAACCATATCGCAAACTTTCGTGATTACATCATCTATTAAAGCTACGACATCATGCATTAATTGCAAATGAGATACATTGCCTAATTTTATTTTCTTTATATTATCTTGAGGTAATTCAGTAGGAGATCTTTTACGTGTTTCAGTTTTTTCTTGTTCATTACGTTGATTTGTTCCAAATCCTGTATTTGTTCTGTCTATAATTTCAATCTCCTCGATACTATTTGAACGTTGAGCAATGGTGACAACATTTCTGTCACTTTCAGTTGTAAATTCCATTTCTTTTCCAATAATTTTATTACTTGTGAAAAGATTTTCTTCCTTAGTTTTCTCATTTAATTTTGAAGTATATTTCATAAAAGGGCCTTTAGCACATTTTTCAAGATGAAATTGAAGCAATTGACCTAAAGATTTTTTCAAGGGTTTTCCTTCGTGCTGCTCAAGATATTTTAATATGCTTGCATGAATTCGATAATG

Forward read:
TTAATTCATTACTTTCATCACTCAATAATACCATAATATCATCAATTTTTTCTTTTTATAAAACCATATCACAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTT

Reverse read:
ATTATTGTTTAACGTTTAAATAGTATCGAGGAGATTGAAATTATAGATAGAATAAATATAGGATTTGGAATAAATTAACGTAATGAATAAGAAAAAATTGAAATATGTAAAAGATTTTTTATTGAATTATTTTAAGATAATATAAAGAAAATAAAATTAGGTAATGTATTTTATTTGTAATTAATGCATGATGTCGTAGTTTTAATAGATGATGTAATTATGAAAGTTTG

bismark command:
bismark --bowtie2 -minins 0 --maxins 1000 --output_dir . --sam --non_directional -N 1 -L 20 -D 20 -R 3 --score_min L,-0.8,-0.8 -p 4 /home/laurawelsh/bio/myBisulf_data/work/mapping/bismark -1 tmp1.fastq -2 tmp2.fastq

I think that I have found a potential solution for this problem.
I'm attaching a diff of my code against bismark_v0.15.0.
The fix is still incomplete: the same thing must be applied to single-end case and to the case where there is ambiguous mapping on the same strand.

Am I missing missing something or is this a real bug?
If it is a real bug, does this fix seem OK?

Thanks,
Sylvain
diff.txthttps://github.com/FelixKrueger/Bismark/files/212413/diff.txt


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/37
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms

@FelixKrueger
Copy link
Owner

I can now confirm that this really is a bug and not a feature.

Using stringent default alignments one gets this (correct) alignment:

@HD VN:1.0 SO:unsorted @SQ SN:Reference LN:736 @PG ID:Bismark VN:v0.15.2 CL:"bismark --maxins 1000 --non_directional --genome ./reference/ -1 fwd.fastq -2 rev.fastq --prefix default_PE" Forward 163 Reference 202 42 230M = 273 301 TTAATTCATTACTTTCATCACTCAATAATACCATAATATCATCAATTTTTTCTTTTTATAAAACCATATCACAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:26 MD:Z:2G7G12G19G13G1G10G9G1G18G21G5G1G7G28G1G0G7G2G0G1G9G1G5G9G11G4 XM:Z:..h.......h............z...................z.............h.h..........z.........z.h..................h....Z.......H........h.....h.h.......h............................h.hh.......x..hh.h.........z.h.....x.........h........Z..h.... XR:Z:GA XG:Z:GA Forward 83 Reference 273 42 230M = 202 -301 CAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTTATTCCAAATCCTATATTTATTCTATCTATAATTTCAATCTCCTCGATACTATTTAAACGTTAAACAATAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:28 MD:Z:9G1G18G21G5G1G7G28G1G0G7G2G0G1G9G1G5G9G11G4G11G5G4G30G6G1G4G0G1 XM:Z:.........z.h..................h....Z.......H........h.....h.h.......h............................h.hh.......x..hh.h.........z.h.....x.........h........Z..h....h...........x.....h....x....................Z.........h...Z..h.h....hh. XR:Z:CT XG:Z:GA

Using very relaxed alignment parameters one gets this (incorrect) alignment with > 50 mismatches to the same position but the OT strand instead of the CTOB strand:

@HD VN:1.0 SO:unsorted @SQ SN:Reference LN:736 @PG ID:Bismark VN:v0.15.2 CL:"bismark --maxins 1000 --non_directional -N 1 --genome ./reference/ -1 fwd.fastq -2 rev.fastq --prefix relaxed_PE --score_min L,-0.8,-0.8" Forward 99 Reference 202 0 230M = 273 301 TTAATTCATTACTTTCATCACTCAATAATACCATAATATCATCAATTTTTTCTTTTTATAAAACCATATCACAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:26 MD:Z:2G7G12G19G13G1G10G9G1G18G21G5G1G7G28G1G0G7G2G0G1G9G1G5G9G11G4 XM:Z:......H....H...H..H.H.Z.......HH.......H..Z........H...........HH....Z.H...H...Z.......H..H..H........H..Z..H..H...H........H..........H....HH............H..........H...........X..........H.....Z......X........H.....H....Z........ XR:Z:CT XG:Z:CT Forward 147 Reference 273 0 230M = 202 -301 CAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTTATTCCAAATCCTATATTTATTCTATCTATAATTTCAATCTCCTCGATACTATTTAAACGTTAAACAATAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:28 MD:Z:9G1G18G21G5G1G7G28G1G0G7G2G0G1G9G1G5G9G11G4G11G5G4G30G6G1G4G0G1 XM:Z:H...H...Z.......H..H..H........H..Z..H..H...H........H..........H....HH............H..........H...........X..........H.....Z......X........H.....H....Z...........HH....HX..........X...H........H...H.HH.Z....H........Z......H...... XR:Z:GA XG:Z:CT

Even though this sort of error should be rare (who allows > 50 mismatches for their alignments ? :)) it certainly needs to be addressed. Working on it...

@sylvainforet
Copy link
Author

Thanks for checking. Yes, admittedly, that's an extreme example, and it should not be too frequent, but it could happen occasionally for more stringent alignments, especially in AT-rich sequences.

@FelixKrueger
Copy link
Owner

I have now looked at your example and changed Bismark for both single-end and paired-end alignments so that the better alignment to the very same position now trumps the poorer one. Fixed in fc61079.

Again many thanks for bringing this to my attention, my hat's off to you not only for finding this in the first place but especially for providing an excellent test case and suggested fix as well! Can you please clone the latest development version and let me know if it now really does what it is supposed to do? Best, Felix

@sylvainforet
Copy link
Author

Hello Felix,
Many thanks for the swift fix!
It seems to work on the rest of my data
All the best,
Sylvain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants