Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
I'm using cutadapt 2.4 and Python 3.6.8 installed with pip3.
When demultiplexing using linked adapters and pair-end reads I recognize that cutadapt does favor a shorter partial overlap that would include a mismatch over longer partial overlap without any mismatch.
The full cutadapt command is the following:
The adapters (that include barcodes) are identical except the last few (5 NTs). The adapters look like this:
However in the 1a results (file: trimmed-1a_R1.fastq) are Sequences like this:
I marked the interesting regions bold. As you see, this read fits better to 17a but is assigned to 1a. I know that 1a is not wrong because I allow mismatches, but 17a fits much better, because of the direct hit
Attached you can find all input and output files.
Hi, sorry that I somehow missed your bug report. Thanks for attaching all the necessary files, this makes it easy to reproduce.
At least one of the problems is that you are encountering issue #394, which was that the
However, I’ll still need to look into this further because even without
The criterion that determines which adapter is the best-matching one is simply the number of matches in the alignment.
When allowing indels in the above example, the problem was that the alignments for 1a and 17a were considered to be equivalent because they both contain 17 matches. And in that case, the rule was that simply the first one found wins. Since 1a was listed before 17a in your FASTA file with adapter sequences, 1a was found.
Alignment for 1a:
Alignment for 17a:
I have now fixed this by using the number of errors in the alignment as a tie breaker. That is, if two adapters get the same number of matches in their alignments, the one with the lower number of errors wins. In the above case, this would then correctly prefer 17a over 1a.
Thanks for finding this! This part of Cutadapt has not been changed in a long time, so this behavior has been as it is in a while. This change also applies to any other adapter type, by the way, not only linked adapters.
Thank you!!! That was a fast fix!
I recognized the "first wins" behaviour when inputting duplicates in the adapter file. I will open a suggestion to print a warning for this as a seperate issue (but it has low priority I think).
I have also some more ideas of improvements for amplicon related cutting which I will also post later.
Thank you again!