Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
quality trim statistic error in PE mode #280
I am not the owner, but I believe paired-end mode is not intended to maintain the same number of bases between R1 and R2 reads, but rather ensures that read 1, read 2, read 3...read N from the R1 file stay in matching order with read 1, read 2, read 3...read N in the R2 file. It is common for sequences in the R2 file to be of lower quality, especially at the 3' end, and thus more bases will be trimmed from reads in that file.
Here's where I think the paired-end mode helps: Let's say that read 2 in your R1 file is high quality, but read 2 in your R2 file is of such low quality that it gets thrown out completely. If you weren't using paired-end mode, your respective files would look like:
The mismatched reads will cause problems downstream when it is time to merge the reads.
Whereas, using paired-end mode, your respective files will look like:
Thus, they stay in matching order. It does not matter that, for example, read 1 in the R1 file is 100 bp and read 1 in the R2 file is 75 bp.
I hope this helps!
Sorry for the late reply, I took some time off
Thanks @jack1120 for taking the time to reply. What you’re saying is correct. Note though that a read can be trimmed to a length of zero. Cutadapt only throws reads away if an explicit filtering option was used (such as
The more I look at the issue, the more I believe that @hujingchu wanted to report a different problem: 263 bp are trimmed from R1 and 4417 bp are trimmed from R2 when they are processed separately. However, when trimming them jointly in paired-end mode, the report indicates that both R1 and R2 were trimmed by 4680 bp (which is equal to 263 plus 4417). This is indeed wrong.
@hujingchu I’ll have a look. I am very sure that the bug is only in the report; the actual trimming is correct.