New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quality trim statistic error in PE mode #280

Closed
hujingchu opened this Issue Dec 27, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@hujingchu

hujingchu commented Dec 27, 2017

quality trim for read1:
cutadapt -q 10 -o filter.R1.fq.gz raw.R1.fq.gz
image

quality trim for read2:
cutadapt -q 10 -o filter.R2.fq.gz raw.R2.fq.gz
image

quality trim for read1 and read2 using PE mode
image

the number of bases trimmed with low quality is not match

@jack1120

This comment has been minimized.

jack1120 commented Jan 8, 2018

Hi hujingchu,

I am not the owner, but I believe paired-end mode is not intended to maintain the same number of bases between R1 and R2 reads, but rather ensures that read 1, read 2, read 3...read N from the R1 file stay in matching order with read 1, read 2, read 3...read N in the R2 file. It is common for sequences in the R2 file to be of lower quality, especially at the 3' end, and thus more bases will be trimmed from reads in that file.

Here's where I think the paired-end mode helps: Let's say that read 2 in your R1 file is high quality, but read 2 in your R2 file is of such low quality that it gets thrown out completely. If you weren't using paired-end mode, your respective files would look like:

R1..................R2
read 1..........read 1
read 2..........read 3
read 3..........read N
read N..........

The mismatched reads will cause problems downstream when it is time to merge the reads.

Whereas, using paired-end mode, your respective files will look like:

R1..................R2
read 1..........read 1
read 3..........read 3
read N..........read N

Thus, they stay in matching order. It does not matter that, for example, read 1 in the R1 file is 100 bp and read 1 in the R2 file is 75 bp.

I hope this helps!

@marcelm

This comment has been minimized.

Owner

marcelm commented Jan 11, 2018

Sorry for the late reply, I took some time off

Thanks @jack1120 for taking the time to reply. What you’re saying is correct. Note though that a read can be trimmed to a length of zero. Cutadapt only throws reads away if an explicit filtering option was used (such as --minimum-length=20) and maintaining the matching order is important only in those cases.

The more I look at the issue, the more I believe that @hujingchu wanted to report a different problem: 263 bp are trimmed from R1 and 4417 bp are trimmed from R2 when they are processed separately. However, when trimming them jointly in paired-end mode, the report indicates that both R1 and R2 were trimmed by 4680 bp (which is equal to 263 plus 4417). This is indeed wrong.

@hujingchu I’ll have a look. I am very sure that the bug is only in the report; the actual trimming is correct.

@marcelm marcelm closed this in 6d68103 Jan 11, 2018

@marcelm

This comment has been minimized.

Owner

marcelm commented Jan 11, 2018

The problem is fixed now in the master branch. If you need the fix, you need to either install a development version or wait for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment