New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Reads written" counts both trimmed and untrimmed reads #128

Closed
marcelm opened this Issue May 28, 2015 · 4 comments

Comments

Projects
None yet
2 participants
@marcelm
Owner

marcelm commented May 28, 2015

In the summary stats, the "Reads written (passing filters)" figure includes both trimmed and untrimmed reads. This can be confusing if trimmed and untrimmed reads are written to different output files since then the number is the sum of both. See also issue #126.

@akshayparopkari

This comment has been minimized.

akshayparopkari commented Sep 11, 2015

Hi Marcel,

I am using cutadapt to trim adapter sequences from my paired-end reads and I am running into a an error similar to issue #126. I used single-end trimming approach as well as paired-end trimming approach and I get different results from both runs.

Here are the results from my runs -

This is cutadapt 1.8.3 with Python 2.7.8
Command line parameters: -b TCGATCGGAAKRGTTYGATYNTGGCTCAG -o P1_R1_V1-V3.fastq --discard-untrimmed A01_S1_L001_R1_001.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 3.54 s (27 us/read; 2.26 M reads/minute).

=== Summary ===

Total reads processed: 133,500
Reads with adapters: 69,984 (52.4%)
Reads written (passing filters): 69,984 (52.4%)

Total basepairs processed: 33,839,299 bp
Total written (filtered): 12,805,845 bp (37.8%)

This is cutadapt 1.8.3 with Python 2.7.8
Command line parameters: -b CGGACTTGATGTACGAACGTNTBACCGCDGCTGCTG -o P1_R2_V1-V3.fastq --discard-untrimmed A01_S1_L001_R2_001.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 3.97 s (30 us/read; 2.02 M reads/minute).

=== Summary ===

Total reads processed: 133,500
Reads with adapters: 64,335 (48.2%)
Reads written (passing filters): 64,335 (48.2%)

Total basepairs processed: 33,712,055 bp
Total written (filtered): 13,314,911 bp (39.5%)


This is cutadapt 1.8.3 with Python 2.7.8
Command line parameters: -b TCGATCGGAAKRGTTYGATYNTGGCTCAG -B CGGACTTGATGTACGAACGTNTBACCGCDGCTGCTG -o PE_R1.fastq -p PE_R2.fastq --discard-untrimmed A01_S1_L001_R1_001.fastq.gz A01_S1_L001_R2_001.fastq.gz
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 8.09 s (61 us/read; 0.99 M reads/minute).

=== Summary ===

Total read pairs processed: 133,500
Read 1 with adapter: 69,984 (52.4%)
Read 2 with adapter: 64,335 (48.2%)
Pairs written (passing filters): 84,904 (63.6%)

Total basepairs processed: 67,551,354 bp
Read 1: 33,839,299 bp
Read 2: 33,712,055 bp
Total written (filtered): 33,665,686 bp (49.8%)
Read 1: 14,391,373 bp
Read 2: 19,274,313 bp

I understand that the -p option checks whether the files are paired correctly or not. Also, I am discarding any read which doesn't have the adapter in it. I don't understand why are there 84,904 read pairs written to output, when reads with adapter 1 and adapter 2 are 69,984 and 64,335, respectively.

Please let me know what you think. Thank you!

@marcelm

This comment has been minimized.

Owner

marcelm commented Sep 14, 2015

I think that’s a bug: It seems that reads that should be discarded are actually written to the output file. The statistics is “correct” in the sense that it properly describes how many read pairs have been erroneously written to the output files. I’ll try to come up with a fix.

@marcelm

This comment has been minimized.

Owner

marcelm commented Sep 14, 2015

The problem you’re describing is separate from the one this issue is about, so I’ve opened issue #146.

@marcelm

This comment has been minimized.

Owner

marcelm commented Aug 27, 2018

It’s been a while, but this problem should finally be fixed: The “Reads written (passing filters):” figure will no longer include the untrimmed reads if --untrimmed-output was used.

@marcelm marcelm closed this in 6e36d5c Aug 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment