Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cutadapt outputting some untrimmed second reads when demultiplexing paired-end reads ("--untrimmed-paired-output" argument seems not to work) #347

Closed
andrefaure opened this issue Dec 18, 2018 · 7 comments

Comments

@andrefaure
Copy link

commented Dec 18, 2018

Hello,

See biostars post here:
https://www.biostars.org/p/354990/

I am attempting to demultiplex barcoded 100bp paired-end illumina short-read sequencing data with cutadapt following these instructions: https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing

I only want to retain read pairs where the barcodes were found and trimmed/removed in both reads of a pair.

However cutadapt is outputting untrimmed second reads despite having specified the "--untrimmed-paired-output" argument.

Full details of my analysis are as follows (using cutadapt 1.17 installed with pip; Python 3.6.5):

Command-line parameters:

cutadapt -g file:demultiplex_barcode-file_1.fasta -G file:demultiplex_barcode-file_1.fasta -e 0.25 --no-indels --untrimmed-output Input_1.fastq.gz.demultiplex.unknown.fastq --untrimmed-paired-output Input_2.fastq.gz.demultiplex.unknown.fastq -o {name}1.fastq -p {name}2.fastq Input_1.fastq.gz Input_2.fastq.gz

demultiplex_barcode-file_1.fasta:

>Input_Rep1_read
^GCCGAATT
>Input_Rep2_read
^CGGCAATT
>Input_Rep3_read
^GAACGTTC

Before cutadapt demultiplexing (example read1 FASTQ entry in Input_1.fastq.gz):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 1:N:0:
GCCGAATTTGCAGTTTGAACAAAGCAAGAACTTACCCCAAACAATTAGTGGAATTGGCAAAAGAAGAAGACAAAGCCACCCCAAGTTAGATTTCGATCCT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJIJJIIJJJIJJJJJIIIIIJIJGGIIIIJGIGGGIJIIJCGHEHEC@D@CA@C>BDDDDCDCCCCDCDCBD?@>

Before cutadapt demultiplexing (example read2 FASTQ entry in Input_2.fastq.gz):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 2:N:0:
CCGAATTAAAATGTCCAATGTTCCAACCTACAGGATCGAAATCTAACTTGGGGTGGCTTTGTCTTCTTCTTTTGCCAATTCCACTAATTGTTTGGGGTAA
+
CCCFFFFFHHHHHJJJJJJIJJJJJJIJJJIJJJJJJJJIJJJJGHGIJIIJJIIDBE@GGHFFFHDHFFCFFEA6>;@ACCACC;;>>:>CCA@BBDB3

After cutadapt demultiplexing (example read1 FASTQ entry in Input_Rep1_read1.fastq):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 1:N:0:
TGCAGTTTGAACAAAGCAAGAACTTACCCCAAACAATTAGTGGAATTGGCAAAAGAAGAAGACAAAGCCACCCCAAGTTAGATTTCGATCCT
+
HHHHHJJJJJJJJJJJJJIJJIIJJJIJJJJJIIIIIJIJGGIIIIJGIGGGIJIIJCGHEHEC@D@CA@C>BDDDDCDCCCCDCDCBD?@>

After cutadapt demultiplexing (example read2 FASTQ entry in Input_Rep1_read2.fastq):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 2:N:0:
CCGAATTAAAATGTCCAATGTTCCAACCTACAGGATCGAAATCTAACTTGGGGTGGCTTTGTCTTCTTCTTTTGCCAATTCCACTAATTGTTTGGGGTAA
+
CCCFFFFFHHHHHJJJJJJIJJJJJJIJJJIJJJJJJJJIJJJJGHGIJIIJJIIDBE@GGHFFFHDHFFCFFEA6>;@ACCACC;;>>:>CCA@BBDB3

As you can see, the barcode was not matched in the second read but both reads nevertheless still appear in the supposedly trimmed output files (Input_Rep1_read1.fastq and Input_Rep1_read2.fastq).

I have tried specifying "--pair-filter=any" although this is the default setting. Neither specifying "any" nor "both" makes any difference to this read pair being retained despite the second read being untrimmed.

Any help would be appreciated!

Thanks,

Andre

@marcelm

This comment has been minimized.

Copy link
Owner

commented Dec 18, 2018

Hi, I’ll be away for the next three weeks. If you still have your problem then, please ping me here.

@qifei9

This comment has been minimized.

Copy link

commented Jan 25, 2019

I meet the same problem.

Perhaps this is the designed behavior, as it writes in the doc https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing

Paired-end demultiplexing always uses the adapter matches of the first read to decide where a read should be written. If adapters to be found in read 2 are given (-A/-G), they are detected and removed as normal, but these matches do not influence where the read pair is written.

Therefore, I think currently --pair-filter=(any|both) works in paired-end mode but not in paired-end demultiplexing mode. In paired-end demultiplexing it just ignores that parameter and does not care whether an adapter is found in the 2nd reads or not.

However, I think that make --pair-filter=(any|both) works in paired-end demultiplexing mode would be really helpful.

@andrefaure

This comment has been minimized.

Copy link
Author

commented Jan 25, 2019

Hi, I’ll be away for the next three weeks. If you still have your problem then, please ping me here.

Hi marcelm, I still have the problem, as does qifei9 above... it would be really helpful to have this functionality. Thanks, Andre

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 4, 2019

Hi, and sorry for the delay. Yes, the current behavior is as designed, as @qifei9 pointed out.

So currently, only the adapter matches in R1 are used as the demultiplexing criterion. Additionally, whatever happens in R2 is completely independent of R1. So you can trim whichever adapters you want from R2 or none at all, it doesn’t influence the demultiplexing.

So what would the desired functionality actually be? I could think of adding a new option, let’s call it --pair-adapters for now, that would enforce that adapters are always found in matching pairs. The behavior would be this: When you specify a list of adapters for R1, such as with -g file:barcodes1.fasta, you also have to specify a list of adapters to remove from R2, for example with -G file:barcodes2.fasta, and then an adapter from barcodes1.fasta would only be found in R1 if its corresponding partner from barcodes2.fasta also is found in R2 (and vice versa).

I think this would solve the problem. Demultiplexing would still only look at the the R1 adapter match, but then you could be sure that it has been found together with a matching adapter in R2. All the pairs where the adapter in R1 does not match the one in R2 would end up in the --untrimmed-(paired-)output files.

@andrefaure

This comment has been minimized.

Copy link
Author

commented Feb 5, 2019

Thanks for getting back to us about this Marcel! The functionality with a new option "pair-adapters" sounds good. Or perhaps coopting the "pair-filter" option for this purpose in demultiplexing mode? Either of these would solve the problem... Thanks again!

@marcelm

This comment has been minimized.

Copy link
Owner

commented Mar 14, 2019

I’ve implemented the --pair-adapters option now. The --pair-filter option is orthogonal, so I haven’t touched it.

Please see the documentation here and let me know whether that is what you need:
https://cutadapt.readthedocs.io/en/latest/guide.html#paired-adapters-dual-indices

@andrefaure

This comment has been minimized.

Copy link
Author

commented Mar 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.