New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paired End Demultiplexing #118

Closed
davmlaw opened this Issue Apr 13, 2015 · 5 comments

Comments

Projects
None yet
3 participants
@davmlaw
Contributor

davmlaw commented Apr 13, 2015

This is not currently supported, and displays "Demultiplexing not supported for paired-end files, yet."

I started writing this feature, but I think it's not obvious how it should work - so perhaps we can spec out a design, so it can be implemented properly?

Here is a command line I'd like to work:

cutadapt -b file:barcodes.fasta -o trimmed-{name}_R1.fastq.gz --paired-output trimmed-{name}_R2.fastq.gz ${READ1} ${READ2}

So, if you specify {name} in R1 (demultiplexing), you need it in paired-output as well. (otherwise error?)

The easy case is if you are demultiplexing on only 1 end - then you just use the same {name} for both outputs.

If you specify different barcodes for read 1 and read 2 (ie -b foo=GATC -A foo=GATC) complicated things can happen. If both are specified and both match, it makes sense to write them out with that name. If they don't agree - what do you do?

@davmlaw davmlaw changed the title from Paired End Multiplexing to Paired End Demultiplexing Apr 13, 2015

@marcelm

This comment has been minimized.

Owner

marcelm commented Apr 13, 2015

I think cutadapt should always write out correctly paired data (the two files should be in sync). From that it follows that {name} would need to be in both file name patterns, as you said. Then the other issue is what to do when adapters for the first and also for the second read are given. I think the only solution that makes sense to me now is to have the {name} always refer to the set of adapters that are removed from the first read. That is, even if you provide -A foo=ACGT, the name would be ignored.

I haven’t had a use case for paired-end demultiplexing myself, so I think the most important question is to ask what is useful for a user.

If you are interested in implementing this, you could also ignore the issues arising when a second set of adapters is given. Cutadapt has two paired-end trimming modes. For backwards compatibility, a "legacy" mode is enabled when none of the uppercase adapter options are given (-A/-B etc). With the command you give above, cutadapt runs in legacy mode (which mainly means that the second read is ignored for length filtering and is unaffected by quality trimming). It seems it would be sufficient for you if demultiplexing worked in legacy mode.

@jrderuiter

This comment has been minimized.

jrderuiter commented Jul 3, 2017

Any news on this?

@marcelm

This comment has been minimized.

Owner

marcelm commented Jul 3, 2017

It seems @davmlaw did not pursue this further, but I can give it a try this week.

@marcelm marcelm closed this in 53c1ac0 Jul 10, 2017

@marcelm

This comment has been minimized.

Owner

marcelm commented Jul 10, 2017

This is now implemented. Please see the updated section in the documentation.

@jrderuiter

This comment has been minimized.

jrderuiter commented Jul 10, 2017

This is great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment