Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexing paired-end reads with dual barcodes and primers #292

Closed
koopkaup opened this issue Feb 21, 2018 · 17 comments

Comments

@koopkaup
Copy link

commented Feb 21, 2018

Hi,
I have Illumina MiSeq reads in the following format
forward_barcode_sequence forward_primer_sequence read reverse_primer_sequence reverse_barcode

Because forward barcodes is used in combination with reverse barcodes, searching only the first barcode as cutadapt does right now does not work. Is it possible to add an option to search for both barcodes simultaneously to decide where a read belongs to?

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 21, 2018

You should be able to use a linked adapter for this purpose. Can you try this and let me know whether it works? I’ll then make it clearer in the documentation.

@koopkaup

This comment has been minimized.

Copy link
Author

commented Feb 21, 2018

I will try this, but how can I provide linked adapters as a fasta file? As shown in the example:
-a file:barcodes.fasta

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 21, 2018

Put them in the FASTA file like this:

>adapter1
ACGT...AACCGGTT
>adapter2
TTAAGG...CCAA
@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 21, 2018

I guess that, since you use file:, you probably have more than a few adapters. In that case, putting them into the FASTA file as I said in the comment above requires you to list all possible combinations. Depending on how many there are, this could be a bit inefficient and you may be better off running cutadapt in a "nested" way: Run it once to demultiplex according to the forward barcode, and then run it on all the output files to demultiplex according to the reverse barcode.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 21, 2018

Sorry, I should have probably read the title more thoroughly. I skipped the fact that you have paired-end reads and the above will only work as I suggested when you merge the reads before running cutadapt.

I will consider adding an option to make this easier.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Apr 30, 2018

Can you clarify whether your reads look like you describe above or whether it is the DNA fragment that you were describing? (forward_barcode_sequence forward_primer_sequence read reverse_primer_sequence reverse_barcode)

@koopkaup

This comment has been minimized.

Copy link
Author

commented May 17, 2018

My reads are in that format.

@DenisGoryunov

This comment has been minimized.

Copy link

commented Jul 4, 2018

Hi,
It seems i have the same kind of data. Please check my recent post on Biostars:
https://www.biostars.org/p/324429/#324738

@marcelm

This comment has been minimized.

Copy link
Owner

commented Jul 4, 2018

Could one of you send me some small part of your dataset (with just a couple of reads)? I would also need to know what the forward_barcode_sequence, forward_primer_sequence, reverse_primer_sequence, reverse_barcode sequences are (these aren’t random barcodes I assume). You can send this to me privately at marcel.martin@scilifelab.se, but mention it here if you have done so.

However, note that I will not have time to work on this until middle of August at the earliest.

@DenisGoryunov

This comment has been minimized.

Copy link

commented Jul 7, 2018

Hi,
Unfortunately my data are already demultiplexed by sequencing facility. But to my understanding the task is exactly the same (see detailed discription on Biostars). In case you still need my data please let me know.
This may be useful for understanding of dual index technology:
https://www.drive5.com/usearch/manual/pipe_demux.html

@marcelm

This comment has been minimized.

Copy link
Owner

commented Feb 28, 2019

As a summary for myself: There are two different dual indexing strategies used by Illumina

  • Combinatorial indexing uses 8 different i5 indices and 12 different i7 indices. These used together allow for 96 combinations.
  • Non-redundant indexing (unique dual indexes, UDI) uses 96 unique i5 indices and 96 unique i7 indices. Apparently, these are only ever used in pairs, that is, the first i5 index is always used with the first i7 index and so on.

For dealing with this type of data in Cutadapt, we need two options.

  • For combinatorial indexing, an idea could be to allow not only {name} in the demultiplexing file name template, but perhaps something like {name1}{name2}, where {name1} is the name of the adapter (barcode) that was found in R1 and {name2} is the name of the adapter (barcode) that was found in R2.

  • For UDI, the --pair-adapters option suggested in #347 would be necessary.

@koopkaup

This comment has been minimized.

Copy link
Author

commented May 16, 2019

Have you come up with a solution for the first situation where there are multiple combinations of indices?
We have done sequencing like this and I would like to try out your method.

@marcelm

This comment has been minimized.

Copy link
Owner

commented May 16, 2019

My plan is to implement the idea where you can use something like {name1}{name2} in the file name templates, as mentioned above. I don’t know when I have time for this, hopefully this month.

For completeness: The second part, which is the --pair-adapters option, is implemented.

@ArnavGuptaa

This comment has been minimized.

Copy link

commented Jun 5, 2019

Any update regarding combinatorial indexing?

@marcelm

This comment has been minimized.

Copy link
Owner

commented Jun 10, 2019

I’m working on this now, give me a couple more days.

@marcelm marcelm closed this in c1134aa Jun 11, 2019

@marcelm

This comment has been minimized.

Copy link
Owner

commented Jun 11, 2019

Hi, this is now implemented ("combinatorial demultiplexing"). Please read the new section in the documentation.

It would be great if someone could test it and let me know whether this is what you need before I make a new release. Just follow the instructions for how to install a development version.

@koopkaup

This comment has been minimized.

Copy link
Author

commented Jun 11, 2019

Thanks! I can try it next week and let you know how it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.