Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similar barcode warning only for 1 bp error rate #463

Closed
duartemolha opened this issue May 28, 2020 · 7 comments
Closed

similar barcode warning only for 1 bp error rate #463

duartemolha opened this issue May 28, 2020 · 7 comments

Comments

@duartemolha
Copy link

Hi ... I am using This is cutadapt 2.7 with Python 3.6.7 installed using conda

My question has to do with a warning I get when I set the error value to 0.15 (1 error in an 8 bp index allowed)
The command I use is:
cutadapt -e 0.15 --overlap 8 --no-indels -g file:indexes_anchor.fa -o selected-anchor-{name}_R1.fastq.gz -p selected-anchor-{name}_R2.fastq.gz --untrimmed-output unallocated-anchor_R1.fastq.gz --untrimmed-paired-output unallocated-anchor_R2.fastq.gz sequences_R1_001.fastq.gz sequences_R2_001.fastq.gz

I get this warning:

Building index of 48 adapters ...
WARNING: Adapters 14 'GACTAGTA' and 30 'GACTACTT' are very similar. At 1 allowed errors, the sequence 'GACTAGTT' cannot be assigned uniquely because the number of matches is 7 compared to both adapters.
Built an index containing 1192 strings.

however, if I increase the error rate to 0.25 for example (meaning I accept 2 bp errors on a 8 bp index) the error does not show again... surely if I now accept 2 bp errors instead of one I should be seeing more warnings about similarity of indexes - not less.

What am I missing?

@marcelm
Copy link
Owner

marcelm commented May 28, 2020

(Note that --overlap 8 is not needed since you use anchored adapters anyway. They have to occur in full.)

I cannot reproduce this at the moment. I’m testing with this command (on Cutadapt 2.7):

cutadapt --no-indels -e 0.25 -g ^GACTAGTA -g ^GACTACTT -o /dev/null /dev/null

and I do get the warning:

WARNING: Adapters 1 'GACTAGTA' and 2 'GACTACTT' are very similar. At 2 allowed errors, the sequence 'GACTAGTT' cannot be assigned uniquely because the number of matches is 7 compared to both adapters.

The check for similarity of adapters is only done when also an index is built. Do you actually get the message "Building index of 2 adapters ..." in the second case?

@duartemolha
Copy link
Author

no I do not get the message of Building Index on the second case.

@marcelm
Copy link
Owner

marcelm commented May 29, 2020

So then in the second case, the index isn’t built, which is why you don’t get the warning. Did you change anything else apart from increasing the maximum allowed error rate?

@duartemolha
Copy link
Author

duartemolha commented Jun 2, 2020 via email

@marcelm
Copy link
Owner

marcelm commented Jun 2, 2020

Is your indexes_anchor.fa file secret or would you be willing to share it with me? You can send it by e-mail to marcel.martin@scilifelab.se and I will not share it with anyone if that helps.

@duartemolha
Copy link
Author

I have sent you an email... thank you for your help

@marcelm
Copy link
Owner

marcelm commented Jun 2, 2020

Thanks for the file. However, I still cannot reproduce the problem. It would be very helpful if you could run your two commands again and then copy and paste the full output up to the message "Processing reads on 1 core in paired-end mode ...".

For me, it looks like this (using /dev/null is input file names, which does not matter as the adapter index is generated before the input is read):

$ cutadapt -e 0.15 --overlap 8 --no-indels -g file:indexes_anchor.fa -o "out-{name}_R1.fastq.gz" -p "out-{name}_R2.fastq.gz" --untrimmed-output /dev/null --untrimmed-paired-output /dev/null /dev/null /dev/null
This is cutadapt 2.7 with Python 3.6.7
Command line parameters: -e 0.15 --overlap 8 --no-indels -g file:indexes_anchor.fa -o out-{name}_R1.fastq.gz -p out-{name}_R2.fastq.gz --untrimmed-output /dev/null --untrimmed-paired-output /dev/null /dev/null /dev/null
Building index of 48 adapters ...
WARNING: Adapters 14 'GACTAGTA' and 30 'GACTACTT' are very similar. At 1 allowed errors, the sequence 'GACTAGTT' cannot be assigned uniquely because the number of matches is 7 compared to both adapters.
Built an index containing 1192 strings.
Processing reads on 1 core in paired-end mode ...

Please also write here the output of conda list '(dnaio|xopen)'.

@marcelm marcelm closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants