Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first adapter in linked pair is incorrectly trimmed from 5' end of read #394

Closed
dmelu opened this issue Aug 10, 2019 · 2 comments

Comments

@dmelu
Copy link

commented Aug 10, 2019

Using cutadapt 2.1 compiled from source code, the below command trims the first adapter of the specified liked pair from the read in file temp1.fq. This behavior seems incorrect because many bases are mismatched near the 3' end of the trimmed adapter. The correct behavior should be that the read is left unchanged. Also, the last two bases of the adapter are not trimmed.
The observed behavior would be correct if option --no-indels was not specified. So the issue seems to be that cutadapt ignores option --no-indels when searching for linked adapters.
Another weird behavior is that the numbers of allowed errors are calculate using a 10% error rate, instead of the 20% rate specified by option -e.

cat ~/tmp/temp1.fq
@bug
CGGGCAGGCTATGTTTTCTTCTGCTGGACCCTTTCATACCTGCTTCCACTCTGACCTCCACATAAATTCCACCAGCGAGTTTCAGCTTTTTTGCCCAATTC
+
AAAAFJJ7JFFFJJJJJJJJJJJJJJJJJJAJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJ-FJ<AJ)A7)7FJAFFAFJJJJJFJAA<FJ<
cutadapt --no-indels -e 0.2 -y ' {name}' -a '3=^CGGGCAGGCTATGTTTTCTTTCTACTGGATC...AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT' temp1.fq
This is cutadapt 2.1 with Python 3.7.2
Command line parameters: --no-indels -e 0.2 -y  {name} -a 3=^CGGGCAGGCTATGTTTTCTTTCTACTGGATC...AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT temp1.fq
Processing reads on 1 core in single-end mode ...
[8<----------] 00:00:00             1 reads  @   1153.2 µs/read;   0.05 M reads/minute
@bug 3
CCTTTCATACCTGCTTCCACTCTGACCTCCACATAAATTCCACCAGCGAGTTTCAGCTTTTTTGCCCAATTC
+
JAJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFJ-FJ<AJ)A7)7FJAFFAFJJJJJFJAA<FJ<
Finished in 0.00 s (3591 us/read; 0.02 M reads/minute).

=== Summary ===

Total reads processed:                       1
Reads with adapters:                         1 (100.0%)
Reads written (passing filters):             1 (100.0%)

Total basepairs processed:           101 bp
Total written (filtered):             72 bp (71.3%)

=== Adapter 1 ===

Sequence: CGGGCAGGCTATGTTTTCTTTCTACTGGATC...AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT; Type: linked; Length: 31+33; 5' trimmed: 1 times; 3' trimmed: 0 times

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-31 bp: 3

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-33 bp: 3

Overview of removed sequences at 5' end
length	count	expect	max.err	error counts
29	1	0.0	2	0 0 0 1



Overview of removed sequences at 3' end
length	count	expect	max.err	error counts

@marcelm

This comment has been minimized.

Copy link
Owner

commented Aug 10, 2019

Thanks for catching this! The example made it easy to reproduce the problem.

I’ve fixed this now. Indeed, the command-line options -e, -O and --no-indels were accidentally being ignored for linked adapters.

@marcelm marcelm closed this in eb5e089 Aug 10, 2019

@dmelu

This comment has been minimized.

Copy link
Author

commented Aug 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.