-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overtrimming of reads #42
Comments
To follow up on this: I tried trimming using the example data again but without specifying the adapter sequences and the reads were not trimmed. So to clarify the issue, it is with spurious match of the adapter to the read. I think you need to add the equivalent of the -O parameter in Atropos and Cutadapt. Also note that, with the alternate option I describe above (using random match probability), Atropos uses a heuristic (that was copied from SeqPurge) to require both adapter sequences to match their respective reads when the match length is <= than some threshold (9 bp by default), regardless of the probability. This seems to be necessary to achieve good performance with short adapter matches. |
Thank you John. In current implementation, It's a good idea to expose a parameter to change this setting. I will implement this in next release. |
When can we expect this new feature to be released? The -O option in Cutadapt is really useful. |
I am benchmarking fastp against other read trimmers using the workflow I developed for the Atropos paper (https://github.com/jdidion/atropos/tree/master/paper/workflow). I find that fastp has a high rate of read overtrimming. Example fastq input and output are attached. The command I used is:
fastp
-i {fastq1} -I {fastq2} -o {prefix}.1.fq.gz -O {prefix}.2.fq.gz
--adapter_sequence {adapter1} --adapter_sequence_r2 {adapter2}
--thread {threads} --length_required 25 —disable_quality_filtering
Nearly all of these overtrimming events involve the spurious removal of up to 10 bases from one or both reads:
I suspect this might be due to overzealous alignment of the reads to each other, and could probably be fixed with an option to require a minimum insert overlap before trimming. Another approach (which is offered as an option in Atropos) is to compute the random match probability of each alignment and compare against a user-specified threshold value.
example.zip
The text was updated successfully, but these errors were encountered: