-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert adapter matches to lowercase instead of trimming #166
Comments
Masking bases that should be trimmed with What is currently not possible is to transform bases to lowercase and I agree it would be useful to have. Have you seen that cutadapt can output an "info file" with the I’ll leave this report open and change the title to remind me that a |
Good to know that by using the info file we can recreate the requested behavior. But this poses a trade-off: we have to do some extra calculus (CPU time, not really important) and we have to write the intermediate file and re-read it again (preventing us from piping the results into other tools and doubling the disk IO needed for the processing, and this is quite important to us). |
Yes, some extra processing is needed, but wouldn’t you have to do that in any case? Even when cutadapt had a I’m happy to implement this, but I’m just not sure whether it really helps in your case. |
Our aligner can handle the situation. For lower case input bases, it will output a soft clipping operation in the sam file (CIGAR operation: S, like 1S90M10S for a removal of the first base and the last ten bases). |
Thanks! |
It’s taken a while, but I don’t forget :-) |
One more idea. Another thing that we need.
Following with the idea of keeping all the information in the output, it would be nice to be able to clip instead of trim the sequences. Something similar to Repeat Masker:
In our case, that we need this functionality, we have to store sequence and qualities in the ID line, run the trimming process, and postprocess the output to put back the original sequence and qualities, but keeping only in uppercase the bases that survived the process.
This allows us to regenerate the original FastQ file from the resulting aligned bam, thus reducing the space we need to store all the data.
The text was updated successfully, but these errors were encountered: