New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trim-*: discard unmatched #10
Comments
Actually, this might not be quite so straightforward --- the cc @ebolyen |
@thermokarst Would |
Thanks @mikerobeson! I updated my comment above to clarify that I was discussing QIIME 2 types, specifically. Without going into too much detail, QIIME 2 handles a bunch of file validation when reading and saving files. Right now, it is not possible in QIIME 2 to save an artifact of type |
Thanks for the update @thermokarst! I completely understand these sorts of issues. Thank you for looking into this. Cheers! |
This recently came up on the forum. |
Stewing on this problem a bit. One thing about this
Thoughts @mikerobeson, @ebolyen, @nbokulich? |
Why not add the parameter to There still is the concern about having an empty output — just raise and error if the output will be empty (after all, users would be consciously specifying that they want to drop any non-matches, so better to wait and raise an error [== notification] than an empty output that is not useful to them). I do not like the other options as much for these reasons:
Method sprawl just confuses people especially if the methods are largely redundant.
The discarded output would be useless. I could see some cases where it may be useful (e.g., filter out different marker genes that have primers attached, e.g., for those folks mixing ITS + 16S), but by and large this would not be useful. Still, I prefer this out of your three options
Meh. This breaks the current use case where we use I really think that we can kill 2 birds with one stone here. Your option 2 is most palatable to me, but why not just discard those reads and raise an error if the output is empty? |
Yep, let's just discard outright. When the framework supports optional outputs we can start plopping those reads into their own artifact. 🎚️ |
Hello everyone. I think @nbokulich hit all the points quite well and I largely agree with them. I'd also echo his question:
As this must not only be addressed here but, if I remember correctly, this "empty file" issue has come up before. Didn't this occur with the ITSxpress plugin? Or was that determined to be another issue? If not, I can still imagine several cases where this "empty file" issue can arise in both of these plugins. Sometimes it can be helpful to have cutadapt to write out the untrimmed sequences, i.e.
This helps to determine how many off-targets or the types of contamination that can be encountered. At least my colleagues tend to ask me for the reads that did not make the "cut". Haha, see what I did there? 😆 |
Oops! Sorry @thermokarst I did not see your response prior to posting mine. :-) |
* ENH: Adds `discard_untrimmed` to filter methods Fixes #10 * SQUASH: addressing @nbokulich's demands
This cutadapt parameter controls if unmatched reads should be discarded - this would be pretty useful to wrap (and straightforward).
This recently came up on the forum.
The text was updated successfully, but these errors were encountered: