You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this very useful workflow. To reduce runtimes, can I suggest adding a 'whitelist' rule to preprocessing? This could reduce runtimes considerably in situations where the targets are limited (e.g., only interested in known human viruses).
I think the implementation could be straightforward:
Add a config option to specify a FASTA/Q file of sequences to whitelist.
In the preprocessing rule files, duplicate the existing host_removal_mapping rule to whitelist_read_mapping or equivalent
Instead of excluding mapped reads with samtools view -f 4..., the duplicated rule would map reads to the whitelist and retain only mapped reads with samtools view -F 4.
Thanks for your consideration!
The text was updated successfully, but these errors were encountered:
Hi,
This is an interesting suggestion. Do you think having an option to use a custom primary database for the viruses of interest would work? The primary searches do essentially what you're suggesting, but for all viruses, and the secondary multi-kingdom searches weed out the false positives from this reduced pool of sequences.
Yep, absolutely. Depending on how much database prep is needed for the primary database, I could envision situations where providing a FASTA whitelist file would be simpler and wouldn't require modifying the virus database. If the primary database is already just a FASTA file of all viruses, then specifying a custom FASTA file of, say, all human viruses would be great.
Thanks for this very useful workflow. To reduce runtimes, can I suggest adding a 'whitelist' rule to preprocessing? This could reduce runtimes considerably in situations where the targets are limited (e.g., only interested in known human viruses).
I think the implementation could be straightforward:
host_removal_mapping
rule towhitelist_read_mapping
or equivalentsamtools view -f 4...
, the duplicated rule would map reads to the whitelist and retain only mapped reads withsamtools view -F 4.
Thanks for your consideration!
The text was updated successfully, but these errors were encountered: