-
Notifications
You must be signed in to change notification settings - Fork 2
Description
- Samtools & Samblaster
- https://github.com/epigen/atacseq_pipeline/blob/main/workflow/rules/processing.smk
- samtools filtering parameter
-F 2316configurable see docs & SAM flags - and/or samblaster duplicate removal configurable
# Adding `remove_duplicates` to the `params` section: ... remove_duplicates = "--removeDups", ... # Including `{params.remove_duplicates}` in the `samblaster` command: ... samblaster {params.add_mate_tags} {params.remove_duplicates} 2> "{output.samblaster_log}" | \ ....
- Macs2
- provide arguments and links for/against keeping duplicates and how to deal with them. show where to change if necessary in config
regarding more regions: I checked the alignment step (https://github.com/epigen/atacseq_pipeline/blob/main/workflow/rules/processing.smk) and as you suspect we flag the duplicates but don't remove them. I can't recall a specific argument for/agains it. Therefore, I can not claim this was intentional, on the other hand, most of the commands are quite specific and intentional. As far as I understand and have read up a little there are pro and cons associated with duplicate removal (pro: technical bias removed, con: potential biology removed; the usual tradeoff). My hope is that signals still remain the same, especially with many samples, filtering and normalization.
MACS2 automatically removes duplicates (with some threshold for max duplication). The duplicates macs2 removes is consistent with those marked by samblaster.