-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collect mapq stats in the pairs-stats if possible #80
Comments
pairtools select + pairtools stats combination covers this use case, see example distiller version that implements filters: https://github.com/open2c/distiller-nf/blob/filter_stats/distiller.nf |
I actually think it shouldn't be hard to implement, and the output would be less confusing than separate select + stats, which produces fewer total reads for mapq30, for example, and then requires stitching two separate files to get proper stats in one place... Either hardcode splitting by different mapq values, or for arbitrary filtering criteria (thinking the distiller filters!). Shall we try to squeeze it in this release? |
Yes, if you see an easy way to do it, than there are no restrictions! The first problem that I anticipate is that we have a hard-coded formatting of stats dictionary into either tsv or yaml. With addition of variable output depending on mapq we'll probably have to invent something more general than what we have right now: pairtools/pairtools/lib/stats.py Line 538 in 40dd81c
The second problem is that we probably want not only |
Formatting for saving is indeed annoying, and I actually think we should organize the original dictionary as we want to save it, and at least with YAML we could just dump it as it is... Re different filters: I would just use |
Some of the fields in stats dictionary are numpy arrays that have to be formatted before posting into YAML. Just a warning if you decide to change it. Re different filters: pairtools/pairtools/cli/select.py Lines 260 to 270 in 40dd81c
|
This is now implemented with arbitrary filters in |
collect mapq stats in our stats if possible - would be useful/helpful for #78 - as there seems to be no easy drop-in mapping quality summary collector ...
maybe check samtools module though - https://multiqc.info/docs/#samtools
The text was updated successfully, but these errors were encountered: