-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add QC Stats #161
Add QC Stats #161
Conversation
@LeeBergstrand Like I mentioned in #91 , this is a nice start toward making QC summary reports. Here are a few observations of mine after testing this code:
If you agree that this PR is a good way to address #91, then we can start to refine this PR including addressing the comments above. Thanks again! |
Sounds good. I'll pursue this. |
@jmtsuji This is essentially a preliminary pass of generating QC stats for Rotary.
Caveats
Does this look good enough to merge in for now? @jmtsuji Do you have any comments or suggestions for improvements? |
@LeeBergstrand Thanks so much for these updates. Things are a bit crazy on my end at the moment, but I'll take a look at #161 and #173 by the end of this week. Let me know if a code review is urgent, and I can take a look sooner. Thanks! |
…empt to allow it to make short qc reports. This would occur even if there was no short reads to process.
@jmtsuji End of the week should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeBergstrand Thanks for working on these improvements to the QC reports! Although I haven't tested the code yet, I didn't notice any issues when looking through the updated rules. The new code looks like it should work well within the caveats you mentioned.
Regarding the caveats:
- The names of the FASTQC files are currently not ordered in the MultiQC output.
We could order them but that would require a step adding organized numbers to the FASTQC output.
I don't think it's a huge priority to have them ordered. We can leave them unordered for this PR.
- This code doesn't use checkpoints or input functions. I found it hard to approach using them without an entire rewrite.
My only concern is that rule run_fastq_short
might error out if the user turns off one of the short read QC steps (e.g., adapter trimming) in the config file, because QC_SHORT_FILE_TYPES
is hard-coded to include '_reformat_', '_adapter_trim_', '_quality_trim_'
. Do you agree that this will likely be an issue? (I haven't tested the code to confirm, but I can do so if needed.)
Assuming the above error occurs: if it looks like it's going to be difficult to address this issue using input functions etc., then I would be OK with just simplifying rule run_fastqc_short
to be like rule run_fastqc_long
, which only shows the raw reads and the final QC'ed reads. Although seeing the results of intermediate QC steps could be nice for some troubleshooting cases, I don't think it's a huge priority to include these in the report at this stage, especially if doing so will require some major code re-writes.
Let me know your thoughts, and we can proceed based on that. Thanks!
@LeeBergstrand Approved -- feel free to merge this in. I can run an end-to-end test of the merged code in this branch once we're satisfied with the FastQC reports. |
Add assembly stats
@jmtsuji Great! |
This can be addressed by changing |
Add QC stats to Rotary.