-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkqc module #1552
Checkqc module #1552
Conversation
The MultiQC - Linux / Linux -Python 3.x checks fail in subtest Test for missing CSPs with this output result:
I am not sure where the problem is. Can you please advise? Thanks! |
In short, it wasn't your fault - was failing for all MultiQC tests. It's since been fixed, I just pulled in updates from |
x-ref test data PR: MultiQC/test-data#218 |
Also added missing ignore_sample call and renamed module anchor.
Thanks for this @maleasy - will be great to get a CheckQC module in! The tool was written by colleagues of mine so it's been on my radar to add for ages.. Maybe @matrulda you could cast your eyes over this PR if you get a few minutes? I've just had a quick stab at getting the CI tests to pass and have moved them on a little. But looks like it's still catching some issues, such as the module not returning a Phil |
Wow! Cool, I will have a look and try it out 😄 👍 |
@maleasy Just wanted to give you a life sign. This week has been packed with meetings, I've started trying the module out and will hopefully have time to finish the review at the start of next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work and thanks for doing this! 👏 Left some comments that you can have a look at.
multiqc/modules/checkqc/checkqc.py
Outdated
continue | ||
self.add_data_source(f, sample) | ||
p_undetermined = issue["data"]["percentage_undetermined"] | ||
threshold = issue["data"]["threshold"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CheckQC reports that percentage undetermined couldn't be computed if yield == 0
, in that case data
only contains lane
and percentage_underermined
(which is N/A
). It would be nice if that case was handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer, fixed in ead255e, this case does not break parsing the checkqc JSON now. I added it to the plot as an empty entry so the user can at least see that something is up with the respective sample. Not sure how to transport more info, e.g. that yield is zero, because no legend is shown for a bar of zero length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out and it looks like this. I think this is fine, but it would nice if the message you wrote ("Yield is 0, no undetermined percentage computation possible") was communicated. I'm not super familiar with how MultiQC modules work, so can't really suggest how to best implement it. Would it be helpful if I send you the json file I used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your test, was the yield 0 in all lanes? I had not thought of this possibility. The corresponding JSON file would be helpful, thanks! Below is a plot where only one lane has yield 0 ( (lane 8 on run hiseq2500_rapidhighoutput_v4_1), I was hoping the legend would include the "Yield is 0" message, but it does not for a value of 0.
I have to see how to do this best, I was hoping to somehow include it in the barplot, but probably a separate plot or section is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, It's basically a failed MiSeq run, so only one lane. It's quite clear from the other handlers that something is wrong, so I don't see it as super necessary to fix, just a nice to have. I can't upload the file here, so send me an e-mail (you can find my address on my profile) if you want me to send the file. But like I said, I'm also fine with approving this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a dummy JSON with an entry like this for testing, I think that should cover it (correct me if I'm wrong):
"UndeterminedPercentageHandler": [
{
"type": "error",
"message": "The percentage of undetermined indexes was to high on lane 8, it was: N/A",
"data": {
"lane": 1,
"percentage_undetermined": "N/A"
}
}
],
In 1d28323 I added an additional section for cases like this, it contains a table listing the lanes with yield zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly, only the message is a bit different, but it isn't used in the code as far as I can see.
"UndeterminedPercentageHandler": [
{
"type": "error",
"message": "Yield for lane: {} was 0. No undetermined percentage could be computed.",
"data": {
"lane": 1,
"percentage_undetermined": "N/A"
}
}
],
(Oops, we have a 🐛 {})
Remove parsing of unused variables Fixed unique run names generation Read numbers now always in millions Handle yield=0 case in UndeterminedPercentageHandler section
multiqc/modules/checkqc/checkqc.py
Outdated
""" | ||
data = self.checkqc_data["UndeterminedPercentageHandler_ZeroYield"] | ||
|
||
pconfig = {"id": "checkqc_zero-yield-table", "table_title": "CheckQC: Lanes with Yield 0", "scale": "Reds"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be less confusing to set col1_header
to something else than the default "Sample Name". Maybe Lane Number
? It's a bit overkill since the other column also is specifying lane. I guess only a list is needed really. Is it possible to make a table with only the first column? I've only made lists in MultiQC reports by adding a HTML list using the custom content module before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the table columns are somewhat redundant. I don't think it is possible to only have the first column, because this column is created automatically by MultiQC using the sample names, and the column title is also assigned by MultiQC automatically. I used the lane/run names for the other columns because I don't have any other useful info to put in the cell.
I don't find documentation about HTML content in a module, I'm not sure if that is easily possible as with custom content. It would be possible to simply add the list of failed lanes to the section text separated by commas, and have no plot/table at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes I tested: matrulda@9051fa6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good solution! I pulled your commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for making this possible 💯 🥳 |
Great, thank you all! |
No problemat all! I've shown this module to my colleagues and we look forward to start using it at our facility 🥳 |
Thanks for this @maleasy! Just skim read the code and I can't see any issues on that side 👍🏻 (just a minor thing - missing a log output when something is found, which I've just added and pushed to the PR). Looking at the report (based only on the file I have in the test data repo), I have a few suggestions for changes:
All fairly minor stuff to just polish the output and make it easier for newcomers. When reviewing I generally try to imagine that I'm a lab scientist who has been emailed a MultiQC report like this and I'm trying to figure it out from scratch. I hope this is ok! Let me know if you have any thoughts on any of the above 👍🏻 Phil |
Hi Phil, |
Ok, thanks for letting me know 👍🏻 Maybe I can wrap it up with help from @matrulda before then, depending on how I go with the other PRs that need handling. |
Ok, I've just pushed some changes to try to tweak the report output a little. Hopefully you guys agree that these are improvements, please let me know if you disagree with anything. Thanks for this! |
Looks all good to me 👍🏻 |
Wow, some serious inbox diving to find this @matrulda! 😂 I hope it's useful! |
Initial version of a module for CheckQC (see #1551 (comment))
CHANGELOG.md
has been updated--lint
flag)docs/README.md
is updated with link to belowdocs/modulename.md
is createdself.add_section
)