Checkqc module #1552

maleasy · 2021-09-17T12:51:14Z

Initial version of a module for CheckQC (see #1551 (comment))

This comment contains a description of changes (with reason)
CHANGELOG.md has been updated

There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
Code is tested and works locally (including with --lint flag)
docs/README.md is updated with link to below
docs/modulename.md is created
Everything that can be represented with a plot instead of a table is a plot
Report sections have a description and help text (with self.add_section)
There aren't any huge tables with > 6 columns (explain reasoning if so)
Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
Module does not do any significant computational work

maleasy · 2021-09-17T13:04:24Z

The MultiQC - Linux / Linux -Python 3.x checks fail in subtest Test for missing CSPs with this output result:

Run python test/print_missing_csp.py --report full_report.html --whitelist CSP.txt
The following scripts are missing from CSP.txt
  'sha256-TUYHIjQsABv4n1G4GdIAz0ZljvuXPEvQ9T20Sok7TvE=' # ////////////////////////////////////////////////// Base JS for MultiQC Reports//
  'sha256-teMLGfDW72TRam2f0Fnj53uchjf3tQR9N0xo+hR2PXU=' # ////////////////////////////////////////////////// MultiQC Table code///////////
  'sha256-AEcY37NW7iIjmlzDOPZvuAorKHxXZxsECPnuSP52fB8=' # ////////////////////////////////////////////////// HighCharts Plotting Code/////
  'sha256-iPIReyQyAoheerXqjtulr8WN7lPMWacMVU4awy5CTJg=' # ////////////////////////////////////////////////// Static MatPlotLib Plots Javas
  'sha256-KGgqQTL/PWbF29mNXwFnepbDYBEnTKdMtejlNhEiXfs=' # ////////////////////////////////////////////////// MultiQC Report Toolbox Code//
  'sha256-RO0nmf6TEJtrp+R4JF8N4UrF+A2Hfkb8BKxyS7nM5go=' # // Return JS code required for plotting a single sample// RSeQC plot. Attempt to
  'sha256-kH3IklvmfvBiZprdCgEZqGuZjhiGEF5V3tHncoRciQI=' # // Javascript for the FastQC MultiQC Mod///////////////// Per Base Sequence Cont

I am not sure where the problem is. Can you please advise?

Thanks!

ewels · 2021-11-15T22:17:37Z

I am not sure where the problem is. Can you please advise?

In short, it wasn't your fault - was failing for all MultiQC tests. It's since been fixed, I just pulled in updates from master and hopefully tests will pass now..

ewels · 2021-11-15T22:18:55Z

x-ref test data PR: MultiQC/test-data#218

Also added missing ignore_sample call and renamed module anchor.

ewels · 2021-11-15T22:40:59Z

Thanks for this @maleasy - will be great to get a CheckQC module in! The tool was written by colleagues of mine so it's been on my radar to add for ages.. Maybe @matrulda you could cast your eyes over this PR if you get a few minutes?

I've just had a quick stab at getting the CI tests to pass and have moved them on a little. But looks like it's still catching some issues, such as the module not returning a UserWarning when no files are found (or possibly not filtering all samples out with --ignore-samples). Now that the CSP tests are fixed hopefully you can push this PR on now to get the little green tick @maleasy ✅

Phil

matrulda · 2021-11-16T07:58:41Z

Wow! Cool, I will have a look and try it out 😄 👍

maleasy · 2021-11-16T15:14:30Z

@ewels @matrulda I refactored the code primarily to make sure that all samples are filtered out on --ignore-samples before adding content. Passes all tests now.

matrulda · 2021-11-19T14:59:22Z

@maleasy Just wanted to give you a life sign. This week has been packed with meetings, I've started trying the module out and will hopefully have time to finish the review at the start of next week.

matrulda

Great work and thanks for doing this! 👏 Left some comments that you can have a look at.

multiqc/modules/checkqc/checkqc.py

matrulda · 2021-11-22T15:11:56Z

multiqc/modules/checkqc/checkqc.py

+                continue
+            self.add_data_source(f, sample)
+            p_undetermined = issue["data"]["percentage_undetermined"]
+            threshold = issue["data"]["threshold"]


CheckQC reports that percentage undetermined couldn't be computed if yield == 0, in that case data only contains lane and percentage_underermined (which is N/A). It would be nice if that case was handled.

Thanks for the pointer, fixed in ead255e, this case does not break parsing the checkqc JSON now. I added it to the plot as an empty entry so the user can at least see that something is up with the respective sample. Not sure how to transport more info, e.g. that yield is zero, because no legend is shown for a bar of zero length.

I tried it out and it looks like this. I think this is fine, but it would nice if the message you wrote ("Yield is 0, no undetermined percentage computation possible") was communicated. I'm not super familiar with how MultiQC modules work, so can't really suggest how to best implement it. Would it be helpful if I send you the json file I used?

In your test, was the yield 0 in all lanes? I had not thought of this possibility. The corresponding JSON file would be helpful, thanks! Below is a plot where only one lane has yield 0 ( (lane 8 on run hiseq2500_rapidhighoutput_v4_1), I was hoping the legend would include the "Yield is 0" message, but it does not for a value of 0.
I have to see how to do this best, I was hoping to somehow include it in the barplot, but probably a separate plot or section is needed.

Yeah, It's basically a failed MiSeq run, so only one lane. It's quite clear from the other handlers that something is wrong, so I don't see it as super necessary to fix, just a nice to have. I can't upload the file here, so send me an e-mail (you can find my address on my profile) if you want me to send the file. But like I said, I'm also fine with approving this as is.

I created a dummy JSON with an entry like this for testing, I think that should cover it (correct me if I'm wrong):

"UndeterminedPercentageHandler": [ { "type": "error", "message": "The percentage of undetermined indexes was to high on lane 8, it was: N/A", "data": { "lane": 1, "percentage_undetermined": "N/A" } } ],

In 1d28323 I added an additional section for cases like this, it contains a table listing the lanes with yield zero.

Yeah, exactly, only the message is a bit different, but it isn't used in the code as far as I can see.

"UndeterminedPercentageHandler": [ { "type": "error", "message": "Yield for lane: {} was 0. No undetermined percentage could be computed.", "data": { "lane": 1, "percentage_undetermined": "N/A" } } ],

(Oops, we have a 🐛 {})

Remove parsing of unused variables Fixed unique run names generation Read numbers now always in millions Handle yield=0 case in UndeterminedPercentageHandler section

…ndler

multiqc/modules/checkqc/checkqc.py

matrulda · 2021-11-23T15:28:05Z

multiqc/modules/checkqc/checkqc.py

+        """
+        data = self.checkqc_data["UndeterminedPercentageHandler_ZeroYield"]
+
+        pconfig = {"id": "checkqc_zero-yield-table", "table_title": "CheckQC: Lanes with Yield 0", "scale": "Reds"}


I think it would be less confusing to set col1_header to something else than the default "Sample Name". Maybe Lane Number? It's a bit overkill since the other column also is specifying lane. I guess only a list is needed really. Is it possible to make a table with only the first column? I've only made lists in MultiQC reports by adding a HTML list using the custom content module before.

You are right, the table columns are somewhat redundant. I don't think it is possible to only have the first column, because this column is created automatically by MultiQC using the sample names, and the column title is also assigned by MultiQC automatically. I used the lane/run names for the other columns because I don't have any other useful info to put in the cell.
I don't find documentation about HTML content in a module, I'm not sure if that is easily possible as with custom content. It would be possible to simply add the list of failed lanes to the section text separated by commas, and have no plot/table at all.

I would suggest to use "run" as "sample" (line 149) instead of "lane", and col1_header="Run". That also removes the need for the special case of one run. I tried it out:

What do you think?

The changes I tested: matrulda@9051fa6

Thats a good solution! I pulled your commit

Nice, thanks 😃

matrulda

Looks good to me, approved! ✔️ Again, great work @maleasy !

@ewels I leave the rest to you now. :)

apeltzer · 2021-11-25T09:47:52Z

Thank you all for making this possible 💯 🥳

maleasy · 2021-11-25T09:50:54Z

Great, thank you all!
@matrulda: Thanks for the helpful comments and code bits

matrulda · 2021-11-25T09:57:37Z

No problemat all! I've shown this module to my colleagues and we look forward to start using it at our facility 🥳

ewels · 2021-11-26T20:02:54Z

Brilliant stuff - thank you for the extensive review @matrulda and thanks for being responsive and implementing all the changes @maleasy!

I'll try to do one final check myself soon and merge when I get a moment 👍🏻

ewels · 2022-01-15T21:17:21Z

Thanks for this @maleasy! Just skim read the code and I can't see any issues on that side 👍🏻 (just a minor thing - missing a log output when something is found, which I've just added and pushed to the PR).

Looking at the report (based only on the file I have in the test data repo), I have a few suggestions for changes:

General stats table
- I think that we can remove these columns.. They seem to exactly mimic the bar plot underneath so I'm not sure that they really add anything.
I would have to have two heatmaps showing all samples / lanes with pass / warn / fail for each test type. I'm not sure if this is actually possible (samples with no warnings are not reported, right @matrulda?) and I know some are read-specific. But it would be nice to flatten where applicable and have a summary table showing which samples fail across the board.. (as we do for FastQC)
Too few reads per sample
- Can we alphabetically sort the sample names please.. (or maybe lanes first?)
Cluster PF too low
- Can we avoid the PF acronym wherever we're not short on space? eg. Title, first category chart legend
Graph sample names
- I wonder if we drop the axis title and use a more verbose Lane 3 instead of just 3? Lane - Read could definitely do with this. eg. Lane 3 - R2 instead of 3 -2. Prefixes and numbers should always be short so I don't think that space is an issue.
Descriptions / help text could be improved
- Error rate too high followed by Some lanes have too high error rate. is not super helpful 😉 What error rate is this? How is it calculated? What does it mean? Same for most sections.
- I think also that the descriptions should be rephrased to make it clear that we are only showing the errors / warnings. For example, instead of Some samples have too few reads we could have The following samples reported an error because they did not meet their minimum read threshold. etc...

All fairly minor stuff to just polish the output and make it easier for newcomers. When reviewing I generally try to imagine that I'm a lab scientist who has been emailed a MultiQC report like this and I'm trying to figure it out from scratch.

I hope this is ok! Let me know if you have any thoughts on any of the above 👍🏻

Phil

maleasy · 2022-01-17T09:46:47Z

Hi Phil,
I won't be able to work on this in the coming weeks. I'll check out your suggestions when I find the time.

ewels · 2022-01-18T21:48:24Z

Ok, thanks for letting me know 👍🏻 Maybe I can wrap it up with help from @matrulda before then, depending on how I go with the other PRs that need handling.

ewels · 2022-01-25T14:02:28Z

Ok, I've just pushed some changes to try to tweak the report output a little. Hopefully you guys agree that these are improvements, please let me know if you disagree with anything.

Thanks for this!

apeltzer · 2022-01-25T18:41:36Z

Looks all good to me 👍🏻

matrulda · 2023-10-20T11:17:37Z

Hi! (over a year later 😓 ). Lost track of this, but super happy that this got merged. Thank you very much for all your work @maleasy @ewels @apeltzer 🎉

ewels · 2023-10-22T17:37:05Z

Wow, some serious inbox diving to find this @matrulda! 😂 I hope it's useful!

Christoph Malisi added 2 commits September 17, 2021 13:24

Added module for CheckQC

ec64665

applied black with same version as github action

e419f2b

maleasy mentioned this pull request Sep 17, 2021

Add CheckQC module #1551

Closed

gartician mentioned this pull request Oct 11, 2021

New module: GoPeaks #1562

Merged

11 tasks

Merge branch 'master' into checkqc_module

d1fb2ec

ewels added 3 commits November 15, 2021 23:22

CheckQC: Add DOI, fix changelog

c837702

Add data sources

5bc7f21

Also added missing ignore_sample call and renamed module anchor.

Add code comment to make CI pass for checkqc

395f964

Christoph Malisi added 3 commits November 16, 2021 15:50

no report generated when all samples are ignored

46c39e0

blackened code

70b9a0d

removed debug print statement

849aceb

matrulda reviewed Nov 22, 2021

View reviewed changes

Christoph Malisi added 3 commits November 23, 2021 08:38

Remove unnecessary comments

ead255e

Remove parsing of unused variables Fixed unique run names generation Read numbers now always in millions Handle yield=0 case in UndeterminedPercentageHandler section

Not capturing unused threshold value when parsing UnidentifiedIndexHa…

ee684b4

…ndler

Separate section for lanes with zero yield

1d28323

matrulda reviewed Nov 23, 2021

View reviewed changes

matrulda added 2 commits November 24, 2021 11:07

Use Run as col1 header in checkqc module

9051fa6

Reformat with black

41b7499

matrulda approved these changes Nov 25, 2021

View reviewed changes

matrulda mentioned this pull request Nov 25, 2021

Feature request - e-mail reports Molmed/checkQC#94

Open

CheckQC: Add log message about samples found

bf14ac3

CheckQC: Minor graph title typo

25fd5d0

ewels added 4 commits January 25, 2022 14:36

Remove General Stats

7732633

Lane sample name prefix for graphs, labelling tweaks

35cb061

CheckQC: Improve section docs a little.

5d36be4

Add comment that passing samples are not shown.

6455030

ewels enabled auto-merge January 25, 2022 14:02

ewels merged commit 350d501 into MultiQC:master Jan 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkqc module #1552

Checkqc module #1552

maleasy commented Sep 17, 2021

maleasy commented Sep 17, 2021

ewels commented Nov 15, 2021

ewels commented Nov 15, 2021

ewels commented Nov 15, 2021

matrulda commented Nov 16, 2021

maleasy commented Nov 16, 2021

matrulda commented Nov 19, 2021

matrulda left a comment

matrulda Nov 22, 2021

maleasy Nov 23, 2021

matrulda Nov 23, 2021

maleasy Nov 23, 2021

matrulda Nov 23, 2021

maleasy Nov 23, 2021

matrulda Nov 23, 2021 •

edited by ewels

matrulda Nov 23, 2021

maleasy Nov 23, 2021 •

edited

matrulda Nov 24, 2021

matrulda Nov 24, 2021

maleasy Nov 25, 2021

matrulda Nov 25, 2021

matrulda left a comment

apeltzer commented Nov 25, 2021

maleasy commented Nov 25, 2021

matrulda commented Nov 25, 2021

ewels commented Nov 26, 2021

ewels commented Jan 15, 2022

maleasy commented Jan 17, 2022

ewels commented Jan 18, 2022

ewels commented Jan 25, 2022

apeltzer commented Jan 25, 2022

matrulda commented Oct 20, 2023

ewels commented Oct 22, 2023

Checkqc module #1552

Checkqc module #1552

Conversation

maleasy commented Sep 17, 2021

maleasy commented Sep 17, 2021

ewels commented Nov 15, 2021

ewels commented Nov 15, 2021

ewels commented Nov 15, 2021

matrulda commented Nov 16, 2021

maleasy commented Nov 16, 2021

matrulda commented Nov 19, 2021

matrulda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matrulda Nov 23, 2021 • edited by ewels

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maleasy Nov 23, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matrulda left a comment

Choose a reason for hiding this comment

apeltzer commented Nov 25, 2021

maleasy commented Nov 25, 2021

matrulda commented Nov 25, 2021

ewels commented Nov 26, 2021

ewels commented Jan 15, 2022

maleasy commented Jan 17, 2022

ewels commented Jan 18, 2022

ewels commented Jan 25, 2022

apeltzer commented Jan 25, 2022

matrulda commented Oct 20, 2023

ewels commented Oct 22, 2023

matrulda Nov 23, 2021 •

edited by ewels

maleasy Nov 23, 2021 •

edited