Feature/picard crosscheck fingerprints #1327

sstadick · 2020-10-23T20:49:18Z

This add support for picard CrosscheckFingerprints output. A column is added to General Stats for each sample that indicates True/False (colored Green/Red respectively) as to whether or not all comparisons for that sample return Expected relationships.

There is also a CrosscheckFingerprints table that includes the pairwise comparison, the categorical result, the LOD_SCORE, and the threshold used, to be referenced in the event that a False is observed in the General Statistics.

There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
Code is tested and works locally (including with --lint flag)
docs/README.md is updated with link to below
docs/modulename.md is created
Everything that can be represented with a plot instead of a table is a plot
Report sections have a description and help text (with self.add_section)
There aren't any huge tables with > 6 columns (explain reasoning if so)
Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
Module does not do any significant computational work

For test data see MultiQC/test-data#172

docs/modules/picard.md

multiqc/modules/picard/CrosscheckFingerprints.py

nh13

Any chance you can screenshot what this looks like with MultiQC/test-data#172?

docs/modules/picard.md

multiqc/modules/picard/CrosscheckFingerprints.py

yfarjoun · 2020-10-27T18:45:26Z

Do note that the "Expected" algorithm doesn't actually know if samples are from the same individual, and so LOD is "expected" to be positive is the sample ID is the same (from VCF or bam header) or if tool was run with "EXPECT_ALL_GROUPS_TO_MATCH=true". I've been toying with the idea of having a concept of "Individual", but that's not for here to discuss....

nh13 · 2020-10-27T22:25:13Z

@sstadick I see that the Sample Name column contains both the left and right sample names. Any chance we could have those also in two separate columns so they're sortable? What do you think?

sstadick · 2020-10-28T01:26:28Z

@nh13 I don't think so. The input data for the table must come as a dictionary with the sample names as the key, so you can't have multiple rows with the same sample name.

Also of note, if someone were to run CrosscheckFingerprints and use ReadGroups as the level of granularity of the checking, I expand the names out to <left-sample>/<left-group> - <right-sample>/<right-group>.

My overall feeling is that plots.table just isn't made for this and it's a force fit for sure. The main "feature" here is in the general stats table with the red/green color of all expected.

Co-authored-by: Nils Homer <nh13@users.noreply.github.com>

This makes `Sample Name` in the table just be an index and moves the sample names into the {LEFT,RIGHT}_SAMPLE column. Additionally, this adds support for respecting the sample ignores options, as well as renaming the samples based on the renaming rules.

sstadick · 2020-10-29T20:40:30Z

Latest screenshot, run with --ignore-samples 'DNA_SAMPLE1_T_1F_tumor_dna' on the data in the test repo

nh13 · 2020-10-29T20:42:30Z

@sstadick that's pretty!

@ewels I think this is ready to go!

ewels · 2020-11-15T12:54:45Z

Thanks all! Test data merged, will take a look at the PR ASAP.

I think that there was only one sample in the test data PR? If you fancy adding a few more then that makes testing a bit easier as I can see how a typical report would look.. (But not critical).

Phil

nh13 · 2020-12-02T02:10:30Z

@ewels I think this is ready to go. The only issue I see is that when the sample names are long, the columns will overlap, but I am assuming this is an issue for all tables? If the columns were fit better, or re-sizeable, I think that would help.

matthdsm · 2021-03-01T12:41:22Z

Hi @ewels,

We're looking forward to using this module, can this be merged?

Thanks
M

ewels · 2021-03-08T00:25:23Z

Hi all,

Thanks for this! I've gone over the code and done a fair bit of refactoring, so please check that it still works as you expect.

I actually found a nasty little bug in the table code during this (maybe the reason why you disabled the colour schemes for LOD?) which was good to fix. I also added a new table config option to treat values as absolute for the bar width which works really nicely here to highlight the extreme LODs properly.

Along with a few other fixes and tweaks, I think that it looks even better now:

Note that the zero-point for this new feature is hardcoded to 0 for now. Surely someone will ask for this to be customisable but that can be rewritten in a future PR if so 😀

I'll merge this now but please give it a try and shout if I messed anything up 👍🏻

Phil

sstadick added 2 commits October 23, 2020 13:58

WIP

fbbcb1d

Working

0c2767d

sstadick mentioned this pull request Oct 23, 2020

Test data for picard CrosscheckFingerprints MultiQC/test-data#172

Merged

Small updates

73e43bf

sstadick marked this pull request as ready for review October 23, 2020 21:12

sstadick added 2 commits October 23, 2020 15:15

Remove coloring form LOD scores because it does not really make sense

f46c3ef

Remove auto-highlighting

ee9e002