Skip to content

Feature/multiqc#808

Merged
gongyixiao merged 57 commits intodevelopfrom
feature/multiqc
Jul 28, 2020
Merged

Feature/multiqc#808
gongyixiao merged 57 commits intodevelopfrom
feature/multiqc

Conversation

@anoronh4
Copy link
Copy Markdown
Collaborator

@anoronh4 anoronh4 commented Jun 9, 2020

Addressing #807

This PR combines results from each sample/pair/cohort into a multiqc report for each level.
Sample Report includes alfred, fastp, hsmetrics for each sample.
Example juno/work/ccs/noronhaa/tempo_multiqc/testmultiqc_noronhaa/results_withsamplemod_frombammapping/bams/DU874146-T/multiqc/multiqc_report.html

Somatic Report includes conpair results for each pair.
Example /juno/work/ccs/noronhaa/tempo_multiqc/testmultiqc_noronhaa/results_withsamplemod_frombammapping/somatic/DU874145-T__DU874145-N/multiqc/multiqc_report.html

Cohort Report includes alfred, fastp, hsmetrics, conpair for each sample/pair in the cohort
Example /juno/work/ccs/noronhaa/tempo_multiqc/testmultiqc_noronhaa/results_withsamplemod_frombammapping/cohort_level/default_cohort/multiqc_report.html

Although there is some text highlighting for values that indicate poor quality (green, yellow and red for pass, warn and fail), further discussion might be needed to continue shaping pass/fail metrics and what metrics are displayed. Alfred shows many metrics but so far i have only added mapping quality as proof of concept, as we may be focused on only a few of the alfred categories.

Additional notes
A few variables were set as global even though they should have been set as local. We caught this while testing multiqc and as such have made the correction in the commit 803e345

Anne Marie Noronha added 29 commits May 26, 2020 22:13
…l to *_multiqc_config.yaml. conpair_custom_mqc.yaml was no longer needed
…ls/variables. also added bash statements to parse conpair results in SomaticRunMultiQC
…to allow for more alfred files to be added later with similar pattern
…cohort level QC, and minor fixes to ordering of columns in conpair results.
…he report. this should give the report a "cleaner" look
@anoronh4
Copy link
Copy Markdown
Collaborator Author

anoronh4 commented Jun 10, 2020

Updated example files:
Cohort report
/juno/work/ccs/noronhaa/tempo_multiqc/testmultiqc_noronhaa/results_biggersamples/cohort_level/default_cohort/multiqc_report.html
Sample Report
/juno/work/ccs/noronhaa/tempo_multiqc/testmultiqc_noronhaa/results_biggersamples/bams/BRCA_03936-N/multiqc/multiqc_report.html
Somatic Report
/juno/work/ccs/noronhaa/tempo_multiqc/testmultiqc_noronhaa/results_biggersamples/somatic/BRCA_00423-T__BRCA_00423-N/multiqc/multiqc_report.html

Future directions:

  • There are some helpful functions available in multiqc 1.9 but I found some bugs in the v1.9 singularity image that I'm working to address: docker image of multiqc:1.9 error with singularity  MultiQC/MultiQC#1221 (comment)
  • Add WGS configurations as well and handle params.assayType == 'genome' (CollectHsMetrics not run)
  • Add PASS/FAIL metrics to each each sample. This will require running the multiqc report as normal, then parsing the resulting general statistics table and annotating based on cutoffs, then running multiqc a second time with a file containing one column of PASS/FAIL/WARN

@gongyixiao gongyixiao added this to the 1.3.2 milestone Jul 10, 2020
…sample and cohort reports to include alfred rg-aware and alfred rg-unaware, instead of just one of the two
@gongyixiao
Copy link
Copy Markdown
Collaborator

In Alfred Mapping Quality, maybe it's good to show the percentage of reads instead of the read count. The advantage of showing percentage will be more obvious in the cohort report.

@gongyixiao This can be done easily. I wonder what else we should display from the Alfred report. A couple of examples include:

# Alignment summary metrics (ME).
# Read length distribution (RL).
# Mean base quality (BQ).
# Base content (BC).
# Mapping quality histogram (MQ).
# Coverage histogram (CO).
# Chromosome mapping statistics (CM).
# Insert size histogram (IS).
# InDel context (IC).
# InDel size (IZ).
# Chromosome GC-content (CG).
# GC-content (GC).
# Avg. target coverage (TC).
# On target rate (OT).

The alfred file is not automatically parsed by multiqc so we have to write the code to parse it.

These info might be able to obtain from qualimap (http://qualimap.bioinfo.cipf.es/). Exploring it here: #166

The idea is since qualimap is a built-in supported tool for multiQC, so we would like to use it for the information it can get to avoid building custom support for Alfred in multiQC. Only use Alfred and build custom support for it for Alfred specific metrics, like read group aware/ignore mapping quality statistics.

Comment thread pipeline.nf Outdated
Comment thread pipeline.nf Outdated
@gongyixiao gongyixiao self-requested a review July 13, 2020 21:39
@gongyixiao gongyixiao linked an issue Jul 14, 2020 that may be closed by this pull request
@gongyixiao
Copy link
Copy Markdown
Collaborator

A few more misc changes:

  • fixed finding pre_multiqc_data
  • fixed incorrect “warn” status
  • fixed resource, 1.GB * task.attempt -> 3.GB * task.attempt
  • removed title on the sidebar that was getting cut off
  • i believe i addressed the problem of not finding .MQ.alfred.tsv

@anoronh4 anoronh4 requested a review from gongyixiao July 21, 2020 21:40
@gongyixiao gongyixiao merged commit 5c62c49 into develop Jul 28, 2020
@gongyixiao gongyixiao mentioned this pull request Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

create qc report to combine multiple qc results

2 participants