Skip to content

@ewels ewels released this May 30, 2020 · 58 commits to master since this release

Another massive release - many thanks to all of the contributors! Keep those pull-requests and issues coming!

Dropped official support for Python 2

Python 2 had its official sunset date
on January 1st 2020, meaning that it will no longer be developed by the Python community.
Part of the python.org statement reads:

That means that we will not improve it anymore after that day,
even if someone finds a security problem in it.
You should upgrade to Python 3 as soon as you can.

Very many Python packages no longer support Python 2
and it whilst the MultiQC code is currently compatible with both Python 2 and Python 3,
it is increasingly difficult to maintain compatibility with the dependency packages it
uses, such as MatPlotLib, numpy and more.

As of MultiQC version 1.9, Python 2 is no longer officially supported.
Automatic CI tests will no longer run with Python 2 and Python 2 specific workarounds
are no longer guaranteed.

Whilst it may be possible to continue using MultiQC with Python 2 for a short time by
pinning dependencies, MultiQC compatibility for Python 2 will now slowly drift and start
to break. If you haven't already, you need to switch to Python 3 now.

New MultiQC Features

  • Now using GitHub Actions for all CI testing
    • Dropped Travis and AppVeyor, everything is now just on GitHub
    • Still testing on both Linux and Windows, with multiple versions of Python
    • CI tests should now run automatically for anyone who forks the MultiQC repository
  • Linting with --lint now checks line graphs as well as bar graphs
  • New gathered template with no tool name sections (#1119)
  • Added --sample-filters option to add show/hide buttons at the top of the report (#1125)
    • Buttons control the report toolbox Show/Hide tool, filtering your samples
    • Allows reports to be pre-configured based on a supplied list of sample names at report-generation time.
  • Line graphs can now have Log10 buttons (same functionality as bar graphs)
  • Importing and running multiqc in a script is now a little Better
    • multiqc.run now returns the report and config as well as the exit code. This means that you can explore the MultiQC run time a little in the Python environment.
    • Much more refactoring is needed to make MultiQC as useful in Python scripts as it could be. Watch this space.
  • If a custom module anchor is set using module_order, it's now used a bit more:
    • Prefixed to module section IDs
    • Appended to files saved in multiqc_data
    • Should help to prevent duplicates requiring -1 suffixes when running a module multiple times
  • New heatmap plot config options xcats_samples and ycats_samples
    • If set to False, the report toolbox options (highlight, rename, show/hide) do not affect that axis.
    • Means that the Show only matching samples report toolbox option works on FastQC Status Checks, for example (#1172)
  • Report header time and analysis paths can now be hidden
    • New config options show_analysis_paths and show_analysis_time (#1113)
  • New search pattern key skip: true to skip specific searches when modules look for a lot of different files (eg. Picard).
  • New --profile-runtime command line option (config.profile_runtime) to give analysis of how long the report takes to be generated
    • Plots of the file search results and durations are added to the end of the MultiQC report as a special module called Run Time
    • A summary of the time taken for the major stages of MultiQC execution are printed to the command line log.
  • New table config option only_defined_headers
    • Defaults to true, set to false to also show any data columns that are not defined as headers
    • Useful as allows table-wide defaults to be set with column-specific overrides
  • New module key allowed for config.extra_fn_clean_exts and config.fn_clean_exts
    • Means you can limit the action of a sample name cleaning pattern to specific MultiQC modules (#905)

New Custom Content features

  • Improve support for HTML files - now just end your HTML filename with _mqc.html
    • Native handling of HTML snippets as files, no MultiQC config or YAML file required.
    • Also with embedded custom content configuration at the start of the file as a HTML comment.
  • Add ability to group custom-content files into report sections
    • Use the new parent_id, parent_name and parent_description config keys to group content together like a regular module (#1008)
  • Custom Content files can now be configured using custom_data, without giving search patterns or data
    • Allows you to set descriptions and nicer titles for images and other 'blunt' data types in reports (#1026)
    • Allows configuration of custom content separately from files themselves (tsv, csv, txt formats) (#1205)

New Modules:

  • DRAGEN
    • Illumina Bio-IT Platform that uses FPGA for secondary NGS analysis
  • iVar
    • Added support for iVar: a computational package that contains functions broadly useful for viral amplicon-based sequencing.
  • Kaiju
    • Fast and sensitive taxonomic classification for metagenomics
  • Kraken
    • K-mer matching tool for taxonomic classification. Module plots bargraph of counts for top-5 hits across each taxa rank. General stats summary.
  • MALT
    • Megan Alignment Tool: Metagenomics alignment tool.
  • miRTop
    • Command line tool to annotate miRNAs with a standard mirna/isomir naming (mirGFF3)
    • Module started by @oneillkza and completed by @FlorianThibord
  • MultiVCFAnalyzer
    • Combining multiple VCF files into one coherent report and format for downstream analysis.
  • Picard - new submodules for QualityByCycleMetrics, QualityScoreDistributionMetrics & QualityYieldMetrics
  • Rockhopper
    • RNA-seq tool for bacteria, includes bar plot showing where features map.
  • Sickle
    • A windowed adaptive trimming tool for FASTQ files using quality
  • Somalier
    • Relatedness checking and QC for BAM/CRAM/VCF for cancer, DNA, BS-Seq, exome, etc.
  • VarScan2
    • Variant calling and somatic mutation/CNV detection for next-generation sequencing data

Module updates:

  • BISCUIT
    • Major rewrite to work with new BISCUIT QC script (BISCUIT v0.3.16+)
      • This change breaks backwards-compatability with previous BISCUIT versions. If you are unable to upgrade BISCUIT, please use MultiQC v1.8.
    • Fixed error when missing data in log files (#1101)
  • bcl2fastq
    • Samples with multiple library preps (i.e barcodes) will now be handled correctly (#1094)
  • BUSCO
    • Updated log search pattern to match new format in v4 with auto-lineage detection option (#1163)
  • Cutadapt
    • New bar plot showing the proportion of reads filtered out for different criteria (eg. too short, too many Ns) (#1198)
  • DamageProfiler
    • Removes redundant typo in init name. This makes referring to the module's column consistent with other modules when customising general stats table.
  • DeDup
    • Updates plots to make compatible with 0.12.6
    • Fixes reporting errors - barplot total represents mapped reads, not total reads in BAM file
    • New: Adds 'Post-DeDup Mapped Reads' column to general stats table.
  • FastQC
    • Fixed tooltip text in Sequence Duplication Levels plot (#1092)
    • Handle edge-case where a FastQC report was for an empty file with 0 reads (#1129)
  • FastQ Screen
    • Don't skip plotting % No Hits even if it's 0% (#1126)
    • Refactor parsing code. Avoids error with -0.00 %Unmapped (#1126)
    • New plot for Bisulfite Reads, if data is present
    • Categories in main plot are now sorted by the total read count and hidden if 0 across all samples
  • fgbio
    • New: Plot error rate by read position from ErrorRateByReadPosition
    • GroupReadsByUmi plot can now be toggled to show relative percents (#1147)
  • FLASh
    • Logs not reporting innie and outine uncombined pairs now plot combined pairs instead (#1173)
  • GATK
    • Made parsing for VariantEval more tolerant, so that it will work with output from the tool when run in different modes (#1158)
  • MTNucRatioCalculator
    • Fixed misleading value suffix in general stats table
  • Picard MarkDuplicates
    • Major change - previously, if multiple libraries (read-groups) were found then only the first would be used and all others ignored. Now, values from all libraries are merged and PERCENT_DUPLICATION and ESTIMATED_LIBRARY_SIZE are recalculated. Libraries can be kept as separate samples with a new MultiQC configuration option - picard_config: markdups_merge_multiple_libraries: False
    • Major change - Updated MarkDuplicates bar plot to double the read-pair counts, so that the numbers stack correctly. (#1142)
  • Picard HsMetrics
    • Updated large table to use columns specified in the MultiQC config. See docs. (#831)
  • Picard WgsMetrics
    • Updated parsing code to recognise new java class string (#1114)
  • QualiMap
    • Fixed QualiMap mean coverage calculation #1082, #1077
  • RSeqC
    • Support added for output from geneBodyCoverage2.py script (#844)
    • Single sample view in the "Junction saturation" plot now works with the toolbox properly (rename, hide, highlight) (#1133)
  • RNASeQC2
  • Samblaster
    • Improved parsing to handle variable whitespace (#1176)
  • Samtools
    • Removes hardcoding of general stats column names. This allows column names to indicate when a module has been run twice (#1076).
    • Added an observed over expected read count plot for idxstats (#1118)
    • Added additional (by default hidden) column for flagstat that displays number total number of reads in a bam
  • sortmerna
    • Fix the bug for the latest sortmerna version 4.2.0 (#1121)
  • sexdeterrmine
    • Added a scatter plot of relative X- vs Y-coverage to the generated report.
  • VerifyBAMID
    • Allow files with column header FREEMIX(alpha) (#1112)

Bug Fixes:

  • Added a new test to check that modules work correctly with --ignore-samples. A lot of them didn't:
    • Mosdepth, conpair, Qualimap BamQC, RNA-SeQC, GATK BaseRecalibrator, SNPsplit, SeqyClean, Jellyfish, hap.py, HOMER, BBMap, DeepTools, HiCExplorer, pycoQC, interop
    • These modules have now all been fixed and --ignore-samples should work as you expect for whatever data you have.
  • Removed use of shutil.copy to avoid problems with working on multiple filesystems (#1130)
  • Made folder naming behaviour of multiqc_plots consistent with multiqc_data
    • Incremental numeric suffixes now added if folder already exists
    • Plots folder properly renamed if using -n/--filename
  • Heatmap plotting function is now compatible with MultiQC toolbox hide and highlight (#1136)
  • Plot config logswitch_active now works as advertised
  • When running MultiQC modules several times, multiple data files are now created instead of overwriting one another (#1175)
  • Fixed minor bug where tables could report negative numbers of columns in their header text
  • Fixed bug where numeric custom content sample names could trigger a TypeError (#1091)
  • Fixed custom content bug HTML data in a config file would trigger a ValueError (#1071)
  • Replaced deprecated 'warn()' with 'warning()' of the logging module
  • Custom content now supports section_extra config key to add custom HTML after description.
  • Barplots with ymax set now ignore this when you click the Percentages tab.
Assets 2

@ewels ewels released this Nov 20, 2019 · 619 commits to master since this release

A huge release, this one has been a long time coming. Due to @ewels being away on paternity leave for over six months it was very delayed and has been nearly a year in the making! During that time there has been 344 commits with 3,370 lines of code added and 1,194 deletions by 19 contributors. That's a lot of changes.

Highlights include:

  • Finally removing the annoying YAML warning
  • Six new modules, and many large updates to existing modules
  • Code restructuring allowing MultiQC to be imported into Python environments and easier running on Windows
  • Lots of tiny bug fixes all over the place.

Enjoy the update! And I promise I'll try not to make everyone wait so long for the next release...

Full changelog

New Modules:

  • fgbio
    • Process family size count hist data from GroupReadsByUmi
  • biobambam2
    • Added submodule for bamsormadup tool
    • Totally cheating - it uses Picard MarkDuplicates but with a custom search pattern and naming
  • SeqyClean
    • Adds analysis for seqyclean files
  • mtnucratio
    • Added little helper tool to compute mt to nuclear ratios for NGS data.
  • mosdepth
    • fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing
  • SexDetErrmine
    • Relative coverage and error rate of X and Y chromosomes

Module updates:

  • bcl2fastq
    • Added handling of demultiplexing of more than 2 reads
    • Allow bcl2fastq to parse undetermined barcode information in situations when lane indexes do not start at 1
  • BBMap
    • Support for scafstats output marked as not yet implemented in docs
  • DeDup
    • Added handling clusterfactor and JSON logfiles
  • damageprofiler
    • Added writing metrics to data output file.
  • DeepTools
    • Fixed Python3 bug with int() conversion (#1057)
    • Handle varied TES boundary labels in plotProfile (#1011)
    • Fixed bug that prevented running on only plotProfile files when no other deepTools files found.
  • fastp
    • Fix faulty column handling for the after filtering Q30 rate (#936)
  • FastQC
    • When including a FastQC section multiple times in one report, the Per Base Sequence Content heatmaps now behave as you would expect.
    • Added heatmap showing FastQC status checks for every section report across all samples
    • Made sequence content individual plots work after samples have been renamed (#777)
    • Highlighting samples from status - respect chosen highlight colour in the toolbox (#742)
  • FastQ Screen
    • When including a FastQ Screen section multiple times in one report, the plots now behave as you would expect.
  • GATK
    • Refactored BaseRecalibrator code to be more consistent with MultiQC Python style
    • Handle zero count errors in BaseRecalibrator
  • HiC Explorer
    • Fixed bug where module tries to parse QC_table.txt, a new log file in hicexplorer v2.2.
  • HTSeq
    • Fixed bug where module would crash if a sample had zero reads (#1006)
  • LongRanger
    • Added support for the LongRanger Align pipeline.
  • miRTrace
    • Fixed bug where a sample in some plots was missed. (#932)
  • Peddy
    • Fixed bug where sample name cleaning could lead to error. (#1024)
    • All plots (including Het Check and Sex Check) now hidden if no data
  • Picard
    • Modified OxoGMetrics.py so that it will find files created with GATK CollectMultipleMetrics and ConvertSequencingArtifactToOxoG.
  • QoRTs
    • Fixed bug where --dirs broke certain input files. (#821)
  • Qualimap
    • Added in mean coverage computation for general statistics report
    • Creates now tables of collected data in multiqc_data
  • RNA-SeQC
    • Updated broken URL link
  • RSeQC
    • Fixed bug where Junction Saturation plot when clicking a single sample was mislabelling the lines.
    • When including a RSeQC section multiple times in one report, clicking Junction Saturation plot now behaves as you would expect.
    • Fixed bug where exported data in multiqc_rseqc_read_distribution.txt files had incorrect values for _kb fields (#1017)
  • Samtools
    • Utilize in-built read_count_multiplier functionality to plot flagstat results more nicely
  • SnpEff
    • Increased the default summary csv file-size limit from 1MB to 5MB.
  • Stacks
    • Fixed bug where multi-population sum stats are parsed correctly (#906)
  • TopHat
    • Fixed bug where TopHat would try to run with files from Bowtie2 or HiSAT2 and crash
  • VCFTools
    • Fixed a bug where tstv_by_qual.py produced invalid json from infinity-values.
  • snpEff
    • Added plot of effects

New MultiQC Features:

  • Added some installation docs for windows
  • Added some docs about using MultiQC in bioinformatics pipelines
  • Rewrote Docker image
    • New base image czentye/matplotlib-minimal reduces image size from ~200MB to ~80MB
    • Proper installation method ensures latest version of the code
    • New entrypoint allows easier command-line usage
  • Support opening MultiQC on websites with CSP script-src 'self' with some sha256 exceptions
    • Plot data is no longer intertwined with javascript code so hashes stay the same
  • Made config.report_section_order work for module sub-sections as well as just modules.
  • New config options exclude_modules and run_modules to complement -e and -m cli flags.
  • Command line output is now coloured by default 🌈 (use --no-ansi to turn this off)
  • Better launch comparability due to code refactoring by @KerstenBreuer and @ewels
    • Windows support for base multiqc command
    • Support for running as a python module: python -m multiqc .
    • Support for running within a script: import multiqc and multiqc.run('/path/to/files')
  • Config option custom_plot_config now works for bargraph category configs as well (#1044)
  • Config table_columns_visible can now be given a module namespace and it will hide all columns from that module (#541)

Bug Fixes:

  • MultiQC now ignores all .md5 files
  • Use SafeLoader for PyYaml load calls, avoiding recent warning messages.
  • Hide multiqc_config_example.yaml in the test directory to stop people from using it without modification.
  • Fixed matplotlib background colour issue (@epakarin - #886)
  • Table rows that are empty due to hidden columns are now properly hidden on page load (#835)
  • Sample name cleaning: All sample names are now truncated to their basename, without a path.
    • This includes for regex and replace (before was only the default truncate).
    • Only affects modules that take sample names from file contents, such as cutadapt.
    • See #897 for discussion.
Assets 2

@ewels ewels released this Dec 21, 2018 · 963 commits to master since this release

An early Christmas present for MultiQC users! 🎅🎁🎄

Many thanks to everyone who has contributed to this release. Happy Christmas and a very happy new year!

New Modules:

  • BISCUIT
    • BISuilfite-seq CUI Toolkit
    • Module written by @zwdzwd
  • DamageProfiler
    • A tool to determine ancient DNA misincorporation rates.
    • Module written by @apeltzer
  • FLASh
    • FLASH (Fast Length Adjustment of SHort reads)
    • Module written by @pooranis
  • MinIONQC
    • QC of reads from ONT long-read sequencing
    • Module written by @ManavalanG
  • phantompeakqualtools
    • A tool for informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data.
    • Module written by @chuan-wang
  • Stacks
    • A software for analyzing restriction enzyme-based data (e.g. RAD-seq). Support for Stacks >= 2.1 only.
    • Module written by @remiolsen

Module updates:

  • AdapterRemoval
    • Handle error when zero bases are trimmed. See #838.
  • Bcl2fastq
    • New plot showing the top twenty of undetermined barcodes by lane.
    • Informations for R1/R2 are now separated in the General Statistics table.
    • SampleID is concatenate with SampleName because in Chromium experiments several sample have the same SampleName.
  • deepTools
    • New PCA plots from the plotPCA function (written by @chuan-wang)
    • New fragment size distribution plots from bamPEFragmentSize --outRawFragmentLengths (written by @chuan-wang)
    • New correlation heatmaps from the plotCorrelation function (written by @chuan-wang)
    • New sequence distribution profiles around genes, from the plotProfile function (written by @chuan-wang)
    • Reordered sections
  • Fastp
    • Fixed bug in parsing of empty histogram data. See #845.
  • FastQC
    • Refactored Per Base Sequence Content plots to show original underlying data, instead of calculating it from the page contents. Now shows original FastQC base-ranges and fixes 100% GC bug in final few pixels. See #812.
    • When including a FastQC section multiple times in one report, the summary progress bars now behave as you would expect.
  • FastQ Screen
    • Don't hide genomes in the simple plot, even if they have zero unique hits. See #829.
  • InterOp
    • Fixed bug where read counts and base pair yields were not displaying in tables correctly.
    • Number formatting for these fields can now be customised in the same way as with other modules, as described in the docs
  • Picard
    • InsertSizeMetrics: You can now configure to what degree the insert size plot should be smoothed.
    • CollectRnaSeqMetrics: Add warning about missing rRNA annotation.
    • CollectRnaSeqMetrics: Add chart for counts/percentage of reads mapped to the correct strand.
    • Now parses VariantCallingMetrics reports. (Similar to GATK module's VariantEval.)
  • phantompeakqualtools
    • Properly clean sample names
  • Trimmomatic
    • Updated Trimmomatic module documentation to be more helpful
    • New option to use filenames instead of relying on the command line used. See #864.

New MultiQC Features:

  • Embed your custom images with a new Custom Content feature! Just add _mqc to the end of the filename for .png, .jpg or .jpeg files.
  • Documentation for Custom Content reordered to make it a little more sane
  • You can now add or override any config parameter for any MultiQC plot! See the documentation for more info.
  • Allow table_columns_placement config to work with table IDs as well as column namespaces. See #841.
  • Improved visual spacing between grouped bar plots

Bug Fixes:

  • Custom content no longer clobbers col1_header table configs
  • The option --file-list that refers to a text file with file paths to analyse will no longer ignore directory paths
  • Sample name directory prefixes are now added after cleanup.
  • If a module is run multiple times in one report, it's CSS and JS files will only be included once (default template)
Assets 2

@ewels ewels released this Aug 4, 2018 · 1234 commits to master since this release

Some of these updates are thanks to the efforts of people who attended the NASPM 2018 MultiQC hackathon session. Thanks to everyone who attended!

New Modules:

  • fastp
    • An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...)
    • Module started by @florianduclot and completed by @ewels
  • hap.py
    • Hap.py is a set of programs based on htslib to benchmark variant calls against gold standard truth datasets
    • Module written by @tsnowlan
  • Long Ranger
    • Works with data from the 10X Genomics Chromium. Performs sample demultiplexing, barcode processing, alignment, quality control, variant calling, phasing, and structural variant calling.
    • Module written by @remiolsen
  • miRTrace
    • A quality control software for small RNA sequencing data.
    • Module written by @chuan-wang

Module updates:

  • BCFtools
  • BBMap
    • Support added for BBDuk kmer-based adapter/contaminant filtering summary stats (@boulund
  • FastQC
    • New read count plot, split into unique and duplicate reads if possible.
    • Help text added for all sections, mostly copied from the excellent FastQC help.
    • Sequence duplication plot rescaled
  • FastQ Screen
    • Samples in large-sample-number plot are now sorted alphabetically (@hassanfa
  • MACS2
    • Output is now more tolerant of missing data (no plot if no data)
  • Peddy
    • Background samples now shown in ancestry PCA plot (@roryk)
    • New plot showing sex checks versus het ratios, supporting unknowns (@oyvinev)
  • Picard
    • New submodule to handle ValidateSamFile reports (@cpavanrun)
    • WGSMetrics now add the mean and standard-deviation coverage to the general stats table (hidden) (@cpavanrun)
  • Preseq
    • New config option to plot preseq plots with unique old coverage on the y axis instead of read count
    • Code refactoring by @vladsaveliev
  • QUAST
    • Null values (-) in reports now handled properly. Bargraphs always shown despite varying thresholds. (@vladsaveliev)
  • RNA-SeQC
    • Don't create the report section for Gene Body Coverage if no data is given
  • Samtools
    • Fixed edge case bug where MultiQC could crash if a sample had zero count coverage with idxstats.
    • Adds % proper pairs to general stats table
  • Skewer
    • Read length plot rescaled
  • Tophat
    • Fixed bug where some samples could be given a blank sample name (@lparsons)
  • VerifyBamID
    • Change column header help text for contamination to match percentage output (@chapmanb)

New MultiQC Features:

  • New config option remove_sections to skip specific report sections from modules
  • Add path_filters_exclude to exclude certain files when running modules multiple times. You could previously only include certain files.
  • New exclude_* keys for file search patterns
    • Have a subset of patterns to exclude otherwise detected files with, by filename or contents
  • Command line options all now use mid-word hyphens (not a mix of hyphens and underscores)
    • Old underscore terms still maintained for backwards compatibility
  • Flag --view-tags now works without requiring an "analysis directory".
  • Removed Python dependency for enum34 (@boulund)
  • Columns can be added to General Stats table for custom content/module.
  • New --ignore-symlinks flag which will ignore symlinked directories and files.
  • New --no-megaqc-upload flag which disables automatically uploading data to MegaQC

Bug Fixes

  • Fix path_filters for top_modules/module_order configuration only selecting if all globs match. It now filters searches that match any glob.
  • Empty sample names from cleaning are now no longer allowed
  • Stop prepend_dirs set in the config from getting clobbered by an unpassed CLI option (@tsnowlan)
  • Modules running multiple times now have multiple sets of columns in the General Statistics table again, instead of overwriting one another.
  • Prevent tables from clobbering sorted row orders.
  • Fix linegraph and scatter plots data conversion (sporadically the incorrect ymax was used to drop data points) (@cpavanrun)
  • Adjusted behavior of ceiling and floor axis limits
  • Adjusted multiple file search patterns to make them more specific
    • Prevents the wrong module from accidentally slurping up output from a different tool. By @cpavanrun (see PR #727)
  • Fixed broken report bar plots when -p/--export-plots was specified (see issue #801)
Assets 2

@ewels ewels released this Mar 15, 2018 · 1469 commits to master since this release

New Modules:

  • DeDup - New module!
    • DeDup: Improved Duplicate Removal for merged/collapsed reads in ancient DNA analysis
    • Module written by @apeltzer,
  • Clip&Merge - New module!
    • Clip&Merge: Adapter clipping and read merging for ancient DNA analysis
    • Module written by @apeltzer,

Module updates:

  • bcl2fastq
    • Catch ZeroDivisionError exceptions when there are 0 reads (@aledj2)
    • Add parsing of TrimmedBases and new General Stats column for % bases trimmed (@matthdsm).
  • BUSCO
    • Fixed configuration bug that made all sample names become 'short'
  • Custom Content
    • Parsed tables now exported to multiqc_data files
  • Cutadapt
    • Refactor parsing code to collect all length trimming plots
  • FastQC
    • Fixed starting y-axis label for GC-content lineplot being incorrect.
  • HiCExplorer
    • Updated to work with v2.0 release.
  • Homer
    • Made parsing of tagInfo.txt file more resilient to variations in file format so that it works with new versions of Homer.
    • Kept order of chromosomes in coverage plot consistent.
  • Peddy
    • Switch Sex error logic to Correct sex for better highlighting (@aledj2)
  • Picard
    • Updated module and search patterns to recognise new output format from Picard version >= 2.16 and GATK output.
  • Qualimap BamQC
    • Fixed bug where start of Genome Fraction could have a step if target is 100% covered.
  • RNA-SeQC
    • Added rRNA alignment stats to summary table @Rolandde
  • RSeqC
    • Fixed read distribution plot by adding category for other_intergenic (thanks to @moxgreen)
    • Fixed a dodgy plot title (Read GC content)
  • Supernova
    • Added support for Supernova 2.0 reports. Fixed a TypeError bug when using txt reports only. Also a bug when parsing empty histogram files.

New MultiQC Features:

  • Invalid choices for --module or --exclude now list the available modules alphabetically.
  • Linting now checks for presence in config.module_order and tags.

Bug Fixes

  • Excluding modules now works in combination with using module tags.
  • Fixed edge-case bug where certain combinations of output_fn_name and data_dir_name could trigger a crash
  • Conditional formatting - values are now longer double-labelled
  • Made config option extra_series work in scatter plots the same way that it works for line plots
  • Locked the matplotlib version to v2.1.0 and below
    • Due to two bugs that appeared in v2.2.0 - will remove this constraint when there's a new release that works again.
Assets 2

@ewels ewels released this Jan 11, 2018 · 1561 commits to master since this release

A slightly earlier-than-expected release due to a new problem with dependency packages that is breaking MultiQC installations since 2018-01-11.

New Modules:

  • Sargasso
    • Parses output from Sargasso - a tool to separate mixed-species RNA-seq reads according to their species of origin
    • Module written by @hxin
  • VerifyBAMID
    • Parses output from VerifyBAMID - a tool to detect contamination in BAM files.
    • Adds the CHIPMIX and FREEMIX columns to the general statistics table.
    • Module written by @aledj2

Module updates:

  • MACS2
    • Updated to work with output from older versions of MACS2 by @avilella
  • Peddy
    • Add het check plot to suggest potential contamination by @aledj2
  • Picard
    • Picard HsMetrics HS_PENALTY plot now has correct axis labels
    • InsertSizeMetrics switches commas for points if it can't convert floats. Should help some european users.
  • QoRTs
    • Added support for new style of output generated in the v1.3.0 release
  • Qualimap
    • New Error rate column in General Statistics table, added by @Cashalow
      • Hidden by default - customise your MultiQC config to always show this column (see docs)
  • QUAST
    • New option to customise the default display of contig count and length (eg. bp instead of Mbp).
    • See documentation. Written by @ewels and @Cashalow
  • RSeQC
    • Removed normalisation in Junction Saturation plot. Now raw counts instead of % of total junctions.

New MultiQC Features:

  • Conditional formatting / highlighting of cell contents in tables
    • If you want to make values that match a criteria stand out more, you can now write custom rules and formatting instructions for tables.
    • For instructions, see the documentation
  • New --lint option which is strict about best-practices for writing new modules
    • Useful when writing new modules and code as it throws warnings
    • Currently only implemented for bar plots and a few other places. More linting coming soon...
  • If MultiQC breaks and shows am error message, it now reports the filename of the last log it found
    • Hopefully this will help with debugging / finding dodgy input data

Bug Fixes

  • Addressed new dependency error with conflicting package requirements
    • There was a conflict between the networkx, colormath and spectra releases.
    • I previously forced certain software versions to get around this, but spectra has now updated with the unfortunate effect of introducing a new dependency clash that halts installation.
  • Fixed newly introduced bug where Custom Content MultiQC config file search patterns had been broken
  • Updated pandoc command used in --pdf to work with new releases of Pandoc
  • Made config table_columns_visible module name key matching case insensitive to make less frustrating
Assets 2

@ewels ewels released this Nov 3, 2017 · 1672 commits to master since this release

There are 34 merged pull-requests in this release - a fantastic example of how an open source community can develop a tool! Many thanks to everyone involved for their hard work.

Breaking changes - custom search patterns

Only for users with custom search patterns for the bowtie or star: you will
need to update your config files - the bowtie search key is now bowtie1,
star_genecounts is now star/genecounts.

For users with custom modules - search patterns must now conform to the search
pattern naming convention: modulename or modulename/anything (the search pattern
string beginning with the name of your module, anything you like after the first /).

New Modules:

  • 10X Supernova
    • Parses statistics from the de-novo Supernova software.
    • Module written by @remiolsen
  • BBMap
    • Plot metrics from a number of BBMap tools, a suite of DNA/RNA mapping tools and utilities
    • Module written by @boulund and @epruesse
  • deepTools - new module!
    • Parse text output from bamPEFragmentSize, estimateReadFiltering, plotCoverage, plotEnrichment, and plotFingerprint
    • Module written by @dpryan79
  • Homer Tag Directory - new submodule!
  • illumina InterOp
    • Module to parse metrics from illumina sequencing runs and demultiplexing, generated by the InterOp package
    • Module written by @matthdsm
  • RSEM - new module!
    • Parse .cnt file comming from rsem-calculate-expression and plot read repartitions (Unalignable, Unique, Multi ...)
    • Module written by @noirot
  • HiCExplorer
    • New module to parse the log files of hicBuildMatrix.
    • Module written by @joachimwolff

Module updates:

  • AfterQC
    • Handle new output format where JSON summary key changed names.
  • bcl2fastq
    • Clusters per sample plot now has tab where counts are categoried by lane.
  • GATK
    • New submodule to handle Base Recalibrator stats, written by @winni2k
  • HiSAT2
    • Fixed bug where plot title was incorrect if both SE and PE bargraphs were in one report
  • Picard HsMetrics
    • Parsing code can now handle commas for decimal places
  • Preseq
    • Updated odd file-search pattern that limited input files to 500kb
  • QoRTs
    • Added new plots, new helptext and updated the module to produce a lot more output.
  • Qualimap BamQC
    • Fixed edge-case bug where the refactored coverage plot code could raise an error from the range call.
  • Documentation and link fixes for Slamdunk, GATK, bcl2fastq, Adapter Removal, FastQC and main docs
  • Went through all modules and standardised plot titles
    • All plots should now have a title with the format Module name: Plot name

New MultiQC Features:

  • New MultiQC docker image
  • New module_order config options allow modules to be run multiple times
    • Filters mean that a module can be run twice with different sets of files (eg. before and after trimming)
    • Custom module config parameters can be passed to module for each run
  • File search refactored to only search for running modules
    • Makes search much faster when running with lots of files and limited modules
    • For example, if using -m star to only use the STAR module, all other file searches now skipped
  • File search now warns if an unrecognised search type is given
  • MultiQC now saves nearly all parsed data to a structured output file by default
    • See multiqc_data/multiqc_data.json
    • This can be turned off by setting config.data_dump_file: false
  • Verbose logging when no log files found standardised. Less duplication in code and logs easier to read!
  • New documentation section describing how to use MultiQC with Galaxy
  • Using shared_key: 'read_counts' in table header configs now applies relevant defaults

Bug Fixes

  • Installation problem caused by changes in upstream dependencies solved by stricter installation requirements
  • Minor default_dev directory creation bug squashed
  • Don't prepend the directory separator (|) to sample names with -d when there are no subdirs
  • yPlotLines now works even if you don't set width
Assets 2

@ewels ewels released this Aug 16, 2017 · 1881 commits to master since this release

CodeFest 2017 Contributions

We had a fantastic group effort on MultiQC at the 2017 BOSC CodeFest.
Many thanks to those involved!

New Modules:

  • AfterQC - New module!
    • Added parsing of the AfterQC json file data, with a plot of filtered reads.
    • Work by @raonyguimaraes
  • bcl2fastq
    • bcl2fastq can be used to both demultiplex data and convert BCL files to FASTQ file formats for downstream analysis
    • New module parses JSON output from recent versions and summarises some key statistics from the demultiplexing process.
    • Work by @iimog (with a little help from @tbooth and @ewels)
  • leeHom
    • leeHom is a program for the Bayesian reconstruction of ancient DNA
  • VCFTools
    • Added initial support for VCFTools relatedness2
    • Added support for VCFTools TsTv-by-count TsTv-by-qual TsTv-summary
    • Module written by @mwhamgenomics

Module updates:

  • FastQ Screen
    • Gracefully handle missing data from very old FastQ Screen versions.
  • RNA-SeQC
    • Add new transcript-associated reads plot.
  • Picard
    • New submodule to handle output from TargetedPcrMetrics
  • Prokka
    • Added parsing of the # CRISPR arrays data from Prokka when available (@asetGem)
  • Qualimap
    • Some code refactoring to radically improve performance and run times, especially with high coverage datasets.
    • Fixed bug where Cumulative coverage genome fraction plot could be truncated.

New MultiQC Features:

  • New module help text
    • Lots of additional help text was written to make MultiQC report plots easier to interpret.
    • Updated modules:
      • Bowtie
      • Bowtie 2
      • Prokka
      • Qualimap
      • SnpEff
    • Elite team of help-writers:
  • New config option section_comments allows you to add custom comments above specific sections in the report
  • New --tags and --view_tags command line options
    • Modules can now be given tags (keywords) and filtered by those. So running --tags RNA will only run MultiQC modules related to RNA analysis.
    • Work by @Hammarn
  • Back-end configuration options to specify the order of table columns
    • Modules and user configs can set priorities for columns to customise where they are displayed
    • Work by @tbooth
  • Added framework for proper unit testing
    • Previous start on unit tests tidied up, new blank template and tests for the clean_sample_name functionality.
    • Added to Travis and Appveyor for continuous integration testing.
    • Work by @tbooth
  • Bug fixes and refactoring of report configuration saving / loading
    • Discovered and fixed a bug where a report config could only be loaded once
    • Work by @DennisSchwartz
  • Table column row headers (sample names) can now be numeric-only.
  • Improved sample name cleaning functionality
    • Added option regex_keep to clean filenames by keeping the matching part of a pattern
    • Work by @robinandeer
  • Handle error when invalid regexes are given in reports
    • Now have a nice toast error warning you and the invalid regexes are highlighted
    • Previously this just crashed the whole report without any warning
    • Work by @robinandeer
  • Command line option --dirs-depth now sets -d to True (so now works even if -d isn't also specified).
  • New config option config.data_dump_file to export as much data as possible to multiqc_data/multiqc_data.json
  • New code to send exported JSON data to a a web server
    • This is in preparation for the upcoming MegaQC project. Stay tuned!

Bug Fixes:

  • Specifying multiple config files with -c/--config now works as expected
    • Previously this would only read the last specified
  • Fixed table rendering bug that affected Chrome v60 and IE7-11
    • Table cell background bars weren't showing up. Updated CSS to get around this rendering error.
  • HTML ID cleanup now properly cleans strings so that they work with jQuery as expected.
  • Made bar graph sample highlighting work properly again
  • Config custom_logo paths can now be relative to the config file (or absolute as before)
  • Report doesn't keep annoyingly telling you that toolbox changes haven't been applied
    • Now uses more subtle toasts and only when you close the toolbox (not every click).
  • Switching report toolbox options to regex mode now enables the Apply button as it should.
  • Sorting table columns with certain suffixes (eg. 13X) no works properly (numerically)
  • Fixed minor bug in line plot data smoothing (now works with unsorted keys)
Assets 2

@ewels ewels released this Jul 18, 2017 · 2073 commits to master since this release

New Modules:

  • BioBloom Tools
    • Create Bloom filters for a given reference and then to categorize sequences
  • Conpair
    • Concordance and contamination estimator for tumor–normal pairs
  • Disambiguate
    • Bargraph displaying the percentage of reads aligning to two different reference genomes.
  • Flexbar
    • Flexbar is a tool for flexible barcode and adapter removal.
  • HISAT2
    • New module for the HISAT2 aligner.
    • Made possible by updates to HISAT2 logging by @infphilo (requires --new-summary HISAT2 flag).
  • HOMER
    • Support for summary statistics from the findPeaks tool.
  • Jellyfish
    • Histograms to estimate library complexity and coverage from k-mer content.
    • Module written by @vezzi
  • MACS2
    • Summary of redundant rate from MACS2 peak calling.
  • QoRTs
    • QoRTs is toolkit for analysis, QC and data management of RNA-Seq datasets.
  • THetA2
    • THeTA2 (Tumor Heterogeneity Analysis) estimates tumour purity and clonal / subclonal copy number.

Module updates:

  • BCFtools
    • Option to collapse complementary changes in substitutions plot, useful for non-strand specific experiments (thanks to @vladsaveliev)
  • Bismark
    • M-Bias plots no longer show read 2 for single-end data.
  • Custom Content
    • New option to print raw HTML content to the report.
  • FastQ Screen
    • Fixed edge-case bug where many-sample plot broke if total number of reads was less than the subsample number.
    • Fixed incorrect logic of config option fastqscreen_simpleplot (thanks to @daler)
    • Organisms now alphabetically sorted in fancy plot so that order is nonrandom (thanks to @daler)
    • Fixed bug where %No Hits was missed in logs from recent versions of FastQ Screen.
  • HTSeq Counts
    • Fixed but so that module still works when --additional-attr is specified in v0.8 HTSeq above (thanks to @nalcala)
  • Picard
    • CollectInsertSize: Fixed bug that could make the General Statistics Median Insert Size value incorrect.
    • Fixed error in sample name regex that left trailing ] characters and was generally broken (thanks to @jyh1 for spotting this)
  • Preseq
  • Qualimap
    • Only calculate bases over target coverage for values in General Statistics. Should give a speed increase for very high coverage datasets.
  • QUAST
  • RSeQC
    • Changed default order of sections
    • Added config option to reorder and hide module report sections

New MultiQC features:

  • If a report already exists, execution is no longer halted.
    • _1 is appended to the filename, iterating if this also exists.
    • -f/--force still overwrites existing reports as before
    • Feature written by @Hammarn
  • New ability to run modules multiple times in a single report
    • Each run can be given different configuration options, including filters for input files
    • For example, have FastQC after trimming as well as FastQC before trimming.
    • See the relevant documentation for more instructions.
  • New option to customise the order of report sections
    • This is in addition / alternative to changing the order of module execution
    • Allows one module to have sections in multiple places (eg. Custom Content)
  • Tables have new column options floor, ceiling and minRange.
  • Reports show warning if JavaScript is disabled
  • Config option custom_logo now works with file paths relative to config file directory and cwd.

Bug Fixes:

  • Table headers now sort columns again after scrolling the table
  • Fixed buggy table header tooltips
  • Base clean_s_name function now strips excess whitespace.
  • Line graphs don't smooth lines if not needed (number of points < maximum number allowed)
  • PDF output now respects custom output directory.
Assets 2

@ewels ewels released this May 17, 2017 · 2193 commits to master since this release

Version 1.0! This release has been a long time coming and brings with it some fairly
major improvements in speed, report filesize and report performance. There's also
a bunch of new modules, more options, features and a whole lot of bug fixes.

The version number is being bumped up to 1.0 for a couple of reasons:

  1. MultiQC is now (hopefully) relatively stable. A number of facilities and users
    are now using it in a production setting and it's published. It feels like it
    probably deserves v1 status now somehow.
  2. This update brings some fairly major changes which will break backwards
    compatibility for plugins. As such, semantic versioning suggests a change in
    major version number.

Breaking Changes

For most people, you shouldn't have any problems upgrading. There are two
scenarios where you may need to make changes with this update:

1. You have custom file search patterns

Search patterns have been flattened and may no longer have arbitrary depth.
For example, you may need to change the following:

fastqc:
    data:
        fn: 'fastqc_data.txt'
    zip:
        fn: '*_fastqc.zip'

to this:

fastqc/data:
    fn: 'fastqc_data.txt'
fastqc/zip:
    fn: '*_fastqc.zip'

See the documentation for instructions on how to write the new file search syntax.

See search_patterns.yaml for the new module search keys
and more examples.

2. You have custom plugins / modules / external code

To see what changes need to applied to your custom plugin code, please see the MultiQC docs.

Module updates:

  • Adapter Removal - new module!
    • AdapterRemoval v2 - rapid adapter trimming, identification, and read merging
  • BUSCO - new module!
    • New module for the BUSCO v2 tool, used for assessing genome assembly and annotation completeness.
  • Cluster Flow - new module!
    • Cluster Flow is a workflow tool for bioinformatics pipelines. The new module parses executed tool commands.
  • RNA-SeQC - new module!
    • New module to parse output from RNA-SeQC, a java program which computes a series
      of quality control metrics for RNA-seq data.
  • goleft indexcov - new module! Thanks to @chapmanb and @brentp
    • goleft indexcov uses the PED and ROC
      data files to create diagnostic plots of coverage per sample, helping to identify sample gender and coverage issues.
  • SortMeRNA - new module! Written by @bschiffthaler
    • New module for SortMeRNA, commonly used for removing rRNA contamination from datasets.
  • Bcftools
    • Fixed bug with display of indels when only one sample
  • Cutadapt
    • Now takes the filename if the sample name is - (stdin). Thanks to @tdido
  • FastQC
    • Data for the Sequence content plot can now be downloaded from reports as a JSON file.
  • FastQ Screen
    • Rewritten plotting method for high sample numbers plot (~ > 20 samples)
    • Now shows counts for single-species hits and bins all multi-species hits
    • Allows plot to show proper percentage view for each sample, much easier to interpret.
  • HTSeq
    • Fix bug where header lines caused module to crash
  • Picard
    • New RrbsSummaryMetrics Submodule!
    • New WgsMetrics Submodule!
    • CollectGcBiasMetrics module now prints summary statistics to multiqc_data if found. Thanks to @ahvigil
  • Preseq
    • Now trims the x axis to the point that meets 90% of min(unique molecules).
      Hopefully prevents ridiculous x axes without sacrificing too much useful information.
    • Allows to show estimated depth of coverage instead of less informative molecule counts
      (see details).
    • Plots dots with externally calculated real read counts (see details).
  • Qualimap
    • RNASeq Transcript Profile now has correct axis units. Thanks to @roryk
    • BamQC module now doesn't crash if reports don't have genome gc distributions
  • RSeQC
    • Fixed Python3 error in Junction Saturation code
    • Fixed JS error for Junction Saturation that made the single-sample combined plot only show All Junctions

Core MultiQC updates:

  • Change in module structure and import statements (see details).
  • Module file search has been rewritten (see above changes to configs)
    • Significant improvement in search speed (test dataset runs in approximately half the time)
    • More options for modules to find their logs, eg. filename and contents matching regexes (see the docs)
  • Report plot data is now compressed, significantly reducing report filesizes.
  • New --ignore-samples option to skip samples based on parsed sample name
    • Alternative to filtering by input filename, which doesn't always work
    • Also can use config vars sample_names_ignore (glob patterns) and sample_names_ignore_re (regex patterns).
  • New --sample-names command line option to give file with alternative sample names
    • Allows one-click batch renaming in reports
  • New --cl_config option to supply MultiQC config YAML directly on the command line.
  • New config option to change numeric multiplier in General Stats
    • For example, if reports have few reads, can show Thousands of Reads instead of Millions of Reads
    • Set config options read_count_multiplier, read_count_prefix and read_count_desc
  • Config options decimalPoint_format and thousandsSep_format now apply to tables as well as plots
    • By default, thosands will now be separated with a space and . used for decimal places.
  • Tables now have a maximum-height by default and scroll within this.
    • Speeds up report rendering in the web browser and makes report less stupidly long with lots of samples
    • Button beneath table toggles full length if you want a zoomed-out view
    • Refactored and removed previous code to make the table header "float"
    • Set config.collapse_tables to False to disable table maximum-heights
  • Bar graphs and heatmaps can now be zoomed in on
    • Interactive plots sometimes hide labels due to lack of space. These can now be zoomed in on to see specific samples in more detail.
  • Report plots now load sequentially instead of all at once
    • Prevents the browser from locking up when large reports load
  • Report plot and section HTML IDs are now sanitised and checked for duplicates
  • New template available (called sections) which has faster loading
    • Only shows results from one module at a time
    • Makes big reports load in the browser much more quickly, but requires more clicking
    • Try it out by specifying -t sections
  • Module sections tidied and refactored
    • New helper function self.add_section()
    • Sections hidden in nav if no title (no more need for the hacky self.intro += )
    • Content broken into description, help and plot, with automatic formatting
    • Empty module sections are now skipped in reports. No need to check if a plot function returns None!
    • Changes should be backwards-compatible
  • Report plot data export code refactored
    • Now doesn't export hidden samples (uses HighCharts export-csv plugin)
  • Handle error when git isn't installed on the system.
  • Refactored colouring of table cells
    • Was previously done in the browser using chroma.js
    • Now done at report generation time using the spectra package
    • Should helpfully speed up report rendering time in the web browser, especially for large reports
  • Docs updates (thanks to @varemo)
  • Previously hidden log file .multiqc.log renamed to multiqc.log in multiqc_data
  • Added option to load MultiQC config file from a path specified in the environment variable MULTIQC_CONFIG_PATH
  • New table configuration options
    • sortRows: False prevents table rows from being sorted alphabetically
    • col1_header allows the default first column header to be changed from "Sample Name"
  • Tables no longer show Configure Columns and Plot buttons if they only have a single column
  • Custom content updates
    • New custom_content/order config option to specify order of Custom Content sections
    • Tables now use the header for the first column instead of always having Sample Name
    • JSON + YAML tables now remember order of table columns
    • Many minor bugfixes
  • Line graphs and scatter graphs axis limits
    • If limits are specified, data exceeding this is no longer saved in report
    • Visually identical, but can make report file sizes considerable smaller in some cases
  • Creating multiple plots without a config dict now works (previously just gave grey boxes in report)
  • All changes are now tested on a Windows system, using AppVeyor
  • Fixed rare error where some reports could get empty General Statistics tables when no data present.
  • Fixed minor bug where config option force: true didn't work. Now you don't have to always specify -f!
Assets 2
You can’t perform that action at this time.