A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report
generated on 2023-11-10, 20:37 UTC
based on data in:
/src/MultiQC_TestData/data/modules/kraken
General Statistics
Showing 12/12 rows and 3/3 columns.Sample Name | % Peptoclostridium difficile | % Top 5 Species | % Unclassified |
---|---|---|---|
GSR-SWIFT-2021-5-04-FF08511340 | 0.0% | 0.0% | 100.0% |
genus_only.kreport | 0.0% | 1.4% | |
metagenome.kreport | 0.0% | 46.0% | 54.0% |
sample1 | 0.0% | 0.3% | 99.7% |
sample2 | 0.0% | 99.4% | 0.3% |
sample_01 | 82.3% | 82.3% | 15.8% |
sample_02 | 87.3% | 87.3% | 11.7% |
sample_03 | 88.8% | 88.8% | 10.3% |
sample_04 | 89.7% | 89.7% | 9.3% |
test | 0.0% | 100.0% | 0.0% |
test1.kreport | 0.0% | 100.0% | |
test2.kreport | 0.0% | 0.0% | 100.0% |
Kraken
Kraken is a taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.DOI: 10.1186/gb-2014-15-3-r46.
Top taxa
The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top 5 taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.
The total number of reads is approximated by dividing the number of unclassified
reads by the percentage of
the library that they account for.
Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
Note that any taxon that does not exactly fit a taxon rank (eg. -
or G2
) is ignored.
Duplication rate of top species
The duplication rate of minimizer falling into the top 5 species
To make this plot, the minimizer duplication rate is computed for the top 5 most abundant species in all samples.
The minimizer duplication rate is defined as: duplication rate = (total number of minimizers / number of distinct minimizers)
A low coverage and high duplication rate (>> 1
) is often sign of read stacking, which probably indicates of false positive hit.