Kraken2 report not parsed #1428

rrdavis77 · 2021-05-14T18:40:38Z

Description of bug: Multiqc will not parse kraken2 reports correctly if the unclassified portion is 100%

MultiQC Error log: No Error logs. Multiqc outputs no data and no errors for the files with 100% unclassified

No error log reported

File that triggers the error:GSR-SWIFT-2021-5-04-FF08511340.kraken2.report.txt

MultiQC run details (please complete the following):

Command used to run MultiQC: multiqc .
MultiQC Version: MultiQC v1.10.1
Operating System: Ubuntu 16.04
Python Version: Python 3.9.2
Method of MultiQC installation: conda

Additional context

Came across some samples when running nfcore/viralrecon that align 100% to the viral genome and 0% to human. When running kraken2 the report is generated for all samples submitted. However when MultiQC is run on samples that report 100% to Unclassified the regex used to search the report file misses the 100% and absence of white space and therefore does not report that data in the mutliqc html.
I think the problem is with the regex (for my current report format) that lies here: https://github.com/ewels/MultiQC/blob/daee3a80e5dd35edea126221f3902673f2ff663c/multiqc/modules/kraken/kraken.py#L118
I tried playing around with the regex to account for this scenario. In the case of 100% reported to unclassified there is no leading white space and 3 digits reported for the percentage. I tried changing it to :
k2_regex = re.compile(r"^\s{0,2}(\d{1,3}\.\d{1,2})\t(\d+)\t(\d+)\t([\dUDKPCOFGS-]{1,3})\t(\d+)(\s+)(.+)")
but that didn't fix it so I am reaching out to the big guns!

The text was updated successfully, but these errors were encountered:

ewels · 2021-06-15T21:51:47Z

Many thanks for reporting @rrdavis77, and for the suggested fix!

This has just been implemented by @ErikDanielsson in #1453 and seems to work in my hands now too. It's now merged so should be available in version 1.11dev. Let us know if you still hit any problems with it!

Cheers,

Phil

rrdavis77 · 2021-06-16T00:39:02Z

Not sure if this an ok way to test but copied the new regex code into my conda environment and it still failed to pick those up. I'll try download the dev version and see what happens with that. I tested on the file linked above.

ewels added the bug: module Bug in a MultiQC module label May 14, 2021

ErikDanielsson added a commit to ErikDanielsson/MultiQC_TestData that referenced this issue Jun 15, 2021

Added file to reproduce MultiQC/MultiQC#1428

23bbb97

This was referenced Jun 15, 2021

Fixed Kraken2 reports not being parsed #1453

Merged

Added file to reproduce kraken2 file not parsing MultiQC/test-data#195

Merged

ewels closed this as completed Jun 15, 2021

vladsavelyev pushed a commit to vladsavelyev/MultiQC_TestData that referenced this issue Apr 16, 2022

Added file to reproduce MultiQC/MultiQC#1428

d5eb37d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kraken2 report not parsed #1428

Kraken2 report not parsed #1428

rrdavis77 commented May 14, 2021

ewels commented Jun 15, 2021

rrdavis77 commented Jun 16, 2021

Kraken2 report not parsed #1428

Kraken2 report not parsed #1428

Comments

rrdavis77 commented May 14, 2021

ewels commented Jun 15, 2021

rrdavis77 commented Jun 16, 2021