You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MultiQC run details (please complete the following):
Command used to run MultiQC: multiqc .
MultiQC Version: MultiQC v1.10.1
Operating System: Ubuntu 16.04
Python Version: Python 3.9.2
Method of MultiQC installation: conda
Additional context
Came across some samples when running nfcore/viralrecon that align 100% to the viral genome and 0% to human. When running kraken2 the report is generated for all samples submitted. However when MultiQC is run on samples that report 100% to Unclassified the regex used to search the report file misses the 100% and absence of white space and therefore does not report that data in the mutliqc html.
I think the problem is with the regex (for my current report format) that lies here: https://github.com/ewels/MultiQC/blob/daee3a80e5dd35edea126221f3902673f2ff663c/multiqc/modules/kraken/kraken.py#L118
I tried playing around with the regex to account for this scenario. In the case of 100% reported to unclassified there is no leading white space and 3 digits reported for the percentage. I tried changing it to : k2_regex = re.compile(r"^\s{0,2}(\d{1,3}\.\d{1,2})\t(\d+)\t(\d+)\t([\dUDKPCOFGS-]{1,3})\t(\d+)(\s+)(.+)")
but that didn't fix it so I am reaching out to the big guns!
The text was updated successfully, but these errors were encountered:
Many thanks for reporting @rrdavis77, and for the suggested fix!
This has just been implemented by @ErikDanielsson in #1453 and seems to work in my hands now too. It's now merged so should be available in version 1.11dev. Let us know if you still hit any problems with it!
Not sure if this an ok way to test but copied the new regex code into my conda environment and it still failed to pick those up. I'll try download the dev version and see what happens with that. I tested on the file linked above.
vladsavelyev
pushed a commit
to vladsavelyev/MultiQC_TestData
that referenced
this issue
Apr 16, 2022
Description of bug: Multiqc will not parse kraken2 reports correctly if the unclassified portion is 100%
MultiQC Error log: No Error logs. Multiqc outputs no data and no errors for the files with 100% unclassified
File that triggers the error:GSR-SWIFT-2021-5-04-FF08511340.kraken2.report.txt
MultiQC run details (please complete the following):
multiqc .
Additional context
Came across some samples when running nfcore/viralrecon that align 100% to the viral genome and 0% to human. When running kraken2 the report is generated for all samples submitted. However when MultiQC is run on samples that report 100% to Unclassified the regex used to search the report file misses the 100% and absence of white space and therefore does not report that data in the mutliqc html.
I think the problem is with the regex (for my current report format) that lies here: https://github.com/ewels/MultiQC/blob/daee3a80e5dd35edea126221f3902673f2ff663c/multiqc/modules/kraken/kraken.py#L118
I tried playing around with the regex to account for this scenario. In the case of 100% reported to unclassified there is no leading white space and 3 digits reported for the percentage. I tried changing it to :
k2_regex = re.compile(r"^\s{0,2}(\d{1,3}\.\d{1,2})\t(\d+)\t(\d+)\t([\dUDKPCOFGS-]{1,3})\t(\d+)(\s+)(.+)")
but that didn't fix it so I am reaching out to the big guns!
The text was updated successfully, but these errors were encountered: