Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kraken2 report not parsed #1428

Closed
rrdavis77 opened this issue May 14, 2021 · 2 comments
Closed

Kraken2 report not parsed #1428

rrdavis77 opened this issue May 14, 2021 · 2 comments
Labels
bug: module Bug in a MultiQC module

Comments

@rrdavis77
Copy link

Description of bug: Multiqc will not parse kraken2 reports correctly if the unclassified portion is 100%

MultiQC Error log: No Error logs. Multiqc outputs no data and no errors for the files with 100% unclassified

No error log reported

File that triggers the error:GSR-SWIFT-2021-5-04-FF08511340.kraken2.report.txt

MultiQC run details (please complete the following):

  • Command used to run MultiQC: multiqc .
  • MultiQC Version: MultiQC v1.10.1
  • Operating System: Ubuntu 16.04
  • Python Version: Python 3.9.2
  • Method of MultiQC installation: conda

Additional context

Came across some samples when running nfcore/viralrecon that align 100% to the viral genome and 0% to human. When running kraken2 the report is generated for all samples submitted. However when MultiQC is run on samples that report 100% to Unclassified the regex used to search the report file misses the 100% and absence of white space and therefore does not report that data in the mutliqc html.
I think the problem is with the regex (for my current report format) that lies here: https://github.com/ewels/MultiQC/blob/daee3a80e5dd35edea126221f3902673f2ff663c/multiqc/modules/kraken/kraken.py#L118
I tried playing around with the regex to account for this scenario. In the case of 100% reported to unclassified there is no leading white space and 3 digits reported for the percentage. I tried changing it to :
k2_regex = re.compile(r"^\s{0,2}(\d{1,3}\.\d{1,2})\t(\d+)\t(\d+)\t([\dUDKPCOFGS-]{1,3})\t(\d+)(\s+)(.+)")
but that didn't fix it so I am reaching out to the big guns!

@ewels ewels added the bug: module Bug in a MultiQC module label May 14, 2021
ErikDanielsson added a commit to ErikDanielsson/MultiQC_TestData that referenced this issue Jun 15, 2021
@ewels
Copy link
Member

ewels commented Jun 15, 2021

Many thanks for reporting @rrdavis77, and for the suggested fix!

This has just been implemented by @ErikDanielsson in #1453 and seems to work in my hands now too. It's now merged so should be available in version 1.11dev. Let us know if you still hit any problems with it!

Cheers,

Phil

@ewels ewels closed this as completed Jun 15, 2021
@rrdavis77
Copy link
Author

Not sure if this an ok way to test but copied the new regex code into my conda environment and it still failed to pick those up. I'll try download the dev version and see what happens with that. I tested on the file linked above.

vladsavelyev pushed a commit to vladsavelyev/MultiQC_TestData that referenced this issue Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module
Projects
None yet
Development

No branches or pull requests

2 participants