Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Per Base Sequence Content" in FastQC module can't handle NaN values #1246

Closed
matrulda opened this issue Jul 8, 2020 · 3 comments
Closed
Labels
bug: module Bug in a MultiQC module

Comments

@matrulda
Copy link
Contributor

matrulda commented Jul 8, 2020

Description of bug:
I discovered that the FastQC plot "Per Base Sequence Content" was empty. All other plots in the FastQC module were generated.
per_base_sequence_content
After troubleshooting a bit, I found that the problem was caused by NaN values in fastqc_data.txt. When replacing the NaN values with numbers, the plots were generated as expected.

Would it be possible to add support for NaN values? Or at least print a warning when this happens?

MultiQC Error log:
No error at all.


[2020-07-08 15:40:54,327] multiqc                                            [DEBUG  ]  No MultiQC config found: /lupus/sw/bioinfo/MultiQC/1.9/rackham/lib/python3.7/site-packages/multiqc_config.yaml
[2020-07-08 15:40:54,327] multiqc                                            [DEBUG  ]  No MultiQC config found: /home/matildaa/.multiqc_config.yaml
[2020-07-08 15:40:54,327] multiqc                                            [DEBUG  ]  No MultiQC config found: multiqc_config.yaml
[2020-07-08 15:40:54,327] multiqc                                            [DEBUG  ]  Command used: /sw/bioinfo/MultiQC/1.9/irma/bin/multiqc -v -m fastqc .
[2020-07-08 15:40:54,330] multiqc                                            [DEBUG  ]  Could not connect to multiqc.info for version check: <urlopen error [Errno 113] No route to host>
[2020-07-08 15:40:54,330] multiqc                                            [INFO   ]  This is MultiQC v1.9
[2020-07-08 15:40:54,331] multiqc                                            [DEBUG  ]  Command     : /sw/bioinfo/MultiQC/1.9/irma/bin/multiqc -v -m fastqc .
[2020-07-08 15:40:54,331] multiqc                                            [DEBUG  ]  Working dir : /tmp/troubleshooting_fastqc
[2020-07-08 15:40:54,331] multiqc                                            [INFO   ]  Template    : default
[2020-07-08 15:40:54,331] multiqc                                            [DEBUG  ]  Running Python 3.7.2 (default, Mar 14 2019, 18:52:14)  [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
[2020-07-08 15:40:54,331] multiqc                                            [INFO   ]  Searching   : /tmp/troubleshooting_fastqc
[2020-07-08 15:40:54,332] multiqc                                            [INFO   ]  Only using modules fastqc
[2020-07-08 15:40:54,332] multiqc                                            [DEBUG  ]  Analysing modules: fastqc
[2020-07-08 15:40:54,333] multiqc                                            [DEBUG  ]  Using temporary directory for creating report: /scratch/tmpms4c9j77
[2020-07-08 15:40:54,497] multiqc                                            [DEBUG  ]  Ignored 231 search patterns as didn't match running modules.
Searching 2 files..
[2020-07-08 15:40:54,769] multiqc.plots.linegraph                            [DEBUG  ]  Using matplotlib version 3.0.3
[2020-07-08 15:40:54,770] multiqc.plots.bargraph                             [DEBUG  ]  Using matplotlib version 3.0.3
[2020-07-08 15:40:54,876] multiqc.modules.fastqc.fastqc                      [INFO   ]  Found 1 reports
[2020-07-08 15:40:54,936] multiqc                                            [INFO   ]  Compressing plot data
[2020-07-08 15:40:54,970] multiqc                                            [INFO   ]  Report      : multiqc_report.html
[2020-07-08 15:40:54,970] multiqc                                            [INFO   ]  Data        : multiqc_data
[2020-07-08 15:40:54,971] multiqc                                            [DEBUG  ]  Moving data file from '/scratch/tmpms4c9j77/multiqc_data' to '/tmp/troubleshooting_fastqc/multiqc_data'
[2020-07-08 15:40:55,128] multiqc                                            [INFO   ]  MultiQC complete

File that triggers the error:
fastqc_data.txt
Note: I've modified the file a bit: Replacing overrepresented sequences with NNNNNNNNN

MultiQC run details (please complete the following):

  • Command used to run MultiQC: multiqc -v -m fastqc .
  • MultiQC Version: MultiQC v1.9 (same result when using running in quay.io/biocontainers/multiqc:1.8--py_1)
  • Operating System: CentOS Linux release 7.8.2003 (Core) (Uppmax cluster Irma)
  • Python Version: 3.7.2
  • Method of MultiQC installation: Uppmax module system

Additional context

@ewels ewels added the bug: core Bug in the main MultiQC code label Jul 8, 2020
@ewels
Copy link
Member

ewels commented Jul 8, 2020

Thanks for reporting this @matrulda - NaN values are always a pain.

@matrulda
Copy link
Contributor Author

matrulda commented Jul 8, 2020

Yeah, they are. They probably don't occur that often, just for bad runs like this. But even bad runs deserves plots. 🙏

@ewels ewels added bug: module Bug in a MultiQC module and removed bug: core Bug in the main MultiQC code labels Dec 28, 2020
@ewels ewels closed this as completed in 414fcc5 Mar 31, 2021
@ewels
Copy link
Member

ewels commented Mar 31, 2021

Hi @matrulda,

Finally got to this - should now be fixed in the dev branch on master 👍🏻 Let me know if it works / you run into any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module
Projects
None yet
Development

No branches or pull requests

2 participants