Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to plot "Filtered Reads" for cutadapt, when no read passed filters #1328

Closed
qifei9 opened this issue Oct 26, 2020 · 6 comments
Closed
Assignees
Labels
bug: module Bug in a MultiQC module
Milestone

Comments

@qifei9
Copy link

qifei9 commented Oct 26, 2020

Description of bug:
I run multiQC for a cutadapt log, which says:

=== Summary ===

Total read pairs processed:          9,029,687
  Read 1 with adapter:               8,812,237 (97.6%)
  Read 2 with adapter:               8,609,012 (95.3%)
Pairs written (passing filters):             0 (0.0%)

and in multiQC report, it says Error - was not able to plot data. in the Cutadapt -> Filtered Reads section.

I think sometimes the result that no read passed the filters tells something to the user. It may appears due to wrong adapter/data/filters used in cutadapt run. Therefore, it's good to know the result from the multiQC report, perhaps by plot or text telling user that no read passed, otherwise one may ignore it assuming this is due to a bug/error within multiQC.

MultiQC Error log:

log:

[INFO   ]         multiqc : This is MultiQC v1.9
...
[INFO   ]        cutadapt : Found 1 reports
[WARNING]        bargraph : Tried to make bar plot, but had no data
...

multiqc_cutadapt.txt:

Sample  pairs_processed r1_with_adapters        r2_with_adapters        pairs_written   ...
ss 9029687 8812237 8609012 0       ...

File that triggers the error:

MultiQC run details (please complete the following):

  • MultiQC Version: 1.9

  • Method of MultiQC installation: snakemake -> singularity -> docker://ewels/multiqc

@ewels ewels added the bug: module Bug in a MultiQC module label Dec 28, 2020
@ewels
Copy link
Member

ewels commented Dec 28, 2020

Thanks for reporting @qifei9 - I agree that this should be handled in a nicer way.

If you have an actual file from cutadapt that I could use for testing that would be better than a truncated excerpt 👍🏻

Phil

@ewels
Copy link
Member

ewels commented Jul 2, 2021

Hi @qifei9,

@ErikDanielsson and I have been looking into this today and trying to figure out how to solve the issue. It was made quite a lot harder by the fact that we don't have the full cutadapt log files that you generated, so we had to recreate our own with your snippet and some guessing. Cutadapt log syntax changes over versions, so I also had to do a bit of forensics to try to guess which version you're working with.

The problem here is not just that you have 0 reads passing filters, but more that your log snippet doesn't contain the number of reads in different filter categories. Every cutadapt log I've seen and can generate looks like this:

=== Summary ===

Total read pairs processed:            250,000
  Read 1 with adapter:                 106,082 (42.4%)
  Read 2 with adapter:                 105,259 (42.1%)
Pairs that were too short:               3,850 (1.5%)
Pairs written (passing filters):       246,150 (98.5%)

Note the line with Pairs that were too short that accounts for the filtered reads. These category lines are only missing for me when 100% of reads pass filters. This is why the report is failing, as MultiQC parses these categories. In your case there are no passing reads and no categories, so MultiQC assumes no failing reads either, so the plot is empty.

Instead of throwing warnings in this case (as @ErikDanielsson initially added in #1480), I have added some code that checks for missing categories in the more recent cutadapt log syntax and counts these reads. These are then included in the plot, which should now show filtered reads that are unaccounted for:

cutadapt_filtered_reads_plot

(code added in 5e40729 and d546366)

I know you created this issue quite a long time ago, so I don't have high hopes, but - if you are able to tell us what version of Cutadapt you were using, and ideally what commands you used to run + the full log output file that we can run with MultiQC, that would be fantastic. Then we can confirm that my detective work above is correct and that the fix works.

Many thanks for reporting, and I hope that the fix is useful!

Cheers,

Phil

@ewels
Copy link
Member

ewels commented Jul 2, 2021

'cc @marcelm just in case you are interested in this! 😉

@marcelm
Copy link
Contributor

marcelm commented Jul 3, 2021

Cutadapt before 3.1 did not print statistics for the --max-ee and --discard-casava filters, see marcelm/cutadapt#482, so perhaps these were used? Perhaps also --discard-trimmed is involved, I’ll have to check if. But in any case, it’s strange that all reads were filtered.

@ewels
Copy link
Member

ewels commented Jul 3, 2021

Right - I should probably revisit the regexes that MultiQC uses to parse the logs too, as I bet I'm missing a bunch of categories:

https://github.com/ewels/MultiQC/blob/7594729cc41e37e66c20eff83af68951faa9c8fd/multiqc/modules/cutadapt/cutadapt.py#L83-L88

@ewels
Copy link
Member

ewels commented Jul 3, 2021

..and refactor to avoid the pairs / reads duplication.. 🤔 (this module code is pretty old now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants