Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiQC report is missing fastQC results on the dev branch #1303

Closed
davidecarlson opened this issue May 21, 2024 · 11 comments · Fixed by #1308
Closed

MultiQC report is missing fastQC results on the dev branch #1303

davidecarlson opened this issue May 21, 2024 · 11 comments · Fixed by #1308
Assignees
Milestone

Comments

@davidecarlson
Copy link

Description of the bug

When running the rnaseq pipeline using the dev branch, the fastQC results are no longer included in the MultiQC report, even though the fastqc zip files are available in the results. This occurs regardless of whether trim_galor or fastp are used as the trimmer.

Command used and terminal output

nextflow run \
nf-core/rnaseq \
--input ${SAMPLES} \
--outdir ${OUTDIR} \
--save_merged_fastq \
--save_trimmed \
--trimmer fastp \
--remove_ribo_rna \
--fasta ${GENOME} \
--gtf ${GTF} \
--aligner star_salmon \
-r dev \
--save_reference \
-resume \
-c /gpfs/projects/GenomicsCore/nf-core/configs/conf/seawulf.config

Relevant files

seawulf_config.zip
nextflow.log.zip
multiqc_report.html.zip

System information

Nextflow version: 24.02.0-edge
Hardware: HPC
Executor: Slurm
Container: Singularity
OS: Rocky Linux 8.7
nf-core version: dev

@davidecarlson davidecarlson added the bug Something isn't working label May 21, 2024
@davidecarlson
Copy link
Author

Some more detail: I compared the results from the dev branch with a previous successful run using the latest release version (different data sets). The working directories are structured quite differently.

Release working directory:

.               .command.run    fastqc                        picard                     versions.yml
..              .command.sh     methods_description_mqc.yaml  samtools                   workflow_summary_mqc.yaml
.command.begin  .command.trace  multiqc_config.yml            software_versions_mqc.yml
.command.err    deseq2          multiqc_report_data           sortmerna
.command.log    dupradar        multiqc_report.html           star
.command.out    .exitcode       multiqc_report_plots          trim_log

Dev working directory:

.    117  137  157  177  197  216  236  256  276  296  315  335  355  375  395  414  434  454  474  494  513  533  57  77  97
..   118  138  158  178  198  217  237  257  277  297  316  336  356  376  396  415  435  455  475  495  514  534  58  78  98
1    119  139  159  179  199  218  238  258  278  298  317  337  357  377  397  416  436  456  476  496  515  535  59  79  99
10   12   14   16   18   2    219  239  259  279  299  318  338  358  378  398  417  437  457  477  497  516  536  6   8   .command.begin
100  120  140  160  180  20   22   24   26   28   3    319  339  359  379  399  418  438  458  478  498  517  537  60  80  .command.err
101  121  141  161  181  200  220  240  260  280  30   32   34   36   38   4    419  439  459  479  499  518  538  61  81  .command.log
102  122  142  162  182  201  221  241  261  281  300  320  340  360  380  40   42   44   46   48   5    519  539  62  82  .command.out
103  123  143  163  183  202  222  242  262  282  301  321  341  361  381  400  420  440  460  480  50   52   54   63  83  .command.run
104  124  144  164  184  203  223  243  263  283  302  322  342  362  382  401  421  441  461  481  500  520  540  64  84  .command.sh
105  125  145  165  185  204  224  244  264  284  303  323  343  363  383  402  422  442  462  482  501  521  541  65  85  .command.trace
106  126  146  166  186  205  225  245  265  285  304  324  344  364  384  403  423  443  463  483  502  522  542  66  86  .exitcode
107  127  147  167  187  206  226  246  266  286  305  325  345  365  385  404  424  444  464  484  503  523  543  67  87  multiqc_config.yml
108  128  148  168  188  207  227  247  267  287  306  326  346  366  386  405  425  445  465  485  504  524  544  68  88  multiqc_data
109  129  149  169  189  208  228  248  268  288  307  327  347  367  387  406  426  446  466  486  505  525  545  69  89  multiqc_plots
11   13   15   17   19   209  229  249  269  289  308  328  348  368  388  407  427  447  467  487  506  526  546  7   9   multiqc_report.html
110  130  150  170  190  21   23   25   27   29   309  329  349  369  389  408  428  448  468  488  507  527  547  70  90  versions.yml
111  131  151  171  191  210  230  250  270  290  31   33   35   37   39   409  429  449  469  489  508  528  548  71  91
112  132  152  172  192  211  231  251  271  291  310  330  350  370  390  41   43   45   47   49   509  529  549  72  92
113  133  153  173  193  212  232  252  272  292  311  331  351  371  391  410  430  450  470  490  51   53   55   73  93
114  134  154  174  194  213  233  253  273  293  312  332  352  372  392  411  431  451  471  491  510  530  550  74  94
115  135  155  175  195  214  234  254  274  294  313  333  353  373  393  412  432  452  472  492  511  531  551  75  95
116  136  156  176  196  215  235  255  275  295  314  334  354  374  394  413  433  453  473  493  512  532  56   76  96

Despite the fact that the fastqc zip files are present in the dev working directory:

find . -name "*.zip"
./69/SH52_1_fastqc.zip
./34/SH61_1_fastqc.zip
./37/SH59_2_fastqc.zip
./85/SH55_1_fastqc.zip
./80/SH46_2_fastqc.zip
./88/SH58_2_fastqc.zip
./83/SH45_1_fastqc.zip
./103/SH49_1_fastqc.zip
./7/SH56_2_fastqc.zip
./21/SH45_2_fastqc.zip
./24/SH58_1_fastqc.zip
./27/SH57_2_fastqc.zip
./75/SH51_1_fastqc.zip
./70/SH52_2_fastqc.zip
./78/SH47_2_fastqc.zip
./73/SH50_1_fastqc.zip
./43/SH44_2_fastqc.zip

The MultiQC log doesn't show that they were found:

 cat .command.log

  /// MultiQC 🔍 | v1.21

|           multiqc | MultiQC Version v1.22.1 now available!
|           multiqc | Only using modules: custom_content, fastqc, cutadapt, fastp, sortmerna, star, hisat2, rsem, salmon, kallisto, samtools, picard, preseq, rseqc, qualimap
|           multiqc | Search path : /gpfs/projects/GenomicsCore/nf-core/rnaseq/Futcher-05-24/work/c9/3c7c78f49d00d00096f2c0e664d85e
|            report | Skipping 17 file search patterns
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1252/1252  
|    custom_content | biotype_counts: Found 1 samples (bargraph)
|    custom_content | dupradar: Found 1 samples (linegraph)
|    custom_content | DupInt: Found 21 General Statistics columns
|    custom_content | star_salmon_deseq2_clustering: Found 21 samples (heatmap)
|    custom_content | biotype-gs: Found 21 General Statistics columns
|    custom_content | nf-core-rnaseq-summary: Found 1 sample (html)
|    custom_content | star_salmon_deseq2_pca: Found 1 samples (scatter)
|    custom_content | fail_strand_check: Found 1 samples (table)
|            picard | Found 21 MarkDuplicates reports
|          qualimap | Found 21 RNASeq reports
|             rseqc | Found 21 read_distribution reports
|             rseqc | Found 21 inner_distance reports
|             rseqc | Found 21 read_duplication reports
|             rseqc | Found 21 junction_annotation reports
|             rseqc | Found 21 junction_saturation reports
|             rseqc | Found 21 infer_experiment reports
|             rseqc | Found 21 bam_stat reports
|          samtools | Found 21 stats reports
|          samtools | Found 21 flagstat reports
|          samtools | Found 21 idxstats reports
|         sortmerna | Found 42 logs
|              star | Found 21 reports
|             fastp | Found 21 reports
|           multiqc | Report      : multiqc_report.html
|           multiqc | Data        : multiqc_data
|           multiqc | Plots       : multiqc_plots
|           multiqc | MultiQC complete

I also noticed that the multiQC version differs between the release (1.19) and dev (1.21) branch. I ran multQC version 1.21 manually in the working directory, and encountered the same issue where the fastQC files were not located.

So, I'm not currently certain whether the issue is related to the updated MultiQC version or changes to how the working directory files are structured.

Let me know if you need any additional information!
Best,
Dave

@davidecarlson
Copy link
Author

Another update: I tried downgrading my local version of MultiQC to 1.19 (which is used in the release version of the pipeline, where it seems to be working fine) and ran it again on my working directory. The issue persists - MultiQC did not locate the fastQC files and did not incorporate them into the report.

@MatthiasZepper MatthiasZepper self-assigned this May 21, 2024
@MatthiasZepper MatthiasZepper added this to the 3.15.0 milestone May 21, 2024
@MatthiasZepper
Copy link
Member

Thanks for reporting and the thorough investigation! I could reproduce the issue and also figured out the reason for the failed inclusion.

MultiQC is run with the custom config file workflows/rnaseq/assets/multiqc/multiqc_config.yml that warrants that the FastQC reports are located in the raw and trim subdirectories of the fastqc folder:

module_order:
  - fastqc:
      name: "FastQC (raw)"
      info: "This section of the report shows FastQC results before adapter trimming."
      path_filters:
        - "./fastqc/raw/*.zip"
  - cutadapt
  - fastp
  - fastqc:
      name: "FastQC (trimmed)"
      info: "This section of the report shows FastQC results after adapter trimming."
      path_filters:
        - "./fastqc/trim/*.zip"

In reality, the publishDir directives will publish the raw reports in ${params.outdir}/fastqc and the trimmed reports in ${params.outdir}/${params.trimmer}/fastqc. Hence, both are not included, if the custom MultiQC config is applied, but will be included otherwise, e.g. in a manual rerun after the pipeline has finished.

I will fix that later this week.

@davidecarlson
Copy link
Author

Thanks a lot, Matthias!

@MatthiasZepper
Copy link
Member

Just to keep you updated: I think, I fixed the bug in this branch, but the testing is somehow cursed at the moment. At first, I had issues with Nextflow not finding the images on our offline cluster and now the MultiQC reports won't load in my browser at all. (But also those from the regular pipeline).

Since it seems to work for you, you are welcome to test the version from my branch. Please include also umi_dedup_stats = true, because I think that botched MultiQC config was also behind another issue (#1277) that I had not really understood before.

@davidecarlson
Copy link
Author

Hi Matthias,

I cloned your fork, switched to the MultiQC_FastQC_bug branch and then ran the local version of the pipeline with the following command:

nextflow run \
./rnaseq \
--input ${SAMPLES} \
--outdir ${OUTDIR} \
--save_merged_fastq \
--save_trimmed \
--trimmer fastp \
--remove_ribo_rna \
--fasta ${GENOME} \
--gtf ${GTF} \
--umitools_dedup_stats true \
--aligner star_salmon \
--save_reference \
-c /gpfs/projects/GenomicsCore/nf-core/configs/conf/seawulf.config

Unfortunately, the fastQC results are still not present in the MultiQC report (see attached).

Did I run the test correctly?

Thanks!
Dave

multiqc_report.zip

@MatthiasZepper
Copy link
Member

MatthiasZepper commented May 29, 2024

Thanks for testing! No, you did not do anything wrong - I did not successfully fix it. The output publishing is not working as supposed and the custom config thus does not apply.

I am just working on it again, but I won't have much more time till next Wednesday for that.

@davidecarlson
Copy link
Author

Okay, thanks for the update! Let me know if I can help with any additional testing.

@MatthiasZepper
Copy link
Member

I now opened a draft PR #1308 that currently doesn't fix anything because ... that escalated quickly.

There is at least one MultiQC bug (see also issue there), potentially even two, involved, and we also have additional issues with the pipeline. So unfortunately, instead of fixing one, I discovered it is a total of four issues.

I described them in detail in the draft PR, so if you feel like experimenting, please give any of it a go, since I won't have time for it until end of next week.

@ewels
Copy link
Member

ewels commented Jun 5, 2024

These bugs in MultiQC should be fixed now as of MultiQC v1.22.2 - hopefully the best solution here is to update the MultiQC nf-core module.

@MatthiasZepper
Copy link
Member

#1308 has just been merged to dev and will be released as part of rnaseq 3.15. I hope, this issue is solved by those changes. Hence, I am closing for now.

edmundmiller added a commit to nf-core/nascent that referenced this issue Oct 14, 2024
edmundmiller added a commit to nf-core/nascent that referenced this issue Oct 20, 2024
edmundmiller added a commit to nf-core/nascent that referenced this issue Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment