Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter Removal MultiQC module broke #838

Closed
VictorGoitea opened this issue Sep 19, 2018 · 12 comments
Closed

Adapter Removal MultiQC module broke #838

VictorGoitea opened this issue Sep 19, 2018 · 12 comments
Labels
bug: core Bug in the main MultiQC code waiting: example data Needs example data before we can proceed

Comments

@VictorGoitea
Copy link

Description of bug:
Multi QC cannot report the Adapter removal info in the ".setting" file
AdapterRemoval version: 2.2.2

MultiQC Error log:

[INFO   ]         multiqc : This is MultiQC v1.6
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching './'
Searching 591 files..  [####################################]  100%             
[INFO   ]          picard : Found 13 AlignmentSummaryMetrics reports
[INFO   ]          picard : Found 13 InsertSizeMetrics reports
[INFO   ]          picard : Found 13 MarkDuplicates reports
[INFO   ]        samtools : Found 13 stats reports
[INFO   ]        samtools : Found 26 flagstat reports
[INFO   ]        samtools : Found 13 idxstats reports
[INFO   ]            star : Found 13 reports
[ERROR  ]         multiqc : Oops! The 'adapterRemoval' MultiQC module broke... 
  Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
  If possible, please include a log file that triggers the error - the last file found was:
    ./Raw/QC_FASTQ/SM3838_S61_R1/QC_AdapterRemoval/SM3838_S61_R1.settings
[SM3838_S61_R1.settings.gz](https://github.com/ewels/MultiQC/files/2396775/SM3838_S61_R1.settings.gz)


============================================================
Module adapterRemoval raised an exception: Traceback (most recent call last):
  File "/data/home/victorg/anaconda3/bin/multiqc", line 440, in multiqc
    output = mod()
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 47, in __init__
    parsed_data = self.parse_settings_file(f)
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 103, in parse_settings_file
    self.set_result_data(settings_data)
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 112, in set_result_data
    self.set_trim_stat(settings_data['Trimming statistics'])
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 174, in set_trim_stat
    self.result_data['percent_aligned'] = round((float(self.result_data['aligned']) * 100.0) / float(self.result_data['total']), 2)
ZeroDivisionError: float division by zero

File that triggers the error:
Please drag and drop (and upload to the GitHub issue) an input file that I can use to replicate the error.

MultiQC run details (please complete the following):

  • Command used to run MultiQC: multiqc ./
  • MultiQC Version: 1.6
  • Operating System: Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-134-generic x86_64)
  • Python Version: Python 3.6.3 :: Anaconda, Inc.
  • Method of MultiQC installation: conda

Additional context
Add any other context about the problem here.

@ewels ewels added bug: core Bug in the main MultiQC code waiting: example data Needs example data before we can proceed labels Sep 28, 2018
@ewels
Copy link
Member

ewels commented Sep 28, 2018

Hi @VictorGotiea,

Thanks for reporting this. Are you able to upload a file that triggers this error so that I can replicate it at my end please?

Thanks,

Phil

@VictorGoitea
Copy link
Author

@ewels
Copy link
Member

ewels commented Oct 2, 2018

Thanks!

@ewels ewels removed the waiting: example data Needs example data before we can proceed label Oct 2, 2018
@VictorGoitea
Copy link
Author

Hi Ewels, any news what is going on?
I summit this new information, which might give you some clues.
I post here the results in the setting file from adapterRemoval run it on different batch of sequences from the same sample. The interesting point is that multiqc works good with the one in the right but it does not with the one in the left. Might be the 0 values in this line the problem?

Length Mate1 Mate2 Singleton Discarded All Length Mate1 Mate2 Singleton Discarded All
**0 0 0 0 0 0 0 0 0 0 8 8**

Because of this message in the traceback of multiqc
round((float(self.result_data['aligned']) * 100.0) / float(self.result_data['total']), 2)


AdapterRemoval ver. 2.2.2 AdapterRemoval ver. 2.2.2
Trimming of paired-end reads Trimming of paired-end reads
   
   
[Adapter sequences] [Adapter sequences]
Adapter1[1]: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG Adapter1[1]: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
Adapter2[1]: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT Adapter2[1]: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
   
   
[Adapter trimming] [Adapter trimming]
RNG seed: NA RNG seed: NA
Alignment shift value: 2 Alignment shift value: 2
Global mismatch threshold: 0.333333 Global mismatch threshold: 0.333333
Quality format (input): Phred+33 Quality format (input): Phred+33
Quality score max (input): 41 Quality score max (input): 41
Quality format (output): Phred+33 Quality format (output): Phred+33
Quality score max (output): 41 Quality score max (output): 41
Mate-number separator (input): '/' Mate-number separator (input): '/'
Trimming Ns: No Trimming Ns: No
Trimming Phred scores <= 2: No Trimming Phred scores <= 2: No
Trimming using sliding windows: No Trimming using sliding windows: No
Minimum genomic length: 15 Minimum genomic length: 15
Maximum genomic length: 4294967295 Maximum genomic length: 4294967295
Collapse overlapping reads: No Collapse overlapping reads: No
Minimum overlap (in case of collapse): 11 Minimum overlap (in case of collapse): 11
   
   
[Trimming statistics] [Trimming statistics]
Total number of read pairs: 6797863 Total number of read pairs: 41635818
Number of unaligned read pairs: 5414784 Number of unaligned read pairs: 33178542
Number of well aligned read pairs: 1383079 Number of well aligned read pairs: 8457276
Number of discarded mate 1 reads: 35 Number of discarded mate 1 reads: 288
Number of singleton mate 1 reads: 0 Number of singleton mate 1 reads: 0
Number of discarded mate 2 reads: 35 Number of discarded mate 2 reads: 288
Number of singleton mate 2 reads: 0 Number of singleton mate 2 reads: 0
Number of reads with adapters[1]: 25372 Number of reads with adapters[1]: 159934
Number of retained reads: 13595656 Number of retained reads: 83271060
Number of retained nucleotides: 679550870 Number of retained nucleotides: 4162049294
Average length of retained reads: 49.9829 Average length of retained reads: 49.9819
   
   
[Length distribution] [Length distribution]
Length Mate1 Mate2 Singleton Discarded All Length Mate1 Mate2 Singleton Discarded All
**0 0 0 0 0 0 0 0 0 0 8 8**
1 0 0 0 0 0 1 0 0 0 14 14
2 0 0 0 2 2 2 0 0 0 14 14
3 0 0 0 2 2 3 0 0 0 26 26
4 0 0 0 0 0 4 0 0 0 12 12
5 0 0 0 0 0 5 0 0 0 20 20
6 0 0 0 2 2 6 0 0 0 14 14
7 0 0 0 0 0 7 0 0 0 24 24
8 0 0 0 2 2 8 0 0 0 26 26
9 0 0 0 10 10 9 0 0 0 42 42
10 0 0 0 6 6 10 0 0 0 36 36
11 0 0 0 10 10 11 0 0 0 54 54
12 0 0 0 10 10 12 0 0 0 80 80
13 0 0 0 14 14 13 0 0 0 92 92
14 0 0 0 12 12 14 0 0 0 114 114
15 4 4 0 0 8 15 80 80 0 0 160
16 8 8 0 0 16 16 96 96 0 0 192
17 21 21 0 0 42 17 150 150 0 0 300
18 14 14 0 0 28 18 132 132 0 0 264
19 24 24 0 0 48 19 153 153 0 0 306
20 24 24 0 0 48 20 168 168 0 0 336
21 24 24 0 0 48 21 212 212 0 0 424
22 40 40 0 0 80 22 294 294 0 0 588
23 39 39 0 0 78 23 337 337 0 0 674
24 61 61 0 0 122 24 390 390 0 0 780
25 57 57 0 0 114 25 412 412 0 0 824
26 65 65 0 0 130 26 409 409 0 0 818
27 47 47 0 0 94 27 524 524 0 0 1048
28 91 91 0 0 182 28 542 542 0 0 1084
29 79 79 0 0 158 29 604 604 0 0 1208
30 89 89 0 0 178 30 736 736 0 0 1472
31 105 105 0 0 210 31 877 877 0 0 1754
32 127 127 0 0 254 32 974 974 0 0 1948
33 200 200 0 0 400 33 1391 1391 0 0 2782
34 333 333 0 0 666 34 2187 2187 0 0 4374
35 374 374 0 0 748 35 2578 2578 0 0 5156
36 516 516 0 0 1032 36 3368 3368 0 0 6736
37 751 751 0 0 1502 37 4768 4768 0 0 9536
38 820 820 0 0 1640 38 4700 4700 0 0 9400
39 844 844 0 0 1688 39 5026 5026 0 0 10052
40 824 824 0 0 1648 40 5057 5057 0 0 10114
41 807 807 0 0 1614 41 5040 5040 0 0 10080
42 844 844 0 0 1688 42 4923 4923 0 0 9846
43 835 835 0 0 1670 43 4711 4711 0 0 9422
44 808 808 0 0 1616 44 4996 4996 0 0 9992
45 750 750 0 0 1500 45 4705 4705 0 0 9410
46 765 765 0 0 1530 46 4773 4773 0 0 9546
47 772 772 0 0 1544 47 4747 4747 0 0 9494
48 693 693 0 0 1386 48 4647 4647 0 0 9294
49 796 796 0 0 1592 49 4972 4972 0 0 9944
50 6785177 6785177 0 0 13570354 50 41555851 41555851 0 0 83111702

@ewels
Copy link
Member

ewels commented Oct 8, 2018

Hi @VictorGotiea,

Thanks for the reminder. I didn't get very far because I had no errors with the example data you sent. Looking at it again I wonder if you may have sent me the wrong sample? The initial error message lists SM3838_S61_R1 but you sent me SM3836_S60_R1. Anyway, it works fine for me which makes it a little difficult to be sure about any fixes. I tried using the numbers from your recent comment, but MultiQC won't parse that - I think because GitHub changes the whitespace formatting.

But yes - looking at the code I would guess that the error happens when there are zero bases trimmed and MultiQC tries to divide by that number. I've just pushed a change to handle this scenario, but it makes me a little nervous as I don't know whether the error will shift elsewhere.

Phil

@ewels
Copy link
Member

ewels commented Oct 8, 2018

ps. Sorry - meant to say - if you could pull the latest changes and test that would be great! If it fixes the error then I'll close this issue.

@VictorGoitea
Copy link
Author

No, I'm sorry I could not solve anything. They were not changes, just different results. I'm still cannot understand why multiqc fails with the results from Adapter removal from that particular batch of sequence. I mean Adapter removal by itself is not failing because it does not give any error and i generrates the corresponding results (the one send it). Then, AdapterRemoval module in multi-qc fails only with these batch of files. I do not know what multiqc does with the data that is parse from AdapterRemoval, do you see something in the report from the left that can make multiqc fails and it is not in the report on the right? What do you interpret from the traceback of multiqc? There is specific information there but I do not understand what it means exactly or what to do with it

@ewels
Copy link
Member

ewels commented Oct 9, 2018

The main problem I have is that I can't replicate the error myself.

If you download the file you sent to me above (SM3836_S60_R1.settings.gz) to a new location, in a folder by itself, and run MultiQC does it give an error?

Phil

@VictorGoitea
Copy link
Author

VictorGoitea commented Oct 15, 2018

Hi Phil,
I corroborate and what you say it's true. Then, there is no problem with the file but some problem with the directory tree. If I run multiqc in this level: /QC_FASTQ it fails.

[INFO   ]         multiqc : This is MultiQC v1.6
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching './'
Searching 316 files..  [####################################]  100%             
[ERROR  ]         multiqc : Oops! The 'adapterRemoval' MultiQC module broke... 
  Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
  If possible, please include a log file that triggers the error - the last file found was:
    ./SM3838_S61_/QC_AdapterRemoval/SM3838_S61_.settings
============================================================
Module adapterRemoval raised an exception: Traceback (most recent call last):
  File "/data/home/victorg/anaconda3/bin/multiqc", line 440, in multiqc
    output = mod()
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 47, in __init__
    parsed_data = self.parse_settings_file(f)
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 103, in parse_settings_file
    self.set_result_data(settings_data)
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 112, in set_result_data
    self.set_trim_stat(settings_data['Trimming statistics'])
  File "/data/home/victorg/anaconda3/lib/python3.6/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py", line 174, in set_trim_stat
    self.result_data['percent_aligned'] = round((float(self.result_data['aligned']) * 100.0) / float(self.result_data['total']), 2)
ZeroDivisionError: float division by zero
============================================================

In this directory I have several folder samples:

SM3835_S59_       SM3836_S22_L001_R1_001  SM3837_S23_L001_        SM3837_S61_  SM3838_S62_  SM3840_S24_L001_        SM3840_S64_  SM3842_S66_       SM3843_S25_L001_R2_001  SM3844_S68_  SM3846_S70_
SM3836_S22_L001_  SM3836_S60_             SM3837_S23_L001_R1_001  SM3838_S61_  SM3839_S63_  SM3840_S24_L001_R1_001  SM3841_S65_  SM3843_S25_L001_  SM3843_S67_             SM3845_S69_  SM3847_S86_

each of this folder has 3 folders called:

  1. QC_AdapterRemoval
  2. QC_FASTQC
  3. QC_fastq_Screen

which contains metrics files from those programs (the SM# states as a prefix in each metrics file.

However if I run multiqc ($multiqc ./) inside of a sample folder, for example SM3835_S59_, the module AdapterRemoval in multiqc does not break and the report it's okay

[INFO   ]         multiqc : This is MultiQC v1.6
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching './'
Searching 15 files..  [####################################]  100%
[INFO   ]  adapterRemoval : Found 1 reports
[INFO   ]    fastq_screen : Found 2 reports
[INFO   ]          fastqc : Found 2 reports
[INFO   ]         multiqc : Compressing plot data
[INFO   ]         multiqc : Report      : multiqc_report.html
[INFO   ]         multiqc : Data        : multiqc_data
[INFO   ]         multiqc : MultiQC complete

Still the idea its to generate a single report for all these samples. Do you have any idea what is causing this problem?

@VictorGoitea
Copy link
Author

Ok, I found the source of trouble. When I combined the the fastq reads sequenced in two lanes I make a typo error in the number of a sample creating a truncated file for non-existing samples. Then, this file generated truncated qc files and seems that was the cause of the problem "broke module adapter removal" in multi-qc. Maybe it would be nice to incorporate a modification that avoid the module to break if there is a truncated file, like "jump" those and continue with the other files reporting what it was good instead of not reporting for any sample. Thank you for all the support anyways. I think you can close the case if you want.
Best regards,
Victor.

@ewels
Copy link
Member

ewels commented Oct 19, 2018

Sounds good - do you have a set of files that I can use to replicate this? If you can zip a folder that I can run on then I should be able to fix it.

@ewels ewels added the waiting: example data Needs example data before we can proceed label Oct 20, 2018
@ewels
Copy link
Member

ewels commented Nov 13, 2019

Hi @VictorGoitea,

I'm happy to add support for skipping empty AdapterRemoval logs, but I can't do this without some example files. I'm closing this issue now as it's been over a year without a reply, but if you would still like this added and can send an example I'd be happy to reopen it.

Cheers,

Phil

@ewels ewels closed this as completed Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: core Bug in the main MultiQC code waiting: example data Needs example data before we can proceed
Projects
None yet
Development

No branches or pull requests

2 participants