Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anvi-merge error v3: KeyError: 'total_length' #632

Closed
meren opened this issue Nov 5, 2017 · 5 comments
Closed

anvi-merge error v3: KeyError: 'total_length' #632

meren opened this issue Nov 5, 2017 · 5 comments

Comments

@meren
Copy link
Member

meren commented Nov 5, 2017

We've been hearing about the following merge error from some of our users, but we have not been able to reproduce it:

File "/sw/csi/anvio/3.0.0/el7_py361_anacondaenv/env/bin/anvi-merge", line 50, in <module>
merger.MultipleRuns(args).merge()
File "/sw/csi/anvio/3.0.0/el7_py361_anacondaenv/env/lib/python3.5/site-packages/anvio/merger.py", line 284, in merge
self.sanity_check()
File "/sw/csi/anvio/3.0.0/el7_py361_anacondaenv/env/lib/python3.5/site-packages/anvio/merger.py", line 160, in sanity_check
v = set([r[k] for r in list(self.profile_dbs_info_dict.values())])
File "/sw/csi/anvio/3.0.0/el7_py361_anacondaenv/env/lib/python3.5/site-packages/anvio/merger.py", line 160, in <listcomp>
v = set([r[k] for r in list(self.profile_dbs_info_dict.values())])
KeyError: 'total_length'

The error is likely due to trying to merge profile databases generated by an older version of anvi'o with the latest v3 version. Usually we keep track of database changes very carefully, and we do not force our users to re-profile their data, but here we are the first time after more than a year.

Is this happening to you? Please help us better understand the problem by answering these questions, and we will provide you with a script that will fix your profile databases:

  • Do you remember which version of anvi'o did you generate your profile databases you are trying to merge?

  • Can you please run this command on one of your anvi'o profile databases you are merging and include the output to your response: sqlite3 PATH/TO/PROFILE.db 'select * from self;'

@teilk48
Copy link

teilk48 commented Apr 12, 2018

I am getting this error when merging databases which were created in V4

Traceback (most recent call last):
  File "/usr/local/bin/anvi-merge", line 49, in <module>
    merger.MultipleRuns(args).merge()
  File "/usr/local/Cellar/anvio/4/libexec/lib/python3.6/site-packages/anvio/merger.py", line 291, in merge
    self.sanity_check()
  File "/usr/local/Cellar/anvio/4/libexec/lib/python3.6/site-packages/anvio/merger.py", line 166, in sanity_check
    v = set([r[k] for r in list(self.profile_dbs_info_dict.values())])
  File "/usr/local/Cellar/anvio/4/libexec/lib/python3.6/site-packages/anvio/merger.py", line 166, in <listcomp>
    v = set([r[k] for r in list(self.profile_dbs_info_dict.values())])
KeyError: 'total_length'

The info outputs for the profile.db are:

version|23
db_type|profile
anvio|4
sample_id|PBLC1_SORTED
samples|PBLC1_SORTED
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|61641386
description|_No description is provided_
creation_date|1523501845.36192

&

version|23
db_type|profile
anvio|4
sample_id|PPG1_SORTED
samples|PPG1_SORTED
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|61641386
description|_No description is provided_
creation_date|1523501846.4421

@jmtsuji
Copy link

jmtsuji commented Apr 12, 2018

I am getting the same error. All Anvi'o steps have been run in V4 (in a conda environment), although I've imported my own gene calls (prokka), functional annotations (prokka), and taxonomy information (DIAMOND blastp vs. RefSeq).

Error:

$ anvi-merge ./*/PROFILE.db -o samples_merged -c contigs.db --skip-concoct-binning
Traceback (most recent call last):
  File "/Winnebago/jmtsuji/miniconda2/envs/anvio4/bin/anvi-merge", line 49, in <module>
    merger.MultipleRuns(args).merge()
  File "/Winnebago/jmtsuji/miniconda2/envs/anvio4/lib/python3.6/site-packages/anvio/merger.py", line 291, in merge
    self.sanity_check()
  File "/Winnebago/jmtsuji/miniconda2/envs/anvio4/lib/python3.6/site-packages/anvio/merger.py", line 166, in sanity_check
    v = set([r[k] for r in list(self.profile_dbs_info_dict.values())])
  File "/Winnebago/jmtsuji/miniconda2/envs/anvio4/lib/python3.6/site-packages/anvio/merger.py", line 166, in <listcomp>
    v = set([r[k] for r in list(self.profile_dbs_info_dict.values())])
KeyError: 'total_length'

Database info for mapping profile (1 of 4) created by anvi-profile:

$ sqlite3 PROFILE.db 'select * from self;'
version|23
db_type|profile
anvio|4
sample_id|L227_2013_6m
samples|L227_2013_6m
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|424b1f26
description|_No description is provided_
creation_date|1523464432.42442
num_splits|33509
num_contigs|32964
total_length|203986014
total_reads_mapped|20712414

Profile 2 of 4:

$ sqlite3 PROFILE.db 'select * from self;'
version|23
db_type|profile
anvio|4
sample_id|L227_2013_8m
samples|L227_2013_8m
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|424b1f26
description|_No description is provided_
creation_date|1523474213.24926
num_splits|33509
num_contigs|32964
total_length|203986014
total_reads_mapped|14951356

Profile 3 of 4:

$ sqlite3 PROFILE.db 'select * from self;'
version|23
db_type|profile
anvio|4
sample_id|L227_2014_6m
samples|L227_2014_6m
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|424b1f26
description|_No description is provided_
creation_date|1523482828.85405
num_splits|33509
num_contigs|32964
total_length|203986014
total_reads_mapped|28552719

Profile 4 of 4 (this seems to be where the error is coming from!):

$ sqlite3 PROFILE.db 'select * from self;'
version|23
db_type|profile
anvio|4
sample_id|L227_2014_8m
samples|L227_2014_8m
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|424b1f26
description|_No description is provided_
creation_date|1523463749.79973

Anvi'o version:

$ anvi-merge --version
Anvi'o version ...............................: 4
Profile DB version ...........................: 23
Contigs DB version ...........................: 10
Pan DB version ...............................: 8
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2

Let me know if you need any additional information.

EDIT: I've realized that my error might be coming from legacy files leftover in my directory from when profile 4 of 4 was run via anvi-profile. The sample represented by profile 4 of 4 was the same sample that I was doing initial testing of anvi-profile on. I'm re-running anvi-profile on that sample and will let you know if that changes anything.

@meren
Copy link
Member Author

meren commented Apr 12, 2018

@teilk48, we recently realized that this happens when there is something wrong with one or more of the single profile databases you're trying to merge. I.e., the anvi-profile step was killed for some reason, and yet there is a PROFILE.db output. Because the file is there, anvi'o thinks everything is OK, when in fact things are not OK.

We certainly need to address this issue by making sure anvi'o can identify broken single profiles.

For instance a completed profiling run should result in a database that looks like this:

key single merged
version 23 23
db_type profile profile
anvio 4-master 4-master
merged 0 1
blank 0 0
sample_id SAMPLE_02 SAMPLES_MERGED
samples SAMPLE_02 SAMPLE_01,SAMPLE_02,SAMPLE_03
default_view single mean_coverage
min_contig_length 2500 2500
SNVs_profiled 1 1
AA_frequencies_profiled 0 0
min_coverage_for_variability 10 10
report_variability_full 0 0
contigs_db_hash 4998888b 4998888b
creation_date 1523369666.5261 1523369678.07424
num_contigs 35 35
num_splits 3 3
total_length 56709 56709
contigs_ordered 0 1
default_item_order None tnf-cov:euclidean:ward
available_item_orders None tnf:euclidean:ward,tnf-cov:euclidean:ward,cov:euclidean:ward

@meren
Copy link
Member Author

meren commented Apr 12, 2018

@jmtsuji,

Thank you very much for the extensive insight, Jackson. This is very helpful. I will do something to address this issue now.

@jmtsuji
Copy link

jmtsuji commented Apr 12, 2018

Glad the bug report was helpful. After re-running anvi-profile (on 4 of 4), the database info looks good:

$ sqlite3 PROFILE.db 'select * from self;'
version|23
db_type|profile
anvio|4
sample_id|L227_2014_8m
samples|L227_2014_8m
merged|0
blank|0
contigs_ordered|0
default_view|single
min_contig_length|2500
SNVs_profiled|1
AA_frequencies_profiled|0
min_coverage_for_variability|10
report_variability_full|0
contigs_db_hash|424b1f26
description|_No description is provided_
creation_date|1523513733.93993
num_splits|33509
num_contigs|32964
total_length|203986014
total_reads_mapped|15130865

...and anvi-merge now works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants