Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config Error: difference in unique keys and item number #917

Closed
jvhagey opened this issue Jul 20, 2018 · 9 comments
Closed

Config Error: difference in unique keys and item number #917

jvhagey opened this issue Jul 20, 2018 · 9 comments

Comments

@jvhagey
Copy link

jvhagey commented Jul 20, 2018

Hi Anvi'o team!

Seems like we have been talking to ya'll a lot these days :), so thanks for making such a cool tool!
I ran into a new bug:
After having issues with my custom hmm database when running anvi-summarize I just decided to remake the whole thing with a smaller database. I am using anvio5.1 that installed into a conda environment and am running it on Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-127-generic x86_64).

anvi-self-test --version gives:

Anvi'o version ...............................: margaret (vunknown)
Profile DB version ...........................: 29
Contigs DB version ...........................: 12
Pan DB version ...............................: 12
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

After building the contig database I added Cogs and then taxonomy via kaiju. A custom hmm database is currently running. When I started another run of a different custom hmm set I got the following error:

Config Error: This is one of the core functions of anvi'o you never want to hear from, but    
              there seems to be something wrong with the table 'hmm_hits' that you are trying 
              to read from. While there are 98945 items in this table, there are only 94879   
              unique keys, which means some of them are going to be overwritten when this     
              function creates a final dictionary of data to return. This may be a programmer 
              error when the data was being inserted into the database, but needs fixin'      
              before we can continue. If you are a user, please get in touch with anvi'o      
              developers about this error. If you are a programmer, you probably did something
              wrong :(    

So that doesn't seem good. I noticed in the default hmm output that is was running says this...

Psst. Your fancy HMM profile 'Ribosomal_RNAs' speaking
===============================================
Alright! You just called an HMM profile that runs on contigs. Because it is not
working with anvi'o gene calls directly, the resulting hits will need to be
added as 'new gene calls' into the contigs database. This is a new feature, and
if it starts screwing things up for you please let us know. Other than that
you're pretty much golden. Carry on.

Don't know if that is related or not.

Here is the db info:
anvi-db-info fixed.contigsV5_1000.db

DB Info (no touch)
===============================================
Database Path ................................: fixed.contigsV5_1000.db
Description ..................................: [Not found, but it's OK]
Type .........................................: contigs
Version ......................................: 12

DB Info (no touch also)
===============================================
project_name .................................: CDRF_Metagenome_2016
contigs_db_hash ..............................: hash6e00a2e8
split_length .................................: 20000
kmer_size ....................................: 4
num_contigs ..................................: 824665
total_length .................................: 1529965316
num_splits ...................................: 824720
genes_are_called .............................: 1
splits_consider_gene_calls ...................: 1
creation_date ................................: 1532060745.22023
gene_level_taxonomy_source ...................: kaiju
gene_function_sources ........................: COG_FUNCTION,COG_CATEGORY

What should I do about this config error? Thanks for your help!

@meren
Copy link
Member

meren commented Jul 23, 2018

Crap. This looks bad, and it should have never happened as anvi'o says.

Based on your message I realize this only happened due to the use of custom HMMs. Can you confirm that?

@guyleonard
Copy link

Watching this, also getting the same error now. Not sure if it's related to me using the BUSCO hmms or not. It doesn't look like they have the same gene IDs but maybe it's not their IDs that is the issue...?

@jvhagey
Copy link
Author

jvhagey commented Jul 25, 2018

Yes, @meren
I remade the contig.db again and ran the default hmms and a custom hmm set with only one hmm just to have it run faster and then I got the error. So I believe its due to the custom hmms.

@meren
Copy link
Member

meren commented Jul 25, 2018

Thank you very much, @jvhagey. This is very helpful. I will look into this this evening!

@meren
Copy link
Member

meren commented Jul 25, 2018

@jvhagey, I've been running custom HMMs and I don't seem to be able to reproduce this problem :( Would you mind sending your test HMM directory with one model and list the exact command lines you use to run into this error?

@jvhagey
Copy link
Author

jvhagey commented Jul 25, 2018

@meren here ya go!

#making the database
anvi-script-reformat-fasta -o fixed.contigsV5_1000.fa -l 1000 --simplify-names --report-file contigs_1000.reformatA.txt ../../MEGAHIT/C5_Results/final.contigs.fa
anvi-gen-contigs-database -f fixed.contigsV5_1000.fa -o fixed.contigsV5_1000C.db --project-name 'CDRF_Metagenome_2016'

#Adding hmms
anvi-run-hmms -T 25 -c fixed.contigsV5_1000C.db
anvi-run-hmms -T 12 -c fixed.contigsV5_1000C.db --hmm-profile-dir /Methano_genes/

#Here are the Hmms currently in the database:
anvi-delete-hmms -l -c fixed.contigsV5_1000C.db

  • BUSCO_83_Protista [type: singlecopy] [num genes: 83]
  • Rinke_et_al [type: singlecopy] [num genes: 162]
  • Ribosomal_RNAs [type: Ribosomal_RNAs] [num genes: 12]
  • Methano_genes [type: Methanogenesis_Gene_mcrA] [num genes: 1]
  • Campbell_et_al [type: singlecopy] [num genes: 139]

The config error now states that: While there are 94886 items in this table, there are only 94879 unique keys. The difference between these two numbers is 7, which is the same number of raw hits that the custom hmm found. Looks like unique keys aren't being made for the hits? I can't see the structure of the sql tables to dig further since we don't have sql on our server currently.

genes.hmm.gz
genes.txt
kind.txt
noise_cutoff_terms.txt
reference.txt
target.txt

@meren
Copy link
Member

meren commented Jul 28, 2018

Thanks for sending these, @jvhagey. But regardless of what I do, I can't reproduce this error neither in master or v4 :/

@jvhagey
Copy link
Author

jvhagey commented Jul 30, 2018

@meren after some playing around it seems that the issue might be with running multiple anvi-run-hmm scripts at the same time. I was running the default hmms (single copy genes) at the same time as the custom when I got the error. When I wait for each hmm set to stop running before I start the next I don't get the error. In the future I will just run one at a time to avoid the issue. Thanks for checking this out.

@meren
Copy link
Member

meren commented Jul 30, 2018

This makes a lot of sense! We probably need to think of a better way to check for that. Thank you very much for looking into this further!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants