-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] anvi-estimate-scg-taxonomy crashes when external-genomes file contains contigs-db with old scg names #2225
Comments
I think the real solution is to increase the version of contigs-db, and write a migration script that simply removes the SCG taxonomy results :/ it will be annoying to anyone who has been using the main branch, but it will be the most reliable way to fix all future hiccups. |
Good call - does there need to be multiple migration tasks or is this enough to write a new migration script? |
Your intuition was right, and a new migration script along with an update in version number was enough. I've made some changes in your version now, please test it, and feel free to migrate it when you're ready :) |
Thanks for the code updates @meren! Here is the PR: #2226 Here is a successful test I ran with IGD: cd INFANT-GUT-TUTORIAL
# migrate all external-genomes
for db in `ls additional-files/pangenomics/external-genomes*.db`; do anvi-migrate $db --migrate-safely; done
# run scg-taxonomy
anvi-run-scg-taxonomy -c additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db --num-threads 1
# print table updates
query="SELECT * FROM scg_taxonomy;"
$ sqlite3 additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db "$query"
2066|Ribosomal_L1|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2464|Ribosomal_L13|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
156|Ribosomal_L14|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
153|Ribosomal_L16|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|
174|Ribosomal_L17|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1556|Ribosomal_L19|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|
633|Ribosomal_L20|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
692|Ribosomal_L21p|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
151|Ribosomal_L22|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
158|Ribosomal_L5|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
149|Ribosomal_L2|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
166|Ribosomal_L27A|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
146|Ribosomal_L3|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
147|Ribosomal_L4|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
172|Ribosomal_S11|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2321|Ribosomal_S15|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1400|Ribosomal_S16|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1818|Ribosomal_S2|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
6|Ribosomal_S6|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
140|Ribosomal_S7|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
160|Ribosomal_S8|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2463|Ribosomal_S9|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
# Get updated values
db_file="additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db"
key_to_filter="scg_taxonomy_was_run"
query="SELECT * FROM self WHERE key = '$key_to_filter';"
$ sqlite3 "$db_file" "$query"
scg_taxonomy_was_run|1
key_to_filter="scg_taxonomy_database_version"
query="SELECT * FROM self WHERE key = '$key_to_filter';"
$ sqlite3 "$db_file" "$query"
scg_taxonomy_database_version|GTDB: v214.1; Anvi'o: v1 Are self values look like they are updating properly :) |
Ok I found an edge case! Here is an example of the problem: unzip test_db.zip
cd test_db
anvi-script-gen-genomes-file --input-dir . -o external-genomes.txt
$ anvi-estimate-scg-taxonomy -M external-genomes.txt \
--metagenome-mode \
--scg-name-for-metagenome-mode Ribosomal_L19 \
--raw-output \
-O asdf
Traceback (most recent call last):
File "/Users/mschechter/github/anvio/bin/anvi-estimate-scg-taxonomy", line 104, in <module>
main(args)
File "/Users/mschechter/github/anvio/anvio/terminal.py", line 915, in wrapper
program_method(*args, **kwargs)
File "/Users/mschechter/github/anvio/bin/anvi-estimate-scg-taxonomy", line 39, in main
t.estimate()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 447, in estimate
scg_taxonomy_super_dict_multi = self.get_scg_taxonomy_super_dict_for_metagenomes()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 937, in get_scg_taxonomy_super_dict_for_metagenomes
scg_taxonomy_super_dict[metagenome_name] = SCGTaxonomyEstimatorSingle(args, progress=progress_quiet, run=run_quiet).get_items_taxonomy_super_dict()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 999, in __init__
TaxonomyEstimatorSingle.__init__(self, skip_init=skip_init)
File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 175, in __init__
self.init()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 179, in init
self.init_items_data()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 269, in init_items_data
self.item_name_to_gene_caller_id_dict[item_gene_name].add(gene_callers_id)
KeyError: 'Ribosomal_L9_C' I think this problem is caused by if you run anvi-run-scg-taxonomy on a contigs-db that does not contain any of the new list of SCGs it will exit out as shown here. Here are two solutions and I don't know which way to go considering we already merged the migration script:
|
This should have never happened, @mschecht. Could this be a contigs-db you migrated with the previous version of the migration script? It has the following SCG in the table: Even though Ribosomal_L9_C is not one of the SCGs we're using:
AND even though the version of the DB shows that it is new:
When I manually change the
|
(So this is not an edge case others will run into -- it is just a Frankenstein contigs-db that was updated with an earlier version of the migration script before it was merged to master, OR something else similar to that :)) |
Thanks for looking into this @meren! It must have been an artifact I introduced while developing. Glad it won't impact anyone! |
Short description of the problem
anvi-estimate-scg-taxonomy crashes when external-genomes file contains contigs-db with old scg names.
anvi'o version
System info
macOS 13.2.1 (22D68)
Detailed description of the issue
Hi anvi'o team,
I am running anvi-estimate-scg-taxonomy with an external genomes file and a specific SCG. Everything is excellent until the program runs into a
contigs-db
that contains SCGs from the older version of the program.I dove into the code and here is the issue:
contigs-dbs
make it this far into the code because they have the updated scg-taxonomy self attribute "GTDB: v214.1; Anvi'o: v1" even though anvi-run-scg-taxonomy. I think there should be a sanity check while processing theexternal-genomes.txt
to skip acontigs-db
that does not have target SCG needed for theanvi-estimate-scg-taxonomy
.I made a quick test below to demonstrate the issue.
Cheers,
Matt
Files / commands to reproduce the issue
Here is a link to download these contigs-dbs: https://drive.google.com/drive/folders/1vY59vrhhW69TH-43W0MV8kBMe_Wfy9Jd?usp=sharing
The text was updated successfully, but these errors were encountered: