Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCBITaxa() error #469

Closed
bowmanjeffs opened this issue Aug 13, 2020 · 12 comments
Closed

NCBITaxa() error #469

bowmanjeffs opened this issue Aug 13, 2020 · 12 comments
Assignees
Labels

Comments

@bowmanjeffs
Copy link

Apologies for the duplicate post over on the Google Group, but I thought it might be better to post this as an issue. After working reliably for some time the NCBITaxa() threw the following error. Fresh install, new computer, etc. didn't solve problem.

Inserting synonyms: 30000 Traceback (most recent call last):
File "", line 1, in
File "/home/jsbowman/.local/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 110, in init
self.update_taxonomy_database(taxdump_file)
File "/home/jsbowman/.local/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database
update_db(self.dbfile)
File "/home/jsbowman/.local/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
upload_data(dbfile)
File "/home/jsbowman/.local/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

@jj-umn
Copy link

jj-umn commented Aug 14, 2020

Problem seems to be declaring spname as NOCASE:
CREATE TABLE synonym (taxid INT,spname VARCHAR(50) COLLATE NOCASE, PRIMARY KEY (spname, taxid));
$ grep -i 'Phyllodactylus Lesueurii' syn.tab
679170 Phyllodactylus Lesueurii
679170 Phyllodactylus lesueurii

@KarolisMatjosaitis
Copy link

KarolisMatjosaitis commented Aug 17, 2020

You can edit 802 line like follows until new PR comes along with a fix:
db.execute("INSERT or IGNORE INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))

Or

785
CREATE TABLE synonym (taxid INT,spname VARCHAR(50), PRIMARY KEY (spname, taxid));
Thanks to Jim.

I would guess that second option is better.

@simone-pignotti
Copy link

simone-pignotti commented Aug 17, 2020

785
CREATE TABLE synonym (taxid INT,spname VARCHAR(50), PRIMARY KEY (spname, taxid));

Removing COLLATE NOCASE at line 785 fixes the issue for me, thank you!

@bowmanjeffs
Copy link
Author

Worked for me too, thanks! Will leave this issue open for developers.

@vrmarcelino
Copy link

Hi there!

I am running into the same problem.
Which file can we edit to fix this issue? I can't seem to find those lines in ete.py.

Thanks a lot!

@KarolisMatjosaitis
Copy link

Depending on your setup but pkg file is:
ete3/ncbi_taxonomy/ncbiquery.py

@simone-pignotti
Copy link

It's the last file in the error trace. In the first post of this issue, that would be:
/home/jsbowman/.local/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py

@vrmarcelino
Copy link

Thanks for the super quick reply.
Found it, and it worked!
Thanks!

@eliaw-twist
Copy link

Oops, maybe I should have read this thread before opening a PR #471

@eliaw-twist
Copy link

eliaw-twist commented Aug 19, 2020

There was an additional problem with duplicate synonyms that differ only with quoting (not just casing), i.e.:

[('92835', '"Aureobacterium ketoreductum"'),
 ('92835', 'Aureobacterium ketoreductum')]

In the syn.tab file being produced.

@Prunoideae
Copy link

For those who are using ete3 in pipeline and afraid to change the module file, I have a runtime patch here:

from ete3 import NCBITaxa
try:
    import ast
    import inspect
    import sys
    print("Patching NCBITaxa's base methods. For reason, see https://github.com/etetoolkit/ete/issues/469.\n")
    code_to_patch = """db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))"""
    patched_code = """db.execute("INSERT OR REPLACE INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))"""

    ncbiquery = sys.modules[NCBITaxa.__module__]
    lines_code = [x.replace(code_to_patch, patched_code)
                  for x in inspect.getsourcelines(ncbiquery.upload_data)[0]]
    # Insert info message to see if patch is really applied
    lines_code.insert(1, "    print('\\nIf this message shown, then the patch is successful!')\n")
    # Insert external import and constants since only this function is patched and recompiled
    lines_code.insert(1, "    import os, sqlite3, sys\n")
    lines_code.insert(1, "    DB_VERSION = 2\n")
    lines_code = "".join(lines_code)

    # Compile and apply the patch
    ast_tree = ast.parse(lines_code)
    patched_function = compile(ast_tree, "<string>", mode="exec")
    mod_dummy = {}
    exec(patched_function, mod_dummy)
    ncbiquery.upload_data = mod_dummy["upload_data"]
except Exception:
    print("Patching failed, current taxonomy data downloaded from FTP may be failed to update with ETE3!")
finally:
    print("Patch finished.")

Import NCBITaxa and add these code should fix the problem when updating the database by replacing the wrong code with the correct one at runtime. This method is quite tricky and dangerous but I think if only this little bit, it's ok to use here as an emergency patch. Tested on my machine and there's nothing wrong happened.

@jhcepas
Copy link
Member

jhcepas commented Aug 29, 2020

Thanks everyone for reporting, debugging and providing workarounds.
this commit should fix the problem in a more permanent way.

Cannot remove the COLLATE NOCASE from the the db definition, because it would affect name searches (currently cases insensitive by default), so I basically added a few lines of code in the parsing function, so we make sure we ignore synonym duplicates.

hope it helps! Reopen if any further issues are found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants