Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast taxtable #87

Merged
merged 32 commits into from
Feb 27, 2017
Merged

Fast taxtable #87

merged 32 commits into from
Feb 27, 2017

Conversation

crosenth
Copy link
Member

@crosenth crosenth commented Oct 22, 2016

About a month's worth of work for two goals, speed up taxit taxtable and additional functionality for taxit add_nodes. One update is the ncbi_taxonomy.db is a required positional argument rather than an optional argument for many for subcommandas new_database, update_taxids and add_nodes. There is also a new RANKS table in the ncbi_taxonomy.db which holds a hierarchy of taxonomic ranks.

commit a051e0e
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 23 13:55:14 2016 -0700

    added some function comments

commit 2b38851
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 23 12:55:25 2016 -0700

    created species_is_classified function

commit 5f03f38
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 23 12:54:56 2016 -0700

    making this less loud

commit badac7a
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 21 16:29:02 2016 -0700

    Removed --append-lineage.  Rest of script is MUCH faster using Pandas.

commit b2b9338
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 21 16:27:31 2016 -0700

    ranks indexed from lowest to highest in database table for sorting later

commit 7fdc05d
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 21 16:26:22 2016 -0700

    --from-taxid and --from-taxtable are now --from-id and --from-table

commit 7ba03ec
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 16 16:24:46 2016 -0700

    moved to subcommand taxtable

commit 8be153d
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 16 16:07:13 2016 -0700

    refreshing unittests, data was stale and incomplete

commit abdc319
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 16 14:44:52 2016 -0700

    taxid not valid as of 2014 http://www.ncbi.nlm.nih.gov/taxonomy/?term=477972

commit 7ee2f4d
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 14 14:55:48 2016 -0700

    moving a couple log messages

commit d9819d2
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 14 14:42:36 2016 -0700

    without pruning

commit 0834539
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 14 14:41:08 2016 -0700

    faster way to build ncbi taxonomy.  Includes initial taxonomic pruning.

commit 7d84d67
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Wed Sep 14 14:40:08 2016 -0700

    adding database schema back and removing building of full taxtable

commit 19d4c91
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 2 17:19:28 2016 -0700

    building full taxtable with Pandas and inserting into database

commit 0cb37bd
Author: Chris Rosenthal <crosenth@uw.edu>
Date:   Fri Sep 2 17:19:05 2016 -0700

    clobber ncbi database as well if --clobber
…er with lowest rank having the sql index of 0
… the case in ncbi releases in the fungal kingdom
@crosenth
Copy link
Member Author

crosenth commented Nov 8, 2016

Need to generate a report of all the intermediate ranks above species (below_ ranks) to decide if we should just drop them entirely and reset the effected nodes' parent_ids.

@crosenth
Copy link
Member Author

crosenth commented Nov 8, 2016

Generate some taxtables between this and master and do some md5sums to make sure the output is exactly the same.

@crosenth crosenth merged commit eb35a89 into master Feb 27, 2017
@crosenth crosenth deleted the fast_taxtable branch February 27, 2017 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants