-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast taxtable #87
Merged
Merged
Fast taxtable #87
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit a051e0e Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 23 13:55:14 2016 -0700 added some function comments commit 2b38851 Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 23 12:55:25 2016 -0700 created species_is_classified function commit 5f03f38 Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 23 12:54:56 2016 -0700 making this less loud commit badac7a Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 21 16:29:02 2016 -0700 Removed --append-lineage. Rest of script is MUCH faster using Pandas. commit b2b9338 Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 21 16:27:31 2016 -0700 ranks indexed from lowest to highest in database table for sorting later commit 7fdc05d Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 21 16:26:22 2016 -0700 --from-taxid and --from-taxtable are now --from-id and --from-table commit 7ba03ec Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 16 16:24:46 2016 -0700 moved to subcommand taxtable commit 8be153d Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 16 16:07:13 2016 -0700 refreshing unittests, data was stale and incomplete commit abdc319 Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 16 14:44:52 2016 -0700 taxid not valid as of 2014 http://www.ncbi.nlm.nih.gov/taxonomy/?term=477972 commit 7ee2f4d Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 14 14:55:48 2016 -0700 moving a couple log messages commit d9819d2 Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 14 14:42:36 2016 -0700 without pruning commit 0834539 Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 14 14:41:08 2016 -0700 faster way to build ncbi taxonomy. Includes initial taxonomic pruning. commit 7d84d67 Author: Chris Rosenthal <crosenth@uw.edu> Date: Wed Sep 14 14:40:08 2016 -0700 adding database schema back and removing building of full taxtable commit 19d4c91 Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 2 17:19:28 2016 -0700 building full taxtable with Pandas and inserting into database commit 0cb37bd Author: Chris Rosenthal <crosenth@uw.edu> Date: Fri Sep 2 17:19:05 2016 -0700 clobber ncbi database as well if --clobber
…er with lowest rank having the sql index of 0
… the case in ncbi releases in the fungal kingdom
Need to generate a report of all the intermediate ranks above species (below_ ranks) to decide if we should just drop them entirely and reset the effected nodes' parent_ids. |
Generate some taxtables between this and master and do some md5sums to make sure the output is exactly the same. |
… nodes that depend on other new nodes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
About a month's worth of work for two goals, speed up
taxit taxtable
and additional functionality fortaxit add_nodes
. One update is the ncbi_taxonomy.db is a required positional argument rather than an optional argument for many for subcommandas new_database, update_taxids and add_nodes. There is also a new RANKS table in the ncbi_taxonomy.db which holds a hierarchy of taxonomic ranks.