-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
133 new ranks #137
133 new ranks #137
Conversation
- left section/subsection adjacent to eachother, otherwise order matches suggested in #133
e2eebc5
to
4038a01
Compare
@nhoffman @dhoogest - In addition to Dan's new rank additions I added support for a new rank "clade" that behaves like "no rank". Like "no rank", a "clade" can appear in any rank order and can loop (be a child and parent of itself). A node of rank "clade" will have its rank represented as an underscore prepended to its parent rank. Example taxtable: |
Good solution! |
I think the simplest solution is to treat the 6 rank loops in the exact same way as we've been handling no_rank |
Okay I have a candidate release here. @dhoogest @nhoffman - I created a group of ranks called RANK_LOOPS which all get the same parent rank + '_' treatment . As far as the new ranks that are not "loop ranks" I could not determine a rank order as none occur together in a single lineage and are all children of rank species at their lowest rank. |
ranks[idx] = ranks[idx - 1] + '_' | ||
except ValueError: | ||
return ranks | ||
return ranks | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More micro-optimizations:
- ranks is being copied twice, once in list(ranks) and again in enumerate(ranks[:]) - do you need both?
- RANK_LOOPS is being accessed in the global scope, which is more expensive than as a local variable
- again, best to use a set to test membership
I'd suggest something like (untested):
RANK_LOOPS_SET = set(RANK_LOOPS)
def replace_loops(ranks, replace=RANK_LOOPS_SET):
for idx, r in enumerate(list(ranks)):
if r in replace:
ranks[idx] = ranks[idx - 1] + '_'
return ranks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I was trying to change as little code as possible. I am not sure about list(ranks) maybe an iterator was being passed at one point 89ee1ec?
-
I will update
-
Thanks for the code, will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates:
- Renamed RANK_LOOPS to UNORDERED_RANKS for clarity
- Global references to RANK_LOOPS has been replaced with local variables except in taxonomy.py (see comment below)
- self.unordered_ranks is a set()
@@ -86,8 +87,7 @@ def __init__(self, engine, NO_RANK='no_rank', schema=None): | |||
ranks = select([ranks_table.c.rank, ranks_table.c.height]).execute().fetchall() | |||
ranks = sorted(ranks, key=lambda x: int(x[1])) # sort by height | |||
self.ranks = [r[0] for r in ranks] # just the ordered ranks | |||
|
|||
self.NO_RANK = NO_RANK | |||
self.unordered_ranks = set(self.ranks) & set(UNORDERED_RANKS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global taxtastic.ncbi.UNORDERED_RANKS var difficult to avoid here without updating database schema to identify "unordered" ranks. Note this is a continuation of the issue of how to identify ncbi "no_rank"
@@ -164,7 +159,6 @@ def test01(self): | |||
""" | |||
|
|||
executable = None | |||
DEVNULL = open(os.devnull, 'w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used and raises warning:
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8'>
@@ -50,7 +50,7 @@ def action(args): | |||
engine = sqlalchemy.create_engine(args.url, echo=args.verbosity > 2) | |||
tax = Taxonomy(engine, schema=args.schema) | |||
|
|||
records = list(yaml.load_all(args.new_nodes)) | |||
records = list(yaml.load_all(args.new_nodes, Loader=yaml.SafeLoader)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.