Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

133 new ranks #137

Merged
merged 10 commits into from
Jul 23, 2020
Merged

133 new ranks #137

merged 10 commits into from
Jul 23, 2020

Conversation

crosenth
Copy link
Member

@crosenth crosenth commented Jul 1, 2020

No description provided.

Daniel Hoogestraat and others added 2 commits June 29, 2020 15:29
- left section/subsection adjacent to eachother, otherwise order
  matches suggested in #133
@crosenth crosenth self-assigned this Jul 2, 2020
@crosenth
Copy link
Member Author

crosenth commented Jul 2, 2020

@nhoffman @dhoogest - In addition to Dan's new rank additions I added support for a new rank "clade" that behaves like "no rank". Like "no rank", a "clade" can appear in any rank order and can loop (be a child and parent of itself). A node of rank "clade" will have its rank represented as an underscore prepended to its parent rank.

Example taxtable: taxit taxtable --tax-ids 572265 1228987 1149873 191675

@crosenth crosenth requested a review from nhoffman July 2, 2020 01:43
@dhoogest
Copy link
Collaborator

dhoogest commented Jul 2, 2020

@nhoffman @dhoogest - In addition to Dan's new rank additions I added support for a new rank "clade" that behaves like "no rank". Like "no rank", a "clade" can appear in any rank order and is cyclical (can be a child and parent of itself). A node of rank "clade" will have its rank represented as an underscore prepended to its parent rank.

Example taxtable: taxit taxtable --tax-ids 572265 1228987 1149873 191675

Good solution!

@crosenth
Copy link
Member Author

crosenth commented Jul 2, 2020

I found another loop in a different rank and decided to do a full count of rank loops:

Screenshot from 2020-07-01 21-55-42

So my solution will not work..

@crosenth
Copy link
Member Author

crosenth commented Jul 2, 2020

I think the simplest solution is to treat the 6 rank loops in the exact same way as we've been handling no_rank

@crosenth
Copy link
Member Author

crosenth commented Jul 6, 2020

Okay I have a candidate release here. @dhoogest @nhoffman - I created a group of ranks called RANK_LOOPS which all get the same parent rank + '_' treatment . As far as the new ranks that are not "loop ranks" I could not determine a rank order as none occur together in a single lineage and are all children of rank species at their lowest rank.

ranks[idx] = ranks[idx - 1] + '_'
except ValueError:
return ranks
return ranks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More micro-optimizations:

  • ranks is being copied twice, once in list(ranks) and again in enumerate(ranks[:]) - do you need both?
  • RANK_LOOPS is being accessed in the global scope, which is more expensive than as a local variable
  • again, best to use a set to test membership

I'd suggest something like (untested):

RANK_LOOPS_SET = set(RANK_LOOPS)

def replace_loops(ranks, replace=RANK_LOOPS_SET):
    for idx, r in enumerate(list(ranks)):
        if r in replace:
            ranks[idx] = ranks[idx - 1] + '_'
        return ranks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I was trying to change as little code as possible. I am not sure about list(ranks) maybe an iterator was being passed at one point 89ee1ec?

  • I will update

  • Thanks for the code, will update

Copy link
Member Author

@crosenth crosenth Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates:

  1. Renamed RANK_LOOPS to UNORDERED_RANKS for clarity
  2. Global references to RANK_LOOPS has been replaced with local variables except in taxonomy.py (see comment below)
  3. self.unordered_ranks is a set()

taxtastic/taxonomy.py Outdated Show resolved Hide resolved
@@ -86,8 +87,7 @@ def __init__(self, engine, NO_RANK='no_rank', schema=None):
ranks = select([ranks_table.c.rank, ranks_table.c.height]).execute().fetchall()
ranks = sorted(ranks, key=lambda x: int(x[1])) # sort by height
self.ranks = [r[0] for r in ranks] # just the ordered ranks

self.NO_RANK = NO_RANK
self.unordered_ranks = set(self.ranks) & set(UNORDERED_RANKS)
Copy link
Member Author

@crosenth crosenth Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global taxtastic.ncbi.UNORDERED_RANKS var difficult to avoid here without updating database schema to identify "unordered" ranks. Note this is a continuation of the issue of how to identify ncbi "no_rank"

@@ -164,7 +159,6 @@ def test01(self):
"""

executable = None
DEVNULL = open(os.devnull, 'w')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used and raises warning:

sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8'>

@@ -50,7 +50,7 @@ def action(args):
engine = sqlalchemy.create_engine(args.url, echo=args.verbosity > 2)
tax = Taxonomy(engine, schema=args.schema)

records = list(yaml.load_all(args.new_nodes))
records = list(yaml.load_all(args.new_nodes, Loader=yaml.SafeLoader))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants