Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`parse_tax_data`: incorperate rank info #113

Closed
zachary-foster opened this issue Dec 21, 2017 · 0 comments
Closed

`parse_tax_data`: incorperate rank info #113

zachary-foster opened this issue Dec 21, 2017 · 0 comments
Assignees
Milestone

Comments

@zachary-foster
Copy link
Collaborator

@zachary-foster zachary-foster commented Dec 21, 2017

Currently, the Taxon objects created for the taxmap output of parse_tax_data do not add rank information, although it is often available in various forms. This means taxon_ranks() does not work as expected (see grunwaldlab/metacoder#188 and grunwaldlab/metacoder#189).

The challenge here is the diversity of inputs to parse_tax_data and the different ways ranks can be encoded:

  • A character vector of classifications (e.g. Animalia;Chordata;Mammalia;Primates;Hominidae) might have the rank information embedded (e.g. k_Animalia;p_Chordata;c_Mammalia;o_Primates;c_Hominidae), so a new class_key option like "taxon_rank" could be added.
  • A data.frame could have a column like the character vector above, or could have one column per rank, which would mean the ranks are the column names. So either a new class_key option or a new option that indicates ranks should be taken from column names
  • A list of data.frames, one per classification, would store its rank info in a column, so the name of the column with rank information would need to be passed to a new option.
  • A list of characters, one per classification, could store its rank info as the names of the vectors.

This means there are 4 ways to encode rank and handling all would require at least 2 new options and a new class_key value. The class_key value is intuitive and would not clutter the help page any, but I hesitate to add 2-3 options to handle rank.

Currently, I am thinking of doing the following:

  • Add a value to class_key called "taxon_rank" for when the rank and taxon name are together in a vector/column (quite common).
  • Add a TRUE/FALSE option called named_by_rank so users can say when column/vector names are ranks. ( somewhat common)

This will not handle lists of data.frames, but that is the least common input type. Maybe another option called "rank_col" could be added in the future.

@zachary-foster zachary-foster self-assigned this Dec 21, 2017
@zachary-foster zachary-foster mentioned this issue Apr 5, 2018
3 of 3 tasks complete
@zachary-foster zachary-foster added this to the v0.2.1 milestone Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.