Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow names or only taxonomic ids as input to higher level methods? #58

Open
sckott opened this issue Jul 10, 2020 · 2 comments
Open

Allow names or only taxonomic ids as input to higher level methods? #58

sckott opened this issue Jul 10, 2020 · 2 comments

Comments

@sckott
Copy link
Owner

sckott commented Jul 10, 2020

An aspect that's different from R taxize is that I didn't want to bring the interactive part to this package. That is, the taxize get_* fxns have a prompt if there's more than one result for a taxon name against a given data source, letting the user pick which taxon. BUT, that's not reproducible and requires an interactive session. The various higher level functions in R taxize like classification() allow input of not just ids but taxonomic names because it passes names to get_* fxns which then result in a single taxonomic id before fetching the classification. However, here we don't have the prompt thing, so i think for higher level methods like Classification/Children we should only allow taxonomic ids as input. thoughts @Daniel-Davies ?

@Daniel-Davies
Copy link
Contributor

In a previous project, when I had this issue, I decided to use a "consensus" protocol on the results of the API. That is, from the list of results returned by the API, taking the most commonly occuring value is usually enough to satisfy the query. Taking a classification example; trying GNR with "panthera tigris" returns 11 separate results; for "species", all are in agreement of "panthera tigris". For genus perhaps, 6 results may have "panthera", while 1 will have "puma", so we take "panthera". Repeating this for each key gives a sort of approximation to the classification of the entered name from the multiple sources that turns out to be reasonably robust.

I think the ID approach is good, since it gives the user an option of determinism, and it definitely needs to be a part of the package. However, if someone is willing to accept the risks, could they also try a "most-common-value-wins" approach? I'm not very trained in Taxonomy so I don't know if this is valid...

@sckott
Copy link
Owner Author

sckott commented Jul 13, 2020

That's a good idea for selecting names. We do that in the R get_ fxns, we look for an exact match, and if there is one return that match. It could be more complicated than that of course. So sounds like we should for the Ids class avoid the interactive/prompt thing and try a best effort approach to returning a single id.

For the higher level methods (e.g., classification) sounds like we go with ONLY allowing IDs as inputs, correct? so users have to get IDs first, either using IDs class or some other method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants