Replace taxize with taxald #226

cboettig · 2018-12-07T02:01:54Z

@hlapp can you review this?

This implements the changes proposed in #224. Unlike the original taxize version, this approach vectorizes the call for id lookup, and also supports multiple authorities instead of only NCBI. The advantages of this are particularly evident in the primates example in the metadata vignette -- which previously ran very slowly and resolved no names, no almost all names are resolved (against ITIS taxonomy in this case), and which runs quite quickly.

taxald pulls a local copy of the complete id table for the authority on the fly if one has not been installed locally. Technically this can be a bit slow, but as these are compressed tables (coming from AWS S3 at the moment) they download more quickly than some single API calls. A user can install a local copy, (e.g. by running taxald::td_create(authority = "all")), but for simplicity and consistency with the old version I haven't documented that here.

taxald isn't on CRAN yet and may change a bit still, though I think this basic behavior should not be impacted. So merging this would mean we would ideally want taxald on CRAN before the next RNeXML release; though as this has always been an opt-in or soft dependency, it's not essential.

This also includes and updated version of the pkgdown build based on taxald, so deprecates PR #223 (and resolves the travis-build error associated with taxize network errors).

Co-Authored-By: cboettig <cboettig@gmail.com>

I wish `spelling::spell_check_package()` would check README.Rmd too! gets me every time. ropensci/spelling#11

hlapp

See inline comments. I think it'd be nice to at least not overwrite existing otu annotations, and that should be an easy change?

vignettes/metadata.Rmd

hlapp · 2018-12-07T02:19:04Z

R/taxize_nexml.R

         call. = FALSE)
  }
-  get_uid <- getExportedValue("taxize", "get_uid")
+  get_ids <- getExportedValue("taxald", "get_ids")


Wouldn't it be simpler to just write taxld::get_ids?

yes, but this change is intentional -- taxize::get_uid is the NCBI-specific function (though you probably wouldn't guess that from the name), while taxize::get_ids is a wrapper around a suite of potential authorities (including itis, col, and gbif. The original RNeXML implementation only supported NCBI by hardwiring get_uid, while at the same time promising to support future versions (we've always had a type argument, but only supporting type="NCBI". So this PR delivers on the support for some alternate authorities and changes the name to match the corresponding taxize function.

(Side note, but taxize::get_ids() did nothing to standardize the return object provided by wrapping the different authorities, so you couldn't just swap this back to taxize::get_ids(). taxald mimics the input, but always provides a consistent output regardless of the authority source).

I know the change to taxald::get_ids is intentional. I was just saying, isn't

get_ids <- taxald::get_ids

the same as

get_ids <- getExportedValue("taxald", "get_ids")

except the former is shorter and (I find anyway) more readable as to what's happening?

In fact, get_ids only gets used a single time. Is there a good reason to not simply replace

taxa_ids <- get_ids(labels, type, format = "uri", ...)

with

taxa_ids <- taxald::get_ids(labels, type, format = "uri", ...)

But I'm nitpicking here 😄 so feel free to just ignore.

So I am intentionally avoiding the use of an explicit reference to the package (e.g. taxald::) because that would cause R CMD CHECK to throw a WARNING, because taxald is only suggested and not imported. Importing from a function from another package using getExportedValue is something of an accepted hack to allow a package to have "soft" dependencies.

We did this with taxize initially because taxize is a huge dependency with lots of additional dependencies and listing it in Imports would make it required for a user who really just wants to do simple parsing of NeXML files and doesn't need this 'bonus' functionality. taxald is a bit lighter, but still not 'core' functionality; and besides, we can't have it in Imports if taxald isn't on CRAN, though we can have it in Suggests.

hlapp · 2018-12-07T02:24:17Z

R/taxize_nexml.R

+    for(i in 1:length(taxa_ids)){
+      id <- taxa_ids[[i]]
+      if(is.na(id))
+        warning(paste("ID for otu", 


I know it's been a warning previously, but I didn't like that before either. That's because issuing a warning on behalf of the user is assuming that user meant each and every taxon label to be resolvable against some database. Maybe the user already knew that not everything was going to resolve, and now has to use suppressWarnings() to get rid of something they were fully anticipating. I think just passing back the fact that nothing was found is good enough, and I'm not sure why just not adding that meta annotation wouldn't be good enough.

Right, I just left this in for compatibility, but I agree this is a warning that has always been more annoying than helpful. It's a lot less annoying than the previous one because taxald does a better job finding matches ... Ideally I think it might be preferable to collect these warnings in a log and return a single warning at the end? Possibly also include a way to suppress the warning as a function argument. I do feel it wouldn't be great to require inspecting the NeXML / meta elements manually to detect unresolved otu labels...

hlapp · 2018-12-07T02:25:44Z

R/taxize_nexml.R

+                      "not found. Consider checking the spelling
+                      or alternate classification"))
+      else 
+        nexml@otus[[j]]@otu[[i]]@meta <- New("ListOfmeta", list(


I know it's been this way before, but note that this will overwrite any meta annotations that the otu elements might have had before. Not really a good idea, and maybe an opportunity to fix that?

nice catch, that's a good point. You've been playing more with S4 more recently then me -- is there a more concise way to fix that than checking if nexml@otus[[j]]@otu[[i]]@meta is empty?

I don't think you need to care whether it's empty – simply combine (c()) the current value (which is of type ListOfmeta) with the new meta, and the generic method for c(ListOfmeta, ...) should catch it and treat it correctly. No?

i.e., the following:

nexml@otus[[j]]@otu[[i]]@meta <- c(nexml@otus[[j]]@otu[[i]]@meta, meta(href = taxa_ids[[i]], rel = "tc:toTaxon"))

> length(c(new("ListOfmeta"), meta("foo:foo", "bar"))) [1] 1 > length(c(new("ListOfmeta", list(meta("baz","bang"))), meta("foo:foo", "bar"))) [1] 2 > class(c(new("ListOfmeta"), meta("foo:foo", "bar"))) [1] "ListOfmeta" attr(,"package") [1] "RNeXML"

So that works as I claimed 😄

nice, have now implemented this.

hlapp · 2018-12-07T03:52:43Z

Very nice speedup! Shaved about 40% off of the CI test time. 👍

cboettig · 2018-12-07T06:38:51Z

Thanks for the nice review. I've fixed the vignette (dropping taxize mention), append instead of overwriting meta, and added an argument to opt out of warnings. Also rebuilt docs. Merge when ready.

cboettig and others added 12 commits November 29, 2018 23:08

add appveyor, rebuild README

7a5aada

Build pkgdown

1804d32

pkgdown site and tighter README

c177ae5

rebuild README.md

71fc86b

Update vignettes/intro.Rmd

afd191e

Co-Authored-By: cboettig <cboettig@gmail.com>

Update vignettes/intro.Rmd

64b5043

Co-Authored-By: cboettig <cboettig@gmail.com>

whoops, more spelling errors.

0b25c2a

I wish `spelling::spell_check_package()` would check README.Rmd too! gets me every time. ropensci/spelling#11

swap out taxize for taxald

cc15ab9

add machine account PAT for travis

e5a99b4

rebuild pkgdown

b4320e9

use modern windows RTOOLS suite

e6b68ec

update wordlist

8578263

hlapp approved these changes Dec 7, 2018

View reviewed changes

cboettig added 2 commits December 6, 2018 21:11

patches as suggested in review

742c4b0

update docs

144b267

hlapp merged commit a20efa3 into master Dec 8, 2018

This was referenced Dec 8, 2018

Adds comparing simplify = FALSE for metadata table #221

Closed

Replace taxize backend #224

Closed

Clean up README, or use to build website #222

Closed

cboettig deleted the taxald branch January 22, 2019 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace taxize with taxald #226

Replace taxize with taxald #226

cboettig commented Dec 7, 2018

hlapp left a comment

hlapp Dec 7, 2018

cboettig Dec 7, 2018

hlapp Dec 7, 2018

hlapp Dec 7, 2018

hlapp Dec 7, 2018

cboettig Dec 7, 2018

hlapp Dec 7, 2018

cboettig Dec 7, 2018

hlapp Dec 7, 2018

cboettig Dec 7, 2018

hlapp Dec 7, 2018

hlapp Dec 7, 2018

hlapp Dec 7, 2018 •

edited

Loading

cboettig Dec 7, 2018

hlapp commented Dec 7, 2018

cboettig commented Dec 7, 2018

Replace taxize with taxald #226

Replace taxize with taxald #226

Conversation

cboettig commented Dec 7, 2018

hlapp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlapp Dec 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlapp commented Dec 7, 2018

cboettig commented Dec 7, 2018

hlapp Dec 7, 2018 •

edited

Loading