@sckott sckott released this Jan 29, 2019 · 15 commits to master since this release

Compare to previous release

v0.9.4...v0.9.5

DEFUNCT

  • iucn_summary_id() is defunct, use iucn_summary() instead

NEW FEATURES

  • col_downstream() gains parameter extant_only (logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquiry
  • downstream() gains another db options: Worms. You can now set db="worms" to use Worms to get taxa downstream from a target taxon. In addition, taxize gains new function worms_downstream(), which is used under the hood in downstream(..., db="worms") (#713) (#715)
  • gains new function id2name() with db options for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of the get_*() family of functions. (#712) (#716)
  • tax_rank() gains new parameter rows so that one can pass rows down to get_*() functions

MINOR IMPROVEMENTS

  • synonyms() warning from an internal cbind() call now fixed (#704) (#705) thanks @vijaybarve
  • namespace taxize function calls thrown when notifying users about API keys (e.g., taxize::use_tropicos()) to make it very clear where the functions live (to avoid confusion with usethis) (#724) (#725) thanks @maelle
  • changed iucn_summary() to output the same structure when no match is found as when a match is found so that when output is passed to iucn_status() behavior is the same (#708) thanks @Rekyt
  • skip tax_name() tests on CRAN (#728)
  • httr replaced by crul throughout (#590)
  • most unit tests that make HTTP requests now cached with vcr, making tests much faster and not prone to errors to remote services being down (#729)
  • EOL: The EOL API underwent major changes, and we've attempted to get things in working order. eol_dataobjects() gains new parameter language. eol_pages() loses iucn, images, videos, sounds, maps, and text parameters, and gains images_per_page, videos_per_page, sounds_per_page, maps_per_page, texts_per_page, and texts_page. Please do let us know if you find any problems with any EOL functions (#717) (#718)
  • As part of EOL changes, the default db value for comm2sci() and sci2comm() is now ncbi instead of eol
  • EUBON base URL now https instead of http
  • A number of get_*() functions changed parameter verbose to messages to not conflict with verbose passed down to crul::HttpClient
  • ping functions: ncbi_ping() reworked to allow use of your api key as a parameter or pulled from your environemnt; eol_ping() using https instead of http, and parsing JSON instead of XML.

BUG FIXES

  • get_eolid() was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallas
  • get_tolid() was erroring when values were NULL - now replacing all NULLL with NA_character_ to make data.table::rbindlist() happy (#710) (#711) thanks @gpli for the fix
  • add additional rows to the rank_ref data.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727)
  • fix to downstream() and ncbi_get_taxon_summary(): change in ncbi_get_taxon_summary to break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetz
  • a number of fixes internally (not user facing) to comply with upcoming R-devel changes for checking length greater than 1 in logical statements (#731)
Assets 2

@sckott sckott released this Jul 24, 2018 · 102 commits to master since this release

NEW FEATURES

  • new contributor: Gaopeng Li
  • gains new functions for helping the user get authentication keys/tokens: use_entrez(), use_eol(), use_iucn() (which uses internally rredlist::rl_use_iucn()), and use_tropicos() (#682) (#691) (#693) By @maelle

MINOR IMPROVEMENTS

  • remove commented out code

BUG FIXES

  • fix tropicos_ping()
  • fixed downstream() and gbif_downstream(): some of the results don't have a canonicalName, so now safely try to get that field (#673)
  • fixed as.uid(), was erroring when passing in a taxon ID (#674) (#675) by @zachary-foster
  • fix in get_boldid() (and by extension classification(..., db = "bold")): was failing when no parent taxon found, just fill in with NA now (#680)
  • fix to synonyms(): was failing for some TSNs for db="itis" (#685)
  • fix to tax_name(): rows arg wasn't being passed on internally (#686)
  • fix to gnr_resolve() and gnr_datasources(): problems were caused by http scheme, switched to use https instead of http (#687)
  • fix to class2tree(): organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gpli
  • fix in ncbi_get_taxon_summary(): changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam in downstream()/ncbi_downstream()/ncbi_children() (#698)
  • fix in class2tree(): was erroring when name strings contained pound signs (e.g., #) (#699) (#700) thanks @gpli
Assets 2

@sckott sckott released this Mar 20, 2018 · 157 commits to master since this release

MINOR IMPROVEMENTS

  • package gains three new authors: Bastian Greshake Tzovaras, Philippe Marchand, and Vinh Tran
  • Don't enforce rate limiting via Sys.sleep for NCBI requests if the user has an API key (#667)
  • Fix to all functions that do NCBI requests to work whether or not a user has an NCBI API key (#668)
  • Increased documentation on authentication, see ?taxize-authentication
  • Further conversion of verbose to messages across the package so that supressing calls to message() do not conflict with curl options passed in
  • Converted genbank2uid() and ncbi_get_taxon_summary() to use crul instead of httr for HTTP requests

BUG FIXES

  • Fix to get_tolid(): it was missing assignment of the att attribute internally, causing failures in some cases (#663) (#672)
  • Fix to ncbi_children() (and thus children() when requesting NCBI data) to not fail when there is an empty result from the internal call to classification() (#664) thanks @arendsee
Assets 2

@sckott sckott released this Feb 7, 2018 · 177 commits to master since this release

Installation

Stalled on CRAN. Install like

install.packages("taxize", repos = c("http://packages.ropensci.org"))

OR

remotes::install_github("ropensci/taxize")
# OR
devtools::install_github("ropensci/taxize")

NEWS

NEW FEATURES

  • class2tree() gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)
  • Added support throughout package for use of NCBI Entrez API keys - NCBI now strongly encourages their use and you get a higher rate limit when you use one. See ?taxize-authentication for help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646)
  • New author Zebulun Arendsee (@arendsee)
  • New package dependencies: crul and zoo

MINOR IMPROVEMENTS

  • In downstream() we now pass on limit and start parameters to gbif_downstream(); we weren't doing that before; the two parameters control pagination (#638)
  • genbank2uid() now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-foster
  • children() outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendsee
  • Improve downstream() by passing ... (additional parameters) down to ncbi_children() used internally. allows e.g., use of ambiguous parameter in ncbi_children() allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendsee
  • swapped out use of httr for crul in EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see package crul for help on curl options. Along with this change, the parameter verbose has changed to messages (for toggling printing of information messages)

DOCUMENTATION

  • Added additional text to the CONTRIBUTING.md file for how to contribute to the test suite (#635)

BUG FIXES

  • genbank2uid now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.
  • Fix to downstream(): passing numeric taxon ids to the function while using db="ncbi" wasn't working (#641) thanks @arendsee
  • Fix to children(): passing numeric taxon ids to the function while using db="worms" wasn't working (#650) (#651) thanks @arendsee
  • synonyms_df() - that attemps to combine many outputs from the synonyms() function - now removes NA/NULL/empy outputs before attempting the combination (#636)
  • Fix to gnr_resolve(): before if preferred_data_sources was used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656)
  • Fixes to children(). It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-foster
Assets 2

@sckott sckott released this Sep 25, 2017 · 260 commits to master since this release

Changes to get_*() functions

  • Added separate documentation file for all get* functions
    describing attributes and various exception behaviors
  • Some get*() functions had NaN as default rows parameter
    value. Those all changed to NA
  • Better failure behavior now when non-acceptable rows
    parameter value given
  • Added in all type checks for parameters across get_*() functions
  • Changed behavior across all get_*() functions to behave the
    same when ask = FALSE, rows = 1 and ask = TRUE, rows = 1 as these
    should result in the same outcome. (#627) thanks @zachary-foster !
  • Fixed direct match behavior so that when there's multiple results
    from the data provider, but no direct match, that the functions don't
    give back just NA with no inication that there were multiple matches.
  • Please let me know if any of these changes cause problems for your
    code or package.

NEW FEATURES

  • Change comm2sci() to S3 setup with methods for character, uid,
    and tsn (#621)
  • iucn_status() now has S3 setup with a single method that only handles
    output from the iucn_summary() function.

MINOR IMPROVEMENTS

  • Add required key parameter to fxn iucn_id() (#633)
  • imrove docs for sci2comm(): to indicate how to get non-simplified
    output (which includes what language the common name is from) vs.
    getting simplified output (#623) thanks @glaroc !
  • Fix to sci2comm() to not be case sensitive when looking for matches
    (#625) thanks @glaroc !
  • Two additional columns now returned with eol_search(): link and content
  • Improve docs in eol_search() to describe returned data.frame
  • Fix bold_bing() to use new base URL for their API
  • Improved description of the dataset rank_ref, see ?rank_ref

BUG FIXES

  • Fix to downstream() via fix to rank_ref dataset to include
    "infraspecies" and make "unspecified" and "no rank" requivalent.
    Fix to col_downstream() to remove properly ranks lower than
    allowed. (#620) thanks @cdeterman !
  • iucn_summary: changed to using rredlist package internally.
    sciname param changed to x. iucn_summary_id() now is
    deprecated in favor of iucn_summary(). iucn_summary() now has a
    S3 setup, with methods for character and iucn (#622)
  • Added "cohort" to rank_ref dataset as that rank sometimes used
    at NCBI (from bug reported in ncbi_downstream()) (#626)
  • Fix to sci2comm(), add tryCatch() to internals to catch
    failed requests for specific pageid's (#624) thanks @glaroc !
  • Fix URL for taxa for NBN taxonomic ids retrieved via
    get_nbnid() (#632)
Assets 2

@sckott sckott released this Jul 17, 2017 · 281 commits to master since this release

BUG FIXES

  • Remove ape::neworder_phylo object, which is not used anymore in taxize
    (#618) (#619) thanks @ashiklom
Assets 2

@sckott sckott released this Jun 30, 2017 · 288 commits to master since this release

NEW FEATURES

  • New function ncbi_downstream() and now NCBI is an option in
    the function downstream() (#583) thanks for the push @andzandz11
  • New data source: Wiki*, which includes Wikipedia, Wikispecies, and
    Wikidata - you can choose which you'd like to search. Uses new package
    wikitaxa, with contributions from @ezwelty (#317)
  • scrapenames() gains a parameter return_content, a boolean, to
    optionally return the OCR content as a text string with the results. (#614)
    thanks @fgabriel1891
  • New function get_iucn() - to get IUCN Red List ids for taxa. In addition,
    new S3 methods synonyms.iucn and sci2comm.iucn - no other methods could
    be made to work with IUCN Red List ids as they do no share their taxonomic
    classification data (#578) thanks @diogoprov

MINOR IMPROVEMENTS

  • bold now an option in classification() function (#588)
  • fix to NBN to use new base URL (#582) ($597)
  • genbank2uid() can give back more than 1 taxon matched to a given
    Genbank accession number. Now the function can return more than one
    match for each query, e.g., try genbank2uid(id = "AM420293") (#602)
    thanks @sariya
  • had to modify cbind() usage to incclude ... for method
    consistency (#612)
  • tax_rank() used to be able to do only ncbi and itis. Can now do a
    lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn,
    worms, natserv, bold (#587)
  • Added to classification() docs in a section Lots of results a
    note about how to deal with results when there are A LOT of them. (#596)
    thanks @ahhurlbert for raising the issue
  • tnrs() now returns the resulting data.frame in the oder of the
    names passed in by the user (#613) thanks @wpetry
  • Changes to gnr_resolve() to now strip out taxonomic names submitted
    by user that are NA, or zero length strings, or are not of class
    character (#606)
  • Added description of the columns of the data.frame output in
    gnr_resolve() (#610) thanks @kamapu
  • Added noted in tnrs() docs that the service doesn't provide any
    information about homonyms. (#610) thanks @kamapu
  • Added parvorder to the taxize rank_ref dataset - used by NCBI -
    if tax returned with that rank, some functions in taxize were failing
    due to that rank missing in our reference dataset rank_ref (#615)

BUG FIXES

  • Fix to get_colid() via problem in parsing within col_search() (#585)
  • Fix to gbif_downstream (and thus fix in downstream()): there
    was two rows with form in our rank_ref reference dataset of rank names,
    causing > 1 result in some cases, then causing vapply to fail as it's
    expecting length 1 result (#599) thanks @andzandz11
  • Fix genbank2uid(): was failing when getting more than 1 result back,
    works now (#603) and fails better now, giving back warnings/error messages
    that are more informative (see also #602) thanks @sariya
  • Fix to synonyms.tsn(): in some cases a TSN has > 1 accepted name. We
    get accepted names first from the TSN, then look for synonyms, and hadn't
    accounted for > 1 accepted name. Fixed now (#607) thanks @tdjames
  • Fixed bug in sci2comm() - was not dealing internally with passing
    the simplify parameter (#616)
Assets 2

@sckott sckott released this Jan 18, 2017 · 334 commits to master since this release

taxize 0.8.4

NEW FEATURES

  • Added WoRMS integration via the new worrms package on CRAN.
    Adds functions as.wormsid(), get_wormsid(), get_wormsid_(),
    children.wormsid(), classification.wormsid(), sci2comm.wormsid(),
    comm2sci.wormsid(), and synonyms.wormsid() (#574) (#579)
  • New functions for NatureServe data, including as.natservid,
    get_natservid, get_natservid_, and classification.natservid
    (#126)

BUG FIXES

  • EOL API keys were not passed on to internal functions. fixed now.
    thanks @dschlaep ! (#576)
  • Fix in rankagg() with respect to vegan package to work with
    older and new version of vegan - thank @jarioksa (#580) (#581)
Assets 2

@sckott sckott released this Dec 16, 2016 · 354 commits to master since this release

NEW FEATURES

  • New data source added: Open Tree of Life. New functions for the data source
    added: get_tolid(), get_tolid_(), and as.tolid() (#517)
  • related to above classification() gains new method for TOL data
  • related to above lowest_common() gains new method for TOL data
  • Now using ritis package, an external dependency for ITIS taxonomy
    data. Note that a large number of ITIS functions were removed, and are
    now available via the package ritis. However, there are still many
    high level functions for working with ITIS data (see functions prefixed
    with itis_), and get_tsn(), classification.tsn(), and similar
    high level functions remain unchanged. (#525)
  • EUBON has a new API (v1.2). We now interact with that new API version.
    In addition, eubon() fxn is now eubon_search(), although either still
    work - though eubon() will be made defunct in the next version of
    this package. Additional new functions were added: eubon_capabilities(),
    eubon_children(), and eubon_hierarchy() (#567)
  • lowest_common() function gains two new data source options: COL (Catalogue
    of Life) and TOL (Tree of Life) (#505)
  • Addded new function synonyms_df() as a slim wrapper around
    data.table::rbindlist() to make it easy to combine many outputs
    from synonyms() for a single data source - there is a lot of heterogeneity
    among data sources in how they report synonyms data, so we don't attempt
    to combine data across sources (#533)

MINOR IMPROVEMENTS

  • Change NCBI URLs to https from http (#571)

BUG FIXES

  • Fixed bug in tax_name() in which when an invalid taxon was searched
    for then classification() returned no data and caused an error.
    Fixed now. (#560) thanks @ljvillanueva for reporting it!
  • Fixed bug in gnr_resolve() in which order of input names to the function
    was not retained. fixed now. (#561) thanks @bomeara for reporting it!
  • Fixed bug in gbif_parse() - data format changed coming back from
    GBIF - needed to replace NULL with NA (#568) thanks @ChrKoenig for
    reporting it!
Assets 2

@sckott sckott released this Jul 23, 2016 · 389 commits to master since this release

NEW FEATURES

  • New vignette: "Strategies for programmatic name cleaning" (#549)

MINOR IMPROVEMENTS

  • get_*() functions now have new attributes to further help the user:
    multiple_matches (logical) indicating whether there were multiple
    matches or not, and pattern_match (logical) indicating whether a
    pattern match was made, or not. (#550) from (#547) discussion,
    thanks @ahhurlbert ! see also (#551)
  • Change all xml2::xml_find_one() to xml2::xml_find_first()
    for new xml2 version (#546)
  • gnr_resolve() now retains user supplied taxa that had no matches -
    this could affect your code, make sure to check your existing code (#558)
  • gnr_resolve() - stop sorting output data.frame, so order of rows
    in output data.frame now same as user input vector/list (#559)

BUG FIXES

  • Fixed internal fxn sub_rows() inside of most get_*() functions
    to not fail when the data.frame rows were less than that requested by
    the user in rows parameter (#556)
  • Fixed get_gbifid(), as sometimes calls failed because we now
    return numberic IDs but used to return character IDs (#555)
  • Fix to all get_() functions to call the internal sub_rows()
    function later in the function flow so as not to interfere with
    taxonomic based filtering (e.g., user filtering by a taxonomic rank)
    (#555)
  • Fix to gnr_resolve(), to not fail on parsing when no data
    returned when a preferred data source specified (#557)
Assets 2