bold 1.3.0

Latest

Latest

salix-d released this 15 Sep 14:36

NEW FEATURES

New function bold_identify_taxonomy() to add taxonomic information to the output of bold_identify() and replace bold_identify_parents(). Instead of taking the taxon names from the bold_identify() output, and use bold_tax_name() to get the taxonomic ID to then pass it to bold_tax_id() to get the parent names, we take the process ids from the bold_identify() output and then pass them to bold_specimens(). This has the advantages of being faster and, more importantly, making sure the correct taxonomy is returned. The function has less arguments since the filtering of the result isn't necessary anymore. Since the result now has only one line per row of input, the output is always in 'wide' format (like when using bold_identify_parents() with wide=TRUE). There is one new argument taxOnly which is TRUE by default and return only the taxonomic data. However, since bold_specimens() also returns other data (habitat, country, image_url, etc), setting this argument to FALSE will also join that data to the input.
New function bold_tax_id2() which will eventually replace bold_tax_id(). The main changes are in the format of the output. For the dataTypes 'basic', 'stats', 'images' and 'thirdparty', the output doesn't change. For the dataTypes 'sequencinglabs', 'geo' and 'depository', instead of having one (sometimes very) wide data.frame, the result is now in 'long' format, with the columns 'input', 'taxid', 'sequencinglabs|country|depository' and 'count'. For the dataTypes 'all' or when selecting more than one dataTypes, the output is a list for each data types containing their respective data.frame. When setting includeTree to TRUE, the parents' data is rbinded to their respective data.frame. The function also check that all arguments are the correct type and that the dataTypes chosen are valid.
The now deprecated bold_tax_id() has the same argument checks as bold_tax_id2() but will throw warnings instead of errors to not affect existing workflows. Also, if a chosen dataTypes is invalid, it gets removed to not make unnecessary requests.
Similarly, the now deprecated bold_identify_parents() has new argument checks and will throw warnings to not affect existing workflows.
For bold_tax_id2() and bold_tax_name(), when querying multiple taxa, if one fails, the loop won't break and will instead throw the API error as a warning. The output object will also have 2 new attributes "errors" and "params" that will let you see what errors occurred for with request and what parameters were use for the request.
To make it easy to retrieve these attributes, 3 new functions have been created:
- bold_get_attr() will return a list of the two attributes
- bold_get_errors() will return a list of the errors
- bold_get_params() will return a list of parameters used
bold_specimens() and bold_seqspec() have a new parameter cleanData which, when set to TRUE, replaces empty strings ("") by NAs and strings containing only duplicated values by their unique value (ex : "COI-5P|COI-5P|COI-5P" becomes "COI-5P").
New function bold_read_trace() to replace read_trace(). Can read one or multiple trace files from a boldtrace object or provided file path(s).
New function b_sepFasta() to use after a call to bold_seqspec() where sepFasta wasn't set to TRUE.

MINOR IMPROVEMENTS

made tests for the new functions
made tests for the bold_trace() function
added test to existing functions to improved test coverage
added/completed argument checks for every functions
bold_specimens() and bold_seqspec() can now also return partial output like bold_seq()
using data.table when possible, removed dplyr and reshape dependencies
using stringi instead of stringr which removed stringr's other dependencies
added more details to the documentation of some functions

BUG FIXES

changed how http responses are read so they throw warnings and return NAs instead of errors. This prevents a long request to stop and fail, loosing the already fetched data. (#74)
added a note in the documentation of bold_seq(), bold_seqspec() and bold_specimen() that if the taxon doesn't have public records, if using another parameter will return all data for that parameter. Users can verify the availability of public records with bold_stats(). A note was also added in bold_tax_name() that the column 'specimenrecords' relate to the records in the taxonomy browser and not in the public data portal. (#76)
fixed output of bold_seq() (#79)
changed the function used to encode to UTF-8 (#81, #86)
contacted bold so they would fix their typo in 'depository' which prevented fetching related data with bold_tax_id() (#83). Added a line in the function to change 'depositories' to 'depository' in case people had been using that.
added a check for 'name' in bold_tax_name() to double escape single quotes. Otherwise it doesn't return the data (#84, #85). Since it's related to the API, this means that the data that comes back also contains errors. So I added a function to repair the names of 'taxon', 'taxonrep' and 'parentname' in the returned object. The function is also used in pipe_params() (which is used by bold_seq(), bold_seqspec() and bold_specimen()) to repair the taxon parameter in case users use results from previous versions.
changed the way the response of bold_seqspec() is read (#87, #88) thanks @cjfields
added a note in bold_stats() documentation to specify that the record counts include all gene markers (#90).

Contributors

cjfields

Assets 2