Skip to content

jaypat87/rdataone

 
 

Repository files navigation

dataone: R interface to the DataONE network of data repositories

CRAN_Status_Badge Build Status

Provides read and write access to data and metadata from the DataONE network of data repositories, including the KNB Data Repository, Dryad, and the NSF Arctic Data Center. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository using the global identifier for each data object. Users can also insert and update data objects on repositories that support these methods. For more details, see the vignettes.

Installation Notes

Version 2.0 of the dataone R package removes the dependency on rJava and significantly changes the base API to correspond to the published DataONE API. Previous methods for accessing DataONE will be maintained, but new methods have been added.

The dataone R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed first. If you are installing on Mac OS X or Windows then installing these libraries is not required.

Installing on Mac OS X

On Mac OS X dataone can be installed with the following commands:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Note: if you wish to build the required redland package from source before installing dataone, please see the redland installation instructions.

Installing on Ubuntu

For ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:

sudo apt-get update
sudo apt-get install librdf0 librdf0-dev

Then install the R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point

Installing on Windows

For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.

To install the dataone R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Note: if you wish to build the required redland package from source before installing dataone, please see the redland installation instructions.

Quick Start

See the full manual for documentation, but once installed, the package can be run in R using:

library(dataone)
help("dataone")

To search the DataONE Federation Member Node Knowledge Network for Biocomplexity (KNB) for a dataset:

cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")
mySearchTerms <- list(q="id:doi*hstuar*+AND+abstract:Zostera+AND+keywords:Benthic", 
                      fl="id,title,dateUploaded,abstract,datasource,size")
result <- query(mn, solrQuery=mySearchTerms, as="data.frame")
pid <- result[1,'id']

A CSV data object can be downloaded from KNB with the commands:

cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")
dataRaw <- getObject(mn, "df35d.443.1")
dataChar <- rawToChar(dataRaw)
theData <- textConnection(dataChar)
df <- read.csv(theData, stringsAsFactors=FALSE)

Uploading a CSV file to a DataONE Member Node requires authentication, which is done by:

  • Login at DataONE: Production or Staging
  • Navigate to the 'My profile' page
  • Then navigate to 'Settings | Authentication Token | Token for DataONE R'
  • Add the token to you environment, but be sure to not save the token in any scripts
options(dataone_test_token = "eyJh8YwQ12NNaqxuDsJSUzI1NiJ9.eyJzdWIi09awjd67rt7n1AC5vc...rest.of.long.token.here")

Once you have the token loaded, uploading is done with:

library(datapack)
library(uuid)
d1c <- D1Client("STAGING", "urn:node:mnStageUCSB2")
id <- paste("urn:uuid:", UUIDgenerate(), sep="")
testdf <- data.frame(x=1:10,y=11:20)
csvfile <- paste(tempfile(), ".csv", sep="")
write.csv(testdf, csvfile, row.names=FALSE)
# Build a DataObject containing the csv, and upload it to the Member Node
d1Object <- new("DataObject", id, format="text/csv", filename=csvfile)
uploadDataObject(d1c, d1Object, public=TRUE)

Note that this example uploads a data file to the DataONE test environment "STAGING" and not the production environment ("PROD"), in order to avoid inserting a bunch of test data into the production network. Users should use "STAGING" for testing (https://search-stage.test.dataone.org), and "PROD" (https://search.dataone.org) for real data submissions. When switching between STAGING and PROD, the token used must come from the appropriate environement, and be set with the appropriate name (dataone_test_token for STAGING, and dataone_token for PROD).

Acknowledgements

Work on this package was supported by:

  • NSF-ABI grant #1262458 to C. Gries, M. B. Jones, and S. Collins.
  • NSF-DATANET grants #0830944 and #1430508 to W. Michener, M. B. Jones, D. Vieglais, S. Allard and P. Cruse
  • NSF DIBBS grant #1443062 to T. Habermann and M. B. Jones
  • NSF-PLR grant #1546024 to M. B. Jones, S. Baker-Yeboah, J. Dozier, M. Schildhauer, and A. Budden

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

nceas_footer

ropensci_footer

About

R package for reading and writing data at DataONE data repositories

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 99.9%
  • Shell 0.1%