- Author: Matthew B. Jones and Peter Slaughter (NCEAS)
- License: Apache 2
- Package source code on Github
- Submit Bugs and feature requests
The datapack R package provides an abstraction for collating heterogeneous collections of data objects and metadata into a bundle that can be transported and loaded into a single composite file. The methods in this package provide a convenient way to load data from common repositories such as DataONE into the R environment, and to document, serialize, and save data from R to data repositories worldwide.
The datapack R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed before the redland and datapack package can be installed. If you are installing on Mac OS X or Windows then installing these libraries is not required.
The following instructions illustrate how to install datapack and its requirements.
Installing on Mac OS X
On Mac OS X datapack can be installed with the following commands:
The datapack R package should be available for use at this point.
Note: if you wish to build the required redland package from source before installing datapack, please see the redland installation instructions.
Installing on Ubuntu
For ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:
sudo apt-get update sudo apt-get install librdf0 librdf0-dev
Then install the R packages from the R console:
The datapack R package should be available for use at this point
Installing on Windows
For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.
To install the R packages from the R console:
See the full manual for documentation, but once installed, the package can be run in R using:
Create a DataPackage and add metadata and data DataObjects to it:
library(datapack) library(uuid) dp <- new("DataPackage") mdFile <- system.file("extdata/sample-eml.xml", package="datapack") mdId <- paste("urn:uuid:", UUIDgenerate(), sep="") md <- new("DataObject", id=mdId, format="eml://ecoinformatics.org/eml-2.1.0", file=mdFile) addData(dp, md) csvfile <- system.file("extdata/sample-data.csv", package="datapack") sciId <- paste("urn:uuid:", UUIDgenerate(), sep="") sciObj <- new("DataObject", id=sciId, format="text/csv", filename=csvfile) dp <- addData(dp, sciObj) ids <- getIdentifiers(dp)
Add a relationship to the DataPackage that shows that the metadata describes, or "documents", the science data:
dp <- insertRelationship(dp, subjectID=mdId, objectIDs=sciId) relations <- getRelationships(dp)
Create an Resource Description Framework representation of the relationships in the package:
serializationId <- paste("resourceMap", UUIDgenerate(), sep="") filePath <- file.path(sprintf("%s/%s.rdf", tempdir(), serializationId)) status <- serializePackage(dp, filePath, id=serializationId, resolveURI="")
Save the DataPackage to a file, using the BagIt packaging format:
bagitFile <- serializeToBagIt(dp)
Note that the dataone R package can be used to upload a DataPackage to a DataONE Member Node using the uploadDataPackage method. Please see the documentation for the dataone R package, for example: