In assessing the generalizability of a statistical learning algorithm, it is vital to consider a variety of diverse, feature-rich datasets. In this package, we develop a simple interface to many common benchmark datasets, including the Penn Machine Learning Benchmarks Olson (2017) arXiv:1703.00512, the University of California-Irvine Machine Learning Repository, and MNIST Lecun et al. doi:10.1109/5.726791, allowing users to examine performance across many disparate contexts. Additionally, we provide useful utilities for data cleaning, data preparation, and cross-validation.
- R:
R
package code. - docs: package documentation, and usage of the
slbR
package on many real and simulated data examples. - man: package manual for help in R session.
- tests:
R
unit tests written using thetestthat
package. - vignettes:
R
vignettes for R session html help pages.
The slbR
package requires only a standard computer with enough RAM to support the operations defined by a user. For minimal performance, this will be a computer with about 2 GB of RAM. For optimal performance, we recommend a computer with the following specs:
RAM: 16+ GB CPU: 4+ cores, 3.3+ GHz/core
The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores@3.3 GHz) and internet of speed 25 Mbps.
The package development version is tested on Linux operating systems. The developmental version of the package has been tested on the following systems:
Linux: Ubuntu 16.04 Mac OSX: Windows:
the latest version of R can be installed by adding the latest repository to apt
:
sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base r-base-dev
which should install in about 20 seconds.
Users should install the following packages prior to installing slbR
, from an R
terminal:
install.packages(c('readr', 'httr'))
which will install in about 15 seconds on a recommended machine.
If you are having an issue that you believe to be tied to software versioning issues, please drop us an Issue.
From an R
session, type:
require(devtools)
install_github('neurodata/slb', force=TRUE) # install slbR
The package should take approximately 60 seconds to install on a recommended computer.
As an example, load all classification datasets from the PMLB
repository:
library(slb)
data <- slb.load.datasets(repositories="pmlb", task="classiciation")