The bigvis package provides tools for exploratory data analysis of large datasets (10-100 million obs). The aim is to have most operations take less than 5 seconds on commodity hardware, even for 100,000,000 data points.
Since bigvis is not currently available on CRAN, the easiest way to try it out is to:
# install.packages("devtools") devtools::install_github("hadley/bigvis")
The bigvis package is structured around the following workflow:
condense()to get a compact summary of the data
if the estimates are rough, you might want to
rmse_cvs()to figure out a good starting bandwidth
if you're working with counts, you might want to
visualise the results with
autoplot()(you'll need to load
ggplot2to use this)
Bigvis also provides a number of standard statistics efficiently implemented on weighted/binned data:
This package wouldn't be possible without:
the fantastic Rcpp package, which makes it amazingly easy to integrate R and C++
JJ Allaire and Carlos Scheidegger who have indefatigably answered my many C++ questions
the generous support of Revolution Analytics who supported the early development.
Yue Hu, who implemented a proof of concepts that showed that it might be possible to work with this much data in R.