Home
Welcome to the iotools wiki!
See DistributedR for a short document on the concepts.
Purpose
iotools provide a set of tools for streaming data through connections and conversion tools from raw bytes to objects and vice-versa. They also contain a set of experimental functions that use streaming to run map/reduce or divide/recombine jobs on Hadoop as well as a general chunk-wise processing.
Simple examples
Unique entries of field 2:
hmr(hinput("/my/data"),
map=function(x) unique(x[,2]), reduce=unique)
Distribution of field 2:
library(iotools)
hmr(hinput("/my/data"),
map=function(x) table(x[,2]),
reduce=function(x) ctapply(as.numeric(x), names(x), sum))