Short tour of parallel and foreach packages, and how to think about scaling data analyses
Branch: master
Clone or download
Latest commit cc9ed36 Aug 1, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Initial commit Feb 8, 2017
BeyondSingleCore.Rpres Typo fix - h/t @briatte Jul 31, 2017
LICENSE Initial commit Feb 8, 2017
Makefile Initial commit Feb 8, 2017 Update Feb 19, 2017

Beyond Single Core: Parallel Analysis in R

R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it's not clear what to do next.

This is material for a short overview of scalable data analysis in R. The slides can be viewed at .

It covers:

  • How to think about parallelism and scalability in data analysis
  • The standard parallel package, including what was the snow and multicore facilities, using airline data as an example
  • The foreach package, using airline data and simple stock data;
  • A summary of best practices.

Included in the materials, though not in the talk, are some more advanced methods:

  • The bigmemory package for out-of-core computation on large data matrices, with a simple physical sciences example;
  • The Rdsm package for shared memory; and
  • a brief introduction to the powerful pbdR pacakges for extremely large-scale computation.