Short tour of parallel and foreach packages, and how to think about scaling data analyses
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets Final run through of slides Feb 13, 2017
data Adding intro-to-parallelism slides Feb 12, 2017
docs Typo fix Aug 1, 2017
images Final run through of slides Feb 13, 2017
.gitignore Initial commit Feb 8, 2017
BeyondSingleCore.Rpres Typo fix - h/t @briatte Jul 31, 2017
LICENSE Initial commit Feb 8, 2017
Makefile Initial commit Feb 8, 2017
README.md Update README.md Feb 19, 2017

README.md

Beyond Single Core: Parallel Analysis in R

R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it's not clear what to do next.

This is material for a short overview of scalable data analysis in R. The slides can be viewed at https://ljdursi.github.io/beyond-single-core-R .

It covers:

  • How to think about parallelism and scalability in data analysis
  • The standard parallel package, including what was the snow and multicore facilities, using airline data as an example
  • The foreach package, using airline data and simple stock data;
  • A summary of best practices.

Included in the materials, though not in the talk, are some more advanced methods:

  • The bigmemory package for out-of-core computation on large data matrices, with a simple physical sciences example;
  • The Rdsm package for shared memory; and
  • a brief introduction to the powerful pbdR pacakges for extremely large-scale computation.