A survival guide to large-scale data analysis in R

Materials for tutorial, "A survival guide to large-scale data analysis in R."

Overview

Large-scale data analysis in R is like the "super G" events in the Winter Olympics---it is about pushing the physical limits of your computer (or compute cluster). My first aim is to show you some techniques to push your R computing further. My second aim is to help you make more effective use of the most precious commodity in computing---memory---and to demonstrate how R sometimes makes poor use of it. This presentation is intended to be hands-on---bring your laptop, and we will work through the examples together. This git repository contains the source code for running the demos.

Other information

In this tutorial I attempt to apply elements of the Software Carpentry approach. See also this article. Please also take a look at the Code of Conduct, and the license information.
To generate PDFs of the slides from the R Markdown source, run make slides.pdf in the docs directory. For this to work, you will need to to install the rmarkdown package in R, as well as any additional packages used in slides.Rmd. For more details, see the Makefile.
See also the instructor notes.

Credits

These materials were developed by Peter Carbonetto at the University of Chicago. Thank you to Matthew Stephens for his support and guidance. Also thanks to Gao Wang for sharing the Python script to profile memory usage, to David Gerard for sharing his code that ultimately improved several of the examples, and to John Blischak, John Novembre and Stefano Allesia for providing great examples to learn from.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
code		code
data		data
docs		docs
extras		extras
output		output
.gitignore		.gitignore
LICENSE.md		LICENSE.md
NOTES.md		NOTES.md
README.md		README.md
conduct.md		conduct.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

docs

docs

extras

extras

output

output

.gitignore

.gitignore

LICENSE.md

LICENSE.md

NOTES.md

NOTES.md

README.md

README.md

conduct.md

conduct.md

Repository files navigation

A survival guide to large-scale data analysis in R

Overview

Other information

Credits

About

Releases

Packages

Languages

License

pcarbo/R-survival-large-scale

Folders and files

Latest commit

History

Repository files navigation

A survival guide to large-scale data analysis in R

Overview

Other information

Credits

About

Resources

License

Stars

Watchers

Forks

Languages