Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



53 Commits

Repository files navigation

SISBID 2018 Module 2: Reproducible Research

Keith Baggerly and Karl Broman July 16-18, 2018


This module is part of the Summer Institute in Statistics for Big Data!

Taught by

Keith A. Baggerly


Karl Broman

Cheat Sheets

Karl's Software Carpentry Course

These are from RStudio's list

There are many other sheets there (including some for user contributions and translations), so check it out!

These are from GitHub

Course Syllabus and Lecture Materials

Day 1, Jul 16, 2018

Session 1, 8:30-10

Lecture 0, Basic Intro, Keith, 5-10 min

Introduction to the course, administration, course goals

Definitions - reproduction vs replication

slides, printable version

Lecture 1, Intro and Common Problems, Karl, 40 min

An introduction to reproducible research by way of commonly encountered problems

slides, printable version

Lecture 2, A Train Wreck, Keith, 40 min

A case study describing just how bad things can get, with clinical implications

slides, printable version

Session 2, 10:30-12

Lecture 3, R Markdown and Literate Programming, Karl, 45 min

md lecture notes, Rmd example

Homework part 1, participants, 45 min Set up the analysis folder, write the preprocessing script in R markdown, compile to html / pdf / word

Session 3, 1:30-3

Lecture 4, R Packages, Keith, 45-60 min (much live demo)

slides, printable version

Homework part 2, participants, 30 min

writing a basic package

Session 4, 3:30-5

Lecture 5, Big Jobs, Karl, 45 min (includes some workalong activities)

Capturing exploratory data analysis, handling the challenges arising when data or jobs are big enough to make rerunning unpleasant or infeasible.

slides, printable version, spin example

Lecture 6, Make, Karl, 45 min

A brief introduction to automation with GNU Make

slides, printable version

Day 2, Jul 17, 2018

Session 5, 8:30-10

Lecture 7, Problems with Replication, Keith, 40 min

A review of several factors which can make results harder to replicate (be seen again with new samples) vs hard to reproduce (starting from the same raw data)

slides, printable version

Lecture 8, Git/GitHub 1: Sharing and RR, Keith, 50 min, mostly live

Using Git/GitHub to share and track versioned files and workflows; using Git/GitHub in an RR workflow

slides, printable version

Session 6, 10:30-12

Lecture 9, Git/GitHub 2: Branching and Merging, Keith, 45 min

Dealing with concurrent development, when things break, and conflicts

slides, printable version

Homework, participants, 45 min

Establishing a repo at GitHub. Posting your package to GitHub.

This session will be a mixture of lecture and live demo.

Session 7, 1:30-3

Lecture 10, Collaborating with Git, Keith, 45 min

slides, printable version

Homework, participants, 45 min

Working with others, making comments, providing feedback, fixing errors

Session 8, 3:30-5

Homework, participants, 45 min Add comments and vignettes to your package on GitHub

Lecture 11, Implementing RR at MDACC, Keith, 45 min

A review of ongoing efforts within the biostat department at MD Anderson to produce reproducible reports, and how we took a report written a few years ago using a mix of R and Stata and revamped it in R/rmarkdown to emulate not just the results but also the "look and feel" of the initial MS word output. Hits on tables and figures in rmarkdown, references, reformatting headers.

slides, printable version

Day 3, Jul 19, 2017

Session 9, 8:30-10

Lecture 12, Writing Good Reports, Keith, 45 min

The "non-codeable" parts of reproducibility - trying to increase the odds your collaborators will understand what it is you're trying to do.

slides, printable version

Homework, participants, 45 min

Automating common tasks with templates - report structures, directory structures, and look and feel

Session 10, 10:30-12

Lecture 13, Summary and Wrapup, Karl, 45 min

Maintaining the Mindset

slides, printable version

Final Class Discussion

Evals, participants, 5 min

Previous Years

Lectures from 2016 and 2017.

Videos from 2015:

Session 01, Session 02, Session 03, Session 04, Session 05, Session 06, Sessions 07 and 08 were homework and demos, and not recorded, Session 09, and Session 10

Recommended Reading/Browsing



No description, website, or topics provided.






No releases published


No packages published