Slides for the "Integrating reproducibility into the undergraduate statistics curriculum" talk at JSM 2016 in Chicago, IL
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
ugrad_repro_jsm2016.pdf

README.md

Integrating reproducibility into the undergraduate statistics curriculum

This repository contains slides for and any additional resources related to the "Integrating reproducibility into the undergraduate statistics curriculum" talk at JSM 2016 in Chicago, IL.

Session information

  • When: Wed, 8/3/2016, 8:30 AM - 10:20 AM
  • Where: CC-W196c
  • Line up:
    • 8:35 AM - Reproducibility for All and Our Love/Hate Relationship with Spreadsheets — Jennifer Bryan, University of British Columbia
    • 8:55 AM - Steps Toward Reproducible Research — Karl W. Broman, University of Wisconsin - Madison
    • 9:15 AM - Enough with Trickle-Down Reproducibility: Scientists, Open This Gate! Scientists, Tear Down This Wall! — Karthik Ram, University of California at Berkeley
    • 9:35 AM - Integrating Reproducibility into the Undergraduate Statistics Curriculum — Mine Cetinkaya-Rundel, Duke University
    • 9:55 AM - Discussant: Yihui Xie, RStudio
  • Chair: Amelia McNamara (filling in for Ben Baumer)

Abstract

The issue of reproducibility often comes up in the context of published research and the need to accompany such research with the data, analyses, software/code necessary to recreate the results. As statistics educators who teach data analysis, we should be instilling best practices in students as early as possible. We advocate for teaching data analysis at all levels of the statistics curriculum using a completely reproducible framework so that the new researchers we train have no other workflow than a reproducible one. Additionally, as statisticians we should be marshaling efforts for promoting reproducible data analysis practices in other disciplines as well. While all this might sound like a tall order at first, modern tools for literate programming (e.g. R Markdown) and systems for version control (e.g. GitHub, Open Science Framework) paired with carefully designed curricula that integrate the use of these tools early and often make this goal easier to attain than ever before. In this talk we will share experiences from undergraduate courses and research experiences teaching and practicing reproducible data analysis. We will also discuss collaborative efforts with non-statisticians for developing and promoting the use of a reproducible data analysis protocol.

Talk outline

  • Intro:
    • Two pronged approach to spreading reproducibility:
      • Convince researchers to change their workflows and adopt reproducible ones
      • Train new researchers who have no other workflow -- this talk focused on this prong
    • Research / teaching:
      • Often comes up in the context of published research and the need to accompany such research with the complete data and analyses, including software/code
      • Educators who teach data analysis should be instilling best practices in students before they set out to do research
  • Share experiences from undergraduate courses that teach data analysis within a reproducibile framework
  • 2016/2017 projects: Carry the theme of reproducibility through the entire major with a capstone course and senior thesis that is fully reproducible
  • More comments on toolkit:
    • R / RStudio: Recommended
    • But R not necessary:
      • Any scripting language might work
      • Even though the overhead in some might be more than others
  • Pleasant side-effects:
    • For instructor: easy Q&A + easy grading
    • For student: easy collaboration + self promotion