Workshop materials for reproducible analysis
R Makefile Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
step00_setup
step01_data
step02_features
step03_models
step04_evaluation
.gitignore
README.Rmd
ReproducibleGLM.Rproj

README.Rmd

title author date output
Reproducible logistic regression models
Steph Locke (@SteffLocke)
`r Sys.Date()`
rmarkdown::html_document
code_folding number_sections toc toc_float toc_depth
show
true
true
true
2

Agenda

  • Analysis workflow
  • Sources of change
  • Accounting for change
  • GLM step-by-step - Project setup
  • GLM step-by-step - Data

Sources of change in analysis

Exercise

What sort of things can alter the results of a piece of analysis?

Answers

  • Changes in data
  • Changes in code behaviours
  • Changes in behaviours in dependencies
  • Randomness

Accounting for change

Exercise

What sort of things can we do to prevent changes creeping into our analysis that stop it from being "deterministic"?

Answers

  • Checksums to flag if anything has changed
  • Keeping a seperate copy of data
  • Keeping dependencies the same over time
  • Source control
  • Unit testing and validating code
  • set.seed

GLM step-by-step -- Project setup

Project checklist

  • Git
  • Project options
    • No Rdata or history!
    • Insert spaces for tabs
  • Packrat +packrat::init()
  • Folder structure
    • data
    • processeddata
    • analysis
    • outputs
    • docs
  • DESCRIPTION
  • LICENSE
  • .Rbuildignore
  • README.Rmd
  • Makefile
  • .travis.yml

Travis setup

Github setup

GLM step-by-step -- Data

  • Source
  • Verification steps
  • Multiple outputs?
    • Main report
    • Supplementary data quality report
    • Shiny?

GLM step-by-step -- Data processing

  • Cleaning steps
  • Sampling
  • Feature scaling
  • Univariate analysis
  • Bivariate analysis

GLM step-by-step -- Candidate models

  • Feature selection
  • Various glm* models

GLM step-by-step -- Evaluation

  • Scaling sample
  • Single model evaluation techniques
  • Comparing multiple models
  • Cross-validation

GLM step-by-step -- Model selection

  • Using evaluation metrics to select best model
  • Presenting model
  • In-depth evaluation of best model

GLM step-by-step -- Supplementary materials

  • Data lineage
  • Data quality
  • Feature analysis in-depth
  • Candidate model evaluations
  • Code
  • Reproducibility info