Skip to content

ModernDive 0.5.0


  • "Data wrangling" chapter now comes after "Tidy data" chapter.
  • Improved explanations and examples of geom_histogram(), geom_boxplot(), and "tidy" data
  • Moving residual analysis from regression Chapters 6 & 7 to Chap 11: Inference for regression
  • Reorganized Chap 8 on Sampling
  • All learning check solutions now in Appendix D
  • PDF build re-added (still a work-in-progress)

All content changes

  • Changed title
    • From: "Statistical Inference via Data Science in R"
    • To: "Statistical Inference via Data Science: A moderndive into R and the tidyverse"
  • Chapter 2 - Getting Started
    • Added subsection 2.2.3 "Errors, warnings, and messages" by @andrewheiss
  • Chapter 3 - Data visualization:
    • Added simpler introductory geom_histogram() and geom_boxplot() examples
    • Started downweighting the amount of data wrangling previews included in this chapter, in particular join.
    • Cleaned up conclusion section
    • Added cheatsheet
  • Switched order of "Chap 4 Tidy Data" and "Chap 5 Data Wrangling": Data Wrangling now comes first
  • Chapter 4 - Data wrangling:
    • Added cheatsheet
  • Chapter 5 - Renamed to "Importing and tidy data"
    • Reordered sections: importing then tidying
    • Added fivethirtyeight::drinks example of "hitting the non-tidy wall", then using tidyr::gather()
    • Made Guatemala democracy score a case study.
    • Added discussion on what tidyverse package is.
    • Moved discussion on normal forms to Ch4: Data Wrangling - joins.
    • Moved discussion on identification vs measurement variables to Ch2: Getting started with data.
  • Chapter 6 - Basic regression:
    • Moved residual analysis to Chapter 11
  • Chapter 7 - Multiple regression:
    • Moved residual analysis to Chapter 11
  • Chapter 8 - Sampling: Major refactoring of presentation/exposition; see below
  • Chapter 11 - Inference for regression:
    • Moved residual analysis from Chapter 6 & 7 here
  • Moved all Learning Check solutions to Appendix D

Chapter 8 Sampling Refactoring

Old chapter structure:

  1. Introduction to sampling
    a) Concepts related to sampling
    b) Inference via sampling
  2. Tactile sampling simulation
    a) Using the shovel once
    b) Using the shovel 33 times
  3. Virtual sampling simulation
    a) Using the shovel once
    b) Using shovel 33 times
    c) Using shovel 1000 times
    d) Using different shovels
  4. In real-life sampling: Polls
  5. Conclusion
    a) Central Limit Theorem
    b) What’s to come?
    c) Script of R code

New chapter structure:

  1. Activity: Sampling from a bowl
    a) Question: What proportion of this bowl is red?
    b) Using shovel once
    c) Using shovel 33 times
  2. Computer simulation:
    a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer
    b) Using shovel once
    c) Using shovel 33 times
    d) Using shovel 1000 times
    e) Using different shovels
  3. Goal: Study fluctuations due to sampling variation
    a) You probably already knew: Bigger sample size means "better" guess.
    b) Comparing shovels: Role of sample size
  4. Framework: Sampling
    a) Terminology for sampling (population, sample, point estimate, etc)
    b) Statistical concepts: sampling distribution and standard error
    c) Computer's random number generator
  5. Interpretation:
    a) Visual display of differences
  6. Case study: Obama poll
  7. Big picture:
    a) Table of inferential scenarios: Add bowl and obama poll (both p)
    b) Why does this work? Theoretial result: CLT
    c) There's a formula for that: SE formula that has sqrt(n) at the bottom
    d) Appendix: Normal distribution discuss
Assets 2
You can’t perform that action at this time.