  ## Linear Regression - Inference


  ## Description of the Data
  These data contain information on mother's and baby's health for 1,174 pregnant women.

In [None]:
library(tidyverse)
library(ggformula)
library(mosaic)

theme_set(theme_bw(base_size = 18))

baby <- read_csv("https://raw.githubusercontent.com/lebebr01/statthink/master/data-raw/baby.csv")
head(baby)

 ## Linear Regression Again
 Below is the linear regression model we fitted with a single variable, `gestational_days` predicting `birth_weight`. As you can see from the coefficients, the estimate for the slope was 0.47. However, this is an estimate of the true effect and we would like to understand what this effect is for the population. Since the sample only have 1,174 women and the birth weight of their babies, the sample value likely will not equal exactly the population value, there will be some error involved.

In [None]:
baby_reg <- lm(birth_weight ~ gestational_days, data = baby)
coef(baby_reg)

 ## Estimating Error
 In order to get some sense of the amount of error in the estimate of the linear slope here, a bootstrap can be done to provide some evidence of the likely range of slope values. The bootstrap will take the following general steps:
 1. Resample the observed data available, with replacement
 2. Fit the same linear regression model as above.
 3. Save the slope coefficient representing the relationship between birth weight and gestational days
 4. Repeat steps 1 - 3 many times
 5. Explore the distribution of slope estimates from the many resampled data sets.

 When this was done with the classification tree, a function was used to do these steps once, then these were repeated many times. Below is a function that does the steps 1 - 3 above a single time.

In [None]:
resample_baby <- function(...) {
  baby_resample <- baby |>
    sample_n(nrow(baby), replace = TRUE)

  lm(birth_weight ~ gestational_days, data = baby_resample) |>
    broom::tidy()
}

resample_baby()

 Now that there is a function that does steps 1 - 3, these processes can now be repeated many times.

In [None]:
baby_coef <- map_dfr(1:10000, resample_baby)

baby_coef |> 
   filter(term == 'gestational_days') |> 
   gf_density(~ estimate)

In [None]:
baby_coef |> 
   filter(term == 'gestational_days') |> 
   df_stats(~ estimate, quantile(c(0.05, 0.5, 0.95)))