# Linear Regression Activity

## Loading R packages

Once the packages have been installed, they do need to be loaded everytime. This will be done with the `library()` function in R and will also be provided for you in this course. Below is an example of loading the packages we just installed, this is a good step here to ensure that the packages have been installed appropriately on your instance of this server.

You will be asked to complete a few specific questions via an ICON survey as part of the complete/incomplete portion of the activities. First, please complete the analysis below as you will be asked to upload this document for proof of completion in addition to answering the specific questions on ICON. 

This activity is due **Monday, December 5th** by 11:59 pm.

   ## Description of the Data

  The data contain 365 observations showing the number of births that happened on every day of 2015. The data have the following 10 variables.

 + date: Date
 + births: Number of births on date (integer)
 + wday: Day of week (ordered factor)
 + year: Year (integer)
 + month: Month (integer)
 + day_of_year: Day of year (integer)
 + day_of_month: Day of month (integer)
 + day_of_week: Day of week (integer)
 + weekend: Dichotomous attribute if the day was a weekend or weekday.
 + weekend_numeric: Dichotomous attribute; 1 = weekend, 0 = weekday.

 **Remember:** Every time you come back to work on this Activity, you will need to reload the very first code chunk. If you do not, you will get errors. 

In [None]:
library(tidyverse)
library(ggformula)
library(mosaic)

theme_set(theme_bw(base_size = 16))

Births2015 <- Births2015 %>% 
  mutate(weekend = ifelse(wday %in% c('Sun', 'Sat'), "Weekend", "Weekday"),
                 weekend_numeric = ifelse(wday %in% c('Sun', 'Sat'), 1, 0))

head(Births2015)


## Linear Regression

1. Explore a linear regression that predicts the number of births with the attribute weekend. Adjust the equation below to add the `births` attribute as the outcome in place of "!!" and the `weekend` attribute as the sole predictor in place of "@@".

In [None]:
births_reg <- lm(!! ~ @@, data = Births2015)
coef(births_reg)


2. Below we are going to estimate common values for the slope coefficient through resampling/bootstrapping. Inside the `resample_births` function below, fill in the appropriate model for the linear model line of code (make sure this matches from your equation from question 1 above). Adjust the equation below to add the `births` attribute as the outcome in place of "!!" and the `weekend` attribute as the sole predictor in place of "@@".

In [None]:
resample_births <- function(...) {
  births_resample <- Births2015 %>%
    sample_n(nrow(Births2015), replace = TRUE)

  births_resample %>%
    lm(!! ~ @@, data = .) %>%
    coef(.) %>%
    .[2] %>%
    data.frame()
}

resample_births()


3. Below is code to run the bootstrap 5,000 times and visualize a density curve of the bootstrapped results. 

*Note: This could take a few minutes to run*. You should received a density curve when the data is finished. 

In [None]:
set.seed(1)

births_coef <- map_dfr(1:5000, resample_births)
names(births_coef) <- 'slope'

gf_density(~ slope, data = births_coef)

4. Summary statistics are computed below including the mean, standard deviation, 5th percentile, median (50th percentile), and the 95th percentile. What do these tell about typical values in the distribution and relate to whether the data support the hypothesis articulated above?

In [None]:
births_coef %>%
  df_stats(~ slope, mean, sd, quantile(c(0.05, 0.5, 0.95)))
