In this lab, you will find a dataset which contains at least 2 numerical variables, then construct a linear regression model. In this write-up you should:
Part 1:
- Find a dataset with at least two numerical variables.
- Briefly describe what these variables measure.
- Create a scatterplot of these data, including a best fit line.
- Determine the equation for the best fit line for these data.
- Assess the fit of your model - what assumptions/conditions are satisfied, which are not.
- Describe the predicated value of at least one value (this can be computed by hand or using R).
Part 2:
- Using dplyr, use the filter command to consider only a part of your dataset. You may want to assign this to a new data set, e.g.:
data2 <- data1 %>% filter(status=="enrolled") - Perform the same analysis as in part 1 with your new data set.
- Describe whether the results of the new analysis are the same or difference from the old one.
Your submission should include a knitted rmd file displaying all of your results. We will discuss how to do this in class.