forked from ENV872/EDA_Spring2024
-
Notifications
You must be signed in to change notification settings - Fork 0
/
A07_GLMs.Rmd
159 lines (82 loc) · 4.84 KB
/
A07_GLMs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: "Assignment 7: GLMs (Linear Regressios, ANOVA, & t-tests)"
author: "Student Name"
date: "Spring 2024"
output: pdf_document
geometry: margin=2.54cm
editor_options:
chunk_output_type: console
---
## OVERVIEW
This exercise accompanies the lessons in Environmental Data Analytics on generalized linear models.
## Directions
1. Rename this file `<FirstLast>_A07_GLMs.Rmd` (replacing `<FirstLast>` with your first and last name).
2. Change "Student Name" on line 3 (above) with your name.
3. Work through the steps, **creating code and output** that fulfill each instruction.
4. Be sure to **answer the questions** in this assignment document.
5. When you have completed the assignment, **Knit** the text and code into a single PDF file.
## Set up your session
1. Set up your session. Check your working directory. Load the tidyverse, agricolae and other needed packages. Import the *raw* NTL-LTER raw data file for chemistry/physics (`NTL-LTER_Lake_ChemistryPhysics_Raw.csv`). Set date columns to date objects.
2. Build a ggplot theme and set it as your default theme.
```{r setup2}
#1
#2
```
## Simple regression
Our first research question is: Does mean lake temperature recorded during July change with depth across all lakes?
3. State the null and alternative hypotheses for this question:
> Answer:
H0:
Ha:
4. Wrangle your NTL-LTER dataset with a pipe function so that the records meet the following criteria:
* Only dates in July.
* Only the columns: `lakename`, `year4`, `daynum`, `depth`, `temperature_C`
* Only complete cases (i.e., remove NAs)
5. Visualize the relationship among the two continuous variables with a scatter plot of temperature by depth. Add a smoothed line showing the linear model, and limit temperature values from 0 to 35 °C. Make this plot look pretty and easy to read.
```{r scatterplot}
#4
#5
```
6. Interpret the figure. What does it suggest with regards to the response of temperature to depth? Do the distribution of points suggest about anything about the linearity of this trend?
> Answer:
7. Perform a linear regression to test the relationship and display the results.
```{r linear.regression}
#7
```
8. Interpret your model results in words. Include how much of the variability in temperature is explained by changes in depth, the degrees of freedom on which this finding is based, and the statistical significance of the result. Also mention how much temperature is predicted to change for every 1m change in depth.
> Answer:
---
## Multiple regression
Let's tackle a similar question from a different approach. Here, we want to explore what might the best set of predictors for lake temperature in July across the monitoring period at the North Temperate Lakes LTER.
9. Run an AIC to determine what set of explanatory variables (year4, daynum, depth) is best suited to predict temperature.
10. Run a multiple regression on the recommended set of variables.
```{r temperature.model}
#9
#10
```
11. What is the final set of explanatory variables that the AIC method suggests we use to predict temperature in our multiple regression? How much of the observed variance does this model explain? Is this an improvement over the model using only depth as the explanatory variable?
> Answer:
---
## Analysis of Variance
12. Now we want to see whether the different lakes have, on average, different temperatures in the month of July. Run an ANOVA test to complete this analysis. (No need to test assumptions of normality or similar variances.) Create two sets of models: one expressed as an ANOVA models and another expressed as a linear model (as done in our lessons).
```{r anova.model}
#12
```
13. Is there a significant difference in mean temperature among the lakes? Report your findings.
> Answer:
14. Create a graph that depicts temperature by depth, with a separate color for each lake. Add a geom_smooth (method = "lm", se = FALSE) for each lake. Make your points 50 % transparent. Adjust your y axis limits to go from 0 to 35 degrees. Clean up your graph to make it pretty.
```{r scatterplot.2}
#14.
```
15. Use the Tukey's HSD test to determine which lakes have different means.
```{r tukey.test}
#15
```
16.From the findings above, which lakes have the same mean temperature, statistically speaking, as Peter Lake? Does any lake have a mean temperature that is statistically distinct from all the other lakes?
>Answer:
17. If we were just looking at Peter Lake and Paul Lake. What's another test we might explore to see whether they have distinct mean temperatures?
>Answer:
18. Wrangle the July data to include only records for Crampton Lake and Ward Lake. Run the two-sample T-test on these data to determine whether their July temperature are same or different. What does the test say? Are the mean temperatures for the lakes equal? Does that match you answer for part 16?
```{r t.test}
```
>Answer: