/
likelihood_problem_set.qmd
153 lines (99 loc) · 4.14 KB
/
likelihood_problem_set.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: "Parameter calibration: likelihood methods"
format:
html:
embed-resources: true
editor: visual
---
## Problem set
You will be asked submit (via Canvas) your rendered (or knitted) html document
```{r}
library(tidyverse)
```
### Part 1
Load dataset
```{r}
d <- read_csv(file = "https://data.ecoforecast.org/neon4cast-targets/phenology/phenology-targets.csv.gz", show_col_types = FALSE)
```
Filter the dataset to only include the site_id BART (Bartlett Experimental Forest in the White Mountains of New Hampshire) and the dates between 2019-01-01 and 2019-07-01. Convert the date to Day of Year (hint: use `lubridate:: yday()` function). Remove rows with gcc_90 equal to NA or gcc_sd equal to 0.
```{r}
bart_2019 <- d %>%
filter(site_id == "BART",
datetime > as_date("2019-01-01"),
datetime < as_date("2019-07-01"),
variable == "gcc_90") %>%
mutate(doy = yday(datetime)) %>%
filter(!is.na(observation),
observation > 0)
```
**Question 1:** How is gcc_90 related to day of year?
**Answer 1:**
```{r}
#Add Answer
```
**Question 2:** Use a histogram to examine the distribution of the gcc_90
**Answer 2:**
```{r}
#Add Answer
```
First create a function called \`pred_logistic' that is your process model. The model is the the logistic curve which ish the equation $$P_1 + P_2 {{exp(P_3 + P_4 x)}\over{1+exp(P_3 + P_4 x)}}$$
**Question 3:** Based on the equation above, write a function that predicts the gcc_90 as a function of the parameters ($P$) and x where x is the DOY. Name that function `pred_logistic`.
**Answer 3:**
```{r}
```
**Question 4:** Write a function that calculates the negative log-likelihood of the data given a set of parameters governing the process and data models. Assume a normal distribution and be sure to estimate the sd in the data model.
**Answer 4:**
```{r}
#Add Answer
```
**Question 5:** Use the `optim` function to find the most likely parameter values. Use the following as starting values `par = c(0.34,0.11,-15,0.11, 1)` where the first four are the theta parameters from your process model and the fifth is the sd of your data model.
**Answer 5:**
```{r}
#Add Answer
```
**Question 6:** Use your optimal parameters in the `pred_logistic` function to predict the data. Save this as the object `predicted`
**Answer 6:**
```{r}
#Add Answer
```
**Question 7:** Calculate the residuals and plot a histogram of the residuals
**Answer 7:**
```{r}
#Add Answer
```
**Question 8:** How does the distribution of the data (Question 2) compare to the distribution of the residuals?
**Answer 8:**
**Question 9:** Predict year 2020 using the process model parameters from the 2019 fit.
```{r}
#Add Answer
```
**Answer 9:**
**Question 10:** Plot the forecast from Question 10 over the data from 2020 (I give the code for getting the 2020 data)
**Answer 10:**
```{r}
bart_2020 <- d %>%
filter(site_id == "BART",
datetime > as_date("2020-01-01"),
datetime < as_date("2020-07-01"),
variable == "gcc_90") %>%
mutate(doy = yday(datetime)) %>%
filter(!is.na(observation),
observation > 0)
```
**Question 11:** Do you think your model from 2019 is reasonable for predicting 2020?
**Answer 11:**
### Part 2
Using the following data
```{r}
df <- read_csv("https://raw.githubusercontent.com/frec-5174/eco4cast-in-R-book/main/data/soil_respiration_module_data.csv", show_col_types = FALSE)
```
It is a dataset that reports soil respiration, soil temperature, and soil moisture over a year at the University of Michigan Biological Station (from Nave, L.E., N. Bader, and J.L. Klug)
The columns correspond to the following
- doy = Day of Year\
- soil_resp: Soil respiration (micromoles CO2 per m2 per second)\
- soil_temp: Soil Temp (deg C) soil_moisture: Soil Moisture (%)\
Use maximum likelihood to estimate the parameters in the model that predicts the relationship between soil temperature and soil respiration using the Q10 function below
$$\theta_1 * \theta_2 ^{{(T - 20)}\over{10}}$$
**Question 12:**
Show all the steps to determine the most likely parameter values, report the parameter values, and plot the data and predictions on the same plot
**Answer 12:**