likelihood_problem_set.qmd

---
title: "Parameter calibration: likelihood methods"
format:
  html:
    embed-resources: true
editor: visual
---

## Problem set

You will be asked submit (via Canvas) your rendered (or knitted) html document

```{r}
library(tidyverse)
```

### Part 1

Load dataset

```{r}
d <- read_csv(file = "https://data.ecoforecast.org/neon4cast-targets/phenology/phenology-targets.csv.gz", show_col_types = FALSE)
```

Filter the dataset to only include the site_id BART (Bartlett Experimental Forest in the White Mountains of New Hampshire) and the dates between 2019-01-01 and 2019-07-01. Convert the date to Day of Year (hint: use `lubridate:: yday()` function). Remove rows with gcc_90 equal to NA or gcc_sd equal to 0.

```{r}
bart_2019 <- d  %>%
  filter(site_id == "BART",
         datetime > as_date("2019-01-01"),
         datetime < as_date("2019-07-01"),
         variable == "gcc_90") %>%
  mutate(doy = yday(datetime)) %>% 
  filter(!is.na(observation),
         observation > 0)
```

**Question 1:** How is gcc_90 related to day of year?

**Answer 1:**

```{r}
#Add Answer
```

**Question 2:** Use a histogram to examine the distribution of the gcc_90

**Answer 2:**

```{r}
#Add Answer
```

First create a function called \`pred_logistic' that is your process model. The model is the the logistic curve which ish the equation $$P_1 + P_2 {{exp(P_3 + P_4 x)}\over{1+exp(P_3 + P_4 x)}}$$

**Question 3:** Based on the equation above, write a function that predicts the gcc_90 as a function of the parameters ($P$) and x where x is the DOY. Name that function `pred_logistic`.

**Answer 3:**

```{r}

```

**Question 4:** Write a function that calculates the negative log-likelihood of the data given a set of parameters governing the process and data models. Assume a normal distribution and be sure to estimate the sd in the data model.

**Answer 4:**

```{r}
#Add Answer
```

**Question 5:** Use the `optim` function to find the most likely parameter values. Use the following as starting values `par = c(0.34,0.11,-15,0.11, 1)` where the first four are the theta parameters from your process model and the fifth is the sd of your data model.

**Answer 5:**

```{r}
#Add Answer
```

**Question 6:** Use your optimal parameters in the `pred_logistic` function to predict the data. Save this as the object `predicted`

**Answer 6:**

```{r}
#Add Answer
```

**Question 7:** Calculate the residuals and plot a histogram of the residuals

**Answer 7:**

```{r}
#Add Answer
```

**Question 8:** How does the distribution of the data (Question 2) compare to the distribution of the residuals?

**Answer 8:**

**Question 9:** Predict year 2020 using the process model parameters from the 2019 fit.

```{r}
#Add Answer
```

**Answer 9:**

**Question 10:** Plot the forecast from Question 10 over the data from 2020 (I give the code for getting the 2020 data)

**Answer 10:**

```{r}
bart_2020 <- d  %>%
  filter(site_id == "BART",
         datetime > as_date("2020-01-01"),
         datetime < as_date("2020-07-01"),
         variable == "gcc_90") %>%
  mutate(doy = yday(datetime)) %>% 
  filter(!is.na(observation),
         observation > 0)
```

**Question 11:** Do you think your model from 2019 is reasonable for predicting 2020?

**Answer 11:**

### Part 2

Using the following data

```{r}
df <- read_csv("https://raw.githubusercontent.com/frec-5174/eco4cast-in-R-book/main/data/soil_respiration_module_data.csv", show_col_types = FALSE)
```

It is a dataset that reports soil respiration, soil temperature, and soil moisture over a year at the University of Michigan Biological Station (from Nave, L.E., N. Bader, and J.L. Klug)

The columns correspond to the following

-   doy = Day of Year\
-   soil_resp: Soil respiration (micromoles CO2 per m2 per second)\
-   soil_temp: Soil Temp (deg C) soil_moisture: Soil Moisture (%)\

Use maximum likelihood to estimate the parameters in the model that predicts the relationship between soil temperature and soil respiration using the Q10 function below

$$\theta_1 * \theta_2 ^{{(T - 20)}\over{10}}$$
**Question 12:** 

Show all the steps to determine the most likely parameter values, report the parameter values, and plot the data and predictions on the same plot

**Answer 12:**