particle_filter_problem_set.qmd

---
title: "Particle filter"
format:
  html:
    embed-resources: true
editor: visual
---

## Problem set

You will need the following packages

```{r}
#| message: false
library(tidyverse)
library(lubridate)
```

### Overview

This exercise involves the following objectives

1.  Modify the particle filter examples to apply to a new model. The model is provided in this assignment
2.  Use out put from prior model run to initialize the particle filter
3.  Run the particle filter without observations to forecast
4.  Run the particle filter to assimilate new observations and forecast
5.  Evaluate how the forecast depends on data assimilation

#### Data

The data for this exercise is aboveground biomass vs. age for a single forest plot. The data has two columns: age (in years) and biomass (in gC/m2)

#### Model

We are predicting the aboveground biomass using the following model

biomass = previous biomass + constant growth - mortality rate \* previous biomass

The constant growth is the parameter `g` below (in units of gC/m2/yr) and mortality rate is the parameter `u`below (proportion of biomass per year). 

### Part 1: Fit model to historical data (Already done for you!)

This step is already done for you.

Here is the data for ages 1 through 50 for the plot (starting in 1950-01-01). It was measured every 5 years.

```{r}
plot_data <- read_csv("https://raw.githubusercontent.com/frec-5174/eco4cast-in-R-book/main/data/PF_data1.csv", show_col_types = FALSE)
```

```{r}
#| warning: false
#| fig-cap: Time-series of forest biomass data
#| label: fig-biomass-timeseries
ggplot(plot_data, aes(x = datetime, y = biomass)) +
  geom_point() +
  labs(x = "age", y = "aboveground biomass (gC/m2)") +
  theme_bw()
```

The following model was used to estimate the posterior distributions of the parameters using a Bayesian framework (i.e., "batch" parameter and state estimation)

```    
sim_dates <- seq(as_date("1950-01-01"), length.out = 50, by = "1 year")
num_particles <- ????
biomass <- array(NA, dim = c(length(sim_dates), num_particles))

biomass[1, ] <- WHAT ARE YOU INITIAL VALUES FOR THE FIRST TIME STEP?

for(t in 2:length(sim_dates)){

  for(i in 1:num_particles){

    biomass_predicted <- biomass[t-1, i] + param$g - param$u * biomass[t-1, i]

    biomass[t, i] <-  biomass_predicted + rnorm(1, mean = 0 , sd = param$sd_add)
  }
}
```

The MCMC chain has posterior distributions for the parameters (`g`, `u`, `sd_add`) and biomass at age 50 (`age50_biomass`)

```{r}
df <- read_csv("https://github.com/frec-5174/eco4cast-in-R-book/raw/main/data/PF_MCMC.csv", show_col_types = FALSE)
```

```{r}
df |> 
  pivot_longer(-c(".draw"), names_to = "parameter_or_state", values_to = "value") |> 
  ggplot() +
  geom_histogram(aes(x = value)) +
  facet_wrap(~parameter_or_state, scales = "free") +
  theme_bw()
```

### Part 2: Forecast using PF

Now you will use the MCMC chain to determine the mean parameter values and the initial condition at age 50 for the particle filter.

Using the lecture material create a particle filter that uses the forest growth model to simulate the aboveground biomass of the forest for age 50 through 70.

#### Step 1: Set up PF

**Question 1**
Following the code in the PF lecture set up the particle filter.

Be sure to:

-   use the mean values for `g`, `u`, and `sd_add` from the MCMC chain as the parameter values
-   use the distribution of the biomass at age 50 in the MCMC chain as your initial state for the PF (e.g., generate `num_particles` samples from the `age50_biomass` column in the csv provided above).
-   The standard deviation for the observations (`sd_data`) is 200.

```{r}
#ADD CODE TO SET UP PF HERE
```

#### Step 2: Run particle filter without assimilating data

Write the code and run the particle filter based on the examples from the lecture.  Do not assimilate any data in this run (e.g, use `y <- rep(NA, num_particles)` for your observations)

```{r}
#ADD PF CODE HERE
```

#### Step 3: Visualize particle filter output

Generate a plot that visualizes the output of the PF (see examples from the lecture). Your plot must have age on the x-axis and biomass on the y-axis with different lines for the particles.

```{r}
# ADD VISUALIZATION CODE HERE
```

#### Step 4: Save PF output

use this code to save your PF output as the object `initial_forecast`

```{r}

```

### Part 3:

Now we have new data!

```{r}
#| warning: false

new_data <- read_csv("https://github.com/frec-5174/eco4cast-in-R-book/raw/main/data/PF_data2.csv", show_col_types = FALSE)

ggplot(new_data, aes(x = datetime, y = biomass)) +
  geom_point() +
  labs(x = "age", y = "aboveground biomass (gC/m2)") +
  theme_bw()
```

#### Step 1: Repeat the PF setup using data assimilation

Using the new data, repeat the PF set up in Part 2 Step 1. You will be starting at age 50 just like above but assimilating the data.

```{r}
#ADD CODE TO SET UP PF HERE
```

#### Step 2: Run particle filter using the new data

Using the new data, run the the PF again. This will be the same code as in Part 2 Step 2 (just copy and paste), except that data will be assimilated.

```{r}
#COPY AND PASTE PF CODE FROM ABOVE
```

#### Step 3: Visualize PF output

Generate a plot that visualizes the output of the PF (see examples from the lecture). Your plot must have age on the x-axis and biomass on the y-axis with different lines for the particles. Your observations from the new data must be on the plot.

```{r}
#ADD VISUALIZATION CODE HERE
```

#### Step 4: Save output

```{r}

```

### Part 4:

Combine the two PF run and evaluate how data assimilation influence the forecast of the last 10 years (age 60 to 70). Produce a plot with the mean and 90% CI for the initial_forecast and assimilated_forecast on the same plot. Include the observations in the plot.

```{r}
#ADD CODE TO COMPARE THE TWO PF OUTPUTS
```

### Part 5:

How did assimilating data influence your forecast for ages 60 to 70? Consider both the mean and uncertainty in your answer.