# Model Tutorial: Baseline Methods

The focus of this research project is the affect of custom loss functions on forecasting wildfire rate of spread. The intent is not to optimize each machine learning model to make a state-of-the-art fuel moisture forecasting tool. However, we will compare the models to two baseline methods: a phys|ics-based model using Kalman filter for data assimilation and a simple climatology method. The purpose of the comparison to baseline methods is to make sure that the machine learning methods are producing reasonably accurate forecasts and thus to ensure that conclusions drawn on the affect of the custom loss functions are meaningful. This notebook explains two baseline methods of for fuel moisture modeling and demonstrates how to deploy them.

## Climatology

### Description

In meteorology, it is a common practice to compare models to a "climatology", or a simple statistical average of past weather. Shreck 2023 compare their machine learning models of fuel moisture to...

## Physics-Based Method

The current fuel moisture model within WRF-SFIRE is a simple ODE based on the physical processes of drying and wetting. The ODE assimilates data via the Kalman Filter, a Bayesian inspired technique for reconciling a deterministic model with observed data.

## Setup

In [None]:
import sys
sys.path.append('../src')
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import tensorflow as tf
import matplotlib.pyplot as plt
# Local modules
from fmda_models import run_augmented_kf
from metrics import ros, rmse
import reproducibility

## Data Read and Split

In [None]:
df = pd.read_pickle("../data/rocky_2023_05-09.pkl")
# Remove NA fm
df = df.dropna(subset=['fm'])

# Extract a single location
df = df[df.stid == "LKGC2"]
df.shape

In [None]:
# Set seed for reproducibility
reproducibility.set_seed(123)

# Create Data
X_train, X_test, y_train, y_test = train_test_split(df[["Ed", "Ew", "rain"]], df['fm'], test_size=.2)

In [None]:
# Format as dictionaries to run through model
dat = {
    'fm' : df['fm'].to_numpy(),
    'Ed' : df["Ed"].to_numpy(),
    'Ew' : df["Ew"].to_numpy(),
    'rain' : df["rain"].to_numpy()
}

In [None]:
preds, E = run_augmented_kf(dat, h2=len(y_train)-1, hours = df["fm"].shape[0])

In [None]:
train_inds = np.arange(0, len(y_train))
test_inds = np.arange(len(y_train), df["fm"].shape[0])

In [None]:
plt.plot(df.date, df.fm, label = "Observed FM")
plt.plot(df.date.iloc[train_inds], preds[train_inds], label= "Train")
plt.plot(df.date.iloc[test_inds], preds[test_inds], label= "Test")
plt.axvline(df.date.iloc[len(y_train)], color= 'k', linestyle='dashed')
plt.legend()
plt.grid()

In [None]:
print(f"RMSE Test: {rmse(preds[test_inds], y_test)}")
print(f"RMSE ROS Test: {rmse(ros(preds[test_inds]), ros(y_test))}")