# Model Tutorial: Random Forest

The purpose of this notebook is to demonstrate how to train and predict Random Forest models used in this project. First, we will demonstrate the basic code, and then reproduce the results using a custom class `RF` to make the code consistent for multiple models.

## Model Description

The goal is to forecast fuel moisture based on atmospheric data observations using machine learning models. The inputs include equilibrium moisture, calculated from relative humidity and surface temperature, collected from RAWS ground-based stations.

Random Forests are a variety of machine learning model that maps an input matrix of features to an output vector. This method can model regression problems, where the target output vector is a continuous quantity. Rows of the feature matrix are observed meteorological quantities at a certain location and time. Values of the output vector are observed fuel moisture quantities at corresponding locations and times.

Random forests are a variety of ensemble learners, where a collection of tree models is used with bootstrapping and random subsetting of features to reduce forecast variance. 

The final model outputs are time series of fuel moisture predictions. The model accuracy is calculated by comparing predicted fuel moisture to observed fuel moisture *at future times* and *at unobserved locations*.

## Setup

In [None]:
import sys
sys.path.append('../src')
import pandas as pd
import numpy as np
# import tensorflow as tf
# import tensorflow.keras.backend as K
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
import yaml
import tensorflow as tf
# Local modules
# from fmda_models import XGB
from metrics import ros, rmse
from data_funcs import train_test_split_spacetime
import reproducibility

## Read and Split Data

In [None]:
df = pd.read_pickle("../data/raws_df.pkl")
# Remove NA 
df = df.dropna(subset=["fm", "Ed", "Ew", "rain", "hour", 'wind', 'solar'])

In [None]:
# Set seed for reproducibility
reproducibility.set_seed(123)

# Create Data
X_train, X_test, y_train, y_test = train_test_split_spacetime(df)

In [None]:
# Subset Columns
X_train=X_train[["Ed", "Ew", "rain", "hour", 'wind', 'solar']]
X_test=X_test[["Ed", "Ew", "rain", "hour", 'wind', 'solar']]

### Model Hyperparams

In [None]:
with open('params.yaml', 'r') as file:
    params = yaml.safe_load(file)["rf"]

params

## Manually Code RF

In [None]:
# create model instance
# model0 = xg.XGBRegressor(max_depth=3, eta=.1, min_child_weight=1, subsample=0.8, colsample_bytree=0.8, scale_pos_weight=1,
#                       objective='reg:squarederror')
reproducibility.set_seed(123)
model0 = RandomForestRegressor(**params)

# fit model
model0.fit(X_train, y_train)

# Predict
preds = model0.predict(X_test)

In [None]:
print("Test RMSE:", rmse(preds, y_test))
print("Test RMSE (ROS):", rmse(ros(preds), ros(y_test)))

## Reproduce using Custom RF Class

We now use a class `RF` that reproduces the code above. The purpose of the class is to have different machine learning models with the same methods for concise code.

The `RF` class accepts a dictionary for hyperparameters, which can be found in the file `params.yml`.

In [None]:
import importlib
import fmda_models
importlib.reload(fmda_models)
from fmda_models import RF

In [None]:
# Set seed for reproducibility
reproducibility.set_seed(123)

model = RF(params=params)
model.fit(X_train, y_train)
preds = model.predict(X_test)

In [None]:
model.eval(X_test, y_test)

## Using Weighted Custom Loss

In [None]:
weights = tf.exp(tf.multiply(-0.01, y_train))

In [None]:
reproducibility.set_seed(123)
# create model instance
model02 = RandomForestRegressor(**params)
# fit model
model02.fit(X_train, y_train, sample_weight = weights)
# Predict
preds = model02.predict(X_test)

In [None]:
print("Test RMSE:", rmse(preds, y_test))
print("Test RMSE (ROS):", rmse(ros(preds), ros(y_test)))

In [None]:
reproducibility.set_seed(123)
model = RF(params=params)
model.fit(X_train, y_train, weights)
model.eval(X_test, y_test)