# Failure Analysis
In this notebook we performed an analysis of the errors produced by the following models:
1. Support Vector Regression (SVR)
2. Gradient Boosting Regression
3. Random Forest Regression
4. XGBoost Extreme Gradient Boosting Regression
5. CatBoost Gradient Boosting Regression
6. K-Nearest Neighbors Regression
7. Long Short-Term Memory (LSTM) neural network model

The analysis is performed using the absolute error in groundwater depth 2021 predictions (measured in feet).

In [1]:
import sys
sys.path.append('..')

In [2]:
import numpy as np
import pandas as pd

from lib.township_range import TownshipRanges
from lib.read_data import read_and_join_output_file
from lib.viz import  view_trs_side_by_side
from lib.viz import melt_model_error_df, draw_model_error_distribution,  draw_model_error_by_feature, draw_model_error_by_township

## Load the Predictions and Error Data
The data are loaded form the `..assets/predictions/` folder which contains
* load the full dataset
* the 2021 predictions of the best machine learning models (based on 2020 data)
* the 2021 predictions of the deeplearning LSTM model (based on 2014-2020 data)
* the models error measures (MAE, RMSE, etc.)

Based on those data we compute the absolute error for all the models' predictions.

In [3]:
# Load the full dataset
full_df = read_and_join_output_file()

# Loading models' predictions
models_predictions_df = pd.read_csv("../assets/predictions/ml_predictions.csv")
lstm_predictions_df = pd.read_csv("../assets/predictions/lstm_predictions.csv")
lstm_predictions_df.drop(columns=["2021_GSE_GWE"], inplace=True)
models_predictions_df = models_predictions_df.merge(lstm_predictions_df, how="inner", left_on=["TOWNSHIP_RANGE"], right_on=["TOWNSHIP_RANGE"])
models_predictions_df.set_index(keys=["TOWNSHIP_RANGE"], inplace=True)

# Loading models' errors
models_error_metrics_df = pd.read_csv("../assets/predictions/ml_models_errors.csv")
lstm_model_error_metrics_df = pd.read_csv("../assets/predictions/lstm_model_errors.csv")
# The LSTM model doesn't have an R^2 error so we add an empty colum
lstm_model_error_metrics_df["R^2"] = ""
models_error_metrics_df = pd.concat([models_error_metrics_df, lstm_model_error_metrics_df], axis=0, ignore_index=True)
models_error_metrics_df.set_index(keys=["MODEL"], inplace=True)

# Computing the error
models_errors_df = models_predictions_df.copy()
model_names = list(models_errors_df.columns)
model_names.remove("2021_GSE_GWE")
models_errors_df = models_errors_df[["2021_GSE_GWE"]].merge(models_errors_df[model_names].sub(models_errors_df["2021_GSE_GWE"], axis=0).abs().add_suffix("_ERROR"), how="inner", left_index=True, right_index=True)
models_errors_df.reset_index(drop=False, inplace=True)

## Results Overview
### Comparing the Models' Error Metrics
The models errors are computed based on their 2021 predictions of the `GSE_GWE` (groundwater depth) feature compared to the real value.

In [4]:
models_error_metrics_df

Unnamed: 0_level_0,MAE,MSE,RMSE,R^2
MODEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SVR,28.9124,2040.4239,45.1711,0.8783
GradientBoostingRegressor,32.6355,2436.3433,49.3593,0.8547
RandomForestRegressor,31.4379,2612.4553,51.1122,0.8442
XGBRegressor,32.0972,2892.8849,53.7855,0.8275
CatBoostRegressor,32.5387,2920.2615,54.0394,0.8258
KNeighborsRegressor,54.0581,5808.7215,76.215,0.6536
LSTM with 72 Township-Ranges as test data,23.793732,1208.83,34.76823,
LSTM with 2021 data as test data,29.878609,1915.5297,43.766766,


We can see above that based on the model we have quite some error variation:
* The mean average error varies between 23.79 and 54.06 feet of groundwater depth
* The root mean square error varies between 34.76 and 76.21 feet of groundwater depth

The best results are obtained with an LSTM model but it is interesting to see that depending on how the model is trained and tested the results vary quite differently.
1. For the `LSTM with 72 Township-Ranges as test data` model, the training, test and target data were split by both group and time. The model was trained on 406 Township-Ranges with _7 years_ of data between 2014 and 2020 (2021 being the target learned) and tested on 72 Township-Ranges
2.For the `LSTM with 2021 data as test data` model, the training, test and target data were split purely based on time. The model was trained on _6 years_ of data between 2014 and 2019 with 2020 being the target learned, and tested with the 2015-2020 data as the input to predict the 2021 data

The first model training and testing method gives the best results, although the second method is still one of the best performing model.

In order to compare the LSTM model with other machine learning model, the last `LSTM with 2021 data as test data` model is being used and referred below with the name "LSTM".

For the rest of this analysis we will focus on the 4 of the best performing models:
* Support Vector Regressor (SVR)
* Random Forest Regression
* CatBoost Gradient Boosting Regression
* Long Short-Term Memory (LSTM) neural network

### Comparing the 2021 Predictions

In [5]:
models_predictions_df.head(15)

Unnamed: 0_level_0,2021_GSE_GWE,XGBRegressor,SVR,KNeighborsRegressor,GradientBoostingRegressor,CatBoostRegressor,RandomForestRegressor,LSTM
TOWNSHIP_RANGE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
T01N R02E,53.193636,57.088593,54.559845,67.719193,55.314644,58.568106,95.508892,175.1715
T01N R03E,32.676189,32.906597,38.231296,49.818264,46.235237,30.953399,29.534033,7.788065
T01N R04E,16.672857,20.337458,28.362341,60.111441,23.919901,21.090174,26.918781,12.7507
T01N R05E,19.476364,27.651838,29.102047,60.111441,31.077371,23.708366,26.081469,12.80022
T01N R06E,33.198,42.595863,38.353994,76.510942,43.19615,37.940042,85.370448,43.638023
T01N R07E,45.614286,54.8451,48.542082,101.400331,56.040468,52.171719,96.936805,67.217415
T01N R08E,128.276923,116.16146,98.536638,126.502788,124.621597,119.768021,118.235722,122.04813
T01N R09E,137.337692,138.8854,123.027194,133.59432,141.119616,142.654175,133.989761,159.60074
T01N R10E,179.52,192.51915,167.04846,139.09219,222.631861,192.215773,179.637085,139.72223
T01N R11E,107.955,119.69755,94.875475,88.290657,109.166181,115.206957,114.726992,108.109924


### Comparing the Models' Errors

In [6]:
models_errors_df

Unnamed: 0,TOWNSHIP_RANGE,2021_GSE_GWE,XGBRegressor_ERROR,SVR_ERROR,KNeighborsRegressor_ERROR,GradientBoostingRegressor_ERROR,CatBoostRegressor_ERROR,RandomForestRegressor_ERROR,LSTM_ERROR
0,T01N R02E,53.193636,3.894957,1.366209,14.525556,2.121008,5.374470,42.315256,121.977864
1,T01N R03E,32.676189,0.230408,5.555107,17.142075,13.559048,1.722790,3.142156,24.888124
2,T01N R04E,16.672857,3.664601,11.689483,43.438583,7.247044,4.417317,10.245924,3.922157
3,T01N R05E,19.476364,8.175474,9.625683,40.635077,11.601008,4.232003,6.605106,6.676144
4,T01N R06E,33.198000,9.397863,5.155994,43.312942,9.998150,4.742042,52.172448,10.440023
...,...,...,...,...,...,...,...,...,...
473,T32S R26E,220.866667,23.171867,27.055718,14.193859,19.111954,1.589253,20.351219,34.316297
474,T32S R27E,151.778571,21.336971,49.752804,8.000134,24.642245,38.673318,25.597731,50.817821
475,T32S R28E,174.023077,24.782663,20.824099,18.048013,26.714577,30.508537,22.599171,3.707833
476,T32S R29E,326.627273,10.003767,1.577982,100.158535,2.813672,22.812833,10.554386,19.842027


## Prediction Error Patterns Analysis
### Distribution of Prediction Absolute Errors

In [7]:
draw_model_error_distribution(models_errors_df)

Observations:
* Most models have a lot of *small* prediction errors between 0~40 feet of groundwater depth.
* The K-Neighbors regressor model shows a more flat distribution of the number of prediction errors. The models has less low-error predictions, and many more
* The K-Neighbors regressor model and XGBoost Regressor models are the two models with the highest prediction errors above 340 feet of groundwater depth
* The Random Forest Regressor, Gradient Boosting Regressor and LSTM models are the models with the lowest maximum prediction error at around 140 feet.

All models have a lot of high prediction errors, but the less such errors the better. For that reason it seems that the best models trained on the test data and evaluated on the 2021 groundwater depth values are:
* Random Forest Regressor
* Gradient Boosting Regressor
* LSTM

### Prediction Absolute Error by Groundwater Depth
The first visualization shows the absolute error of each individual prediction, based on the groundwater depth. The second visualization shows the average prediction error per 1o feet of groundwater depth. For this visualization, we binned the predictions by 10 feet of groundwater depth and computed the mean of the absolute errors of the predictions in each bin.

In [8]:
error_by_gsegwe_df = models_errors_df[["TOWNSHIP_RANGE",  "2021_GSE_GWE", "CatBoostRegressor", "SVR", "RandomForestRegressor", "LSTM"]].copy()
draw_model_error_by_feature(error_by_gsegwe_df, x="2021_GSE_GWE", x_title="Ground Water Depth", title="Predictions' Absolute Error by Groundwater Depth")

In [9]:
draw_model_error_by_feature(error_by_gsegwe_df, x= "2021_GSE_GWE", x_title="Ground Water Depth",
                          title="Mean of the Prediction Absolute Error per 10 ft. of Groundwater Depth", binned=True)

In this visualization, there are some distinct spikes of large prediction errors in all models, curiously at the same groundwater depths (~30 and 180 feet) of the shifted target. We can also see the some models (the CATBoost and Random Forest regressors) seem to have a tendency to have a high prediction error at higher values of groundwater depth while the LSTM and SVR models seem to be the more stable when it comes to the amount of prediction error per groundwater depth.

In [10]:
melt_model_error_df(error_by_gsegwe_df[(error_by_gsegwe_df["2021_GSE_GWE"] > 20) & (error_by_gsegwe_df["2021_GSE_GWE"] < 30)]).sort_values("ABS_ERROR", ascending=False)[:4]

Unnamed: 0,TOWNSHIP_RANGE,2021_GSE_GWE,MODEL,ABS_ERROR
52,T10S R21E,27.3,RandomForestRegressor,247.537822
10,T10S R21E,27.3,CatBoostRegressor,231.12338
73,T10S R21E,27.3,SVR,224.995904
31,T10S R21E,27.3,LSTM,186.31331


In [11]:
melt_model_error_df(error_by_gsegwe_df[(error_by_gsegwe_df["2021_GSE_GWE"] > 170) & (error_by_gsegwe_df["2021_GSE_GWE"] < 190)]).sort_values("ABS_ERROR", ascending=False)[:4]

Unnamed: 0,TOWNSHIP_RANGE,2021_GSE_GWE,MODEL,ABS_ERROR
134,T15S R10E,182.335,SVR,282.464523
96,T15S R10E,182.335,RandomForestRegressor,244.613244
58,T15S R10E,182.335,LSTM,241.1797
20,T15S R10E,182.335,CatBoostRegressor,232.581136


In [12]:
melt_model_error_df(error_by_gsegwe_df[(error_by_gsegwe_df["2021_GSE_GWE"] > 450) & (error_by_gsegwe_df["2021_GSE_GWE"] < 650)]).sort_values("ABS_ERROR", ascending=False).groupby("MODEL").head(3)

Unnamed: 0,TOWNSHIP_RANGE,2021_GSE_GWE,MODEL,ABS_ERROR
13,T27S R27E,626.46,CatBoostRegressor,319.343929
11,T25S R28E,625.55,CatBoostRegressor,213.291942
4,T15S R11E,620.98,CatBoostRegressor,200.917252
41,T27S R27E,626.46,RandomForestRegressor,198.023933
32,T15S R11E,620.98,RandomForestRegressor,197.04689
35,T22S R16E,483.03275,RandomForestRegressor,184.079327
54,T26S R27E,622.55625,SVR,182.437368
21,T22S R16E,483.03275,LSTM,164.67283
50,T22S R17E,473.4362,SVR,160.339986
49,T22S R16E,483.03275,SVR,148.325142


In [13]:
draw_model_error_by_township(error_by_gsegwe_df)

The townships where the highest absolute errors made are shown above.
* T15S R10E
* T10S R21E
* T20S R18E
* T22S R16E
* T22S R17E
* T27S R27E
* T22S R28E
* T2&S R28E

## Geographic Distribution of Errors
In order to see if the see if the models are making higher prediction errors in some specific areas of the San Joaquin Valley, we use a small multiple visualization, displaying for each model the Township-Range map of the San Joaquin Valley and coloring them based on the amount of prediction error.

In [14]:
township_range = TownshipRanges()
error_township_geo_df = township_range.map_df.merge(
    melt_model_error_df(models_errors_df).sort_values(["MODEL", "ABS_ERROR"], ascending=False),
    how="inner",
    left_on='TOWNSHIP_RANGE',
    right_on='TOWNSHIP_RANGE')
view_trs_side_by_side(error_township_geo_df, feature= 'MODEL', value = 'ABS_ERROR', title = "Geographical Distribution of Models' Prediction Errors")

Although there are differences in amount of errors, most of the high errors are concentrated on the San Joaquin Valley west and south-east hillsides. These areas are the ones with deeper groundwater depth levels as can be seen in the `eda/groundwater.ipynb` notebook. This matches the findings in the above `Mean of the Predictions Absolute Error per 10 ft. of Groundwater Depth` visualization, that for some models, the higher the groundwater depth is, the higher the prediction error is.
## Prediction Error by Groundwater Variation
Evaluating the feature importance in model prediction using SHAP (refer to the `ml/explainabilit.ipynb` notebook), indicates that the previous depth is the biggest predictor of the future depth. Here we try to see if there is a correlation between the variation between the 2020 and 2021 groundwater depth and the 2021 prediction errors.

To do so we first extract the 2020 and 2021 data from the dataset, compute the variation and then merge it with the error prediction.

In [15]:
# Extract the 2020 and 2021 data
full_df = full_df[full_df.index.get_level_values(1).isin(['2020', '2021'])]['GSE_GWE']
full_df = full_df.unstack(level=-1)
# Compute the 2020-2021 depth variation
full_df['2020_2021_DEPTH_VARIATION'] = np.abs(full_df['2020'] - full_df['2021'])
full_df.reset_index(inplace=True)
# Merge 2020-2021 groundwater depth variation with model errors
error_by_variation_df = models_errors_df.drop(columns=["2021_GSE_GWE"]).merge(full_df[["TOWNSHIP_RANGE", "2020_2021_DEPTH_VARIATION"]], how="inner", left_on="TOWNSHIP_RANGE", right_on="TOWNSHIP_RANGE")
# Display the distribution of errors
draw_model_error_by_feature(error_by_variation_df, x= "2020_2021_DEPTH_VARIATION", x_title="Ground Water Depth Variation Between 2020 and 2021",
                            title= "Predictions' Absolute Error by Groundwater Depth Between 2020 and 2021", binned=True)

We can clearly see that for all models the bigger the 2020-2021 ground water variation is, the higher the errors are in the models' 2021 predictions. The K-Nearest Neighbors and LSTM models seem to be impacted at a lower level by this type of of error as the error for these models reaches about 150 feet for high 2020-2021 variation, while other models roughly range between 200 and 25 feet of prediction erro.