#### The below error analysis is performed using the mean absolute error in the units of the groundwater depth measured in feet. 
- Failure analysis (10 points)
- Select at least 3 *specific* examples where prediction failed, and analyse possible reasons why. What future improvements might fix the failures? Ideally you should be able to identify different kinds of failure.

The test set is for YEAR = 2020
The last year in the train set is 2019 and it contains CURRENT_DEPTH 


In [1]:
import sys
sys.path.append('..')



import numpy as np
import pandas as pd
import altair as alt

from lib.viz import  view_trs_side_by_side
from lib.township_range import TownshipRanges
from lib.supervised_tuning import get_model_errors, read_target_shifted_data
from lib.read_data import read_and_join_output_file
from lib.viz import sjv_color_range_17, sjv_color_range_9, chart_model_error_distribution, chart_model_error_by_township, chart_model_error_by_depth, chart_model_error_by_depth, chart_model_depth_diff_error





#### Get the actual data that has not been normalized

In [2]:
full_df = read_and_join_output_file()
full_df = full_df[full_df.index.get_level_values(1).isin(['2020', '2021'])]['GSE_GWE']
full_df = full_df.unstack(level=-1)
full_df['depth_diff'] = np.abs(full_df['2020'] - full_df['2021'])
full_df.reset_index(inplace=True)

Insights into the model

- Feature importance and feature ablation through SHAP can be seen here in the section called Explainabilty through SHAP.

- Failure/Error analysis is conducted below

In [3]:
test_model_errors_df, error_df = get_model_errors()

In [4]:
#For the top six models examined in supervised learning, we look into the mean absolute error trends in the prediction of test data
test_model_errors_df.head(2)

Unnamed: 0,Mean Absolute Error,Mean Squared Error,R^2,RMSE
SVR,31.6927,2335.4166,0.8607,48.3261
GradientBoostingRegressor,31.8823,2399.4872,0.8569,48.9846


In [5]:
#For the top six models examined in supervised learning, we look into the absolute error for each township range in the prediction of test data
error_df.head(2)

Unnamed: 0,TOWNSHIP_RANGE,GSE_GWE_SHIFTED,model_name,absolute_error
0,T01N R02E,53.193636,XGBRegressor_absolute_error,1.944754
1,T01N R03E,32.676189,XGBRegressor_absolute_error,2.261349


### Analyzing the pattern of errors made in the regressors RandomForest and SVR

In [6]:
chart_model_error_distribution(error_df)

### Mean absolute error by target (Groundwater depth value)

In [7]:
model_name_list = [
    "CatBoostRegressor_absolute_error",
    "SVR_absolute_error",
    "RandomForestRegressor_absolute_error",
]
chart_model_error_by_depth(error_df, model_name_list)

> In the above chart, there are some distinct spikes in error in all models, curiously at the same depths of the shifted target

In [8]:
def investigate_spikes(df:pd.DataFrame=error_df, model_name_list:list=model_name_list, min_target:int=150, max_target:int= 200):
    final_df = pd.DataFrame()

    for model in model_name_list:
        df = error_df[(error_df['model_name']==model) & (error_df['GSE_GWE_SHIFTED'] > min_target) & (error_df['GSE_GWE_SHIFTED'] < max_target) ].sort_values(by=['absolute_error'], ascending=False)[:1]
        final_df = pd.concat([final_df, df], axis=0)
    return final_df

In [9]:
investigate_spikes(error_df, model_name_list, min_target=150, max_target= 200)

Unnamed: 0,TOWNSHIP_RANGE,GSE_GWE_SHIFTED,model_name,absolute_error
2160,T15S R10E,182.335,CatBoostRegressor_absolute_error,249.076408
726,T15S R10E,182.335,SVR_absolute_error,281.97199
2638,T15S R10E,182.335,RandomForestRegressor_absolute_error,248.517246


In [10]:
investigate_spikes(error_df, model_name_list, min_target=0, max_target= 50)

Unnamed: 0,TOWNSHIP_RANGE,GSE_GWE_SHIFTED,model_name,absolute_error
2082,T10S R21E,27.3,CatBoostRegressor_absolute_error,221.00926
648,T10S R21E,27.3,SVR_absolute_error,231.933128
2560,T10S R21E,27.3,RandomForestRegressor_absolute_error,246.155369


In [11]:
investigate_spikes(error_df, model_name_list, min_target=450, max_target= 500)

Unnamed: 0,TOWNSHIP_RANGE,GSE_GWE_SHIFTED,model_name,absolute_error
2261,T22S R16E,483.03275,CatBoostRegressor_absolute_error,182.900006
828,T22S R17E,473.4362,SVR_absolute_error,171.256123
2739,T22S R16E,483.03275,RandomForestRegressor_absolute_error,171.289431


> We notice townships T15S R10E, T10S R21E and T22S R16E have a distinct spike in error consistently thrown by various models.

In [18]:
spike_df = read_and_join_output_file()
spike_df.reset_index(inplace=True)
print(spike_df[(spike_df['TOWNSHIP_RANGE'].isin(['T15S R10E', 'T10S R21E'])) & (spike_df['YEAR'].isin(['2014', '2015', '2016', '2017', '2018', '2019']))][['TOWNSHIP_RANGE', 'YEAR', 'GSE_GWE']].groupby('TOWNSHIP_RANGE').mean())
print(spike_df[(spike_df['TOWNSHIP_RANGE'].isin(['T15S R10E', 'T10S R21E' ])) & (spike_df['YEAR'] == '2020') ][['TOWNSHIP_RANGE', 'YEAR', 'GSE_GWE']])

                   GSE_GWE
TOWNSHIP_RANGE            
T10S R21E        15.183333
T15S R10E       248.850000
     TOWNSHIP_RANGE  YEAR  GSE_GWE
1366      T10S R21E  2020   267.65
1990      T15S R10E  2020   483.50


### Identify townships with the most error

In [13]:
errors_by_township_df, chart = chart_model_error_by_township(error_df, model_name_list, 20)
chart

The townships where the highest absolute errors made are shown above.
- T15S R10E
- T10S R21E
- T27S R27E
- T20S R18E
- T22S R17E
- T22S R28E
- T22S R16E
- T27S R26E


### Plotting the error by township range

In [14]:
township_range = TownshipRanges()
township_df = township_range.map_df
model_name_list = list( error_df['model_name'].unique())
errors_by_township_df, _ = chart_model_error_by_township(error_df, model_name_list, error_df.shape[0])
error_township_geo_df = township_df.merge(errors_by_township_df, how="inner", left_on='TOWNSHIP_RANGE', right_on='TOWNSHIP_RANGE')
view_trs_side_by_side(error_township_geo_df, feature= 'model_name', value = 'absolute_error', title = "Error by township range")

The error regions are concentrated in regions that traditionally have a large groundwater depth. The above graphs when compared with the groundwater depth maps in the groundwater.ipynb file, attest to this fact.

#### Since the model feature importances and SHAP indicated that the previous depth is the biggest predictor of the future depth, check the depth for the above townships for the previous year

In [15]:
# 2021 data current depth  was taken as predicted value for 2020 since target is shifted.
# Test year is 2020 It is predicting 2021 current depth
# Look into the current depth in 2020 in these townships


In [16]:
full_df

YEAR,TOWNSHIP_RANGE,2020,2021,depth_diff
0,T01N R02E,52.196000,53.193636,0.997636
1,T01N R03E,24.418788,32.676189,8.257401
2,T01N R04E,18.961667,16.672857,2.288810
3,T01N R05E,20.336154,19.476364,0.859790
4,T01N R06E,32.380000,33.198000,0.818000
...,...,...,...,...
473,T32S R26E,197.730769,220.866667,23.135897
474,T32S R27E,119.037500,151.778571,32.741071
475,T32S R28E,191.171429,174.023077,17.148352
476,T32S R29E,344.578571,326.627273,17.951299


Plot current year depth to target depth differences against mean absolute error

In [17]:
chart_model_depth_diff_error(error_df, full_df)

The KNN regressor begins with a arger error than Random Forest Regressor or Gradient Boosting Regressor, but it's error does not increase with the difference in the current depth and the target value and to the contrary decreases while the mean abosolute error for Gradient Boosting and Random Forest and Support Vector machines steadily increases.

A depper dive into the township ranges showing spike in 

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=b042e2da-6536-449d-95b8-d85fa08825de' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>