# Results Analysis

The purpose of this notebook is to analyze the results from the RNN training experiments.

Unless otherwise stated, we will refer to the trained RNN as "the RNN". All validation numbers are RMSE.

## Environment Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from data_funcs import from_json

In [None]:
results = from_json("data/output.json")

## Control Case

Case 11 with param set 0 was the reproducibility case.

In [None]:
pd.DataFrame(results['0']['cases']['case11'])

The RNN outperforms the KF in this case. Note the prediction RMSE is lower than the training RMSE for the RNN, indicating there are not signs of overfitting. The KF, by contrast, has very low training error but a prediction error over 3x larger.

## Summarise Results

### Param Set Descriptions

In [None]:
for i in range(1, len(results)):
    print('~'*50)
    print(results[str(i)]['params'])

The main differences in these param sets are:

* Activation functions: linear for the first case, then tanh, and then sigmoid.
* Epochs: 1,000 for set 1 versus 10,000 for sets 2 and 3
* Scaling: 1, .8, .8
* Centering: 0, 0, .5

Each param set was run on 7 cases:

### Extract Results

Excluding param set 0, as that was only run on case 11.

In [None]:
for i in range(1, len(results)):
    print('~'*50)
    print(results[str(i)]['cases'].keys())

We summarise the RMSE for the param sets:

Each case has 9 RMSE values:

In [None]:
pd.DataFrame(results[str(1)]['cases']['case10'])

We next build a long-format dataframe with all of the results from the results dictionary. There are 3 param sets, 3 models, 3 time periods, and 7 cases. So we expect a dataframe of $3\cdot3\cdot3\cdot7=189$ rows

In [None]:
df = pd.DataFrame(columns=['Period', 'Case', 'RMSE', 'Model'])
for i in range(1, len(results)):
    for case in results[str(i)]['cases']:
        df_temp = pd.DataFrame(results[str(i)]['cases'][case])
        df_temp=df_temp.rename_axis("Period").reset_index()
        df_temp['Case']=np.repeat(case, 3)
        df_temp['param_set']=np.repeat(int(i), 3)
        df_temp=pd.melt(df_temp, id_vars=['Period', 'Case', 'param_set'], value_vars=['Augmented KF', 'RNN initial', 'RNN trained'],
                     var_name='Model', value_name='RMSE')
        df = pd.concat((df, df_temp))

df

### Results by Param Set

Excluding RNN initial.

In [None]:
df2 = df[df.Model != 'RNN initial']
sns.boxplot(
    x=df2['param_set'],
    y=df2['RMSE'],
    hue=df2['Period']
).set_title('Results by Param Set')

We print the group means...

In [None]:
x=df2.groupby(['param_set', 'Period']).agg({'RMSE': 'mean'})
pd.DataFrame({
    'Period': ['all', 'predict', 'train'],
    'Set 1': list(x.RMSE[0:3]),
    'Set 2': list(x.RMSE[3:6]),
    'Set 3': list(x.RMSE[6:9])
})

Param sets 2 and 3 have similar rates of prediction error, though the boxplots show there is substantial overlap.

### Results by Model

Here we just look at results from Param set 2 so we are not double (triple) counting results.

Again we exclude the untrained RNN from the plot as there are extreme values that distort the plot margins.

In [None]:
df2 = df[(df.Model != 'RNN initial') & (df.param_set == 2)]
sns.boxplot(
    x=df2['Model'],
    y=df2['RMSE'],
    hue=df2['Period']
).set_title('Results by Model')

In [None]:
x=df2.groupby(['Model', 'Period']).agg({'RMSE': 'mean'})
pd.DataFrame({
    'Period': ['all', 'predict', 'train'],
    'KF': list(x.RMSE[0:3]),
    'RNN Trained': list(x.RMSE[3:6])
})

The trained RNN has a lower prediction error on average than the KF. 

The augmented Kalman Filter gets very low training error, but a much higher prediction error, over 5x. This is clear signs of overfitting.

In [None]:
df1=df[(df.Model == "Augmented KF") & (df.param_set==2)]
df2=df[(df.Model == "RNN trained") & (df.param_set==2)]

# Check equality of other cols
print(df1['Period'].equals(df1['Period']))
print(df1['Case'].equals(df1['Case']))

In [None]:
# Rename RMSE's then Add RMSE from df2 to df1
df1=df1.rename(columns={"RMSE": "RMSE KF"})
df2=df2.rename(columns={"RMSE": "RMSE RNN"})
# df1.join(df2['RMSE RNN'])
df1['RMSE RNN'] = df2['RMSE RNN'].to_numpy()

In [None]:
sns.scatterplot(
    data=df1, 
    x='RMSE KF', 
    y='RMSE RNN', 
    hue='Period')
plt.legend(loc="upper left")
plt.ylim(0,8)
plt.xlim(0,8)
plt.title("RMSE - KF vs RNN (Param Set 2)")
plt.axline((0, 0), slope=1, c='k', linestyle=':', alpha=.6)
plt.text(6,6.2,"equal RMSE",rotation=37, alpha=.6)
plt.text(3,7,"KF Better", alpha=.6)
plt.text(6,1,"RNN Better", alpha=.6)

## Where the RNN goes wrong



The initial RNN, with physics-initiated weights, has some extreme values for the initial accuracy.

In [None]:
df1 = df[df['Model']!= "Augmented KF"]

In [None]:
sns.histplot(df1[df1['Model']=="RNN initial"]['RMSE'])

The pattern is far from clean and linear, but generally the largest RMSE after training corresponds to the largest errors from the initial, untrained RNN models. We should investigate why these large initial RNN errors exist and whether it is indivative of a data issue or modeling issue.

In [None]:
plt.scatter(
    df1[df1['Model']=="RNN initial"]['RMSE'],
    df1[df1['Model']=="RNN trained"]['RMSE']
)
plt.xlabel("Initial RMSE")
plt.ylabel("Trained RMSE")
plt.title("RNN RMSE - Initial vs Trained")