# Results Analysis

The purpose of this notebook is to analyze the results from the RNN training experiments.

Unless otherwise stated, we will refer to the trained RNN as "the RNN". All validation numbers are RMSE.

## Environment Setup

In [None]:
import sys
sys.path.append('..')

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from data_funcs import from_json

In [None]:
results = from_json("../outputs/output_FMR.json")

## Control Case

Case 11 with param set 0 was the reproducibility case.

In [None]:
pd.DataFrame(results['0']['cases']['case11'])

The RNN outperforms the KF in this case. Note the prediction RMSE is lower than the training RMSE for the RNN, indicating there are not signs of overfitting. The KF, by contrast, has very low training error but a prediction error over 3x larger.

## Summarise Results

### Param Set Descriptions

In [None]:
for i in range(1, len(results)):
    print('~'*50)
    print(f"Param Set {i}")
    print(f"Activation: {results[str(i)]['params']['activation']}")
    print(f"FM Increase: {results[str(i)]['params']['fm_raise_vs_rain']}")

### Extract Results

Excluding param set 0, as that was only run on case 11.

We summarise the RMSE for the param sets:

In [None]:
pd.DataFrame(results[str(1)]['cases']['WLCC2_202305010000'])

We next build a long-format dataframe with all of the results from the results dictionary. For each param set and model there are 3 time periods.

In [None]:
df = pd.DataFrame(columns=['Period', 'Case', 'RMSE', 'Model'])
for i in range(1, len(results)):
    for case in results[str(i)]['cases']:
        df_temp = pd.DataFrame(results[str(i)]['cases'][case])
        df_temp=df_temp.rename_axis("Period").reset_index()
        df_temp['Case']=np.repeat(case, 3)
        df_temp['param_set']=np.repeat(int(i), 3)
        df_temp=pd.melt(df_temp, id_vars=['Period', 'Case', 'param_set'], value_vars=['RNN initial', 'RNN trained'],
                     var_name='Model', value_name='RMSE')
        df = pd.concat((df, df_temp))

df

### Results by Param Set

Excluding RNN initial.

In [None]:
df2 = df[df.Model != 'RNN initial']
sns.boxplot(
    x=df2['param_set'],
    y=df2['RMSE'],
    hue=df2['Period']
).set_title('Results by Param Set')

Print the group means for trained RNNs.

In [None]:
df2 = df[df.Model != 'RNN initial']
# Group by 'param_set' and 'Period' and calculate the mean of 'RMSE'
grouped_df = df2.groupby(['param_set', 'Period'])['RMSE'].mean().reset_index()

# Reshape the DataFrame using pivot_table
pivot_df = grouped_df.pivot_table(index='param_set', columns='Period', values='RMSE').reset_index()

# Optional: Rename columns if needed
pivot_df.columns.name = None  # Remove the 'Period' name from the columns
pivot_df.columns = ['param_set'] + [f'RMSE_{period}' for period in pivot_df.columns[1:]]
pivot_df