Intro etc


...

TODO Summarise results


---


## Comparison with Baselines

Finally, we can compare the best performing gradient boosted etc models against the best baseline method.  The VAR (Vector Auto-Regression) model from the [baselines notebook](https://github.com/makeyourownmaker/CambridgeTemperatureNotebooks/blob/main/notebooks/cammet_baselines_2021.ipynb) was the best performing baseline.

The best encoder decoder model, after 5 training epochs, was conv2dk2d_28l_48s_16bs_448fm_64f_1ksf_7kst.  Here I train the same model for 20 epochs.

Some points to note regarding the `plot_forecasts` diagnostic plot:
 * on validation data not test data
 * `plot_forecasts`
   * plot example forecasts with observations and lagged temperatures
      * first row shows examples of best near zero rmse forecasts
      * second row shows examples of worst positive rmse forecasts
      * third row shows examples of worst negative rmse forecasts
      * lagged observations are negative
      * the day of the year the forecast begins in and the rmse value is displayed above each sub-plot

### Updated VAR model

...

In [None]:
from statsmodels.tsa.api import VAR
from statsmodels.tools.eval_measures import rmse, medianabs


def plot_baseline_metrics(metrics, main_title):
  fig, axs = plt.subplots(1, 2, figsize = (14, 7))
  fig.suptitle(main_title)
  axs = axs.ravel()  # APL ftw!

  methods = metrics.method.unique()

  for method in methods:
    met_df = metrics.query('metric == "rmse" & method == "%s"' % method)
    axs[0].plot(met_df.horizon, met_df.value, color='blue', label='Updated VAR')

  ivar_rmse = np.array([0.39, 0.52, 0.64, 0.75, 0.86, 0.96, 1.06, 1.15, 1.23,
                        1.31, 1.38, 1.45, 1.51, 1.57, 1.63, 1.68, 1.73, 1.77,
                        1.81, 1.85, 1.89, 1.92, 1.96, 1.99, 2.02, 2.05, 2.08,
                        2.1 , 2.13, 2.15, 2.18, 2.2 , 2.22, 2.24, 2.26, 2.28,
                        2.3 , 2.31, 2.33, 2.35, 2.36, 2.38, 2.39, 2.4 , 2.42,
                        2.43, 2.44, 2.45])
  steps = [i for i in range(1, len(ivar_rmse)+1)]
  axs[0].plot(steps, ivar_rmse, color='black', label='Initial VAR')

  axs[0].set_xlabel("horizon - half hour steps")
  axs[0].set_ylabel("rmse")
  # axs[0].legend(methods)


  for method in methods:
    met_df = metrics.query('metric == "mae" & method == "%s"' % method)
    axs[1].plot(met_df.horizon, met_df.value, color='blue', label='Updated VAR')

  ivar_mae = np.array([0.39, 0.49, 0.57, 0.66, 0.74, 0.83, 0.91, 0.98, 1.05,
                       1.12, 1.18, 1.24, 1.29, 1.34, 1.39, 1.43, 1.47, 1.5 ,
                       1.53, 1.56, 1.59, 1.62, 1.64, 1.66, 1.68, 1.7 , 1.72,
                       1.73, 1.75, 1.76, 1.77, 1.78, 1.8 , 1.81, 1.82, 1.83,
                       1.83, 1.84, 1.85, 1.85, 1.86, 1.86, 1.87, 1.87, 1.88,
                       1.88, 1.89, 1.89])
  axs[1].plot(steps, ivar_mae, color='black', label='Initial VAR')

  axs[1].set_xlabel("horizon - half hour steps")
  axs[1].set_ylabel("mae")
  # axs[1].legend(methods)

  plt.legend(bbox_to_anchor=(1.04, 0.5), loc="center left", borderaxespad=0)
  plt.show()


def update_metrics(metrics, test_data, method, get_metrics,
                   model = None,
                   met_cols = ['type', 'method', 'metric', 'horizon', 'value']):
  metrics_h = []

  if method in ['SES', 'HWES']:
    horizons = [i for i in range(4, 49, 4)]
    horizons.insert(0, 1)
  else:
    # horizons = [1, 48]
    horizons = range(1, 49)

  if method in ['VAR']:
    variates = 'multivariate'
  else:
    variates = 'univariate'

  print("h\trmse\tmae")
  for h in horizons:
    if method in ['VAR']:
      rmse_h, mae_h = get_metrics(test_data, h, method, model)
    else:
      rmse_h, mae_h = get_metrics(test_data, h, method)

    metrics_h.append(dict(zip(met_cols, [variates, method, 'rmse', h, rmse_h])))
    metrics_h.append(dict(zip(met_cols, [variates, method,  'mae', h,  mae_h])))

  print("\n")

  metrics_method = pd.DataFrame(metrics_h, columns = met_cols)
  metrics = metrics.append(metrics_method)

  return metrics


# rolling_cv with pre-trained model
def var_rolling_cv(data, horizon, method, model):
    lags = model.k_ar  # lag order
    i = lags
    h = horizon
    rmse_roll, mae_roll = [], []
    endo_vars = ['y', 'dew.point', 'humidity', 'pressure']
    exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
                 'irradiance', 'azimuth_cos', 'za_rad'
                ]

    while (i + h) < len(data):
        obs_df  = data[endo_vars].iloc[i:(i + h)]
        endo_df = data[endo_vars].iloc[(i - lags):i].values
        exog_df = data[exog_vars].iloc[i:(i + h)]

        # y_hat = model.forecast(endo_df, steps = h)
        y_hat = model.forecast(endo_df, exog_future = exog_df, steps = h)
        preds = pd.DataFrame(y_hat, columns = endo_vars)

        rmse_i = rmse(obs_df.y,      preds.y)
        mae_i  = medianabs(obs_df.y, preds.y)
        rmse_roll.append(rmse_i)
        mae_roll.append(mae_i)

        i = i + 1

    print(h, '\t', np.nanmean(rmse_roll).round(3), '\t', np.nanmean(mae_roll).round(3))

    return [np.nanmean(rmse_roll).round(2), np.nanmean(mae_roll).round(2)]

...

In [None]:
# approx. 5 mins

train_df = train_df.asfreq(freq='30min')
valid_df = valid_df.asfreq(freq='30min')
test_df  = test_df.asfreq(freq='30min')

train_df.dropna(inplace=True)

endo_vars = ['y', 'dew.point', 'humidity', 'pressure']
exog_vars = [
            'day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
            'irradiance', 'azimuth_cos', 'za_rad'
            ]
endo_df = train_df[endo_vars]
exog_df = train_df[exog_vars]

var_model = VAR(endog = endo_df, exog = exog_df)
# var_model = VAR(endog = endo_df)
MAX_LAGS = 96
lag_order_res = var_model.select_order(MAX_LAGS)
display(lag_order_res.summary())
display(lag_order_res.selected_orders)
print(lag_order_res.selected_orders['bic'])

lag_order_table = lag_order_res.summary().data
headers = lag_order_table.pop(0)
lag_order_df = pd.DataFrame(lag_order_table, columns=headers)
lag_order_df.drop('', axis=1, inplace=True)

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    lag_order_df = pd.concat([lag_order_df[col].str.replace('*', '').astype(float)
                             for col in lag_order_df], axis=1)

lag_order_df.loc[1:, ['AIC','BIC','HQIC']].plot()
plt.xlabel('lag')
plt.ylabel('IC')
plt.show()

The lowest BIC value occurs at 51 lags.  I'm going to use `maxlags = 51` because that is where decreasing returns sets in.

In [None]:


def get_var_backtest(model, data, endo_vars, exog_vars, y_col=Y_COL, horizon=HORIZON):
  lags = model.k_ar  # lag order
  i = lags
  h = horizon
  preds = []

  while (i + h) < len(data):
    if i % 1000 == 0:
      print(i)

    obs_df  = data[endo_vars].iloc[i:(i + h)]
    endo_vals = data[endo_vars].iloc[(i - lags):i].values

    if exog_vars is not None:
      exog_df = data[exog_vars].iloc[i:(i + h)]
      y_hat_lol = model.forecast(endo_vals, exog_future = exog_df, steps = h)
    else:
      y_hat_lol = model.forecast(endo_vals, steps = h)

    y_col_pos = 0  # hardcoding is bad mkay - make function param?
    y_hat_series = pd.Series(data  = [y_hat_l[y_col_pos] for y_hat_l in y_hat_lol],
                             index = obs_df.index,
                             name  = y_col)
    y_hat_ts = TimeSeries.from_series(y_hat_series)
    # y_hat_ts = TimeSeries.from_values(np.array([y_hat_l[y_col_pos] for y_hat_l in y_hat_lol]))
    # y_hat = [y_hat_l[y_col_pos] for y_hat_l in y_hat_lol]

    preds.append(y_hat_ts)
    i = i + 1

  return preds


var_fit = var_model.fit(maxlags = 51, ic = 'bic')
print(var_fit.summary())

main_var_col = 'y'
backtest_var = get_var_backtest(var_fit, valid_df, endo_vars, exog_vars, y_col = main_var_col)
# display(len(backtest_var))
# display(backtest_var[0])
hist_comp_var = get_historic_comparison(backtest_var, valid_df, y_col = main_var_col)
# display(hist_comp_var)
summarise_historic_comparison(hist_comp_var, valid_df, y_col = main_var_col)

title_var = 'VAR ' + main_var_col + '...'
plot_multistep_diagnostics(hist_comp_var, title_var, y_col = main_var_col)


# metric_cols = ['type', 'method', 'metric', 'horizon', 'value']
# metrics = pd.DataFrame([], columns = metric_cols)
# metrics = update_metrics(metrics, valid_df, 'VAR', var_rolling_cv, var_fit)
## metrics = update_metrics(metrics, test_df, 'VAR', var_rolling_cv, var_fit)
# plot_baseline_metrics(metrics, 'Multivariate Baseline Comparison - 2021 valid data')


# 2019 data
# maxlags = 5
# ...
# h	   rmse	   mae
# 1 	 0.39 	 0.39
# 48 	 2.45 	 1.89

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# maxlags = 52
# h	   rmse	   mae
# 1 	 0.37 	 0.37
# 48 	 2.253 	 1.784
# maxlags = 52 substantially better than maxlags = 9

# endo_vars = ['y', 'dew.point', 'humidity',]
# maxlags = 52
# h	   rmse	   mae
# 1 	 0.37 	 0.37
# 48 	 2.293 	 1.814
# including pressure is beneficial

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['za_rad', 'irradiance', 'azimuth_cos',]
# maxlags = 51
# h	   rmse	   mae
# 1    0.369 	 0.369
# 48   2.163 	 1.729
# exog_vars is beneficial
# 1 hr 28 mins :-(

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',]
# maxlags = 52
# h	   rmse	   mae
# 1    0.37 	 0.37
# 48   2.133 	 1.68
# Sinusoidal terms better than irradiance etc!

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1', 'irradiance']
# maxlags = 51
# h	   rmse	   mae
# 1    0.369 	 0.369
# 48   2.105 	 1.667
# irradiance worth adding to sinusoidal terms

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1', 'za_rad']
# maxlags = 51
# h	   rmse	   mae
# 1    0.37 	 0.37
# 48   2.134 	 1.679
# za_rad not as beneficial as irradiance

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1', 'azimuth_cos']
# maxlags = 51
# h	   rmse	   mae
# 1    0.37 	 0.37
# 48   2.131 	 1.675
# azimuth_cos more beneficial than za_rad

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
#              'irradiance', 'azimuth_cos']
# maxlags = 51
# h	   rmse	   mae
# 1 	 0.368 	 0.368
# 48 	 2.098 	 1.658
# Best model so far

# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
#              'irradiance', 'azimuth_cos', 'za_rad']
# maxlags = 51
# h	   rmse	   mae
# 1 	 0.368 	 0.368
# 48 	 2.098 	 1.657
# Marginally better with za_rad

# valid_df
# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
#              'irradiance', 'azimuth_cos', 'za_rad']
# maxlags = 51
# h	   rmse	   mae
# 1 	 0.347 	 0.347
# 48 	 2.012 	 1.581
#

# valid_df
# endo_vars = ['y_des', 'dew.point_des', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
#              'irradiance', 'azimuth_cos', 'za_rad']
# maxlags = 53
# h	   rmse	   mae
# 1 	 0.347 	 0.347
# 48 	 2.724   2.132

# valid_df
# endo_vars = ['y_des', 'dew.point_des', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
#              'irradiance', 'azimuth_cos', 'za_rad']
# maxlags = 53
# h	   rmse	   mae
# 1 	 0.357 	 0.357
# 48 	 2.712   2.121

# valid_df
# train_df.loc['2016-01-01':,]
# endo_vars = ['y', 'dew.point', 'pressure', 'humidity',]
# exog_vars = ['day.cos.1', 'day.sin.1', 'year.cos.1', 'year.sin.1',
#              'irradiance', 'azimuth_cos', 'za_rad']
# maxlags = 22
# h	   rmse	   mae
# 1 	 0.352 	 0.352
# 48 	 2.926   2.305
# Backtest RMSE 48th: 2.92592
# Backtest MAE 48th:  2.304481
# Radical decrease in maxlags!
# Not a great model


rmse
```
[0.39, 0.52, 0.64, 0.75, 0.86, 0.96, 1.06, 1.15, 1.23,
 1.31, 1.38, 1.45, 1.51, 1.57, 1.63, 1.68, 1.73, 1.77,
 1.81, 1.85, 1.89, 1.92, 1.96, 1.99, 2.02, 2.05, 2.08,
 2.1 , 2.13, 2.15, 2.18, 2.2 , 2.22, 2.24, 2.26, 2.28,
 2.3 , 2.31, 2.33, 2.35, 2.36, 2.38, 2.39, 2.4 , 2.42,
 2.43, 2.44, 2.45]
```

mae
```
[0.39, 0.49, 0.57, 0.66, 0.74, 0.83, 0.91, 0.98, 1.05,
 1.12, 1.18, 1.24, 1.29, 1.34, 1.39, 1.43, 1.47, 1.5 ,
 1.53, 1.56, 1.59, 1.62, 1.64, 1.66, 1.68, 1.7 , 1.72,
 1.73, 1.75, 1.76, 1.77, 1.78, 1.8 , 1.81, 1.82, 1.83,
 1.83, 1.84, 1.85, 1.85, 1.86, 1.86, 1.87, 1.87, 1.88,
 1.88, 1.89, 1.89]
```

In [None]:
var_fit.plot()
plt.show()

# var_fit.plot_acorr()
# plt.show()

var_fit.fevd(48).plot()
plt.show()

var_fit.mse(48)

The updated VAR model shows substantial improvement.  It would benefit from further validation, including residual plots, QQ plots, autocorrelation of residual plots, residual boxplots across the forecast horizon steps etc

NOTE: Updated VAR validated on 2021 data; initial VAR validated on 2019 data.

TODO Move VAR baseline to separate notebook

---

**TODO** Plot model diagnostics.


Next, I plot the best model and VAR model rmse and mae values for forecast horizons up to 48 (24 hours, each horizon step is equivalent to 30 minutes).  This plot plus the two others are for forecasts on the previously unused 2019 "test" data.  This is different from the 2018 "validation" data used elsewhere in this notebook.

Some points to note regarding diagnostic plots:
 * once again, on test data not validation data
 * `plot_horizon_metrics`
   * plot rmse and mae values for each individual step-ahead
 * `check_residuals`
   * observations against predictions
   * residuals over time
   * residual distribution
 * `plot_forecasts`
   * see sub-section immediately above for notable points

Broadly speaking, these results are very similar to the results from the VAR model.


Diagnostic plots summary:
 * once again, these plots use test data not validation data
 * `plot_horizon_metrics`
   * initially, these results look quite contradictory
   * the rmse plot indicates better forecasts for the VAR method (in orange)
   * the mae plot indicates better forecasts for the Conv2D kernel 2D method (in blue, mis-labelled as LSTM)
 * `check_residuals`
   * the observations against predictions plot indicates
     * predictions are too high at cold temperatures (below 0 C)
     * predictions are too low at hot temperatures (above 25 C)
   * residuals over time
     * no obvious heteroscadicity
     * no obvious periodicity
       * surprising given observations against predictions plot
   * residual distribution appears to be approximately normal (slightly right-skewed)
     * no obvious sign of fat tails
 * `plot_forecasts`
   * notable lack of noisy observations for the large positive and negative rmse examples

The median absolute error (mae) is less sensitive to outliers compared to the root mean squared error (rmse) metric.

Therefore, the rmse and mae plot difference may be due to the presence of outliers. I have maintained from the start that this data set is quite noisy, and attempts to correct these problems may have unintensionally introduced new issues.

Transformed mean values across the 48 step horizon:
 * rmse of 2.05796
 * mae of 1.17986

---


## Conclusion

The best results from the gradient boosted trees are similar/different to  results from the [best LSTM model](https://github.com/makeyourownmaker/CambridgeTemperatureNotebooks/blob/main/notebooks/lstm_time_series.ipynb).

How and why are they similar/different?

...

The conclusion is separated into the following sections:
 1. What worked
 2. What didn't work
 3. Rejected ideas
 4. Future work
