## 5.6 Evalutation of the best prediction models

### 5.6.1 Modelling and comparing with the benchmark

Now that we have analyzed and found the optimal setting for each of the OLS and RF models, the effectiveness of the models in forecasting detrended and deseasonalized energy consumption can be evaluated, where the goal is to account for all variability not explained by trend or the seasonal components.

OLS, which by design has zero bias, directly minimizes the sum of squared errors between predicted and observed values, making it sensitive to outliers. This sensitivity means that extreme values in the data can influence the regression coefficient, which could prove to be a problem with MSTL remainders due to the possibility of seasonality not being perfectly captured at all timesteps. However, its simplicity and interpretability remain as strengths, especially when dealing with linear relationships where the number of predictors is not excessively large and its sparsity on computational resources allows for higher retraining frequency, which has been proven to have significant impact on accuracy.

Conversely, RF offers robustness to outliers and the ability to model complex, non-linear relationships between larger set of variables without requiring transformation or assumption of linearity. Unlike OLS, RF introduces some bias to gain a reduction in variance, benefitting from an ensemble of decision trees to improve accuracy and generalizability. This trade-off is particularly beneficial in handling the volatile energy consumption time series. However, to get an efficient trade-off between bias and variance, it is crucial to tune the parameters to the time series, which is extremely computationally demanding in both time and computational power. For this research, the parameters have been tuned for one customer group and reused for all customer groups which is suboptimal.

Both models’ performance will be evaluated on their predictive accuracy and compared to a benchmark. This baseline assumes that the MSTL decomposition perfectly accounts for trend and seasonality, leaving no remainder, meaning that a naive model predicts zero for the detrended and deseasonalized time series. This comparison helps in highlighting the additional variance each model can explain, thereby demonstrating their ability to capture the nuances of energy consumption beyond what can be attributed to predictable patterns alone.

#### 5.6.2 Comparing OLS with the benchmark model

Earlier in this chapter, the different settings of the framework used have been tested to find the most efficient settings for OLS, as seen earlier the OSL model performed best with a 1-year training window size, and the highest possible retraining frequency, which is every timestep.

To evaluate the accuracy of the forecasting models, Mean squared Error (MSE) will be used. It measures the average squared difference between the estimated values and the actual values as seen below:

$$ MSE = \frac{1}{T} \sum_{t=n_{train}+1}^{T} (R_t-\hat{R}_t)^2 $$

In [3]:
# Forecsting remainders with OLS

## Libraries
library(tidyverse)
library(forecast)
library(ggplot2)
library(dplyr)
library(data.table)
library(IRdisplay)
library(progress)

library(foreach)
library(doParallel)

library(caret)
library(randomForest)

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.1.1       v purrr   0.3.2  
v tibble  2.1.1       v dplyr   0.8.0.1
v tidyr   0.8.3       v stringr 1.4.0  
v readr   1.3.1       v forcats 0.4.0  
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
Registered S3 method overwritten by 'xts':
  method     from
  as.zoo.xts zoo 
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Registered S3 methods overwritten by 'forecast':
  method             from    
  fitted.fracdiff    fracdiff
  residuals.fracdiff fracdiff

Attaching package: 'data.table'

The following ob

In [10]:
##################Setting workign directory and loadign data ###################
##### Setting workign directory and loadign data #####
base_path <- "C:/Users/madsh/OneDrive/Dokumenter/kandidat/Fællesmappe/Forecasting-energy-consumption"
base_path <- "C:/Users/madsh/OneDrive/Dokumenter/kandidat/Fællesmappe/Forecasting-energy-consumption"
setwd(base_path)
#data <- read.csv(paste0(base_path,"Data/Combined/Full_data_ecwap.csv"))
MSTL      <- fread(paste0(base_path,"/Data Cleaning/MSTL_decomp_results.csv"))
R_t_OLS   <- fread(paste0(base_path,"/Data/Results/OLS/R_hat_t/2yTrain_h=1_steps_ahead=1_OLS_R_hat_t.csv"))
R_t_RF    <- fread(paste0(base_path,"/Data/Results/RF/R_hat_t/h=1_steps_ahead=1_ntree=250_RF_R_hat_t.csv"))

In [35]:
  R_t_vec_1year <- tail(MSTL$Remainder, n=8759)
  R_t_vec_2year <- tail(MSTL$Remainder, n=17519)
  R_t_0_vec     <- MSTL$Null_Remainder
  R_t_OLS_vec   <- R_t_OLS$x
  R_t_RF_vec    <- R_t_RF$x
  
  e1 <- R_t_vec-R_t_0_vec
  e2 <- R_t_vec_1year-R_t_OLS_vec
  e3 <- R_t_vec_2year-R_t_RF_vec
  
  MSE_0      <- mean(e1^2)
  MSE_OLS    <- mean(e2^2)
  MSE_RF     <- mean(e3^2)

e1 <- tail(e1, n=17519)

In [37]:
# Create a data frame (table)
mse_table <- data.frame(
  Model = c("MSE_0", "MSE_OLS"),
  MSE_Value = c(MSE_0, MSE_OLS)
)

# Display the table
print(mse_table)

    Model MSE_Value
1   MSE_0  31514.66
2 MSE_OLS  40865.14


From the table above, the OSL model has a MSE of 40865.14, higher than the median MSE of 31514.66 for the benchmark. This difference in MSE suggests that the naïve model performs better than the OLS model, meaning that it would be better to not even predict the remainders than using an OLS model. However, the distribution of MSE values for both sets of forecasts is notably right-skewed, indicating the presence of outliers with significantly higher errors that affect the mean more than the median.

Given the skewed nature of the distribution and to ensure that the lower median MSE observed for the benchmark is not a result of randomness. The DM test offers a thorough method for comparing predictive accuracy. By focusing on the difference between the forecast errors from the two models, the DM test evaluates whether there is a statistically significant difference in their performance. For a more detailed description of the DM test see chapter 3.9.

Therefore, to validate the initial findings suggested by the boxplot, there will be conducted a DM test for all the 707 customer groups. Which will provide a statistical backing for any claims regarding the relative performance of the OLS model versus the benchmark, ensuring that the conclusion is not only visually and intuitively appealing but also statistically sound. The results of this test will further inform us about the

consistency and reliability of the OLS model’s performance in forecasting energy consumption remainders after MSTL decomposition.

In [38]:
 dm_test <- dm.test(e1, e3, alternative = "two.sided", h = 1, power = 2)

dm_test


	Diebold-Mariano Test

data:  e1e3
DM = 60.43, Forecast horizon = 1, Loss function power = 2, p-value <
2.2e-16
alternative hypothesis: two.sided
