## Model evaluation



### Why Use MAE, RMSE, and MAPE?
In forecasting model evaluation, the goal is to measure how well the model's predictions fit the actual data. To achieve this, error metrics are used to quantify the difference between predictions (`yhat`) and observed values (`y`). Three of the most common metrics for this purpose are:

#### 1. **Mean Absolute Error (MAE)**
   - **Definition**: MAE is the average of the absolute errors between the predictions and actual values. It measures the average magnitude of errors in a set of predictions, regardless of whether the errors are positive or negative.
   - **Formula**: 
     $$ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|$$
     
   - **Interpretation**: MAE provides an intuitive measure of how much our predictions deviate on average from the actual values. Since it does not square the errors, MAE is less sensitive to outliers, making it useful when we want a robust measure of accuracy that isn't heavily skewed by unexpected spikes.
   - **Applicability in COVID-19 forecasting**: If the priority is having a "typical" error that is easy to interpret in terms of daily case numbers, MAE is a good choice.

#### 2. **Root Mean Squared Error (RMSE)**
   - **Definition**: RMSE is the square root of the average of the squared differences between predicted and actual values. Unlike MAE, RMSE penalizes larger errors more heavily, which can be beneficial when large errors represent a higher risk in decision-making.
   - **Formula**:
     $$ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}$$
     
   - **Interpretation**: RMSE is more sensitive to outliers, which can be useful if you want to ensure that the model penalizes large errors significantly. In COVID-19 forecasting, RMSE can be particularly useful when large deviations in case numbers are concerning (for example, during peak periods).
   - **Applicability in COVID-19 forecasting**: If large deviations in case numbers are especially problematic, RMSE is a suitable metric as it "punishes" predictions that stray significantly from actual values.

#### 3. **Mean Absolute Percentage Error (MAPE)**
   - **Definition**: MAPE calculates the average of the absolute percentage errors between predictions and actual values, providing an intuitive percentage-based accuracy metric.
   - **Formula**:
     $$ \text{MAPE} = \frac{100}{n} \sum_{i=1}^n \left| \frac{y_i - \hat{y}_i}{y_i} \right|$$
     
   - **Interpretation**: MAPE expresses errors as a percentage of the actual values, making it easy to interpret in terms of relative accuracy. For instance, a MAPE of 5% means the average forecasted value is off by 5% from the actual value. However, MAPE can be sensitive to very small actual values, leading to high error rates or undefined values.
   - **Applicability in COVID-19 forecasting**: MAPE is especially useful when it's important to understand the error in relative terms, such as how much the model’s predictions deviate as a percentage of actual cases. This is beneficial for comparing performance over time or across regions with varying case numbers.

### Why Use All Three Metrics?
MAE, RMSE, and MAPE each provide unique insights into the model’s performance:

- **MAE** gives a straightforward measure of average error, useful for understanding typical deviations in absolute terms and for robust performance when outliers are present.
- **RMSE** is beneficial if we want the model to be "strict" and heavily penalize larger errors, which is valuable in situations where large deviations are particularly costly or concerning.
- **MAPE** allows us to assess the model’s accuracy in percentage terms, which is helpful for comparing accuracy across datasets of different scales or understanding performance across varying levels of COVID-19 cases.

By using all three metrics, you can achieve a more balanced and comprehensive view of the model's performance, capturing both absolute and relative error perspectives, as well as sensitivity to large deviations.

In [1]:
import numpy as np
import pandas as pd
import malib.models.metrics as mmm
import malib.data.plotting as mdp
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

In [2]:
PATH = "../../data/processed/df_clean_target_confirmed.csv"
PATH_HP_OPTUNA = "../../data/processed/prediction/optuna.csv"
PATH_HP_HYPEROPT = "../../data/processed/prediction/hyperopt.csv"
PATH_HP_PARAM_GRID = "../../data/processed/prediction/params_grid.csv"
PATH_TRAIN = "../../data/processed/train.csv"
PATH_TEST = "../../data/processed/test.csv"

In [3]:
df = pd.read_csv(PATH)
df_opt = pd.read_csv(PATH_HP_OPTUNA)
df_hyp = pd.read_csv(PATH_HP_HYPEROPT)
df_pg = pd.read_csv(PATH_HP_PARAM_GRID)
train = pd.read_csv(PATH_TRAIN)

In [4]:
list_of_models = {"optuna":df_opt,"hyperopt":df_hyp,"params_grid":df_pg}

In [5]:
rr = df_opt.shape[0]

In [6]:
for key,m in list_of_models.items():
    print(key)
    eval = mmm.daily_evaluation(df.tail(rr),m,"ds","y","yhat")
    display(eval)
    print("Metrics summary")
    pd.set_option('display.float_format', lambda x: '%.3f' % x)
    display(eval.mean(numeric_only=True).T)
    eval["y"] = eval["predict"]
    eval["ds"] = eval["Date"]
    fig = mdp.plot_interactive_forecast(df,train,eval[["ds","y"]],"ds","y")
    fig.show(renderer='iframe')

optuna


Unnamed: 0,Date,true,predict,RMSE,MAE,MAPE,Normalized RMSE,Normalized MAE,Accuracy
0,2020-06-20,8805336.0,8720314.41,85021.59,85021.59,0.009656,0.009656,0.009656,0.990344
1,2020-06-21,8933875.0,8846325.26,87549.74,87549.74,0.0098,0.0098,0.0098,0.9902
2,2020-06-22,9071733.0,8970091.23,101641.77,101641.77,0.011204,0.011204,0.011204,0.988796
3,2020-06-23,9237071.0,9099658.22,137412.78,137412.78,0.014876,0.014876,0.014876,0.985124
4,2020-06-24,9408254.0,9231990.93,176263.07,176263.07,0.018735,0.018735,0.018735,0.981265
5,2020-06-25,9586141.0,9367322.47,218818.53,218818.53,0.022827,0.022827,0.022827,0.977173
6,2020-06-26,9777487.0,9502945.01,274541.99,274541.99,0.028079,0.028079,0.028079,0.971921
7,2020-06-27,9955597.0,9631877.25,323719.75,323719.75,0.032516,0.032516,0.032516,0.967484
8,2020-06-28,10117227.0,9757888.1,359338.9,359338.9,0.035518,0.035518,0.035518,0.964482
9,2020-06-29,10275799.0,9881654.07,394144.93,394144.93,0.038357,0.038357,0.038357,0.961643


Metrics summary


true              12329317.500
predict           11124021.315
RMSE               1205296.185
MAE                1205296.185
MAPE                     0.088
Normalized RMSE          0.088
Normalized MAE           0.088
Accuracy                 0.912
dtype: float64

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prediction_data["Series"] = "Predictions"


hyperopt


Unnamed: 0,Date,true,predict,RMSE,MAE,MAPE,Normalized RMSE,Normalized MAE,Accuracy
0,2020-06-20,8805336.0,8720811.28,84524.72,84524.72,0.01,0.01,0.01,0.99
1,2020-06-21,8933875.0,8845628.31,88246.69,88246.69,0.01,0.01,0.01,0.99
2,2020-06-22,9071733.0,8968951.41,102781.59,102781.59,0.011,0.011,0.011,0.989
3,2020-06-23,9237071.0,9098808.93,138262.07,138262.07,0.015,0.015,0.015,0.985
4,2020-06-24,9408254.0,9232185.31,176068.69,176068.69,0.019,0.019,0.019,0.981
5,2020-06-25,9586141.0,9368464.73,217676.27,217676.27,0.023,0.023,0.023,0.977
6,2020-06-26,9777487.0,9504338.9,273148.1,273148.1,0.028,0.028,0.028,0.972
7,2020-06-27,9955597.0,9632509.38,323087.62,323087.62,0.032,0.032,0.032,0.968
8,2020-06-28,10117227.0,9757326.42,359900.58,359900.58,0.036,0.036,0.036,0.964
9,2020-06-29,10275799.0,9880649.51,395149.49,395149.49,0.038,0.038,0.038,0.962


Metrics summary


true              12329317.500
predict           11124359.848
RMSE               1204957.652
MAE                1204957.652
MAPE                     0.088
Normalized RMSE          0.088
Normalized MAE           0.088
Accuracy                 0.912
dtype: float64



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



params_grid


Unnamed: 0,Date,true,predict,RMSE,MAE,MAPE,Normalized RMSE,Normalized MAE,Accuracy
0,2020-06-20,8805336.0,8725347.62,79988.38,79988.38,0.009,0.009,0.009,0.991
1,2020-06-21,8933875.0,8851206.46,82668.54,82668.54,0.009,0.009,0.009,0.991
2,2020-06-22,9071733.0,8974724.74,97008.26,97008.26,0.011,0.011,0.011,0.989
3,2020-06-23,9237071.0,9103950.77,133120.23,133120.23,0.014,0.014,0.014,0.986
4,2020-06-24,9408254.0,9237104.0,171150.0,171150.0,0.018,0.018,0.018,0.982
5,2020-06-25,9586141.0,9374902.96,211238.04,211238.04,0.022,0.022,0.022,0.978
6,2020-06-26,9777487.0,9511477.83,266009.17,266009.17,0.027,0.027,0.027,0.973
7,2020-06-27,9955597.0,9641308.45,314288.55,314288.55,0.032,0.032,0.032,0.968
8,2020-06-28,10117227.0,9767167.29,350059.71,350059.71,0.035,0.035,0.035,0.965
9,2020-06-29,10275799.0,9890685.56,385113.44,385113.44,0.037,0.037,0.037,0.963


Metrics summary


true              12329317.500
predict           11139513.710
RMSE               1189803.790
MAE                1189803.790
MAPE                     0.087
Normalized RMSE          0.087
Normalized MAE           0.087
Accuracy                 0.913
dtype: float64



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Conclusions

#### 1. Choice of Hyperparameter Tuning Method: Optuna, Param Grid, or Hyperopt

When comparing the performance metrics across Optuna, Param Grid, and Hyperopt, we observe the following:

- **Accuracy**: All three methods achieve a very similar average accuracy of approximately 0.912–0.913. This suggests that each tuning method is producing models with a comparable level of overall fit. However, **Param Grid** has a slightly higher accuracy (0.913) than the others, though the difference is minimal.
  
- **RMSE and MAE**: Param Grid achieves the lowest overall RMSE (1,189,803) and MAE (1,189,803) among the three methods. Hyperopt follows closely, while Optuna has a slightly higher RMSE and MAE. This indicates that **Param Grid** provides the best absolute fit in terms of error, as both RMSE and MAE are minimized with this method.

- **MAPE and Normalized RMSE/MAE**: The average MAPE and normalized RMSE/MAE for all three methods are also very similar (around 0.087–0.088). However, **Param Grid** has a marginally better normalized RMSE and MAE (0.087) compared to the other methods (0.088 for Optuna and Hyperopt).

**Recommendation**: Based on these metrics, **Param Grid** is the preferred hyperparameter tuning method. It provides the lowest error (RMSE and MAE) and slightly higher accuracy, indicating a marginally better fit. This advantage, though subtle, makes Param Grid the most effective method for hyperparameter tuning in this case.

---

#### 2. Evaluation of Prophet’s Forecasting Performance

The evaluation metrics give insights into the quality of the Prophet model’s forecast:

- **Overall Fit**: With a **MAPE around 8.7%–8.8%**, the Prophet model shows reasonable accuracy, though it suggests there is still some level of deviation from the actual data. A MAPE below 10% is often considered a sign of a fairly good fit in time series forecasting, indicating that Prophet is generally capturing the trend.

- **RMSE and MAE Values**: The RMSE values around 1,200,000 indicate an average forecast deviation in the range of 1.2 million units per prediction. Although these errors are sizable, they may be expected in high-scale time series data (like COVID-19 case numbers) where the absolute values are high. The **normalized RMSE and MAE (~8.7%)** imply that the error is roughly 8.7% of the mean true values, which provides a clearer interpretation of the model’s performance relative to the data scale.

- **Accuracy**: The accuracy values close to 91.3% across the models indicate that Prophet provides a consistent fit across different days, achieving reliable performance over the forecast horizon. However, the daily metrics reveal that the error increases slightly as the forecast horizon extends, which is typical in time series forecasting due to the cumulative uncertainty in the data.

**Overall Assessment**: The Prophet model is performing reasonably well in capturing the overall trend, achieving an average error (MAPE) of around 8.7% and an accuracy close to 91.3%. This indicates a fairly good fit to the observed data. However, there is room for improvement if higher accuracy is required, particularly for reducing absolute error (RMSE) on larger-scale predictions. Further fine-tuning or considering additional seasonal adjustments or other modeling approaches could potentially enhance forecast accuracy for more precise applications.