# PhDinDS-M5 Leaderboards

The following table presents the combined leaderboards of the (1) entire M5 competition, (2) entries of the the PhDinDS AIM students, and some (3) M5 benchamarks for comparison.

For new entries, you may:
1. Add a `.csv` file with the same structure and information in the `time_series_handbook/08_WinningestMethods/M5_competition/leaderboards_data` directory and fill out the necessary information (WRMSSE) per Level in the hierarchy.
2. Re-run the code cell below.

It is expected that these reported WRMSSE (M5 accuracy metric) for each entry is well-supported by its own `directory` (e.g. `phdinds2024_entry`) that details the methodology used in obtaining the forecasts. Description on the entries are shown in the sections below.

Detailed description of the winningest methods, including their code are in the following repository:
https://github.com/Mcompetitions/M5-methods

In [34]:
from glob import glob
import pandas as pd
import os

files = glob('leaderboards_data/*.csv')
leaderboards = pd.concat([pd.read_csv(file) for file in files], axis=0)
display(leaderboards.sort_values(by='AveWRMSSE'))

Unnamed: 0,Entry,Level1,Level2,Level3,Level4,Level5,Level6,Level7,Level8,Level9,Level10,Level11,Level12,AveWRMSSE
0,YJ_STU,0.199,0.31,0.4,0.277,0.365,0.39,0.474,0.48,0.573,0.966,0.929,0.884,0.52
1,Matthias,0.186,0.294,0.416,0.246,0.349,0.381,0.481,0.497,0.594,1.023,0.964,0.907,0.528
2,mf,0.236,0.319,0.421,0.308,0.397,0.405,0.496,0.505,0.6,0.95,0.917,0.875,0.536
3,monsaraida,0.254,0.34,0.418,0.302,0.377,0.411,0.483,0.49,0.579,0.963,0.928,0.886,0.536
4,Alan Lahoud,0.213,0.324,0.414,0.272,0.361,0.416,0.494,0.503,0.595,0.995,0.95,0.897,0.536
0,phdinds2024,0.252662,0.358375,0.481902,0.357163,0.454832,0.466121,0.569269,0.581131,0.707473,1.283457,1.198084,1.147574,0.654837
5,ES_bu,0.426,0.514,0.58,0.478,0.577,0.577,0.654,0.643,0.728,1.012,0.969,0.915,0.671
8,ARIMA_td,0.615,0.673,0.753,0.656,0.768,0.725,0.81,0.785,0.856,1.027,0.969,0.91,0.796
6,sNaive,0.56,0.673,0.718,0.623,0.708,0.76,0.829,0.801,0.888,1.223,1.205,1.176,0.847
9,ARIMA_bu,0.829,0.85,0.87,0.844,0.905,0.882,0.932,0.893,0.938,1.048,0.981,0.917,0.908


## Brief Description of Phd in DS entries

**phdinds2024** A LightGBM model was trained for each *Level 9 (store-department)* series totalling to 70 trained models for the base level forecasts. Both the input structure and ML parameters for was optimized for each series. Specifically, lookback window, and delay were both optimized in addition to choosing whether to use a lookback on endogenous (unit sales), lookback on exogenous (calendar, sell prices, promotions, holiday data), and lookahead on the same exogenous variables. Simultaneously, LighGBM parameters on a `goss` boosting method: `top_rate`, `other_rate`, `tree_learner`, `n_estimators`, `learning_rate`, and `num_leaves` were optimized for `rmse` on a `regression` objective. No early stopping was empoyed. *Reconciliation*: These Level 9 forecasts were then used to obtain Levels 1 to 8 using bottom-up (bu) reconciliation and to obtain Levels 10 to 12 using top-down (td, based on average proportions). For details see, `phdinds2024_entry/phdinds2024_entry.ipynb`.


*Describe new entries here.*

## Brief Description of M5 Winningest Methods

*Descriptions are lifted from Makridakis, et al (2020)*

**1st:** `0.520` *YJ_STU* Senior undergraduate student in SK - LightGBM models were trained to produce forecasts for the product-store series using data per store (10 models), store-category (30 models), and store-department (70 models). Two variations: recursive and non-recursive. Total of 220 models were built and each series was forecast using the average of 6 models. Each one exploiting a different learning approach and train set. The models were optimized without considering early stopping and
by maximizing the neg log-likelihood of the Tweedie Distribution (Zhou et al., 2020). The method was fine-tuned using the last four 28-day-long
    windows of available data for CV and by measuring both the mean and the standard deviation of the errors produced by the individual models and their combinations. Features used were calendar-related information, special events, promotions, prices,
and unit sales data, both in a recursive and a non-recursive format.

**2nd:** `0.528` *Matthias* LightGBM + N-BEATS (DL for time series forecasting). This method was also based on an equally weighted
combination of various LightGBM models, however, was externally adjusted through multipliers according
to the forecasts produced by N-BEATS (deep-learning NN for time series forecasting; (Oreshkin
et al., 2019)) for the top fineve aggregation levels of the dataset. Essentially, LightGBM models were
first trained per store (10 models) and then five different multipliers were used to adjust their forecasts
and properly capture the trend. In this regard, a total of 50 models were built and each series of the
product-store level of the dataset was forecast using a combination of fiveve different models. The loss
function used was a custom, asymmetric one. The last four 28-day-long windows of available data
were used for CV and model building. The LightGBM models were trained using only some basic
features about calendar effects and prices (past unit sales were not considered), while the N-BEATS
model was based solely on historical unit sales.

**3rd:** `0.536` *mf* LSTM. This method involved an equally weighted
combination of 43 deep-learning NNs (Salinas et al., 2020), each consisting of multiple LSTM layers
that were used to recursively predict the product-store series. From the models trained, 24 considered
dropout, while the remaining 19 did not. Note that these models originated from just 12 models
and corresponded to the last, more accurate instances observed for these models while training, as
specified through CV (last fourteen 28-day-long windows of available data). Similar to the winner,
the method considered Tweedie regression, but was modifieded however to optimize weights based on
sampled predictions instead of actual values. The Adam optimizer and the cosine annealing was used
for the learning rate schedule. The NNs considered a total of 100 features of similar nature to those of
the winning submission (sales data, calendar-related information, prices, promotions, special events,
identifiers, and zero-sales periods).


**4th:** `0.536` *monsaraida* Weekly forecasting of the horizon (28 days chopped into weeks) using LGBM trained on store-level data. This method produced forecasts for the
product-store series of the dataset using non-recursive LightGBM models, trained per store (10 models).
However, in contrast to the rest of the methods, each week of the forecasting horizon was forecast
separately using a different model (4 models per store). Thus, a total of 40 models were built to
produce the forecasts. The features used as inputs were similar to those of the winning submission,
with the exception of the recursive ones. Tweedie regression was considered for training the models,
with no early stopping, and no optimization was performed in terms of training parameters. The last
five 28-day-long windows of available data were used for CV.


**5th:** `0.536` *Alan Lahoud* This method considered recursive LightGBM models,
trained per department (7 models). After producing the forecasts for the product-store series, these
were externally adjusted so that that the mean of each of the series at the store-department level was
the same as the one of the previous 28 days. This was done using appropriate multipliers. The models
were trained using Poisson regression with early stopping and validated using a random sample of 500
days. The features used as input were similar to those of the winning submission.

## Brief Description of M5 Benchmarks

*Descriptions are lifted from Makridakis, et al (2020)*

**ES_bu** An algorithm is used to automatically
select the most appropriate exponential smoothing model for predicting total sales (level 1),
indicated through information criteria (Hyndman et al., 2002). Then, the rest of
the series (levels 1-11) are predicted using the bottom-up method.

**ARIMA_td** An algorithm is used to automatically select the most appropriate ARIMA model for predicting total
sales (level 1), indicated through information criteria (Hyndman & Khandakar, 2008). Then, the rest
of the series (levels 1-11) are predicted using the bottom-up method.

**ARIMA_bu** The same algorithm used in ARIMA td is employed for forecasting the product-store series of the
dataset (level 12). Then, the rest of the series (levels 1-11) are predicted using the bottom-up method.

**sNaive** The forecasts at time t are equal to the last known observation of the
same period, $t - m$, as follows:
$$
\hat{y}_t = y_{t-m}
$$

where m is the frequency of the series. In M5, m is set equal to 7 since the series are daily. Contrary
to the Naive method, sNaive can capture possible seasonal variations. Although sales do not usually
display strong seasonality at low cross-sectional levels, this is very likely at higher aggregation levels.

**MA** Moving averages are often used in practice to forecast sales (Syntetos &
Boylan, 2005). Forecasts are computed by averaging the last k observations of the series as follows:

$$
\hat{y}_t = \frac{\Sigma_{i=1}^{k} y_{t-i}}{k}
$$

The order of the MA ranges between 2 and 14 and is specifieded by minimizing the in-sample MSE of
the method.

### References

Hyndman, R. J., Koehler, A. B., Snyder, R. D., & Grose, S. (2002). A state space framework for automatic forecasting using
exponential smoothing methods. *International Journal of Forecasting*, 18, 439-454.

Hyndman, R., & Khandakar, Y. (2008). Automatic time series forecasting: the forecast package for R. *Journal of Statistical
Software*, 26, 1-22.

Syntetos, A. A., Boylan, J. E., & Croston, J. D. (2005). On the categorization of demand patterns. *Journal of the Operational
Research Society*, 56, 495-503.

Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2020). The M5 Accuracy Competition: Results, findings, and conclusions. *International Journal of Forecasting*

Makridakis, S., & Spiliotis, E. (2021). The M5 Competition and the Future of Human Expertise in Forecasting. *Foresight: The International Journal of Applied Forecasting*, (60).

## End