# M5 using mlforecast

[mlforecast](https://nixtla.github.io/mlforecast/) is a framework to perform time series forecasting using machine learning models. It abstracts away most of the details and tries to mimic the scikit-learn API.

This notebook is inspired by https://www.kaggle.com/kneroma/m5-first-public-notebook-under-0-50.

In [None]:
%pip install git+https://github.com/Nixtla/mlforecast.git

## Libraries

In [None]:
from pathlib import Path

import lightgbm as lgb
import numpy as np
import pandas as pd
from mlforecast.core import TimeSeries
from mlforecast.forecast import Forecast
from window_ops.rolling import rolling_mean

## Data loading

In [None]:
input_path = Path('../input/m5-preprocess/processed/')

data = pd.read_parquet(input_path / 'sales.parquet')
data

These are all sales in the dataset, however due to memory limitations we'll take from the 350th day onwards.

In [None]:
dates = sorted(data['date'].unique())
data = data[data['date'] >= dates[349]]
data.shape

mlforecast requires a dataframe with an index named **unique_id** which identifies each time serie, a column **ds** containing the datestamps and a column **y** with the series values.

In [None]:
data = data.rename(columns={'id': 'unique_id', 'date': 'ds'})
data = data.set_index('unique_id')
data

Metadata for predictions

In [None]:
prices = pd.read_parquet(input_path / 'prices.parquet')
prices

In [None]:
cal = pd.read_parquet(input_path / 'calendar.parquet')
cal = cal.rename(columns={'date': 'ds'})
cal.head()

## Forecast setup

There are two inputs needed: a regressor that follows the scikit-learn API and a time series object which defines the features to be computed.

### Model

In [None]:
lgb_params = {
    'objective': 'poisson',
    'metric': 'rmse',
    'learning_rate': 0.075,
    'bagging_freq': 1,
    'bagging_fraction': 0.75,
    'lambda_l2': 0.1,
    'n_estimators': 1200,
    'num_leaves': 128,
    'min_data_in_leaf': 100,
}

model = lgb.LGBMRegressor(**lgb_params)
model

### TimeSeries
This is where we define the features. A brief description of each argument:

* **freq**: frequency of our time series. This is a pandas abbreviation and is used to get the next dates when computing the predictions.
* **lags**: lags that we want to use as features.
* **lag_transforms**: dictionary where the keys are the lags that we want to use and the values are a list of transformations to apply to them. The transformations are defined as `numba` jitted functions. If the function takes more arguments than the input array, these are passed as a tuple `(func, arg1, arg2, ...)`.
* **date_features**: date attributes to use for training. These are computed from the `ds` column and are updated in each timestep.
* **num_threads**: number of threads to use in preprocessing and updates, defaults to all cpus. Since the transformations are `numba` jitted functions, we can use multithreading to compute our features.

In [None]:
ts = TimeSeries(
    freq='D',
    lags=[7, 28],
    lag_transforms = {
        7:  [(rolling_mean, 7), (rolling_mean, 28)],
        28: [(rolling_mean, 7), (rolling_mean, 28)],
    },
    date_features=['year', 'month', 'day', 'dayofweek', 'quarter', 'week'],
    num_threads=4,
)
ts

### Define forecaster
Once we have our model and flow configuration setup, we instantiate a `Forecast` object with them.

In [None]:
fcst = Forecast(model, ts)

## Training

If we only want to preprocess our data and train on all of it we can just call `Forecast.fit`. In this case we're going to make a train-valid split to get some information on the training, so we instead call `Forecast.preprocess` to get the dataframe with all our features and (internally) save the information for the forecasting step. `Forecast.preprocess` takes the following additional arguments:

* **dropna**: whether or not to drop rows with null values after building all the features. Using lags and transformations on the lags generates many rows with `np.nan`s, this is a flag to indicate whether we want to drop them when we're done.
* **keep_last_n**: keep only last `n` samples from each time serie after computing the features. The updates are performed by applying the transformations on the series again and taking only the last value. This can save memory if you have very long series and your transformations only use a small window, like in this case where we have series with thousands of data points and our transformations require only 28 (lag) + 27 (window) samples.
* **static_features**: define which features are static. By default all extra columns (other than **ds** and **y**) are considered static and are replicated when building the features for the next timestep, setting this overrides that and repeats only the ones defined here.

In [None]:
%%time
features_df = fcst.preprocess(
    data,
    dropna=True,
    keep_last_n=28+27,
    static_features=['item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'],
)
del data

In [None]:
features_df.columns

The names of the transformations are built using the name of the function and its arguments.

The order is always:
1. Static features
2. Lags
3. Lag transforms
4. Date features

Perform a train-valid split with 95% on train and 5% on valid.

In [None]:
np.random.seed(11)
train_mask = np.random.rand(features_df.shape[0]) < 0.95
train, valid = features_df[train_mask], features_df[~train_mask]
X_train, y_train = train.drop(columns=['ds', 'y']), train.y
X_valid, y_valid = valid.drop(columns=['ds', 'y']), valid.y
del features_df, train, valid

Calling `Forecast.fit` performs the preprocessing step as well. If we've already done that we just call `Forecast.model.fit` instead.

In [None]:
%time fcst.model.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_valid, y_valid)], verbose=20)

## Predictions

By default the predictions are computed repeating the static features and updating the transformations and the date features. If you want to do something different you can define your own predict function as explained [here](https://nixtla.github.io/mlforecast/forecast.html#Custom-predictions).

In [None]:
def my_predict_fn(
    model,
    new_x,
    dynamic_dfs,
    features_order,
    alpha,
) -> np.ndarray:
    for df in dynamic_dfs:
        new_x = new_x.merge(df, how='left')
    predictions = model.predict(new_x[features_order])
    return alpha * predictions

Calling `Forecast.predict(horizon)` computes the predictions for the next `horizon` steps. We can also provide a custom `predict_fn` like we do in this case, using `my_predict_fn` defined above. This step uses multithreading if `num_threads` was set to a value greater than 1 or was left empty and you have more than 1 cpu (here we have 4).

In [None]:
fcst.ts.num_threads

In [None]:
%%time
alphas = [1.028, 1.023, 1.018]
preds = None
for alpha in alphas:
    alpha_preds = fcst.predict(28, dynamic_dfs=[cal, prices], predict_fn=my_predict_fn, alpha=alpha)
    alpha_preds = alpha_preds.set_index('ds', append=True)
    if preds is None:
        preds = 1 / 3 * alpha_preds
    else:
        preds += 1 / 3 * alpha_preds
preds

## Submission

In [None]:
wide = preds.reset_index().pivot_table(index='unique_id', columns='ds')
wide.columns = [f'F{i+1}' for i in range(28)]
wide.columns.name = None
wide.index.name = 'id'
wide

In [None]:
sample_sub = pd.read_csv(
    '../input/m5-forecasting-accuracy/sample_submission.csv', index_col='id'
)
sample_sub.update(wide)
np.testing.assert_allclose(sample_sub.sum().sum(), preds['y_pred'].sum())
sample_sub.to_csv('submission.csv')