#### Notebook with the functions of the different forecasting models

This notebook contains the functions that allow to define and train forecasting models using the available algorithms,
which are:

  * SARIMAX
  * Prophet
  * LightGBM

The functions included are:

| Function | Description |
| -------- | ----------- |
| `obtain_sarimax`  | defines and trains a SARIMAX model using the given hyperparameters and input time series |
| `obtain_prophet`  | defines and trains a Prophet model using the given hyperparameters and input time series |
| `obtain_lightgbm`  | TBD |
| `refit_generate_forecast`  | re-fits a model using all the available data before the back-testing set. Then the forecast for the back-testing window is generated using this newly retrained model |

###### Definition of functions

In [0]:
def obtain_sarimax(df_train, params, holidays=False):
    """
    Defines and trains a SARIMAX model according to the specified parameters and input time series.

    If "holidays" is set to True then it is assumed that "df_train" includes a binary column which indicates the
    observations that correspond to holidays, this column will be used as an external regressor.

    Parameters
    __________
        df_train (pd.DataFrame): Dataset with training time series.
        params (dict): Dictionary with seasonal and non-seasonal order parameters of the model.
        holidays (bool, defaults to False): Flag to indicate whether the dataset contains the holidays regressor or not.

    Returns
    _______
        model (ARIMAResultsWrapper): Object with the SARIMAX model defined by "params" and trained with the given time
            series.
    """
    # Validating holidays column
    if holidays:
        exog = df_train["holiday"]
    else:
        exog = None

    # Defining and training SARIMAX model
    model = ARIMA(
        df_train["y"],
        exog=exog,
        order=(params["p"], params["d"], params["q"]),
        seasonal_order=(params["P"], params["D"], params["Q"], params["s"]),
        enforce_invertibility=False,
        enforce_stationarity=False
    ).fit()

    return model

In [0]:
def obtain_prophet(df_train, params, df_holidays=None):
    """
    Defines and trains a Prophet model according to the specified parameters and input time series.

    If "df_holidays" is different from "None" then it is assumed that this variable contains a DataFrame with the
    holidays in the format Prophet expects them to be.

    It is necessary to consider that Prophet requires the date column to be labeled as "ds" and the series column to be
    labeled as "y". The same naming convention is required for the DataFrame with the holidays as well, with a column
    named "holiday" which must contain the name of the holiday.

    Parameters
    __________
        df_train (pd.DataFrame): Dataset with training time series.
        params (dict): Dictionary with the hyperparameters of the model.
        df_holidays (pd.DataFrame, defaults to None): Dataset with holidays.

    Returns
    _______
        model (prophet.forecaster.Prophet): Object with the Prophet model defined by "params" and trained with the given
            time series.
    """
    # Defining and training Prophet model
    model = Prophet(
        changepoint_range=0.9,
        daily_seasonality=False,
        holidays=df_holidays,
        changepoint_prior_scale=params["changepoint_prior_scale"],
        seasonality_prior_scale=params["seasonality_prior_scale"],
        holidays_prior_scale=params["holidays_prior_scale"],
    ).fit(df_train)

    return model

In [0]:
def refit_generate_forecast(algorithm, params, df_trainval, df_test, holidays=False, df_holidays=None):
    """
    Re-fits a model using all the available data before the back-testing set. Then the forecast for the back-testing
    window is generated using this newly retrained model.

    Given that models built with native time series algorithms such as Prophet and SARIMAX can only be trained once, the
    re-fitting happens by creating a new object using the already known hyperparameters and then training this object
    with the new data.

    Back-testing forecast generated with the re-trained object are compared against the actual values of the test set to
    measure the final performance of the model.

    Parameters
    __________
        algorithm (str): Algorithm used by the model to re-train.
        params (dict): Dictionary with the hyperparameters of the model to re-train.
        df_trainval (pd.DataFrame): Dataset with data to re-train the model.
        df_test (pd.DataFrame): Back-testing dataset.
        holidays (bool, defaults to False): Flag to indicate whether the holidays are included or not.
        df_holidays (pd.DataFrame, defaults to None): Dataset with holidays.

    Returns
    _______
        df_output (pd.DataFrame): Table with the forecast for the back-testing window.
        test_wape (float): Back-test WAPE.
    """
    # Validating if holidays are included
    if holidays and (algorithm == "sarimax"):
        exog = df_test["holiday"].values
    elif holidays and (algorithm == "prophet"):
        df_hlds = df_holidays.copy()
    else:
        exog = None
        df_hlds = None

    # Validating the algorithm
    if algorithm == "sarimax":
        # Re-fitting the model
        model = obtain_sarimax(df_trainval, params, holidays=holidays)

        # Generating forecast for the back-test set
        test_fcsts = model.predict(
            start=len(df_trainval),
            end=len(df_trainval) + len(df_test) - 1,
            exog=exog,
            typ="levels"
        )
        fcst = test_fcsts.values

        # Calculating performance metric
        test_wape = wape(df_test["y"].values, test_fcsts.values)

    elif algorithm == "prophet":
        # Re-fitting the model
        model = obtain_prophet(df_trainval, params, df_holidays=df_hlds)

        # Generating forecast for the back-test set
        test_fcsts = model.predict(df_test)[["ds", "yhat"]]
        fcst = test_fcsts["yhat"].values

        # Calculating performance metric
        test_wape = wape(df_test["y"].values, test_fcsts["yhat"].values)

    # Creating output table
    df_output = pd.DataFrame({"algorithm": algorithm,"ds": df_test["ds"].values, "fcst": fcst})

    return df_output, test_wape
