# Direct forecasting

[Forecasting with Machine Learning - Course](https://www.trainindata.com/p/forecasting-with-machine-learning)

Load the retail sales data set located in Facebook's Prophet Github repository and use **direct forecasting** to predict future sales. 

- We want to forecast sales over the next 3 months. 
- Sales are recorded monthly. 
- We assume that we have all data to the month before the first point in the forecasting horizon.

We will forecast using Scikit-learn in this exercise.

Follow the guidelines below to accomplish this assignment.

## Import required classes and functions

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.linear_model import Lasso
from sklearn.metrics import root_mean_squared_error
from sklearn.multioutput import MultiOutputRegressor
from sklearn.preprocessing import MinMaxScaler

from feature_engine.datetime import DatetimeFeatures
from feature_engine.imputation import DropMissingData
from feature_engine.timeseries.forecasting import (
    LagFeatures,
    WindowFeatures,
)
from feature_engine.pipeline import Pipeline

## Load data

In [2]:
url = "https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv"
df = pd.read_csv(url)
df.to_csv("example_retail_sales.csv", index=False)

df = pd.read_csv(
    "example_retail_sales.csv",
    parse_dates=["ds"],
    index_col=["ds"],
    nrows=160,
)

df = df.asfreq("MS")

df.head()

Unnamed: 0_level_0,y
ds,Unnamed: 1_level_1
1992-01-01,146376
1992-02-01,147079
1992-03-01,159336
1992-04-01,163669
1992-05-01,170068


## Create the target variable

In direct forecasting, we train a model per step. Hence, we need to create 1 target per step.

In [3]:
# The forecasting horizon.
horizon = 3

# Create an empty dataframe for the targets.
y = pd.DataFrame(index=df.index)

# Add each one of the steps in the horizon.
for h in range(horizon):
    y[f"h_{h}"] = df.shift(periods=-h, freq="MS")
    
    y

In [4]:
# Remove nan from target

y.dropna(inplace=True)

# align data to available target values

df = df.loc[y.index]

df.tail()

Unnamed: 0_level_0,y
ds,Unnamed: 1_level_1
2004-10-01,319726
2004-11-01,324259
2004-12-01,387155
2005-01-01,293261
2005-02-01,295062


## Split data into train and test

Leave data from 2004 onwards in the test set.

## Set up regression model

We will use Lasso in this assignment.

## Set up a feature engineering pipeline

Set up transformers from feature-engine and / or scikit- learn in a pipeline and test it to make sure the input feature table is the one you need for the forecasts.

We will use feature-engine because we are great fans of the library.

If you prefer pandas, as long as the input feature table is the one you expect, that is also a suitable alternative.

## Test pipeline over test set

Ensure that the returned input feature table is suitable to forecast from `2004-01-01` onwards.

## Train a recursive forecaster

Now that we know that the pipeline works, we can train the forecaster.

You can take the feature table and target returned up to here to train the Lasso. 

Or, as we will do, you can add the Lasso within the pipeline.

# Forecast 3 months of sales

We'll start by forecasting 3 months of sales, starting at every single point of the test set.

This is the equivalent of backtesting without refit. More info in section 6!!

## Plot predictions vs actuals

Pick the first row of predictions and plot them against the real sales.

## Determine the RMSE

Pick the first row of predictions and calculate the RMSE

## Forecast next 3 months of sales

Predict the first 3 months of sales right after the end of the test set.

That is, starting on `2005-02-02`.