In [1]:
%load_ext autoreload
%autoreload 2
import sys
#sys.path.insert(1, '/home/ximo/Documents/GitHub/skforecast')
%config Completer.use_jedi = False

In [1]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg

By default, when using `predict` method on a trained forecaster object, predictions starts right after the last training observation.

In [4]:
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
print(data_train.tail().to_markdown())

| date                |       y |
|:--------------------|--------:|
| 2004-09-01 00:00:00 | 1.13443 |
| 2004-10-01 00:00:00 | 1.18101 |
| 2004-11-01 00:00:00 | 1.21604 |
| 2004-12-01 00:00:00 | 1.25724 |
| 2005-01-01 00:00:00 | 1.17069 |


In [7]:
forecaster = ForecasterAutoreg(
                    regressor = RandomForestRegressor(random_state=123),
                    lags = 5
                )

forecaster.fit(y=data_train['y'])

In [8]:
# Predict
# ==============================================================================
forecaster.predict(steps=3)

2005-02-01    0.927480
2005-03-01    0.756215
2005-04-01    0.692595
Freq: MS, Name: pred, dtype: float64


As expected, predictions follow directly from the end of training data.

If the training sample is relatively small or if it is desired compute the best possible forecasts, the forecaster should be retrained using all the available data before making predictions. However, if that strategy is infeasible (for example, because the training set is very large), it should bevery useful to generate predictions without retraining the model each time.

With skforecast, it is posible to generate predictions starting time ahead of training date using the argument `last_window`. When `last_window` is provided, the forecaster use this data to generate the lads needed as predictors.

In [9]:
# Predict
# ==============================================================================
forecaster.predict(steps=3, last_window=data['y'].tail(5))

2008-07-01    0.803853
2008-08-01    0.870858
2008-09-01    0.905003
Freq: MS, Name: pred, dtype: float64

Since the provided `last_window` contains values from 2008-02-01 to 2008-06-01, the forecaster is able to create the needed lags and predict the next 5 steps.


> **⚠ WARNING:**  
> It is important to note that the lenght of last windows must be enought to include the maximum lag used by the forecaster. Fore example, if the forecaster uses lags 1, 24, 48, `last_window` must include the last 72 values of the series.

