# Equivalent Pipelines in sktime and skforecast

[Sktime](https://www.sktime.org), a well-known forecasting library, provides functionality to apply transformations to both the target variable and exogenous variables using two distinct classes:

- `TransformedTargetForecaster`: Applies the specified transformations to the target series.

- `ForecastingPipeline`: Applies the specified transformations to the exogenous variables before passing them to the forecaster.

Similarly, [Skforecast](https://www.skforecast.org) supports transformations for both the target variable and exogenous variables through the following arguments present in all forecasters:

- `transformer_y`: Applies the specified transformations (single transformer or a sklearn pipeline with multiple transformers) to the target variable.

- `transformer_exog`: Applies the specified transformations (single transformer or a sklearn pipeline with multiple transformers) to the exogenous variables.

- `transformer_series`: Equivalent to `transformer_y` in multi-series forecasters.

The following document provides a side-by-side comparison of equivalent code in **Sktime** and **Skforecast** for applying transformations to the target variable and exogenous variables.


**Without exogenous variables**

<table>

<tr>
    <td style="text-align: center;"><strong>skforecast</strong></td>
    <td style="text-align: center;"><strong>sktime</strong></td>
</tr>

<tr>
<td style="vertical-align: top;">

```python
from skforecast.recursive import ForecasterRecursive
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler

forecaster = ForecasterRecursive(
                regressor     = Ridge(random_state=951),
                lags          = 15,
                transformer_y = StandardScaler(),
             )
forecaster.fit(y=y)
predictios = forecaster.predict(steps=10)
predictios
```

</td>

<td style="vertical-align: top;">

```python
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import (
    make_reduction,
    TransformedTargetForecaster,
)

regressor = make_reduction(Ridge(random_state=951), window_length=15, strategy="recursive")
forecaster = TransformedTargetForecaster(
    steps=[
        ("boxcox", TabularToSeriesAdaptor(StandardScaler())),
        ("regressor", regressor),
    ]
)
forecaster.fit(y=y)
fh = ForecastingHorizon(np.arange(1, 11), is_relative=True)
predictions = forecaster.predict(fh=fh)
predictions
```
</td>

</tr>

</table>

**With exogenous variables**

<table>

<tr>
    <td style="text-align: center;"><strong>skforecast</strong></td>
    <td style="text-align: center;"><strong>sktime</strong></td>
</tr>

<tr>
<td style="vertical-align: top;">

```python
from skforecast.recursive import ForecasterRecursive
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sktime.transformations.series.boxcox import BoxCoxTransformer

forecaster = ForecasterRecursive(
                regressor        = Ridge(random_state=951),
                lags             = 15,
                transformer_y    = BoxCoxTransformer(),
                transformer_exog = StandardScaler(),
             )
forecaster.fit(y=y)
predictios = forecaster.predict(steps=10)
predictios
```

</td>

<td style="vertical-align: top;">

```python
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import (
    make_reduction,
    TransformedTargetForecaster,
    ForecastingPipeline,
)

regressor = make_reduction(Ridge(random_state=951), window_length=15, strategy="recursive")
pipe_y = TransformedTargetForecaster(
    steps=[
        ("boxcox", BoxCoxTransformer()),
        ("regressor", regressor),
    ]
)
pipe_X = ForecastingPipeline(
    steps=[
        ("scaler", TabularToSeriesAdaptor(StandardScaler())),
        ("forecaster", pipe_y),
    ]
)
pipe_X.fit(y=y, X=exog)
fh = ForecastingHorizon(np.arange(1, 11), is_relative=True)
predictions = pipe_X.predict(fh=fh, X=exog_test)
predictions
```
</td>

</tr>

</table>

<div class="admonition note" name="html-admonition" style="background: rgba(255,145,0,.1); padding-top: 0px; padding-bottom: 6px; border-radius: 8px; border-left: 8px solid #ff9100; border-color: #ff9100; padding-left: 10px; padding-right: 10px">

<p class="title">
    <i style="font-size: 18px; color:#ff9100; border-color: #ff1744;"></i>
    <b style="color: #ff9100;"> <span style="color: #ff9100;">&#9888;</span> Warning</b>
</p>

<p>When working with exogenous variables, both libraries apply the same transformations. However, the results differ because <strong>sktime</strong> incorporates the lagged values of the exogenous variables into the underlying training matrices, whereas <strong>skforecast</strong> does not. For example, if 3 lagged values are used and two exogenous variables are included, the underlying training matrices are as follows:</p>

<ul>
  <li><strong>skforecast</strong>: <code>lag_1</code>, <code>lag_2</code>, <code>lag_3</code>, <code>exog_1</code>, <code>exog_2</code></li>
  <li><strong>sktime</strong>: <code>lag_1</code>, <code>lag_2</code>, <code>lag_3</code>, <code>exog_1_lag_1</code>, <code>exog_1_lag_2</code>, <code>exog_1_lag_3</code>, <code


</div>

In [9]:
# Libraries
# ======================================================================================
import pandas as pd
import numpy as np
from skforecast.datasets import fetch_dataset
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sktime.forecasting.compose import (
    make_reduction,
    TransformedTargetForecaster,
    ForecastingPipeline,
)
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.forecasting.base import ForecastingHorizon
from skforecast.recursive import ForecasterRecursive

In [10]:
# Data
# ======================================================================================
data = fetch_dataset(name='fuel_consumption')
data = data.rename(columns={'Gasolinas': 'litters'})
data = data.rename_axis('date')
data = data.loc[:'1990-01-01 00:00:00']
data = data[['litters']]
data['month'] = data.index.month
data['year'] = data.index.year
display(data.head(4))

fuel_consumption
----------------
Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01.
Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and
Corporación de Derecho Público tutelada por el Ministerio para la Transición
Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas
Shape of the dataset: (644, 5)


Unnamed: 0_level_0,litters,month,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1969-01-01,166875.2129,1,1969
1969-02-01,155466.8105,2,1969
1969-03-01,184983.6699,3,1969
1969-04-01,202319.8164,4,1969


In [11]:
# Train-test dates
# ======================================================================================
end_train = '1980-01-01 23:59:59'
data_train = data.loc[:end_train]
data_test  = data.loc[end_train:]

## Sktime

In [12]:
# Sktime pipeline
# ======================================================================================
regressor = make_reduction(Ridge(), window_length=15, strategy="recursive")
pipe_y = TransformedTargetForecaster(
    steps=[
        ("boxcox", BoxCoxTransformer()),
        ("regressor", regressor),
    ]
)
pipe_X = ForecastingPipeline(
    steps=[
        ("scaler", TabularToSeriesAdaptor(StandardScaler())),
        ("forecaster", pipe_y),
    ]
)
pipe_X.fit(y=data_train['litters'], X=data_train[['month', 'year']])
fh = ForecastingHorizon(np.arange(1, len(data_test) + 1), is_relative=True)
predictions_sktime = pipe_X.predict(fh=fh, X=data_test[['month', 'year']])
predictions_sktime

1980-02-01    430096.815068
1980-03-01    472406.420587
1980-04-01    509203.559184
1980-05-01    495910.509282
1980-06-01    518548.672893
                  ...      
1989-09-01    820033.569581
1989-10-01    801291.145367
1989-11-01    756075.962331
1989-12-01    795345.389792
1990-01-01    746317.734572
Freq: MS, Name: litters, Length: 120, dtype: float64

## Skforecast

In [13]:
# Skforecast with transformations
# ======================================================================================
forecaster = ForecasterRecursive(
    regressor=Ridge(),
    lags = 15,
    transformer_y=BoxCoxTransformer(),
    transformer_exog=StandardScaler(),
    differentiation=1
)
forecaster.fit(y=data_train['litters'], exog=data_train[['month', 'year']])
predictios_skforecast = forecaster.predict(steps=len(data_test), exog=data_test[['month', 'year']])
predictios_skforecast.head()

1980-02-01    423397.123996
1980-03-01    486824.513879
1980-04-01    520290.931131
1980-05-01    501044.779487
1980-06-01    525962.041524
Freq: MS, Name: pred, dtype: float64

In [14]:
results = pd.DataFrame({
    'sktime': predictions_sktime,
    'skforecast': predictios_skforecast,
})
results

Unnamed: 0,sktime,skforecast
1980-02-01,430096.815068,423397.123996
1980-03-01,472406.420587,486824.513879
1980-04-01,509203.559184,520290.931131
1980-05-01,495910.509282,501044.779487
1980-06-01,518548.672893,525962.041524
...,...,...
1989-09-01,820033.569581,557023.492560
1989-10-01,801291.145367,538724.233130
1989-11-01,756075.962331,570326.998173
1989-12-01,795345.389792,530129.452420


## Equivalent transformations

The following table shows the equivalent transformations in sktime and skforecast:

In [15]:
# Box-Cox transformation
# ======================================================================================
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sklearn.preprocessing import PowerTransformer

transformer_sktime = BoxCoxTransformer()
y_hat_sktime = transformer_sktime.fit_transform(data_train['litters'])

transformer_skforeast = PowerTransformer(method='box-cox', standardize=False)
y_hat_skforecast = transformer_skforeast.fit_transform(data_train[['litters']]).flatten()

np.testing.assert_allclose(y_hat_sktime, y_hat_skforecast)

In [16]:
# Differencing
# ======================================================================================
from sktime.transformations.series.difference import Differencer
from skforecast.preprocessing import TimeSeriesDifferentiator

transformer_sktime = Differencer(lags=1)
y_hat_sktime = transformer_sktime.fit_transform(data_train['litters'])[1:]

transformer_skforeast = TimeSeriesDifferentiator(order=1)
y_hat_skforecast = transformer_skforeast.fit_transform(data_train['litters'].to_numpy())[1:]

np.testing.assert_allclose(y_hat_sktime, y_hat_skforecast)

In [17]:
# Log transformation
# ======================================================================================
from sklearn.preprocessing import FunctionTransformer
from sktime.transformations.series.boxcox import LogTransformer

transformer_sktime = LogTransformer(offset=1)
y_hat_sktime = transformer_sktime.fit_transform(data_train['litters'])

transformer_skforeast = FunctionTransformer(func=np.log1p, inverse_func=np.expm1, validate=True)
y_hat_skforecast = transformer_skforeast.fit_transform(data_train[['litters']]).flatten()

np.testing.assert_allclose(y_hat_sktime, y_hat_skforecast)



## Grid search

In [15]:
from sklearn.preprocessing import PowerTransformer, RobustScaler
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.performance_metrics.forecasting import MeanSquaredError
from sktime.forecasting.model_selection import SlidingWindowSplitter

forecaster = make_reduction(Ridge(), window_length=15, strategy="recursive")

pipe = TransformedTargetForecaster(
    steps=[
        ("deseasonalizer", Deseasonalizer()),
        ("power", TabularToSeriesAdaptor(PowerTransformer())),
        ("scaler", TabularToSeriesAdaptor(RobustScaler())),
        ("forecaster", forecaster),
    ]
)

# Using dunder notation to access inner objects/params as in sklearn
param_grid = {
    "deseasonalizer__model": ["multiplicative", "additive"],
    "power__transformer__method": ["yeo-johnson", "box-cox"],
    "power__transformer__standardize": [True, False],
    "forecaster__estimator__alpha": [0.1, 0.5, 1.0]
}

# Cross-validation setup
cv = SlidingWindowSplitter(fh=10, window_length=len(data_train), step_length=10)

# Grid search setup
gscv = ForecastingGridSearchCV(
    forecaster=pipe,
    param_grid=param_grid,
    cv=cv,
    verbose=1,
    scoring=MeanSquaredError(square_root=True)
)

# Fit and predict
gscv.fit(data['litters'])
gscv.cv_results_.head()

Fitting 12 folds for each of 24 candidates, totalling 288 fits


KeyboardInterrupt: 

# Reproducible example issue

In [16]:
rng = np.random.RandomState(951)
y = pd.Series(rng.normal(size=20), index=pd.date_range("2020-01-01", periods=20))
exog = pd.DataFrame(rng.normal(10, 10, size=(20, 2)), index=y.index, columns=["exog1", "exog2"])
exog_test = pd.DataFrame(rng.normal(10, 10, size=(5, 2)), index=pd.date_range("2020-01-21", periods=5), columns=["exog1", "exog2"])

In [17]:
# Sktime vs skforecast without exogenous variables
# ======================================================================================
forecaster_sktime = make_reduction(LinearRegression(), window_length=3, strategy="recursive")
forecaster_sktime.fit(y=y)
predictions_sktime = forecaster_sktime.predict(fh=ForecastingHorizon([1, 2, 3]))
forecaster_skforecast = ForecasterRecursive(
    regressor=LinearRegression(),
    lags=3
)
forecaster_skforecast.fit(y=y)
predictions_skforecast = forecaster_skforecast.predict(steps=3)

np.testing.assert_allclose(predictions_sktime, predictions_skforecast)

In [18]:
# Sktime vs skforecast withexogenous variables
# ======================================================================================
forecaster_sktime = make_reduction(LinearRegression(), window_length=3, strategy="recursive")
forecaster_sktime.fit(y=y, X=exog)
predictions_sktime = forecaster_sktime.predict(fh=ForecastingHorizon([1, 2, 3]), X=exog_test)
forecaster_skforecast = ForecasterRecursive(
    regressor=LinearRegression(),
    lags=3
)
forecaster_skforecast.fit(y=y, exog=exog)
predictions_skforecast = forecaster_skforecast.predict(steps=3, exog=exog_test)

np.testing.assert_allclose(predictions_sktime, predictions_skforecast)

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 3 / 3 (100%)
Max absolute difference: 1.77123741
Max relative difference: 12.71464862
 x: array([ 0.253213, -1.359257,  2.242614])
 y: array([0.319955, 0.116031, 0.471376])

In [23]:
# Training matrices skforecast
# ======================================================================================
X_train, y_train = forecaster_skforecast.create_train_X_y(y=y, exog=exog)
X_train

Unnamed: 0,lag_1,lag_2,lag_3,exog1,exog2
2020-01-04,-1.295017,0.105323,-0.593468,-10.646148,6.218617
2020-01-05,1.026185,-1.295017,0.105323,9.175485,28.116487
2020-01-06,0.304132,1.026185,-1.295017,9.472782,12.998803
2020-01-07,-1.382771,0.304132,1.026185,9.301041,5.197434
2020-01-08,0.834075,-1.382771,0.304132,2.511584,13.38125
2020-01-09,0.370802,0.834075,-1.382771,0.159088,0.417967
2020-01-10,0.552028,0.370802,0.834075,8.779067,13.225236
2020-01-11,-0.577722,0.552028,0.370802,-13.78326,26.646572
2020-01-12,-0.647968,-0.577722,0.552028,9.142057,15.396234
2020-01-13,-1.630815,-0.647968,-0.577722,3.234469,16.391182


In [28]:
pd.concat([y, exog], axis=1)

Unnamed: 0,0,exog1,exog2
2020-01-01,-0.593468,16.408317,1.16912
2020-01-02,0.105323,10.900575,17.133099
2020-01-03,-1.295017,-5.669573,5.729836
2020-01-04,1.026185,-10.646148,6.218617
2020-01-05,0.304132,9.175485,28.116487
2020-01-06,-1.382771,9.472782,12.998803
2020-01-07,0.834075,9.301041,5.197434
2020-01-08,0.370802,2.511584,13.38125
2020-01-09,0.552028,0.159088,0.417967
2020-01-10,-0.577722,8.779067,13.225236


In [27]:
# Training matrices sktime
# ======================================================================================
# Inspect the transformed data
y_train, X_train = forecaster_sktime._transform(y, exog)
pd.DataFrame(X_train)


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,-0.593468,0.105323,-1.295017,16.408317,10.900575,-5.669573,1.16912,17.133099,5.729836
1,0.105323,-1.295017,1.026185,10.900575,-5.669573,-10.646148,17.133099,5.729836,6.218617
2,-1.295017,1.026185,0.304132,-5.669573,-10.646148,9.175485,5.729836,6.218617,28.116487
3,1.026185,0.304132,-1.382771,-10.646148,9.175485,9.472782,6.218617,28.116487,12.998803
4,0.304132,-1.382771,0.834075,9.175485,9.472782,9.301041,28.116487,12.998803,5.197434
5,-1.382771,0.834075,0.370802,9.472782,9.301041,2.511584,12.998803,5.197434,13.38125
6,0.834075,0.370802,0.552028,9.301041,2.511584,0.159088,5.197434,13.38125,0.417967
7,0.370802,0.552028,-0.577722,2.511584,0.159088,8.779067,13.38125,0.417967,13.225236
8,0.552028,-0.577722,-0.647968,0.159088,8.779067,-13.78326,0.417967,13.225236,26.646572
9,-0.577722,-0.647968,-1.630815,8.779067,-13.78326,9.142057,13.225236,26.646572,15.396234
