[BUG] pmdarima estimators break when `X` contains more indices than the forecasting horizon #3657

d4nielmeyer · 2022-10-27T13:47:46Z

Describe the bug
I have a dataset containing 1 endogenous and 5 exogenous variables and would like to do cross-validation (via ExpandingWindowSplitter) to evaluate several AutoARIMA models. In particular, my intention was to use an initial_window = 3, step_length = 1 and fh = [1, 2, 3] (as illustrated below).

| * * * x x x - - - |
| * * * * x x x - - |
| * * * * * x x x - |
| * * * * * * x x x |

X (exogenous variables) is of shape (16, 5), while y (endogenous variable) is of shape (16, ). After running temporal cross-validation I get train and test indices of length 3 for the first split -> train [0, 1, 2] and test [3, 4, 5], which is what I would expect. However, within the temporal CV when I split the data into X/y-train/test I get y_train of shape (3, ), y_test (3, ), X_train (3, 5) but X_test (6, 5). Since X_test: n_rows=6 doesn't match n_periods=3 it is not accepted to be passed inside the predict-method (inside the evaluate-method). As a consequence I get the following error notification:

ValueError('X array dims (n_rows) != n_periods')

To Reproduce

To reproduce the error I used a simple, but similar dataset from the sktime library:

from sktime.datasets import load_longley
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.forecasting.arima import AutoARIMA
from sktime.performance_metrics.forecasting import MeanAbsoluteError
import numpy as np

y, X = load_longley()

forecaster = AutoARIMA()
cv = ExpandingWindowSplitter(initial_window=3, step_length=1, fh=np.arange(1, 4))
loss = MeanAbsoluteError()

results = evaluate(forecaster=forecaster, y=y, X=X, cv=cv, error_score='raise', scoring=loss)

Expected behavior
I am pretty aware of the facts that:

If you fit with exogenous, you must predict with exogenous
When you are predicting with exogenous, your dimensions must match.

But to some extent I would expect that (S)ARIMAX-models were capable of handling inputs at inference time in a way like:
A)
input:
------------------ y(t-3)
------------------ y(t-2)
------------------ y(t-1)
x1(t) x2(t) x3(t)

output:
y(t)
y(t+1)
y(t+2)
y(t+3)

or B):
input:
x1(t-3) x2(t-3) x3(t-3) y(t-3)
x1(t-2) x2(t-2) x3(t-2) y(t-2)
x1(t-1) x2(t-1) x3(t-1) y(t-1)

output:
y(t)
y(t+1)
y(t+2)
y(t+3)

Versions
System:
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00) [Clang 13.0.1 ]
executable: /Users/dmr/miniforge3/envs/explain/bin/python
machine: macOS-12.6-arm64-arm-64bit

Python dependencies:
pip: 22.3
setuptools: 65.4.1
sklearn: 1.1.2
sktime: 0.13.4
statsmodels: 0.13.2
numpy: 1.23.3
scipy: 1.8.1
pandas: 1.4.4
matplotlib: 3.6.0
joblib: 1.2.0
numba: 0.56.2
pmdarima: 1.8.5
tsfresh: 0.17.0

Thanks for any comments/ advice!

The text was updated successfully, but these errors were encountered:

fkiraly · 2022-10-29T18:33:14Z

bug confirmed on python 3.8.12, windows, current main

fkiraly · 2022-10-29T19:05:46Z

diagnosed the error: the problem is not evaluate, but the pmdarima interfaces which break when X that is larger than the forecasting horizon is passed (which evaluate does).

fkiraly · 2022-10-29T19:11:46Z

FYI, see #3657 for a discussion of the deeper issue. Comments or suggestions appreciated.

fkiraly · 2022-11-11T16:59:29Z

I think this is another instance of this bug: #3763

…s than forecasting horizon (#3667) Fixes #3657. The bug was caused by `pmdarima` models breaking when the `X` passed was strictly larger than the indices in the forecasting horizon. The example code in #3657 has been added as a test (with minor generalization to cover more estimators). In the future, we should probably also add test scenarios where `X` is strictly larger than the forecasting horizon. Depends on: * #4474, which fixes a bug that was masked by #3657 * #4483 for slicing `X`

d4nielmeyer added the bug Something isn't working label Oct 27, 2022

MatthewMiddlehurst added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Oct 27, 2022

fkiraly added this to Needs triage & validation in Bugfixing via automation Oct 29, 2022

fkiraly moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing Oct 29, 2022

fkiraly changed the title ~~[BUG] AutoARIMA: Evaluate-method does not accept shape of exogenous variables after ExpandingWindowSplitting~~ [BUG] pmdarima estimators break when X contains more indices than the forecasting horizon Oct 29, 2022

This was referenced Oct 29, 2022

& SzymonStolarski [BUG] fix pmdarima interfaces breaking for X containing more indices than forecasting horizon #3667

Merged

[ENH] forecasting: consider testing and/or treating the case where X indices are strictly larger than y/fh indices #3668

Open

fkiraly moved this from Reproduced/confirmed to Under review in Bugfixing Oct 29, 2022

fkiraly mentioned this issue Oct 30, 2022

[BUG] statsmodels interfaces break for X containing more indices than forecasting horizon #3670

Closed

fkiraly mentioned this issue Nov 18, 2022

[BUG] 'evaluate' method cannot handle TransformedTargetForecaster where the time series data is transformed into several features #3795

Closed

fkiraly closed this as completed in #3667 Apr 21, 2023

Bugfixing automation moved this from Under review to Fixed/resolved Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] pmdarima estimators break when `X` contains more indices than the forecasting horizon #3657

[BUG] pmdarima estimators break when `X` contains more indices than the forecasting horizon #3657

d4nielmeyer commented Oct 27, 2022 •

edited

fkiraly commented Oct 29, 2022 •

edited

fkiraly commented Oct 29, 2022

fkiraly commented Oct 29, 2022

fkiraly commented Nov 11, 2022

[BUG] pmdarima estimators break when X contains more indices than the forecasting horizon #3657

[BUG] pmdarima estimators break when X contains more indices than the forecasting horizon #3657

Comments

d4nielmeyer commented Oct 27, 2022 • edited

fkiraly commented Oct 29, 2022 • edited

fkiraly commented Oct 29, 2022

fkiraly commented Oct 29, 2022

fkiraly commented Nov 11, 2022

[BUG] pmdarima estimators break when `X` contains more indices than the forecasting horizon #3657

[BUG] pmdarima estimators break when `X` contains more indices than the forecasting horizon #3657

d4nielmeyer commented Oct 27, 2022 •

edited

fkiraly commented Oct 29, 2022 •

edited