Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pmdarima estimators break when X contains more indices than the forecasting horizon #3657

Closed
d4nielmeyer opened this issue Oct 27, 2022 · 4 comments · Fixed by #3667
Closed
Labels
bug Something isn't working module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects

Comments

@d4nielmeyer
Copy link

d4nielmeyer commented Oct 27, 2022

Describe the bug
I have a dataset containing 1 endogenous and 5 exogenous variables and would like to do cross-validation (via ExpandingWindowSplitter) to evaluate several AutoARIMA models. In particular, my intention was to use an initial_window = 3, step_length = 1 and fh = [1, 2, 3] (as illustrated below).

| * * * x x x - - - |
| * * * * x x x - - |
| * * * * * x x x - |
| * * * * * * x x x |

X (exogenous variables) is of shape (16, 5), while y (endogenous variable) is of shape (16, ). After running temporal cross-validation I get train and test indices of length 3 for the first split -> train [0, 1, 2] and test [3, 4, 5], which is what I would expect. However, within the temporal CV when I split the data into X/y-train/test I get y_train of shape (3, ), y_test (3, ), X_train (3, 5) but X_test (6, 5). Since X_test: n_rows=6 doesn't match n_periods=3 it is not accepted to be passed inside the predict-method (inside the evaluate-method). As a consequence I get the following error notification:

ValueError('X array dims (n_rows) != n_periods')

To Reproduce

To reproduce the error I used a simple, but similar dataset from the sktime library:

from sktime.datasets import load_longley
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.forecasting.arima import AutoARIMA
from sktime.performance_metrics.forecasting import MeanAbsoluteError
import numpy as np

y, X = load_longley()

forecaster = AutoARIMA()
cv = ExpandingWindowSplitter(initial_window=3, step_length=1, fh=np.arange(1, 4))
loss = MeanAbsoluteError()

results = evaluate(forecaster=forecaster, y=y, X=X, cv=cv, error_score='raise', scoring=loss)

Expected behavior
I am pretty aware of the facts that:

  • If you fit with exogenous, you must predict with exogenous
  • When you are predicting with exogenous, your dimensions must match.

But to some extent I would expect that (S)ARIMAX-models were capable of handling inputs at inference time in a way like:
A)
input:
------------------ y(t-3)
------------------ y(t-2)
------------------ y(t-1)
x1(t) x2(t) x3(t)

output:
y(t)
y(t+1)
y(t+2)
y(t+3)

or B):
input:
x1(t-3) x2(t-3) x3(t-3) y(t-3)
x1(t-2) x2(t-2) x3(t-2) y(t-2)
x1(t-1) x2(t-1) x3(t-1) y(t-1)

output:
y(t)
y(t+1)
y(t+2)
y(t+3)

Versions
System:
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00) [Clang 13.0.1 ]
executable: /Users/dmr/miniforge3/envs/explain/bin/python
machine: macOS-12.6-arm64-arm-64bit

Python dependencies:
pip: 22.3
setuptools: 65.4.1
sklearn: 1.1.2
sktime: 0.13.4
statsmodels: 0.13.2
numpy: 1.23.3
scipy: 1.8.1
pandas: 1.4.4
matplotlib: 3.6.0
joblib: 1.2.0
numba: 0.56.2
pmdarima: 1.8.5
tsfresh: 0.17.0

Thanks for any comments/ advice!

@d4nielmeyer d4nielmeyer added the bug Something isn't working label Oct 27, 2022
@MatthewMiddlehurst MatthewMiddlehurst added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Oct 27, 2022
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Oct 29, 2022
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 29, 2022

bug confirmed on python 3.8.12, windows, current main

@fkiraly fkiraly moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing Oct 29, 2022
@fkiraly fkiraly changed the title [BUG] AutoARIMA: Evaluate-method does not accept shape of exogenous variables after ExpandingWindowSplitting [BUG] pmdarima estimators break when X contains more indices than the forecasting horizon Oct 29, 2022
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 29, 2022

diagnosed the error: the problem is not evaluate, but the pmdarima interfaces which break when X that is larger than the forecasting horizon is passed (which evaluate does).

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 29, 2022

FYI, see #3657 for a discussion of the deeper issue. Comments or suggestions appreciated.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 11, 2022

I think this is another instance of this bug: #3763

fkiraly added a commit that referenced this issue Apr 21, 2023
…s than forecasting horizon (#3667)

Fixes #3657.

The bug was caused by `pmdarima` models breaking when the `X` passed was
strictly larger than the indices in the forecasting horizon.

The example code in #3657 has been added as a test (with minor
generalization to cover more estimators).

In the future, we should probably also add test scenarios where `X` is
strictly larger than the forecasting horizon.

Depends on:
* #4474, which fixes a bug that was
masked by #3657
* #4483 for slicing `X`
Bugfixing automation moved this from Under review to Fixed/resolved Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
Bugfixing
Fixed/resolved
3 participants