Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

& SzymonStolarski [BUG] fix pmdarima interfaces breaking for X containing more indices than forecasting horizon #3667

Merged
merged 19 commits into from Apr 21, 2023

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Oct 29, 2022

Fixes #3657.

The bug was caused by pmdarima models breaking when the X passed was strictly larger than the indices in the forecasting horizon.

The example code in #3657 has been added as a test (with minor generalization to cover more estimators).

In the future, we should probably also add test scenarios where X is strictly larger than the forecasting horizon.

Depends on:

@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting bugfix Fixes a known bug or removes unintended behavior labels Oct 29, 2022
@fkiraly fkiraly changed the title [BUG] fix pmdarima interfaces for X containing more indices than forecasting horizon [BUG] fix pmdarima and statsmodels interfaces for X containing more indices than forecasting horizon Oct 29, 2022
@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 29, 2022

strange: this test fails under 3.10 but not under 3.8

from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.utils.estimator_checks import check_estimator

check_estimator(TransformedTargetForecaster, fixtures_to_run="test_predict_time_index_with_X[TransformedTargetForecaster-1-y:2cols-fh=1-range-int-True]", return_exceptions=False)

@fkiraly fkiraly changed the title [BUG] fix pmdarima and statsmodels interfaces for X containing more indices than forecasting horizon [BUG] fix pmdarima interfaces for X containing more indices than forecasting horizon Oct 29, 2022
@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 29, 2022

this seems to be related to statsmodels, not sure what is happening there.

Reverting the change to the statsmodels adapter and moving to separate issue for investigation.

@fkiraly fkiraly changed the title [BUG] fix pmdarima interfaces for X containing more indices than forecasting horizon [BUG] pmdarima interfaces breaking for X containing more indices than forecasting horizon Oct 30, 2022
@fkiraly fkiraly changed the title [BUG] pmdarima interfaces breaking for X containing more indices than forecasting horizon [BUG] fix pmdarima interfaces breaking for X containing more indices than forecasting horizon Oct 30, 2022
@fkiraly fkiraly marked this pull request as draft October 30, 2022 12:27
@SzymonStolarski
Copy link

Hi @fkiraly,

It's been quite some time since this has been created, however I was wondering if it would be possible to go back to this PR and check the issue with the tests in order to fix #3657?

I've been also playing around with the change in the pmdarima adapter proposed in this PR and it looks like the evaluate funtion works fine. One thing, however, that I noticed is the fact that this change loses the possibility of quickly passing X assumptions without PeriodIndex in predict, e.g.:

y = load_airline()
X = pd.DataFrame({
    'exo_1': y*5,
})

forecaster = ARIMA()
forecaster.fit(y=y, X=X) # Here we are passing data the standard way with `PeriodIndex`

# Here we would like to check the predictions passing quickly exo assumptions
forecaster.predict(fh=[1,2,3], X=pd.Series([123, 213, 321]) 

Will throw an error KeyError: "None of [PeriodIndex(['1961-01', '1961-02', '1961-03'], dtype='period[M]')] are in the [index]".

I assume predict would need to check if data passed in X contains PeriodIndex and based on that decide if:

# PeriodIndex case
fh_abs = fh.to_absolute(self.cutoff).to_pandas()
if X is not None:
    X = X.loc[fh_abs]

or

# No PeriodIndex case
fh_idx = fh.to_indexer(self.cutoff)
if X is not None:
    X = X.loc[fh_idx]

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 14, 2023

I assume predict would need to check

That might be a fix? PR with attempted fixes are appreciated - even if they fail, we learn sth.
Apologies, @SzymonStolarski, for not working on this - the "obvious" fix didn't work and failures were python version dependent.

Maybe we should have a look how this behaves after the pandas 2 compatibility upgrade.

Also FYI @yarnabrina, who has recently been working on the _StatsModelsAdapter, perhaps he can spot sth. Return objects are highly idiosyncratic and not subject to a unified interface in statsmodels (or pmdarima which depends heavily on statsmodels).

@fkiraly fkiraly marked this pull request as ready for review April 16, 2023 14:55
fkiraly added a commit that referenced this pull request Apr 21, 2023
This PR adds two arguments to `get_slice` to allow exclusive/inclusive
bounding at either and of the slice.

The default of the two new arguments leads to current behaviour, so no
deprecation is necessary.

This will be useful as a utility, and also for fixing
#3667
@fkiraly fkiraly merged commit 41c1baa into main Apr 21, 2023
22 checks passed
@fkiraly fkiraly deleted the evaluate-bigger-X branch April 21, 2023 20:40
@fkiraly fkiraly changed the title [BUG] fix pmdarima interfaces breaking for X containing more indices than forecasting horizon & SzymonStolarski [BUG] fix pmdarima interfaces breaking for X containing more indices than forecasting horizon Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a known bug or removes unintended behavior module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] pmdarima estimators break when X contains more indices than the forecasting horizon
2 participants