Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add skforecast forecaster autoreg adapter #5447

Conversation

yarnabrina
Copy link
Collaborator

@yarnabrina yarnabrina commented Oct 17, 2023

Adds a new compose forecaster SkforecastForecasterAutoreg, which is comparable to make_reduction with strategy='recursive' and uses ForecasterAutoreg from skforecast package in the backend.

TODO's for future (may or may not be in scope for this PR):

  1. support for ForecasterAutoregDirect which is comparable to strategy='direct'
    1. add a strategy parameter similar to make_reduction and accordingly make the call in _create_forecaster
    2. add a new SkforecastForecasterAutoregDirect forecaster
  2. support for ForecasterAutoregMultiSeries which is comparable to a single global model for all series in panel/hierarchical dataset with series identifier is a feature itself
  3. support for ForecasterAutoregMultiVariate which is comparable to multiple global models for all series in panel/hierarchical dataset, without series identifier is a feature but different targets
  4. switch to predict_quantiles from predict_bootstrapping + numpy.quantile in _predict_quantiles (ref. support more than 2 percentiles to be passed for predict_interval JoaquinAmatRodrigo/skforecast#572)
  5. support feature importance
  6. support missing values based on regressor - how?
  7. support passing bootstrapping arguments - during init?
  8. fix test errors - MUST for this PR

@yarnabrina yarnabrina added interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Oct 17, 2023
@JavierEscobarOrtiz
Copy link

Hello @yarnabrina

Regarding the ForecasterAutoregMultiVariate TODO, the model will only be able to predict one series at a time, even if it is learning for all of them. So the exog variables need to be related only to the target series.

Adding more series to the training matrix causes it to grow horizontally (adding more columns). The exog variables will be related to all series. However, since you are only predicting your target series, the exogenous information needs to be related to that series.

forecaster_multivariate_train_matrix_diagram

https://skforecast.org/latest/user_guides/dependent-multi-series-multivariate-forecasting

@yarnabrina yarnabrina force-pushed the add-skforecast-ForecasterAutoreg-adapter branch from 545a30e to 107df7f Compare October 22, 2023 05:55
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 25, 2023

am I interpreting the multiple issues raised in skforecast correctly that running it through the sktime test framework dredges up various bugs in skforecast that we either have to patch on the sktime side, or directly fix in skforecast?

@yarnabrina
Copy link
Collaborator Author

Yes I'm expecting a quick bugfix and enhancements in skforecast from @JavierEscobarOrtiz. These seem major enough issues to me so it's worth fixing there directly instead of a sktime patch, but I don't know how long that may take though, but I see a new branch there for v11 already.

Other than those, there may be a few changes needed in sktime as well, or some more checks to satisfy the tests. I am trying to isolate exactly what's causing those, not clear so far.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 26, 2023

my interpretation is that there are at least two issues:

  • NaN are created somewhere in the pipeline, sklearn does not like this
  • the predict_interval returns are non-conformant, as variable names are coerced to string even if they are integer. This is likely a side effect of sklearn under the hood of skforecast.

@yarnabrina
Copy link
Collaborator Author

as variable names are coerced to string even if they are integer

This is probably not from sklearn or skforecast. I am doing this before passing to fit or predict using _coerce_column_names static method myself. Before I added this, there were a lot more failures, and then I noted that make_reduction also uses this with _coerce_col_str function.

(There can still be side effect though, I am not saying that does not exist. But I'm doing myself as well.)

NaN are created somewhere in the pipeline, sklearn does not like this

This is probably the index issue I created in skforecast. If I am not wrong, sktime adds no restriction on values of indices whatsoever as long as they are date/time or integers and montonous (please correct me), but as of now it seems skforecast needs them to start from 0 if it's not range index. @JavierEscobarOrtiz confirmed it's a bug, but in the time it's fixed in skforecast, do you have any suggestion to handle in sktime, if you faced it before for some other estimators?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 27, 2023

do you have any suggestion to handle in sktime, if you faced it before for some other estimators?

The canonical solution if an interfaced estimator returns garbage indices is to overwrite them with the expected indices - as long as we are really certain that the forecasts are what we want (bad indices can indicate bad forecasts...)

The code is templateable, for _predict it is in lines 175-178 of the forecasting_supersimple extension template.

I was considering to add sth similar for _predict_interval etc, but we don't have it.
Rows are the same, columns should use self._y_index and from_product to form the expected multi-index (alpha-s and coverage-s should retain their original order).

@yarnabrina yarnabrina force-pushed the add-skforecast-ForecasterAutoreg-adapter branch from c4ba9b7 to 0135c5b Compare December 9, 2023 05:27
@yarnabrina yarnabrina force-pushed the add-skforecast-ForecasterAutoreg-adapter branch from 0135c5b to 8a583f4 Compare February 9, 2024 05:21
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can spot two problems remaining:

  • skforecast cannot deal with Index typed index, expecting RangeIndex or data index type. This fails the input check with an uninformative message (new_index accessed before assigned)
  • requested/returned data in the probabilistic case seem non-matching

I would put both down to issues with non-contiguous fh. I would suggest to address these by requesting contiguous fh, then subsetting the result.

@yarnabrina
Copy link
Collaborator Author

I have these tags set already specifically for those reasons:

"enforce_index_type": [pandas.DatetimeIndex, pandas.RangeIndex]
"capability:insample": False
"capability:pred_int": True
"capability:pred_int:insample": False

I thought the first will deal with index issue, and the rest with probabilistic predictions. Can you please tell what do these tags mean then?


Also, I don't know what data is being passed in the test, but this works for me locally. I am predict with fh where values are not-consecutive, and probablistic predictions also appeared to be in correct format. Am I overlooking something?

>>> 
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sktime.datasets import load_longley
>>> from sktime.forecasting.compose import SkforecastAutoreg
>>> 
>>> y, X = load_longley()
>>> 
>>> y = y.reset_index(drop=True)
>>> X = X.reset_index(drop=True)
>>> 
>>> y_train = y.head(n=12)
>>> 
>>> X_train = X.head(n=12)
>>> X_test = X.tail(n=4)
>>> 
>>> forecaster = SkforecastAutoreg(RandomForestRegressor(), [2, 4])
>>> 
>>> forecaster.fit(y_train, X=X_train)
SkforecastAutoreg(lags=[2, 4], regressor=RandomForestRegressor())
>>> 
>>> forecaster.predict(fh=[1, 4], X=X_test)
12    67405.08
15    67487.88
Name: TOTEMP, dtype: float64
>>> 
>>> forecaster.predict_interval(fh=[3, 2], X=X_test, coverage=[0.9, 0.95])
      TOTEMP                              
        0.90                0.95          
       lower     upper     lower     upper
13  66817.64  66817.64  66817.64  66817.64
14  66817.64  66817.64  66817.64  66817.64
>>> 
>>> forecaster.predict_quantiles(fh=range(1, 5), X=X_test, alpha=[0.8, 0.3, 0.2, 0.7])
      TOTEMP                              
         0.8       0.3       0.2       0.7
12  67859.08  67859.08  67859.08  67859.08
13  67859.08  67859.08  67859.08  67859.08
14  67859.08  67859.08  67859.08  67859.08
15  67941.88  67941.88  67941.88  67941.88
>>> 

The predictions themselves are super suspicious though - why are these constant for a given horizon?

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 9, 2024

I have these tags set already specifically for those reasons:

"enforce_index_type": [pandas.DatetimeIndex, pandas.RangeIndex]
"capability:insample": False
"capability:pred_int": True
"capability:pred_int:insample": False

I thought the first will deal with index issue, and the rest with probabilistic predictions. Can you please tell what do these tags mean then?

  • enforce_index_type: originally meant to coerce index types (and restore them in predict, this is unfortunatey not linked to any logic in the base class right now
  • capability:insample user facing for lookup - there is no logic that would raise an error
  • capability:pred_int: user facing, also controls error message if predict_interval etc are called, and tests
  • capability:pred_int:insample: user facing, also controls some tests

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 9, 2024

Also, I don't know what data is being passed in the test, but this works for me locally. I am predict with fh where values are not-consecutive, and probablistic predictions also appeared to be in correct format. Am I overlooking something?

My suspicion is, the index ends up Index (with object dtype) and not MultiIndex as it should - you can get those from _get_columns if you want

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 9, 2024

The predictions themselves are super suspicious though - why are these constant for a given horizon?

Though suspicion is cast on a third party, no?

@Abhay-Lejith
Copy link
Contributor

@fkiraly @yarnabrina Once I finish other tasks and want to start working on this, do I make another pr? Otherwise I think I will need push access to @yarnabrina's fork for my commits to be visible here right?

@yarnabrina
Copy link
Collaborator Author

@fkiraly @yarnabrina Once I finish other tasks and want to start working on this, do I make another pr? Otherwise I think I will need push access to @yarnabrina's fork for my commits to be visible here right?

Yes, a separate PR seems better. What you can do is similar to what @fnhirwa did with darts PR -> in a new branch of your fork, cherry pick commits from my fork, and continue thereupon.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 30, 2024

superseded by #6531

@fkiraly fkiraly closed this Jun 30, 2024
fkiraly pushed a commit that referenced this pull request Jul 2, 2024
Continuing with work done in #5447.

#### What does this implement/fix? Explain your changes.

An adapter for the skforecast `ForecasterAutoreg` .

#### Does your contribution introduce a new dependency? If yes, which
one?
skforecast
@yarnabrina yarnabrina deleted the add-skforecast-ForecasterAutoreg-adapter branch July 3, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants