[ENH] Access the actual features (including lagged y) used as input for the model inside a reduction forecaster #5644

hliebert · 2023-12-19T14:36:02Z

I would like to access the actual X features used as input for the model inside a reducer. It appears these are not currently stored (unless I've missed something).

Would it be possible to store them (or provide a method to recreate them)? One prominent use case for this is that the actual values are required to compute shapley values.

A simple example is given below. _y and _X are stored for the RecursiveTabularRegressionForecaster, but not the actual input passed to the nested estimator.

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sktime.datasets import load_longley
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import make_reduction

y, X = load_longley()
horizon = ForecastingHorizon(np.arange(1, 4), is_relative=True)

random_forest = RandomForestRegressor()
reduction_forecaster = make_reduction(
    random_forest, window_length=5, strategy="recursive"
)

reduction_forecaster.fit(y, X, fh=horizon)
reduction_forecaster.predict(fh=horizon, X=X)

reduction_forecaster._X
reduction_forecaster._y

The text was updated successfully, but these errors were encountered:

yarnabrina · 2023-12-19T15:01:16Z

Just out of my curiosity, can I ask how are you planning to use Shapley values here?

For example, strategy="direct" will use multiple models, and "actuals" are going to vary for different forecast horizons with strategy="recursive" if I understand correctly (in the sense fh=1 will only use last window_length training observations, fh=2 will use last window_length - 1 training observations and prediction for fh=1 and so on).

So, what shall you pass as "actual" values to shap?

(As said, it's not related to your issue, but out of my curiosity as I am also planning to have some explainability component for my office work.)

hliebert · 2023-12-19T15:15:19Z

In my current application I care most about a few horizons only (1-3), and later horizons are increasingly less relevant. For now I'm simply planning to just look at the values for the first few horizons separately.

I'm aware of the issues with features varying by horizon in the recursive case, I'm not sure what the best solution is. Maybe offering a method to recreate the input for a given horizon is easier than storing it?

fkiraly · 2023-12-22T14:43:52Z

Related, this request by @yarnabrina to access the internal data in two-step exogenous forecast:
#5598

I wonder whether there should be a programmatic way to access internal preprocessed data.

fkiraly · 2023-12-22T14:45:26Z

From a design perspective, we could "dump" the formatted data in a _X-like argument, although we should be cautious as this can blow up the pickle size etc.

A "nicer" way would be to also allow the forecaster to act as a transformer, which is now possible with the object_type tag that can have multiple types. This was introduced to allow polymorphism for the graphical pipeline, FYI @benHeid.

There used to be a transform method, so I wonder whether this can simply be reactivated.

fkiraly · 2023-12-22T14:46:25Z

On a related note, there is a transformer (not much used afaik) which also addresses the issue, the ReducerTransform, which could be used for shapley values.

hliebert · 2023-12-29T12:31:27Z

From a design perspective, we could "dump" the formatted data in a _X-like argument, although we should be cautious as this can blow up the pickle size etc.

A "nicer" way would be to also allow the forecaster to act as a transformer, which is now possible with the object_type tag that can have multiple types. This was introduced to allow polymorphism for the graphical pipeline, FYI @benHeid.

There used to be a transform method, so I wonder whether this can simply be reactivated.

I'd be fine with either, although I'm not sure how "dumping" would look like with the recursive reducer.

hliebert · 2023-12-29T12:45:25Z

On a related note, there is a transformer (not much used afaik) which also addresses the issue, the ReducerTransform, which could be used for shapley values.

Thanks for pointing this out, I'll have a look. Does this mean the API reference on the homepage is incomplete? I've looked through the list of transformers, and this one isn't listed.

fkiraly · 2024-01-02T12:44:43Z

Does this mean the API reference on the homepage is incomplete

Yes, sorry.

The issue is, we are not sure how to test "estimator is not present on API reference page".
It should be picked up by the all_estimators utility though, since that is programmatic and crawls the package.

I've added it, and the new direct reducer prototype to the API reference:
#5690

fkiraly · 2024-01-02T12:45:17Z

although I'm not sure how "dumping" would look like with the recursive reducer.

I think that is precisely @yarnabrina's question, i.e., what should it even do.

hliebert · 2024-01-04T18:37:02Z

although I'm not sure how "dumping" would look like with the recursive reducer.

I think that is precisely @yarnabrina's question, i.e., what should it even do.

Maybe just dump _X in a dictionary by horizon? Or provide a method that takes horizon as argument and returns X after fitting.

Thanks for updating the docs!

hliebert added the enhancement Adding new functionality label Dec 19, 2023

fkiraly added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Dec 19, 2023

fkiraly added this to rework in reduction forecasters issues Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Access the actual features (including lagged y) used as input for the model inside a reduction forecaster #5644

[ENH] Access the actual features (including lagged y) used as input for the model inside a reduction forecaster #5644

hliebert commented Dec 19, 2023

yarnabrina commented Dec 19, 2023

hliebert commented Dec 19, 2023 •

edited

Loading

fkiraly commented Dec 22, 2023

fkiraly commented Dec 22, 2023

fkiraly commented Dec 22, 2023

hliebert commented Dec 29, 2023

hliebert commented Dec 29, 2023 •

edited

Loading

fkiraly commented Jan 2, 2024

fkiraly commented Jan 2, 2024 •

edited

Loading

hliebert commented Jan 4, 2024

[ENH] Access the actual features (including lagged y) used as input for the model inside a reduction forecaster #5644

[ENH] Access the actual features (including lagged y) used as input for the model inside a reduction forecaster #5644

Comments

hliebert commented Dec 19, 2023

yarnabrina commented Dec 19, 2023

hliebert commented Dec 19, 2023 • edited Loading

fkiraly commented Dec 22, 2023

fkiraly commented Dec 22, 2023

fkiraly commented Dec 22, 2023

hliebert commented Dec 29, 2023

hliebert commented Dec 29, 2023 • edited Loading

fkiraly commented Jan 2, 2024

fkiraly commented Jan 2, 2024 • edited Loading

hliebert commented Jan 4, 2024

hliebert commented Dec 19, 2023 •

edited

Loading

hliebert commented Dec 29, 2023 •

edited

Loading

fkiraly commented Jan 2, 2024 •

edited

Loading