Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Get feature importance or coef from reduction forecaster #3849

Open
iki77 opened this issue Nov 29, 2022 · 6 comments
Open

[ENH] Get feature importance or coef from reduction forecaster #3849

iki77 opened this issue Nov 29, 2022 · 6 comments
Labels
enhancement Adding new functionality feature request New feature or request

Comments

@iki77
Copy link

iki77 commented Nov 29, 2022

Is your feature request related to a problem? Please describe.
There is no way to find out the feature importance (non linear models) and coef (linear model) from reduction forecaster

Describe the solution you'd like
Impelemnt feature_importance and coef to reduction forecaster and forecastingpipeline if it wraps around the reduction forecaster

@iki77 iki77 added the enhancement Adding new functionality label Nov 29, 2022
@fkiraly fkiraly added the feature request New feature or request label Nov 29, 2022
@fkiraly
Copy link
Collaborator

fkiraly commented Nov 29, 2022

have you tried the get_fitted_params method? That should produce feature importances or coefficients for the reduction foreaster.

If it does not, code for a call forecaster.get_fitted_params() after fitting it on sktime dummy data would be appreciated.

@iki77
Copy link
Author

iki77 commented Nov 29, 2022

Thanks, I can get the feature importances but it is not easy for beginner

  • Reduction forecaster with regressor as estimator:
    forecaster.get_fitted_params()["estimator"].feature_importances_
  • Reduction forecaster with regressor pipeline as estimator:
    forecaster.get_fitted_params()["estimator"].named_steps["regressor"].feature_importances_
  • Forecasting pipeline with reduction forecaster:
    forecaster_pipe.get_fitted_params()["forecaster"].get_fitted_params()["estimator"].named_steps["regressor"].feature_importances_

It would be helpful if reduction forecaster and forecasting pipeline to have feature_importances_ & coef_ that function as a shortcut

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 29, 2022

ok, does it not show up at the key "estimator__feature_importances" if you use version 0.14.0?
In the second step under the key "estimator__named_steps__regressor__feature_importances"_

If not, it is a bug, please report with full code that produces the first example.

The second one might not be solvable though as sklearn does not have get_fitted_params interfaces.

@iki77
Copy link
Author

iki77 commented Nov 30, 2022

Only case 1 works, here is the code

import pandas as pd

from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sktime.forecasting.compose import make_reduction
from sktime.forecasting.compose import ForecastingPipeline
from sktime.transformations.series.summarize import WindowSummarizer

y = _make_hierarchical(n_columns=3, random_state=42)
X = y.iloc[:, :-1]
y = y.iloc[:, -1:]

regressor = RandomForestRegressor(random_state=42)
regressor_pipe = Pipeline(
    steps=[("regressor", regressor)]
)

# CASE 1 - WORKS
forecaster_1 = make_reduction(
    regressor,
    scitype="tabular-regressor",
    transformers=[WindowSummarizer(n_jobs=1)],
    window_length=None,
    strategy="recursive",
    pooling="global",
)
forecaster_1.fit(y=y, X=X)
forecaster_1.get_fitted_params()["estimator__feature_importances"] 

# CASE 2 - DOES NOT WORK
forecaster_2 = make_reduction(
    regressor_pipe,
    scitype="tabular-regressor",
    transformers=[WindowSummarizer(n_jobs=1)],
    window_length=None,
    strategy="recursive",
    pooling="global",
)
forecaster_2.fit(y=y, X=X)
forecaster_2.get_fitted_params()["estimator__named_steps__regressor__feature_importances"] # DOES NOT EXIST

# CASE 3
forecaster_pipe = ForecastingPipeline(
    steps=[
        ("exog_dynamics", WindowSummarizer(n_jobs=1, target_cols=X.columns.tolist())),
        ("forecaster", forecaster_2),
    ]
)
forecaster_pipe.fit(y=y, X=X)

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 1, 2022

Ah!

Thanks for the example.

I think this is an instance of an incomplete feature, i.e., implementing get_fitted_params for the forecasting pipelines here: item 5 on the list in
#1497

In general, the special composites with get_params overrides also need get_fitted_params overrides; ordinary composites should be covered by the base class already.

I'll have a look at it, should not be too difficult to add given the existing default implementations.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 1, 2022

example 3 is addressed by this: #3863

example 2 is more difficult, since the fitted parameters are not passed on within sklearn, and the "underscore at the end" syntax for fitted parameters is not consistent within sklearn itself - e.g., it's called named_steps rather than named_steps_.

To fix this, one would have to write an estimator crawler specifically for scikit-learn - it might be easier to integrate a coherent get_fitted_params interface with sktime? Here's the package for it: https://github.com/sktime/skbase ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants