Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Adds Permutation Importance #13146

Merged
merged 98 commits into from Jul 17, 2019

Conversation

@thomasjpfan
Copy link
Member

thomasjpfan commented Feb 12, 2019

Reference Issues/PRs

Resolves #11187

What does this implement/fix? Explain your changes.

Adds permutation importance to a model_inspection module.

TODO

  • Initial implementation.
  • Add example demonstrating the differences between permutation importance and feature_importances_ when using trees.
  • Add to user guide.
  • Support pandas dataframes.
thomasjpfan added 3 commits Feb 8, 2019
thomasjpfan added 4 commits Feb 12, 2019
Copy link
Member

jnothman left a comment

Do we want to provide a meta estimator giving feature_importances_ for use the local where that's expected?

Please also consider looking at eli5 for feature parity, and perhaps testing ideas

thomasjpfan added 2 commits Feb 13, 2019
Copy link
Member

jnothman left a comment

Hmmmm... By conducting cross validation over multiple splits, this determines feature importance for a class of model, rather than a specific model. If we are trying to inspect a specific model, surely we should not be fitting cv-many different models, but merely assessing the importance of features to prediction accuracy for the given model.

for column in columns:
with _permute_column(X_test, column, random_state) as X_perm:
feature_score = scoring(estimator, X_perm, y_test)
permutation_importance_scores.append(baseline_score -

This comment has been minimized.

Copy link
@jnothman

jnothman Feb 13, 2019

Member

What does it mean when this value is negative? Do we need to clip in that case??

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Feb 14, 2019

Author Member

Negative means that the model performed better with the feature permuted. This could mean that the feature should be dropped.

There is a paragraph about this in https://explained.ai/rf-importance/index.html at Figure 3(a)

This comment has been minimized.

Copy link
@ogrisel

ogrisel Feb 18, 2019

Member

Interesting. I think both the docstring and the user guide should explain the meaning of negative importance.

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Feb 14, 2019

Hmmmm... By conducting cross validation over multiple splits, this determines feature importance for a class of model, rather than a specific model.

This is correct. I will add a prefit option to inspect a specific model (turning off the cross validation).

The CV mode isn't inspecting the model, it is using a multiple models to find the importance of the features. It is "inspecting the data". If the scope of the inspect module is for data and model inspection, then this CV feature could be kept in.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Feb 14, 2019

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 18, 2019

+1 for focusing first on a tool used for the single (fitted) model inspection use case. Here are alternative implementations:

Then we could think of a tool for automated feature selection using a nested cross-validation loop that can be used in Pipeline as the SelectFromModel does. However, to me, it's less of a priority.

Copy link
Member

ogrisel left a comment

Because it's so cheap to resample the individual predictions (on the permuted validation set), we should take advantage of this to recompute the mean score on many resampled predictions (bootstrap estimates of the importance). I think it's very important that the default behavior of this tool makes it natural to get bootstrap confidence intervals on the feature importance (e.g. a 2.5%-97.5% percentile interval in addition to the median importance across resampled importances.

Also, the feature importance plot in the example should use horizontal mustache/ box plots to highlight the uncertainty of this feature importance estimates:

https://matplotlib.org/gallery/pyplots/boxplot_demo_pyplot.html#sphx-glr-gallery-pyplots-boxplot-demo-pyplot-py

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 18, 2019

We could even set the opacity of feature boxplots where 0 is outside of the 2.5%-97.5% range to highlight that those features are not predictive (given the others).

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Feb 18, 2019

Here are other interesting references that I have not carefully read yet:

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Feb 18, 2019

@ogrisel Thank you for all the suggestions! I will focus this PR on inspecting a single fitted model and tune the API to make it easy to get bootstrap results.

thomasjpfan added 3 commits Feb 19, 2019
…tance
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Feb 19, 2019

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Feb 20, 2019

It’s a prefix I use to mean “REFACTOR”.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Feb 20, 2019

…tance
scores : array, shape (n_features, bootstrap_samples)
Permutation importance scores
"""

This comment has been minimized.

Copy link
@amueller

amueller Mar 5, 2019

Member

Needs a reference - and a user guide!

thomasjpfan added 2 commits Mar 14, 2019
…tance
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 9, 2019

I like returning bunches, though it means we need to support both __getattr__ and __getitem__ during any deprecation cycle if we decide to change it. Though I guess that shouldn't be that big an issue.

What I like about cross_validate and grid search resuling in dicts is that they can easily be turned into dataframes. I'm not sure that makes sense here.

Previously we had been very careful not to add classes, even though things like cv_results_ could benefit from more logic. The plotting infrastructure that @thomasjpfan is working on would add lots of classes so if we go that route (which I hope) this might be a bit of a change of policy?

thomasjpfan added 2 commits Jul 9, 2019
…tance
Copy link
Member

amueller left a comment

Can you please check my and guillaume's suggestions and address the remaining comments? I'd really like to merge this.

Permutation feature importance is a model inspection technique that can be used
for any `fitted` `estimator` when the data is rectangular. This is especially
useful for non-linear or opaque `estimators`. The permutation feature
importance is defined to be the decrease in a model score when the feature

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
Suggested change
importance is defined to be the decrease in a model score when the feature
importance is defined to be the decrease in a model score when a single feature
useful for non-linear or opaque `estimators`. The permutation feature
importance is defined to be the decrease in a model score when the feature
value is randomly shuffled [1]_. This procedure breaks the relationship between
the feature and the target, thus the drop in the model score is analogous to

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
Suggested change
the feature and the target, thus the drop in the model score is analogous to
the feature and the target, thus the drop in the model score is indicative of
always important to evaluate the predictive power of a model using a held-out
set (or better with cross-validation) prior to computing importances.

Relation to feature importance in trees

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
Suggested change
Relation to feature importance in trees
Relation to impurity-based importance in trees
---------------------------------------

Tree based models provides a different measure of own feature importances based
on the mean decrease in the splitting criterion. This gives importance to

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
Suggested change
on the mean decrease in the splitting criterion. This gives importance to
on the mean decrease in impurity (MDI, impurity meaning the splitting criterion). This gives importance to
on the mean decrease in the splitting criterion. This gives importance to
features that may not be predictive on unseen data. The permutation feature
importance avoids this issue, since it can be applied to unseen data.
Furthermore, the tree importance computed based on the impurity decrease of

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
Suggested change
Furthermore, the tree importance computed based on the impurity decrease of
Furthermore, impurity-based feature importance for trees

(need some more rewrite in the next line)

feature_names = []
for col, cats in zip(categorical_columns, ohe.categories_):
for cat in cats:
feature_names.append("{}_{}".format(col, cat))

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member

Not addressed?

In this example, we compute the permutation importance on the Wisconsin
breast cancer dataset using :func:`~sklearn.inspection.permutation_importance`.
The :class:`~sklearn.ensemble.RandomForestClassifier` can easily get about 97%
accuracy on a test dataset with a unsurprising tree impurity based feature

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member

I don't get this sentence. What's unsurprising? Maybe just remove this part?

plt.show()

##############################################################################
# Next, we pick a threshold to group our features into clusters and choose a

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
Suggested change
# Next, we pick a threshold to group our features into clusters and choose a
# Next, we manually pick a threshold by visual inspection of the dendrogram to group our features into clusters and choose a
X /= X_std

lr = LinearRegression().fit(X, y)
expected_importances = 2 * lr.coef_**2

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member

Please add a comment that this can be computed in closed form

def test_permutation_importance_linear_regresssion():
X, y = make_regression(n_samples=500, n_features=10, random_state=0)

y -= y.mean()

This comment has been minimized.

Copy link
@amueller

amueller Jul 16, 2019

Member
X = scale(X)
y = scale(y)
thomasjpfan added 3 commits Jul 16, 2019
Copy link
Member

amueller left a comment

lgtm

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 17, 2019

my browser is working great for me these days..

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 17, 2019

I think there were only nitpicks after @ogrisels approval, so merging.

@amueller amueller merged commit d1c52f4 into scikit-learn:master Jul 17, 2019
17 checks passed
17 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 1 new alert
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.85%)
Details
codecov/project Absolute coverage decreased by -0.19% but relative coverage increased by +3.14% compared to c0c5313
Details
scikit-learn.scikit-learn Build #20190716.43 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@parrt

This comment has been minimized.

Copy link

parrt commented Jul 17, 2019

Hooray! Great work guys! :) @jph00, check it out.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 23, 2019

The things that happen while you're on the ski slopes. Congrats, @thomasjpfan!

@Akanxa01

This comment has been minimized.

Copy link

Akanxa01 commented Jul 26, 2019

Hi Everyone,

Thanks for improving the usability for feature selection through ML. I have been trying to use from sklearn.inspection import permutation_importance but it throws an error: ImportError: cannot import name 'permutation_importance'
What am I missing?
Any help would be really appreciated!
PS: Python version: 3.7.3 and sklearn version: 0.21.2

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 26, 2019

@Akanxa01

This comment has been minimized.

Copy link

Akanxa01 commented Jul 26, 2019

Thanks for your response @jnothman, I am planning to use it for a critical project. Is it safe to use the nightly-build yet and has it been tested for all the bugs? If not, I'll wait to use it for my next project.

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Jul 26, 2019

has it been tested for all the bugs

It has been tested for all the bugs... that we encountered so far.

After a major version release, users may find edge-case bugs that we couldn't catch.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 30, 2019

This doesn't have a what's new entry!!

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 30, 2019

Added what's new in 9a6f05e

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 30, 2019

Feel free to tweak it.

jnothman added a commit that referenced this pull request Jul 30, 2019
@kool7d

This comment has been minimized.

Copy link

kool7d commented Aug 22, 2019

This permutation importance is giving me only zeroes no matter how I choose the settings. Everything else works fine, including the default importance.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 22, 2019

@kool7d please open an issue with code to reproduce the issue. It's likely that you have strongly correlated or uninformative features. Saying that "default importances work fine" means that they don't detect the issue.

@iba3

This comment has been minimized.

Copy link

iba3 commented Nov 1, 2019

Does the X, y arguments of this function take into account the transformations done within a pipeline setting if a pipeline is passed as the estimator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.