Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add permutation based feature importance? #11187

Open
amueller opened this Issue Jun 2, 2018 · 17 comments

Comments

Projects
None yet
6 participants
@amueller
Copy link
Member

amueller commented Jun 2, 2018

I think adding permutation based feature importances would be cool:
https://link.springer.com/article/10.1186%2F1471-2105-8-25

There is a python package that does it for our random forests with bagging:
https://github.com/parrt/random-forest-importances

But I'd rather like to see a generic permutation based importance score with cross-validation or hold-out.
I think this would be great analysis tool.

One easy way to implement it would be to provide a function for plotting etc. But Ideally we'd be able to use it in feature selection, I think, so we'd need to create a meta-estimator PermutationImportanceCV that only provides feature_importances_, I guess, so it can be wrapped with SelectFromModel?

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Jun 2, 2018

Also see this blog post:
http://parrt.cs.usfca.edu/doc/rf-importance/index.html

I'm not suggesting we change our RF feature importances, but having a more expensive but possibly higher quality alternative would be great.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 3, 2018

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Jun 4, 2018

I'm not suggesting we change our RF feature importances, but having a more expensive but possibly higher quality alternative would be great.

It was something we thought about with @ogrisel.

Moreover, the implementation describes in the blog post is model agnostic if I remember right.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Jun 4, 2018

Their implementation is not model agnostic, but the general algorithm is. And I'd be happy to upstream from eli5 given that this seems a relatively natural and accepted method.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 5, 2018

@VineethKanaparthi

This comment has been minimized.

Copy link

VineethKanaparthi commented Jun 16, 2018

I am waiting for this method to be included. We need reliable results more than anything even if it's not fast. We prefer not be wrong anytime. I implemented it myself for my model.(for a competition)

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 17, 2018

@datajanko

This comment has been minimized.

Copy link
Contributor

datajanko commented Jul 20, 2018

I think PermutationImportanceCV is a good approach. I also see the advantage of a uniform interface to all functions. The question is, wouldn't a keyword argument for the random forest classes, to use the a different feature importance algorithm? We also provide multiple impurity measures. So why not maintain multiple importance measures?

@datajanko

This comment has been minimized.

Copy link
Contributor

datajanko commented Aug 1, 2018

the rfpimp package allows to treat multiple features as one. I think it would make sense to include this as well (maybe later). What are your thoughts on that? I think, it would require changes in SelectFromModel. But this also would raise immediately the question, if one could be more flexible when splitting features in trees. E.g. defining some kind of feature group one could enforce that e.g. latitude and longitude appearing always together in splits.

@datajanko

This comment has been minimized.

Copy link
Contributor

datajanko commented Aug 3, 2018

So instead of permuting, I tried an additional approach presented in the articles: dropping columns.

To show some code: The FeatureImportanceCV could roughly look like this:

class FeatureImportanceCV(BaseEstimator, MetaEstimatorMixin):

def __init__(self, estimator, cv=KFold(3)):
    self.estimator = estimator
    self.cv = cv
    
def fit(self, X, y):
    n_splits = self.cv.get_n_splits()
    fold_importances = np.zeros((n_splits, X.shape[1]))
    for i, (train, test) in enumerate(self.cv.split(y)):
        baseline_fold_i = self.estimator.fit(X[train], y[train]).score(X[test], y[test])
        except_feature_fold_i = [(self.estimator.fit(np.delete(X[train], j, axis=1),y[train])
                                                     .score(np.delete(X[test], j, axis=1), y[test])) 
                               for j in np.arange(X.shape[1])]
        fold_importances[i, :] = baseline_fold_i - except_feature_fold_i
    self.feature_importances_ = np.mean(fold_importances, axis=0)

Would this be the way to go? Of course the list comprehension should be replaces by something less dense. Permuting can be integrated here easily as well (except for maybe introducing more .copy()calls).

What should the class be capable of? Am I doing something completely wrong, just by looking at it.

I was able to run this on my machine with:

X, y = make_classification(n_samples=10, n_features=5, n_clusters_per_class=1, n_informative=3)
ficv = FeatureImportanceCV(LogisticRegression())
ficv.fit(X, y)
ficv.feature_importances_
array([0.38888889, 0.        , 0.19444444, 0.        , 0.11111111])

Additionally, the call to SelectFromModel works as well:

sfm = SelectFromModel(ficv).fit(X, y)
sfm.transform(X)

chooses a subset of the data

array([[-0.58617384, -1.30229985],
       [-0.52552405, -1.07458527],
       [ 0.94055998, -0.91529262],
       [-1.04839412, -0.2912939 ],
       [ 0.22273055, -1.36454126],
       [ 1.32253391, -1.68018815],
       [-0.13443558, -0.66608385],
       [ 0.29920044,  0.44704055],
       [ 1.94086301, -3.04870667],
       [ 0.4591779 ,  0.22705557]])
@parrt

This comment has been minimized.

Copy link

parrt commented Sep 2, 2018

I do a drop column thing too in rfpimp. It gives slightly different results but takes longer due to all the retraining.

@datajanko

This comment has been minimized.

Copy link
Contributor

datajanko commented Sep 3, 2018

compared to what? I think doing the random permutations and re_fitting should have the same complexity. You should only see the performance gains when you are computing the feature importance out-of-bag. But this is what you meant, right?

@parrt

This comment has been minimized.

Copy link

parrt commented Sep 3, 2018

Well, isn't it the case that training a model is vastly more expensive than using the model? It is always that way in my experience, except for kNN. If you have 50 features you have to retrain the model 50 times for drop col importance, which could take overnight to train even once. Permuting and running predictions should be many times faster.

@datajanko

This comment has been minimized.

Copy link
Contributor

datajanko commented Sep 3, 2018

Of course it is, and I'd really like to see the feature directly in sklearn for random forests. However, this feature importance approach might be useful also for different classifiers. Using for example dask (and dask-ml) one might be able to scale this quite easily.

@parrt

This comment has been minimized.

Copy link

parrt commented Sep 3, 2018

I was just referring to your comment on complexity. Training complexity is very different from the complexity for running samples through, but now you are agreeing the cost is higher? Just because you turn on a few extra core won't change the complexity. Whatever, drop column feature importance is still useful, which is why I implemented it as well. Yep, drop column and permutation importance works with any model.

@datajanko

This comment has been minimized.

Copy link
Contributor

datajanko commented Sep 3, 2018

I think there is a slight misunderstanding: Instead of dropping a column, you could also just permute it and train again with the permuted column. This has almost the same complexity as just dropping the column. I understood your package to also provide that option (but maybe I just overread a small detail in your really nice remarks concerning the feature importances).

However, one could of course also just sample from the data a specific proportion and score before and after permuting the columns on that sample (one even might want to fit on a smaller set, but if I recall, those ideas are already mentioned in the details to your package)

@parrt

This comment has been minimized.

Copy link

parrt commented Sep 3, 2018

Ok, I think we are in agreement! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.