Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature selection based on permutation importance #15075

Open
qinhanmin2014 opened this issue Sep 24, 2019 · 8 comments
Open

Feature selection based on permutation importance #15075

qinhanmin2014 opened this issue Sep 24, 2019 · 8 comments

Comments

@qinhanmin2014
Copy link
Member

We now support permutation importance, maybe it's reasonable to support feature selection method based on it (e.g., remove features whose premutation importance < 0).

@amueller
Copy link
Member

Yes, I think so. There is discussion about this in #9606.
I'm not aware of any literature on this, though?

@qinhanmin2014
Copy link
Member Author

Ahh, I saw the comment from @thomasjpfan #9606 (comment)

I'm not aware of any literature on this, though?

No, but
(1) It's straightforward and reasonable IMO
(2) eli5 talks about it in their doc https://eli5.readthedocs.io/en/latest/blackbox/permutation_importance.html#feature-selection
They make use of SelectFromModel by introducing a class PermutationImportance. Permutation importance is stored as feature_importances_ in that class. I think this implementation is awkward and we can do better.
(3) I don't think it's difficult to find some related posts from Kaggle, e.g., https://www.kaggle.com/c/ieee-fraud-detection/discussion/108575#latest-636647

@cailurus
Copy link

cailurus commented Oct 2, 2019

Feature selection by permutation is common in Kaggle and other tabular data competition. This
method has been introduced on 《ESL 2nd》page 593. https://web.stanford.edu/~hastie/Papers/ESLII.pdf

@qinhanmin2014
Copy link
Member Author

This method has been introduced on 《ESL 2nd》page 593.

I don't think ESL mention feature selection based on permutation importance? It only mentions permutation importance based on RF (and that'a actually different from the permutation importance we've implemented.)

@qinhanmin2014
Copy link
Member Author

Maybe we can go ahead to discuss how to implement it.
Thomas prefer to integrate it into SelectFromModel, but I think that will be difficult, because sometimes we fit the model using the traning set, and calculate the permutation importance using the validation set.
Perhaps we can introduce a new class (e.g., SelectFromPermutationImportance, or a shorter name).

@ysunmi0427
Copy link
Contributor

How's it going? If no one is there, I would like to start working on this issue.

I agree with @qinhanmin2014 that we need to create new class. In my opinion, the new class will derive SelectFromModel to reuse other parts of original class except calculating importances.

@jnothman
Copy link
Member

jnothman commented Jan 27, 2020 via email

@MaxwellLZH
Copy link
Contributor

Hi, I've opened a PR which adds another parameter importance_type to feature_selection.SelectFromModel, which could be set to "permutation" to use permutation importance for feature selection. Any advice or feedback would be really appreciated, thank you !

@cmarmo cmarmo removed the help wanted label Jul 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants