Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid calculating feature importance for multiple times in SelectFromModel #15169

Open
qinhanmin2014 opened this issue Oct 10, 2019 · 6 comments

Comments

@qinhanmin2014
Copy link
Member

Currently, in SelectFromModel, we calculate feature importance during transform. If users do feature selection based on the training set and transform both the training set and the testing set, they'll need to calculate feature importance for multiple times. Actually calculating feature importance is sometimes time-consuming (e.g., a large xgboost model), so I think we should figure out a way to avoid this.
I don't have a solution. We can't calculate and store feature importance during fit because when prefit=True, we allow users to call transform directly. We can't store feature importance during transform because we can't add/modify attributes during transform.

@qinhanmin2014 qinhanmin2014 changed the title Avoid Calculating feature importance for multiple times in SelectFromModel Avoid calculating feature importance for multiple times in SelectFromModel Oct 10, 2019
@jnothman
Copy link
Member

jnothman commented Oct 10, 2019 via email

@qinhanmin2014
Copy link
Member Author

so in theory it is the estimator's responsibility to memoise.

Any examples in scikit-learn? I agree that it's a good idea but we can only add/modify attributes in fit?

@jnothman
Copy link
Member

jnothman commented Oct 10, 2019 via email

@qinhanmin2014
Copy link
Member Author

I thought you were talking about the expense of retrieving feature_importances_

Yes, so we should avoid retrieving feature_importances for multiple times. For those classes who use @property, we need to calculate feature importance every time we access feature importance.

That's the attribute I mean which can be implemented as an expensive property... What expense are you referring to?

Sorry I can't understand this part.

@jnothman
Copy link
Member

jnothman commented Oct 10, 2019 via email

@qinhanmin2014
Copy link
Member Author

in theory it is the estimator's responsibility to memoise.

@jnothman do you think it's possible to store feature importance after we calculate it? This seems like a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants