New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid calculating feature importance for multiple times in SelectFromModel #15169
Comments
The convention in python is that a property/attributes/descriptor should
not be expensive to `__get__`, so in theory it is the estimator's
responsibility to memoise.
|
Any examples in scikit-learn? I agree that it's a good idea but we can only add/modify attributes in |
I thought you were talking about the expense of retrieving
feature_importances_ ... That's the attribute I mean which can be
implemented as an expensive property... What expense are you referring to?
|
Yes, so we should avoid retrieving feature_importances for multiple times. For those classes who use
Sorry I can't understand this part. |
Oh right. I forgot things. I raised a related discussion in #7491
|
@jnothman do you think it's possible to store feature importance after we calculate it? This seems like a good idea. |
Currently, in SelectFromModel, we calculate feature importance during
transform
. If users do feature selection based on the training set and transform both the training set and the testing set, they'll need to calculate feature importance for multiple times. Actually calculating feature importance is sometimes time-consuming (e.g., a large xgboost model), so I think we should figure out a way to avoid this.I don't have a solution. We can't calculate and store feature importance during
fit
because whenprefit=True
, we allow users to calltransform
directly. We can't store feature importance duringtransform
because we can't add/modify attributes during transform.The text was updated successfully, but these errors were encountered: