Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid hasattr(est, expensive-to-calculate-attribute) #7491

Closed
jnothman opened this issue Sep 26, 2016 · 13 comments · Fixed by #11698
Closed

Avoid hasattr(est, expensive-to-calculate-attribute) #7491

jnothman opened this issue Sep 26, 2016 · 13 comments · Fixed by #11698
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices Sprint

Comments

@jnothman
Copy link
Member

#7481 raises the issue that checking the presence of some attributes can be expensive and #7487 fixed this for feature_importances_. We should identify what other attributes (particularly underscore-suffixed) are currently being checked with hasattr in the codebase and identify any that are risky by way of being defined as properties that are or may be expensive to compute.

grep hackers welcome.

@jnothman jnothman added Moderate Anything that requires some knowledge of conventions and best practices Need Contributor labels Sep 26, 2016
@jnothman
Copy link
Member Author

ping @fukatani in case you're interested.

@fukatani
Copy link
Contributor

fukatani commented Sep 26, 2016

OK! I'm interested in this issue, I'll work on it!

I think @property is not appropriate for expensive method primarily.
I'll also check it.

@jnothman
Copy link
Member Author

It's not. But some things can't be changed that easily!

On 26 September 2016 at 22:07, fukatani notifications@github.com wrote:

OK! I'm interested in this issue, I'll work it!

I think @Property is not appropriate for expensive method primarily.
I'll also check it.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7491 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz69J7yp_OoYdVqctx8tcxr-wX6AaOks5qt7VogaJpZM4KGZHj
.

@fukatani
Copy link
Contributor

fukatani commented Oct 9, 2016

I'm sorry for slowly response.
First I extracted all hasattr calling in scikit learn.
And calling in test code is removed.

Result is here.
https://docs.google.com/spreadsheets/d/1PYXnP5vf4ahEuXV-fUG-EuAEc1pjJ_z5KngcXQDp-jg/pubhtml?gid=0&single=true

@fukatani
Copy link
Contributor

fukatani commented Oct 9, 2016

Next, I extract all attributes which @property is defined.
And I also check these attributes existence is checked by hasattr.

https://docs.google.com/spreadsheets/d/1PYXnP5vf4ahEuXV-fUG-EuAEc1pjJ_z5KngcXQDp-jg/pubhtml?gid=1304236032&single=true

@fukatani
Copy link
Contributor

fukatani commented Oct 9, 2016

So next, we have to check @property calculation cost which attribute is checked by hasattr.
Listed here.

  • multiclass.py(379) coef_
  • pipeline.py(510) classes_
  • cluster\hierarchical.py(839) fit_predict
  • ensemble\voting_classifier.py(213) predict_proba
  • linear_model\ridge.py(797) classes_
  • linear_model\ridge.py(1320) classes_
  • linear_model\stochastic_gradient.py(726) predict_proba
  • linear_model\stochastic_gradient.py(806) predict_log_proba
  • neural_network\multilayer_perceptron.py(629) partial_fit
  • neural_network\multilayer_perceptron.py(952) partial_fit
  • svm\base.py(482) coef_
  • svm\base.py(588) predict_proba
  • svm\base.py(627) predict_log_proba

@jnothman
Copy link
Member Author

jnothman commented Oct 9, 2016

classes_ should be trivial. predict* and partial_fit must be designed in a way that they're cheap, as they're only expensive once they've got arguments.

It's worth checking the coef_s but they're likely to not be outrageously expensive.

@jnothman
Copy link
Member Author

jnothman commented Oct 9, 2016

thanks

@fukatani
Copy link
Contributor

fukatani commented Oct 9, 2016

I got it.

@adrinjalali
Copy link
Member

Thinking about this issue and going through some of the instances of hasattr(..., 'classes_'), I've got a question: is the task to change all such instances of hasattr into getattr regardless of where they are, or should it be only if we know the attribute is a @property?

For instance, forest.py has such an instance of a call to hasattr, but there the classes_ is not a @property so it's okay and it's cheap. However, it'll be easier to just change all such calls just in case.

@jnothman
Copy link
Member Author

jnothman commented Jul 27, 2018 via email

@adrinjalali
Copy link
Member

In that case, I guess there isn't much to do (see #11698)

@jnothman
Copy link
Member Author

Thanks for following this up, @adrinjalali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices Sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants