Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__repr__ not that helpful #6323

Closed
amueller opened this issue Feb 9, 2016 · 13 comments · Fixed by #11705

Comments

@amueller
Copy link
Member

@amueller amueller commented Feb 9, 2016

Maybe this should be an enhancement proposal...
So I think our current __repr__ is not that helpful.
Most construction parameters are default parameters that are never seen by a user, so reporting them is basically noise.
We don't report other important things, though, like whether the model was fitted at all, or, like, what the training score is, or the training time.
I know R has a very different approach, and I'm not sure their approach is good. But I think our current approach is pretty suboptimal.
I think the __repr__ has become more important because of the popularity of jupyter notebook. If I run fit, I get the __repr__ back. And it is just noise.

A slight improvement might be to just print the construction parameters that are not set to the default value. But we could also think about something more helpful, and maybe something more model specific. GridSearchCV for example could report the best score, and the best parameters found etc.

I have been thinking about this for a while, but this is somewhat inspired by looking at #5299. Why does adding a faster solver change the __repr__ of PCA? That seems really weird to me. PCA is one of the simplest and most commonly used methods in ML, in particular in courses. Now people will constantly see something about tol and number of iterations that is probably not relevant for them at all.

@betatim

This comment has been minimized.

Copy link
Contributor

@betatim betatim commented Feb 10, 2016

I like the idea of removing parameters with default values.

I do like that __repr__ produces a valid python expression that I can use to "recreate" the estimator. I think this is worth keeping. Can we use __str__ for a more human output?

In the notebook context you could register formatters (can't find a better link :-/) for estimators for "rich" displaying.

@vighneshbirodkar

This comment has been minimized.

Copy link
Contributor

@vighneshbirodkar vighneshbirodkar commented Feb 10, 2016

@amueller
We can overload __repr__ in all the base classes, which will use inspect to look at which arguments are not their default values. But I am not sure how we can keep a track of fitting time without changing all the derived fit functions.

@amueller

This comment has been minimized.

Copy link
Member Author

@amueller amueller commented Oct 7, 2016

Someone want to add a PR for removing the default values?

I feel like having in there whether the model is fitted or not would be nice, but I'm not entirely sure how to achieve that. We currently have the check_is_fitted in predict but that requires data to call. We could move that to a private method that can be called without data but that seems like a big change.

I'm not sure how to incorporate the fit status into the repr while still providing a useful python expression.
We could work on custom display code...

@amueller amueller added this to the 0.19 milestone Oct 7, 2016
@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Oct 8, 2016

I've been wondering about custom HTML display code for a little while...

On 8 October 2016 at 10:24, Andreas Mueller notifications@github.com
wrote:

Someone want to add a PR for removing the default values?

I feel like having in there whether the model is fitted or not would be
nice, but I'm not entirely sure how to achieve that. We currently have the
check_is_fitted in predict but that requires data to call. We could move
that to a private method that can be called without data but that seems
like a big change.

I'm not sure how to incorporate the fit status into the repr while still
providing a useful python expression.
We could work on custom display code...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6323 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6wBS33uOqbtT94L0X_xGti6Pnmm0ks5qxtS2gaJpZM4HW1aU
.

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 8, 2016

I like the idea of removing parameters with default values.

The drawback is that they are then harder to discover.

I do like that repr produces a valid python expression that I can use to
"recreate" the estimator. I think this is worth keeping. Can we use str for
a more human output?

+1. It also matches the description of what repr and str should be

It's good to keep in mind that repr of an estimator is what's going
to be used in some thing like warnings or error messages. Hence it should
not be too verbose (which goes in the direction of removing default
arguments, and hence contradicts my position above :$ ).

@amueller

This comment has been minimized.

Copy link
Member Author

@amueller amueller commented Oct 8, 2016

Actually I might take a stab at this this weekend (if I ever finish my talk).
I think my first draft would allow the user to configure if they want the full parameters or the changed parameters or additional info or not.
We can indeed also use __repr__ vs __str__ vs jupyter html to distinguish these....

@amueller

This comment has been minimized.

Copy link
Member Author

@amueller amueller commented Oct 8, 2016

Hm ok maybe __repr__ is the changed parameters and __str__ is the additional info for a first try.
I would include in the additional info for all classes (if they have it):
classes_, n_outputs_, n_features, maybe add n_features everywhere - possibly via a property.
And if any of these are set, we know it was fit. If not, it was not and the __str__ is just the __repr__.

Additionally, each model can add their own stuff in __str__, like depth of trees, number of non-zero coefficients in sparse models, number of support vectors etc. This might be something we let the user disable? Anyhow, I'll draft that today or tomorrow and then we can see if we like it.

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 8, 2016

@amueller

This comment has been minimized.

Copy link
Member Author

@amueller amueller commented Oct 8, 2016

That's why I was thinking about making it optional and maybe off by default. sklearn.set_print(verbose=True)?

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 8, 2016

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Oct 8, 2016

True that it's consistent with pandas and numpy.

On 9 October 2016 at 07:29, Gael Varoquaux notifications@github.com wrote:

That's why I was thinking about making it optional and maybe off by
default. sklearn.set_print(verbose=True)?

Something like this sound about right. It's also consistent with things
in pandas and numpy.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#6323 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61CrKHOXP40ZDn-_EH5sQqyonxvZks5qx_0xgaJpZM4HW1aU
.

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Oct 9, 2016

I'd be interested in experimenting with a jupyter-friendly HTML rendering of estimators, with the benefit of that being more flexible:

  • interactivity that does not advantage either the beginner/expert;
  • visual rather than textual markup;
  • not affecting doctests;
  • unlikely to upset people if enhanced/changed from version to version

After mocking that up, or even merging it, we can then see what is worth porting to repr strings via options and changing defaults. I might try to mock something up, but would welcome someone else's attempt as my time is very limited atm, and I'd like to focus on other features.

Initially I envision something closely replicating the repr structure, but with:

  • a list of unchanged params is initially collapsed and openable;
  • perhaps even changed params are also collapsed initially;
  • fittedness is shown by an icon which upon clicking lists fitted attributes; perhaps each of these can be inspected in the GUI;
  • parameter values that are complex may be best collapsed too;
  • maybe Pipeline and FeatureUnion should have specialised visualisation so that a mix of them shows their structure clearly (e.g. with a vertical grid for Pipeline and a horizontal grid for FeatureUnion).
@amueller

This comment has been minimized.

Copy link
Member Author

@amueller amueller commented Oct 9, 2016

that sounds interesting but also somewhat fragile. And as I said, we currently don't even have a way to see if something was fitted or not. I guess you could check if there are any trailing underscore attributes?
I think going in small steps is good. And I agree, jupyter-based changes are less likely to break anyones code. My changes are all optional. We could limit them to _repr_pretty_ if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.