-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] Classifier and regressor tags #4418
[MRG + 2] Classifier and regressor tags #4418
Conversation
@mblondel for using ranking metrics on regressors, that is only possible with ground-truth being binary, right? Is that your use case? |
|
306d788
to
bc10d8f
Compare
|
|
For ndcg (not yet in scikit learn) any real is fine.
|
|
Should be good now. |
|
In which extend does it break backward compatibility with pickled model? |
If you unpickle a class that has a new property added, the unpickled object will have the class attribute. So |
LGTM. Since it's a core design issue, it would be nice to have other opinions. random pings @larsmans @ogrisel @agramfort @jnothman Thanks for tackling this @amueller! |
Thanks for the review :) I agree we should have more opinions, but I think it is a pretty clear-cut improvement. |
Argh, I need to write some docs though |
@@ -266,6 +266,7 @@ def __repr__(self): | |||
############################################################################### | |||
class ClassifierMixin(object): | |||
"""Mixin class for all classifiers in scikit-learn.""" | |||
estimator_type = "classifier" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should make this variable private: people not implementing an estimator should not be looking at it / modifying it. Only people coding a certain class should be touching it. This is pretty much the definition of an class-level private variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the point of this tag system is to let third-party developers implement classes without inheriting from our base classes. Private variables can be changed. On the other hand, a third-party developer doesn't want his code to break because we changed variable names. We need to make this tag part of our API and commit to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the point of this tag system is to let third-party developers implement
classes without inheriting from our base classes. Private variables can be
changed.On the other hand, a third-party developer doesn't want his code to
break because we changed variable names. We need to make this tag part
of our API and commit to it.
Agreed, but it's going to make tab completion ugly, and it's going to
worry/confuse our users.
In a sense, this is the same thing as an add function, which is a
private variable. So in Python land, typing-related class information are
usually coded why private attributes, I would say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the __ in add is just a way to emphasize that this is a special
method rather than a private one. If the goal is to not mess w/ tab
completion I am +1 with the _ prefix but the variable should really be
documented as part of our public API (i.e. we won't change it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 03/24/2015 08:49 PM, Mathieu Blondel wrote:
I think the __ in add is just a way to emphasize that this is a
special method rather than a private one. If the goal is to not mess
w/ tab completion I am +1 with the _ prefix but the variable should
really be documented as part of our public API (i.e. we won't change it).
+1
I documented it in the dev docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanation of the |
|
|
@landscape-bot doesn't like me accessing private attributes ^^ |
I'm not sure where @mblondel's comment went. I agree the point is to make it accessible to third-party developers without inheritance. But I'm not sure why this means it can't be private. |
:func:`cross_validation.cross_val_score` defaults to being stratified when used | ||
on a classifier, but not otherwise. Similarly, scorers for average precision | ||
that take a continuous prediction need to call ``decision_function`` for classifiers, | ||
but ``predict`` for regressors. This destinction between classifiers and regressors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distinction
Still +1 for merge on my side. |
|
||
def is_classifier(estimator): | ||
"""Returns True if the given estimator is (probably) a classifier.""" | ||
estimator = _get_sub_estimator(estimator) | ||
return isinstance(estimator, ClassifierMixin) | ||
return getattr(estimator, "_estimator_type", None) == "classifier" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we enforce that all estimators should have a _estimator_type
tag?
def is_classifier(estimator):
"""Returns True if the given estimator is a classifier."""
if not hasattr(estimator, "_estimator_type"):
raise ValueError("The given estimator instance does not have a _estimator_type tag.")
return estimator._estimator_type.lower() == "classifier"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would not work with user defined estimators
currently, but this would be helpful in framing a generic estimator test framework as wished for in #3810
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then this should be in the test framework, not the code. So people that want to be strict can run their tests, but people that don't care can still run their sloppy but sklearn compatible code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense... Thanks for the comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@@ -1075,6 +1076,7 @@ def _decision_function(self, X): | |||
predict_stages(self.estimators_, X, self.learning_rate, score) | |||
return score | |||
|
|||
@deprecated(" and will be removed in 0.19") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also staged_decision_function.
(Ping @pprett )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that, but you are right, it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Shall we merge? |
I'll rebase (master changed). Maybe @ogrisel @GaelVaroquaux or @agramfort (that's what you get for commenting on issues) want to have a look. |
c9a9f34
to
d89c215
Compare
|
|
||
Parameters | ||
---------- | ||
X : array-like, shape = [n_samples, n_features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X : array-like, shape (n_samples, n_features)
can you go over the docstrings to make sure we use this convention here too?
besides docstring nitpicks LGTM |
|
fixed the docstrings. |
ping @ogrisel @GaelVaroquaux maybe check if anyone at the sprint opposes, otherwise merge? |
is implemented using the ``_estimator_type`` attribute, which takes a string value. | ||
It should be ``"classifier"`` for classifiers and ``"regressor"`` for regressors, | ||
to work as expected. Inheriting from ``ClassifierMixin`` or ``RegressorMixin`` will | ||
set the attribute automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add 'clusterer' here, as I see it is used in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defined it, but didn't use it. I can add it to the docs or remove it from the mixin, I wasn't sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the docs are updated to add clusterer, +1 to merge. |
Done. Thanks for the reviews :) merging. |
[MRG + 2] Classifier and regressor tags
|
Merged #4418.
Hurray! Thanks
|
This improved
is_classifier
by making it not depend on inheritance and introducesis_regressor
.It also fixes #2588, using ranking scorers on regressors.
Todo:
estimator_type
in the dev docsdecision_function
from all regressors.