[MRG + 2] Classifier and regressor tags #4418

amueller · 2015-03-19T20:49:06Z

This improved is_classifier by making it not depend on inheritance and introduces is_regressor.
It also fixes #2588, using ranking scorers on regressors.
Todo:

Describe the estimator_type in the dev docs
remove decision_function from all regressors.
use tag to decide whether to call predict or decision_function in multi_class.

amueller · 2015-03-19T20:49:36Z

@mblondel for using ranking metrics on regressors, that is only possible with ground-truth being binary, right? Is that your use case?

landscape-bot · 2015-03-19T20:57:31Z

Repository health increased by 0.00% when pulling 306d788 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

No new problems were introduced.
2 problems were fixed (including 0 errors and 0 code smells).

landscape-bot · 2015-03-19T21:05:34Z

Repository health increased by 0.00% when pulling bc10d8f on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

1 new problem was found (including 0 errors and 1 code smell).
2 problems were fixed (including 0 errors and 0 code smells).

landscape-bot · 2015-03-19T22:09:43Z

Repository health increased by 0.00% when pulling a39d237 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

1 new problem was found (including 0 errors and 1 code smell).
2 problems were fixed (including 0 errors and 0 code smells).

mblondel · 2015-03-19T23:05:59Z

For ndcg (not yet in scikit learn) any real is fine.
On Mar 20, 2015 5:50 AM, "Andreas Mueller" notifications@github.com wrote:

@mblondel https://github.com/mblondel for using ranking metrics on
regressors, that is only possible with ground-truth being binary, right? Is
that your use case?

—
Reply to this email directly or view it on GitHub
#4418 (comment)
.

landscape-bot · 2015-03-20T18:50:34Z

Repository health increased by 0.00% when pulling ef712e2 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

1 new problem was found (including 0 errors and 1 code smell).
2 problems were fixed (including 0 errors and 0 code smells).

amueller · 2015-03-20T18:51:39Z

Should be good now.

landscape-bot · 2015-03-20T20:30:36Z

Repository health increased by 0.00% when pulling 6e50c77 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

1 new problem was found (including 0 errors and 1 code smell).
2 problems were fixed (including 0 errors and 0 code smells).

arjoly · 2015-03-21T09:44:17Z

In which extend does it break backward compatibility with pickled model?

amueller · 2015-03-23T15:06:20Z

If you unpickle a class that has a new property added, the unpickled object will have the class attribute. So is_classifier will work on "old" objects.

mblondel · 2015-03-24T06:55:17Z

LGTM. Since it's a core design issue, it would be nice to have other opinions. random pings @larsmans @ogrisel @agramfort @jnothman

Thanks for tackling this @amueller!

amueller · 2015-03-24T15:01:29Z

Thanks for the review :) I agree we should have more opinions, but I think it is a pretty clear-cut improvement.

amueller · 2015-03-24T15:01:56Z

Argh, I need to write some docs though

GaelVaroquaux · 2015-03-24T15:05:39Z

sklearn/base.py

@@ -266,6 +266,7 @@ def __repr__(self):
 ###############################################################################
 class ClassifierMixin(object):
    """Mixin class for all classifiers in scikit-learn."""
+    estimator_type = "classifier"


I think that we should make this variable private: people not implementing an estimator should not be looking at it / modifying it. Only people coding a certain class should be touching it. This is pretty much the definition of an class-level private variable.

But the point of this tag system is to let third-party developers implement classes without inheriting from our base classes. Private variables can be changed. On the other hand, a third-party developer doesn't want his code to break because we changed variable names. We need to make this tag part of our API and commit to it.

But the point of this tag system is to let third-party developers implement
classes without inheriting from our base classes. Private variables can be
changed.

On the other hand, a third-party developer doesn't want his code to
break because we changed variable names. We need to make this tag part
of our API and commit to it.

Agreed, but it's going to make tab completion ugly, and it's going to
worry/confuse our users.

In a sense, this is the same thing as an add function, which is a
private variable. So in Python land, typing-related class information are
usually coded why private attributes, I would say.

I think the __ in add is just a way to emphasize that this is a special
method rather than a private one. If the goal is to not mess w/ tab
completion I am +1 with the _ prefix but the variable should really be
documented as part of our public API (i.e. we won't change it).

On 03/24/2015 08:49 PM, Mathieu Blondel wrote:

I think the __ in add is just a way to emphasize that this is a
special method rather than a private one. If the goal is to not mess
w/ tab completion I am +1 with the _ prefix but the variable should
really be documented as part of our public API (i.e. we won't change it).
+1
I documented it in the dev docs.

the variable should really be documented as part of our public API (i.e. we won't change it).

Agreed.

amueller · 2015-03-24T15:17:48Z

I added an explanation of the estimator_type attribute to the "Roll your own estimator" docs because this is when people need to know about it.
Is there a documentation of is_classifier somewhere? Should it also go to the developer docs? If so, where?

landscape-bot · 2015-03-24T15:26:32Z

Repository health increased by 0.00% when pulling 06f3b75 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

1 new problem was found (including 0 errors and 1 code smell).
2 problems were fixed (including 0 errors and 0 code smells).

landscape-bot · 2015-03-24T15:45:43Z

Repository health decreased by 0.01% when pulling 4db5b73 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

3 new problems were found (including 0 errors and 3 code smells).
2 problems were fixed (including 0 errors and 0 code smells).

coveralls · 2015-03-24T16:06:45Z

Coverage increased (+0.03%) to 95.11% when pulling 4db5b73 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

amueller · 2015-03-24T16:12:01Z

@landscape-bot doesn't like me accessing private attributes ^^

amueller · 2015-03-24T16:21:00Z

I'm not sure where @mblondel's comment went. I agree the point is to make it accessible to third-party developers without inheritance. But I'm not sure why this means it can't be private.

mblondel · 2015-03-25T02:48:17Z

doc/developers/index.rst

+:func:`cross_validation.cross_val_score` defaults to being stratified when used
+on a classifier, but not otherwise. Similarly, scorers for average precision
+that take a continuous prediction need to call ``decision_function`` for classifiers,
+but ``predict`` for regressors. This destinction between classifiers and regressors


distinction

mblondel · 2015-03-25T02:49:52Z

Still +1 for merge on my side.

raghavrv · 2015-03-25T18:51:40Z

sklearn/base.py


 def is_classifier(estimator):
    """Returns True if the given estimator is (probably) a classifier."""
-    estimator = _get_sub_estimator(estimator)
-    return isinstance(estimator, ClassifierMixin)
+    return getattr(estimator, "_estimator_type", None) == "classifier"


Could we enforce that all estimators should have a _estimator_type tag?

def is_classifier(estimator): """Returns True if the given estimator is a classifier.""" if not hasattr(estimator, "_estimator_type"): raise ValueError("The given estimator instance does not have a _estimator_type tag.") return estimator._estimator_type.lower() == "classifier"

This would not work with user defined estimators currently, but this would be helpful in framing a generic estimator test framework as wished for in #3810

But then this should be in the test framework, not the code. So people that want to be strict can run their tests, but people that don't care can still run their sloppy but sklearn compatible code.

That makes sense... Thanks for the comment :)

But then this should be in the test framework, not the code. So people that want to be strict can run their tests, but people that don't care can still run their sloppy but sklearn compatible code.

I am not sure that we want to enforce that choice, but if we do, I agree with Andy that it should be in the test framework.

landscape-bot · 2015-03-25T20:49:06Z

Repository health decreased by 0.01% when pulling 2041386 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

3 new problems were found (including 0 errors and 3 code smells).
2 problems were fixed (including 0 errors and 0 code smells).

coveralls · 2015-03-25T21:02:21Z

Coverage increased (+0.03%) to 95.11% when pulling 2041386 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

arjoly · 2015-03-26T14:53:14Z

sklearn/ensemble/gradient_boosting.py

@@ -1075,6 +1076,7 @@ def _decision_function(self, X):
        predict_stages(self.estimators_, X, self.learning_rate, score)
        return score

+    @deprecated(" and will be removed in 0.19")


There is also staged_decision_function.

(Ping @pprett )

I was wondering about that, but you are right, it should be removed.

landscape-bot · 2015-03-26T15:38:07Z

Repository health decreased by 0.01% when pulling c9a9f34 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

4 new problems were found (including 0 errors and 4 code smells).
3 problems were fixed (including 0 errors and 1 code smell).

mblondel · 2015-03-31T06:27:13Z

Shall we merge?

amueller · 2015-03-31T23:57:34Z

I'll rebase (master changed). Maybe @ogrisel @GaelVaroquaux or @agramfort (that's what you get for commenting on issues) want to have a look.

landscape-bot · 2015-04-01T00:05:25Z

Repository health decreased by 0.01% when pulling d89c215 on amueller:classifier_regressor_tags into e2dfd23 on scikit-learn:master.

4 new problems were found (including 0 errors and 4 code smells).
3 problems were fixed (including 0 errors and 1 code smell).

agramfort · 2015-04-01T08:27:23Z

sklearn/svm/base.py

+
+        Parameters
+        ----------
+        X : array-like, shape = [n_samples, n_features]


X : array-like, shape (n_samples, n_features)

can you go over the docstrings to make sure we use this convention here too?

agramfort · 2015-04-01T08:27:58Z

besides docstring nitpicks LGTM

landscape-bot · 2015-04-01T14:39:06Z

Repository health decreased by 0.01% when pulling 188fb11 on amueller:classifier_regressor_tags into e2dfd23 on scikit-learn:master.

4 new problems were found (including 0 errors and 4 code smells).
3 problems were fixed (including 0 errors and 1 code smell).

amueller · 2015-04-01T14:49:20Z

fixed the docstrings.

amueller · 2015-04-02T14:50:35Z

ping @ogrisel @GaelVaroquaux maybe check if anyone at the sprint opposes, otherwise merge?

GaelVaroquaux · 2015-04-02T14:52:44Z

doc/developers/index.rst

+is implemented using the ``_estimator_type`` attribute, which takes a string value.
+It should be ``"classifier"`` for classifiers and ``"regressor"`` for regressors,
+to work as expected. Inheriting from ``ClassifierMixin`` or ``RegressorMixin`` will
+set the attribute automatically.


You should add 'clusterer' here, as I see it is used in the code.

I defined it, but didn't use it. I can add it to the docs or remove it from the mixin, I wasn't sure.

I defined it, but didn't use it. I can add it to the docs or remove it from the mixin, I wasn't sure.

I think that it is useful to specify it. People may use it later, and fixing the term is good.

GaelVaroquaux · 2015-04-02T15:10:26Z

Once the docs are updated to add clusterer, +1 to merge.

amueller · 2015-04-02T15:15:47Z

Done. Thanks for the reviews :) merging.

[MRG + 2] Classifier and regressor tags

landscape-bot · 2015-04-02T15:20:06Z

Repository health decreased by 0.01% when pulling acb21bb on amueller:classifier_regressor_tags into e2dfd23 on scikit-learn:master.

4 new problems were found (including 0 errors and 4 code smells).
3 problems were fixed (including 0 errors and 1 code smell).

GaelVaroquaux · 2015-04-02T17:07:50Z

Merged #4418.

Hurray! Thanks

amueller force-pushed the classifier_regressor_tags branch from 306d788 to bc10d8f Compare March 19, 2015 21:01

amueller changed the title ~~[WIP] Classifier and regressor tags~~ [MRG] Classifier and regressor tags Mar 20, 2015

amueller changed the title ~~[MRG] Classifier and regressor tags~~ [MRG + 1] Classifier and regressor tags Mar 24, 2015

GaelVaroquaux reviewed Mar 24, 2015
View reviewed changes

mblondel reviewed Mar 25, 2015
View reviewed changes

raghavrv reviewed Mar 25, 2015
View reviewed changes

arjoly reviewed Mar 26, 2015
View reviewed changes

Add tags to classifiers and regressors to identify them as such.

d89c215

amueller force-pushed the classifier_regressor_tags branch from c9a9f34 to d89c215 Compare April 1, 2015 00:00

agramfort reviewed Apr 1, 2015
View reviewed changes

COSMIT use consistent shape description in docstring.

188fb11

amueller changed the title ~~[MRG + 1] Classifier and regressor tags~~ [MRG + 2] Classifier and regressor tags Apr 1, 2015

GaelVaroquaux reviewed Apr 2, 2015
View reviewed changes

DOC adding clusterer tag to dev docs.

acb21bb

amueller added a commit that referenced this pull request Apr 2, 2015

Merge pull request #4418 from amueller/classifier_regressor_tags

aae305c

[MRG + 2] Classifier and regressor tags

amueller merged commit aae305c into scikit-learn:master Apr 2, 2015

amueller deleted the classifier_regressor_tags branch April 2, 2015 15:21

amueller mentioned this pull request Apr 8, 2015

[MRG+1] Common test refactoring #4550

Merged

3 tasks

amueller mentioned this pull request Jun 4, 2015

[MRG] Fix deprecation of decision function in SGD #4818

Merged

[MRG + 2] Classifier and regressor tags #4418

[MRG + 2] Classifier and regressor tags #4418

Conversation

amueller commented Mar 19, 2015

amueller commented Mar 19, 2015

landscape-bot commented Mar 19, 2015

landscape-bot commented Mar 19, 2015

landscape-bot commented Mar 19, 2015

mblondel commented Mar 19, 2015

landscape-bot commented Mar 20, 2015

amueller commented Mar 20, 2015

landscape-bot commented Mar 20, 2015

arjoly commented Mar 21, 2015

amueller commented Mar 23, 2015

mblondel commented Mar 24, 2015

amueller commented Mar 24, 2015

amueller commented Mar 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GaelVaroquaux Mar 25, 2015 via email

Choose a reason for hiding this comment

amueller commented Mar 24, 2015

landscape-bot commented Mar 24, 2015

landscape-bot commented Mar 24, 2015

coveralls commented Mar 24, 2015

amueller commented Mar 24, 2015

amueller commented Mar 24, 2015

Choose a reason for hiding this comment

mblondel commented Mar 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GaelVaroquaux Mar 26, 2015 via email

Choose a reason for hiding this comment

landscape-bot commented Mar 25, 2015

coveralls commented Mar 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

landscape-bot commented Mar 26, 2015

mblondel commented Mar 31, 2015

amueller commented Mar 31, 2015

landscape-bot commented Apr 1, 2015

Choose a reason for hiding this comment

agramfort commented Apr 1, 2015

landscape-bot commented Apr 1, 2015

amueller commented Apr 1, 2015

amueller commented Apr 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GaelVaroquaux Apr 2, 2015 via email

Choose a reason for hiding this comment

GaelVaroquaux commented Apr 2, 2015

amueller commented Apr 2, 2015

landscape-bot commented Apr 2, 2015

GaelVaroquaux commented Apr 2, 2015 via email