[MRG+1] Fixes #7578 added check_decision_proba_consistency in estimator_checks #8253

shubham0704 · 2017-01-31T20:36:19Z

Reference Issue

What does this implement/fix? Explain your changes.

It fixes the need for a test function to check whether the output predict_proba and decision_function are perfectly correlated or not.

Any other comments?

I need to understand the testing part of this function.I have done the pep8 linting and pyflakes but recieved 1 error while nose check stating set_testing_parameters() takes exactly 1 value 0 given and error=1.Also where is the best palce to yield this function.I did not make that change because I was unsure.

lesteve · 2017-02-01T13:53:14Z

sklearn/utils/estimator_checks.py

@@ -114,7 +116,7 @@ def _yield_classifier_checks(name, Classifier):
    yield check_classifiers_regression_target
    if (name not in ["MultinomialNB", "LabelPropagation", "LabelSpreading"]
        # TODO some complication with -1 label
-            and name not in ["DecisionTreeClassifier",
+        and name not in ["DecisionTreeClassifier",


You need to put the and on the previous line to make flake8 happy. Error from Travis is (it gives you an hint as to what to do):

./sklearn/utils/estimator_checks.py:119:9: W503 line break before binary operator and name not in ["DecisionTreeClassifier",

merge for updating fork

jnothman · 2017-02-02T02:06:49Z

Using spearmanr internally performs rankdata followed by corrcoef. I think rankdata (or a stable argsort) followed by testing for equality should suffice and be more efficient

jnothman · 2017-02-02T02:09:08Z

sklearn/utils/estimator_checks.py

+
+
+@ignore_warnings(category=DeprecationWarning)
+def check_rank_corr(name, Estimator):


perhaps call it check_decision_proba_consistency.

…n_proba_consistency

jnothman · 2017-02-02T06:24:52Z

sklearn/utils/estimator_checks.py

+    predict_proba methods has outputs with perfect rank correlation.
+    """
+
+    X, Y = make_multilabel_classification(n_classes=2, n_labels=1,


Why are we using multilabel data? Why not just binary?

jnothman · 2017-02-02T06:25:27Z

sklearn/utils/estimator_checks.py

+            try:
+                classif = OneVsRestClassifier(estimator)
+                classif.fit(X, Y)
+                a = classif.predict_proba([i for i in range(20)])


Usually the input would be 2d. Why is it 1d? For test data, can just generate something random and uniform, or can take from a similar distribution to training data.

jnothman · 2017-02-02T06:25:32Z

sklearn/utils/estimator_checks.py

+
+        if hasattr(estimator, "predict_proba"):
+            try:
+                classif = OneVsRestClassifier(estimator)


why are we doing OvR?

jnothman · 2017-02-02T06:27:15Z

sklearn/utils/estimator_checks.py

+                a = classif.predict_proba([i for i in range(20)])
+                b = classif.decision_function([i for i in range(20)])
+                assert_equal(
+                 rankdata(a, method='average'), rankdata(b, method='average'))


method shouldn't matter as long as tied values have tied ranks. But if we're working with non-binary classification, we need to do this comparison column-wise. Use assert_array_equal rather than assert_equal.

jnothman · 2017-02-02T06:28:08Z

sklearn/utils/estimator_checks.py

+                assert_equal(
+                 rankdata(a, method='average'), rankdata(b, method='average'))
+
+            except ValueError:


What is this to catch? The try block should go around the smallest scope that we want to exclude otherwise this test can pass when all estimators raise a ValueError because the test is broken.

jnothman · 2017-02-02T06:28:30Z

sklearn/utils/estimator_checks.py

+
+    if hasattr(estimator, "decision_function"):
+
+        if hasattr(estimator, "predict_proba"):


merge to update fork

shubham0704 · 2017-02-03T16:00:20Z

@jnothman Travis-ci fails even though no errors found.

lesteve · 2017-02-03T16:37:28Z

@jnothman Travis-ci fails even though no errors found.

Yeah I have opened an issue on Travis about that (this is under investigation):
travis-ci/travis-ci#7264

merge for fork update

Trigger

jnothman · 2017-02-06T05:43:37Z

You seem to have confused a number of different patches in your PR. You should be using a branch in your fork to avoid this, not master.

jnothman · 2017-02-06T05:44:55Z

You can keep this one on master, but you need to revert or otherwise remove your changes pertaining to other issues.

shubham0704 · 2017-02-08T16:48:07Z

@jnothman anything else needed?

jnothman · 2017-02-13T10:31:07Z

sklearn/utils/estimator_checks.py

@@ -56,8 +56,9 @@

 BOSTON = None
 CROSS_DECOMPOSITION = ['PLSCanonical', 'PLSRegression', 'CCA', 'PLSSVD']
-MULTI_OUTPUT = ['CCA', 'DecisionTreeRegressor', 'ElasticNet',
-                'ExtraTreeRegressor', 'ExtraTreesRegressor', 'GaussianProcess',
+MULTI_OUTPUT = ['CCA',  'DecisionTreeClassifier', 'DecisionTreeRegressor',


all others here are regressors. What makes you sure it's appropriate to include multioutput classifiers here?

In the function check_supervised_y_2d the line inside the warning `section-
estimator.fit(X, y[:, np.newaxis]) does not give any warnings for the classifiers I included. therefore I included it in the MULTI_OUTPUT list . Otherwise it would give me a error that ```expected 1 DataConversionWarning, got: ```. I checked sklearn documents for DecisionTreeClassifier it says y can accept [n_samples, n_outputs].

Is this change relevant to the rest of the PR? Perhaps it should be a separate PR.

Sure, making changes then.

Nosetests fail if I do not include them. Can 1 reference 2 issues with 1 pr otherwise this will not pass.Maybe I will open one and reference this issue and the opened one with this pr.What do you say?

Could you be more specific what fails if you do not include them?

These are the errors when I do not include them.

jnothman · 2017-02-13T10:31:19Z

sklearn/utils/estimator_checks.py

@@ -113,12 +114,12 @@ def _yield_classifier_checks(name, Classifier):
    # basic consistency testing
    yield check_classifiers_train
    yield check_classifiers_regression_target
-    if (name not in ["MultinomialNB", "LabelPropagation", "LabelSpreading"]
+    if (name not in ["MultinomialNB", "LabelPropagation", "LabelSpreading"]):


please don't add these parentheses

jnothman · 2017-02-13T10:31:23Z

sklearn/utils/estimator_checks.py

        # TODO some complication with -1 label
-            and name not in ["DecisionTreeClassifier",
-                             "ExtraTreeClassifier"]):
+        if (name not in ["DecisionTreeClassifier", "ExtraTreeClassifier"]):


please don't add these parentheses

jnothman · 2017-02-13T10:32:08Z

build_tools/travis/test_script.sh

@@ -8,6 +8,7 @@

 set -e

+


This was added by mistake during the days when travis went down, when I foolishly tried to make travis work in order to get my pr pass tests as this was my first one.Will correct it.

jnothman · 2017-02-13T10:32:46Z

sklearn/utils/estimator_checks.py

-                return (p.name != 'self'
-                        and p.kind != p.VAR_KEYWORD
-                        and p.kind != p.VAR_POSITIONAL)
+                return (p.name != 'self' and p.kind != p.VAR_KEYWORD and


why change this?

jnothman · 2017-02-13T10:33:02Z

travis.log

@@ -0,0 +1,87 @@
+Command line:


please remove this file.

jnothman · 2017-02-13T10:34:19Z

sklearn/utils/estimator_checks.py

+
+@ignore_warnings(category=DeprecationWarning)
+def check_decision_proba_consistency(name, Estimator):
+    """


We don't usually add docstrings to checks, because nose doesn't play nicely.

jnothman · 2017-02-13T10:37:04Z

sklearn/utils/estimator_checks.py

+    predict_proba methods has outputs with perfect rank correlation.
+    """
+    rnd = np.random.RandomState(0)
+    X_train = (3*rnd.uniform(size=(10, 4))).astype(int)


we usually use integer features, or binary, in common tests in case estimators can't deal with real-valued features.

Absolutely I added .astype(int).

jnothman · 2017-02-13T10:49:13Z

sklearn/utils/estimator_checks.py

+    if (hasattr(estimator, "decision_function") and
+            hasattr(estimator, "predict_proba")):
+
+        estimator.fit(X_train, y)


Nothing seems to be entering this case: I've modified it to say assert False but nothing is failing from sklearn/tests/test_common.py

Ah. You've put this in as a regressor check.

jnothman · 2017-02-13T10:49:58Z

sklearn/utils/estimator_checks.py

@@ -162,6 +163,7 @@ def _yield_regressor_checks(name, Regressor):
    yield check_regressors_no_decision_function
    yield check_supervised_y_2d
    yield check_supervised_y_no_nan
+    yield check_decision_proba_consistency


This should be in ...classifier_checks not regressors

shubham0704 · 2017-02-16T01:08:46Z

Awesome thats great I get what you are trying to say (likelihood(point belongs to(A))/likelihood(belongs to(B))) if both are around 0.5 that is we can have 0.6/0.4 meaning it can belong to either side then the values wont peak so much.Thanks a lot. It should work definitely.I will make changes and update.

fork update

shubham0704 · 2017-02-17T00:38:58Z

@jnothman I did not exactly take the points in the middle I just bought the cluster centres nearer and they kind of overlap. Other thing that I thought was to make ellipses on both blobs and consider all the points outside them for test set but this worked. Is it fine or should I improvise?

jnothman · 2017-02-21T22:52:21Z

sklearn/utils/estimator_checks.py

        # TODO some complication with -1 label
-            and name not in ["DecisionTreeClassifier",
-                             "ExtraTreeClassifier"]):
+        if name not in ["DecisionTreeClassifier", "ExtraTreeClassifier"]:


why did you change this from and to a separate if. That's what creates the errors in your screenshot.

Making changes.

jnothman · 2017-02-21T22:52:25Z

sklearn/utils/estimator_checks.py

+    centers = [(2, 2), (4, 4)]
+    X, y = make_blobs(n_samples=100, random_state=0, n_features=4,
+                      centers=centers, cluster_std=1.0, shuffle=True)
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,


With this approach, the probabilities are again going to be very peaked around 0 and 1, since the blobs are more-or-less linearly separable, encouraging numerical precision errors etc. For test, I'd just use np.random.randn() + 3 or something.

jnothman

LGTM

jnothman · 2017-02-22T01:11:54Z

sklearn/utils/estimator_checks.py

-    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
-                                                        random_state=0)
-
+    X_test = np.random.randn(20, 2)+4


please insert spaces around +

Thanks a lot for all the reviews on pr. Learnt a lot.Making changes.

jnothman · 2017-02-22T02:15:17Z

Please add an entry in what's new. Put it in API changes to say "Estimators with both x and y are now required ..."

shubham0704 · 2017-02-22T02:28:42Z

Got my Network exam now will surely do it by evening.

lesteve · 2017-02-23T14:13:33Z

@shubham0704 please use "Fix #issueNumber" in your PR description this way the associated issue gets closed automatically when the PR is merged. For more details, look at this. I have edited your description but please remember do it next time.

shubham0704 · 2017-02-24T06:18:27Z

Sure.Thanks @lesteve .

shubham0704 · 2017-03-02T09:13:46Z

[RFC] -request for close :)
Note: (this is on master so I have to use ad-hoc methods to address other issues)
Thanks

lesteve · 2017-03-07T06:03:37Z

sklearn/utils/estimator_checks.py

-                return (p.name != 'self'
-                        and p.kind != p.VAR_KEYWORD
-                        and p.kind != p.VAR_POSITIONAL)
+                return (p.name != 'self' and


For next time, try not to change things that are not related to your PR. This adds noise into the diff and makes it harder for the review to be efficient.

Sure @lesteve .Thanks a lot.

lesteve · 2017-03-07T06:10:27Z

LGTM, merging, thanks a lot!

…y in estimator_checks (scikit-learn#8253)

[WEP] added check_rank_corr in estimator_checks

70a74c6

lesteve reviewed Feb 1, 2017

View reviewed changes

lesteve changed the title ~~[WEP] Fixes #7578 added check_rank_corr in estimator_checks~~ [WIP] Fixes #7578 added check_rank_corr in estimator_checks Feb 1, 2017

shubham0704 added 2 commits February 1, 2017 21:28

[WIP] linted, add check_rank_corr to estimator_checks.py

7a53901

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn

a29a1de

merge for updating fork

jnothman reviewed Feb 2, 2017

View reviewed changes

[WIP] Fixes scikit-learn#7578 advised changes added for check_decisio…

d94c892

…n_proba_consistency

jnothman reviewed Feb 2, 2017

View reviewed changes

shubham0704 added 4 commits February 2, 2017 15:19

[WIP] Fixes scikit-learn#7578 made recommended changes

45fc811

[WIP] minor changes

1be0ddb

Merge remote-tracking branch 'upstream/master'

16ae6cd

merge to update fork

done nosetests and removed errors

b8c4b86

shubham0704 added 7 commits February 3, 2017 23:49

[WIP] Fixes scikit-learn#7578 reinitiating ci-tests

942ebaa

Merge remote-tracking branch 'upstream/master'

32c6d56

merge for fork update

tried some changes to travis

890a335

travis changes

793e495

removed travis changes

1ba275c

[WIP] Fixes scikit-learn#8289 added get_max_squared_sum

ed1fce7

[WIP] Fixes scikit-learn#8289 minor changes

9a9ebec

shubham0704 mentioned this pull request Feb 4, 2017

get_max_squared_sum in bench_rcv1 undefined #8289

Closed

--allow-empty

594a91c

Trigger

Fixes scikit-learn#7578 made recommended changes

f1c98c3

shubham0704 changed the title ~~[WIP] Fixes #7578 added check_rank_corr in estimator_checks~~ [MRG] Fixes #7578 added check_decision_proba_consistency in estimator_checks Feb 7, 2017

jnothman reviewed Feb 13, 2017

View reviewed changes

shubham0704 added 3 commits February 16, 2017 19:21

Merge remote-tracking branch 'upstream/master'

b6f3238

fork update

made recommended changes

7d74887

rounded results

495f514

jnothman reviewed Feb 21, 2017

View reviewed changes

made recommended changes

f593502

jnothman reviewed Feb 22, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Fixes #7578 added check_decision_proba_consistency in estimator_checks~~ [MRG+1] Fixes #7578 added check_decision_proba_consistency in estimator_checks Feb 22, 2017

shubham0704 added 3 commits February 22, 2017 06:55

made recommended changes

f4bd859

minor changes

10f440d

added proper indent

aec7ce2

shubham0704 added 2 commits February 22, 2017 15:09

added entry in whatsnew

15deeff

Trigger

ca5793c

lesteve reviewed Mar 7, 2017

View reviewed changes

lesteve merged commit 02c705e into scikit-learn:master Mar 7, 2017

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

herilalaina pushed a commit to herilalaina/scikit-learn that referenced this pull request Mar 26, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

150e7c1

…y in estimator_checks (scikit-learn#8253)

massich pushed a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

cf903c3

…y in estimator_checks (scikit-learn#8253)

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

19059a3

…y in estimator_checks (scikit-learn#8253)

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

8520357

…y in estimator_checks (scikit-learn#8253)

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

544abfb

…y in estimator_checks (scikit-learn#8253)

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

e7b7af8

…y in estimator_checks (scikit-learn#8253)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] Fixes scikit-learn#7578 added check_decision_proba_consistenc…

f2da612

…y in estimator_checks (scikit-learn#8253)



		@ignore_warnings(category=DeprecationWarning)
		def check_rank_corr(name, Estimator):


		if hasattr(estimator, "decision_function"):

		if hasattr(estimator, "predict_proba"):

[MRG+1] Fixes #7578 added check_decision_proba_consistency in estimator_checks #8253

[MRG+1] Fixes #7578 added check_decision_proba_consistency in estimator_checks #8253

Conversation

shubham0704 commented Jan 31, 2017 • edited by lesteve Loading

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

lesteve Feb 1, 2017 • edited Loading

Choose a reason for hiding this comment

jnothman commented Feb 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubham0704 commented Feb 3, 2017

lesteve commented Feb 3, 2017

jnothman commented Feb 6, 2017

jnothman commented Feb 6, 2017

shubham0704 commented Feb 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubham0704 commented Feb 16, 2017

shubham0704 commented Feb 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Feb 22, 2017

shubham0704 commented Feb 22, 2017 • edited Loading

lesteve commented Feb 23, 2017 • edited Loading

shubham0704 commented Feb 24, 2017

shubham0704 commented Mar 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lesteve commented Mar 7, 2017

shubham0704 commented Jan 31, 2017 •

edited by lesteve

Loading

lesteve Feb 1, 2017 •

edited

Loading

shubham0704 commented Feb 17, 2017 •

edited

Loading

jnothman left a comment •

edited

Loading

shubham0704 commented Feb 22, 2017 •

edited

Loading

lesteve commented Feb 23, 2017 •

edited

Loading