[MRG+1] Change CV defaults to 5 #11557

aboucaud · 2018-07-16T15:02:18Z

Reference Issues/PRs

Fixes #11129 and takes over stalled PR #11139

What does this implement/fix? Explain your changes.

Add warning for models that do not specify an explicit value forcv or n_splits to prepare for a future deprecation of the default to 3 and an update of the default value to 5.

glemaitre · 2018-07-16T16:31:32Z

I see this is still WIP but it could be worth mentioning that you will have to decorate the tests which use the cv using the @pytest.mark.filterwarning to avoid showing the deprecation warning.

GaelVaroquaux

Looks great so far. A couple minor comments.

GaelVaroquaux · 2018-07-17T12:28:28Z

doc/modules/cross_validation.rst

@@ -498,7 +499,7 @@ two slightly unbalanced classes::

  >>> from sklearn.model_selection import StratifiedKFold

-  >>> X = np.ones(10)
+    >>> X = np.ones(10)


This looks strange.

GaelVaroquaux · 2018-07-17T12:31:00Z

doc/whats_new/v0.20.rst

+
+- The default number of cross-validation folds ``cv`` and the default number of
+  splits ``n_splits`` in the :class:`model_selection.KFold`-like splitters will change
+  from 3 to 5 in 0.22 to account for good practice in the community.


"to account for good practice in the community." => "as 3-fold has a lot of variance".

GaelVaroquaux · 2018-07-17T12:32:46Z

sklearn/model_selection/_split.py

@@ -49,6 +49,17 @@
           'check_cv']


+NSPLIT_WARNING = (
+    "You should specify a value for 'n_splits' instead of relying on the "
+    "default value. Note that this default value of 3 is deprecated in "


Instead of "Note...", I would say "This default value will change from 3 to 5 in version 0.22."

GaelVaroquaux · 2018-07-17T12:38:17Z

I canceled the travis build as @aboucaud is pushing a new version soon.

GaelVaroquaux

LGTM.

+1 for merge.

amueller · 2018-07-17T14:31:08Z

sklearn/model_selection/_split.py

@@ -406,8 +420,11 @@ class KFold(_BaseKFold):
    RepeatedKFold: Repeats K-Fold n times.
    """

-    def __init__(self, n_splits=3, shuffle=False,
+    def __init__(self, n_splits=None, shuffle=False,


I thought we're gonna use 'warn' from now on?

You want to replace all None by "warn"? Fine with me.

@amueller for n_splits only or cv as well ?

amueller · 2018-07-17T14:31:26Z

looks good apart from None as sentinel vs 'warn'.

amueller · 2018-07-17T16:44:02Z

lgtm

GaelVaroquaux · 2018-07-17T16:49:16Z

We'll merge when travis is ready.

amueller · 2018-07-17T18:51:58Z

test errors :-/

jeremiedbb · 2018-07-17T21:26:35Z

doc/whats_new/v0.20.rst

+  splits ``n_splits`` in the :class:`model_selection.KFold`-like splitters will change
+  from 3 to 5 in 0.22 as 3-fold has a lot of variance.
+  :issue:`11129` by :user:`Alexandre Boucaud <aboucaud>`.
+


should be the number of the PR not the issue, right ?

dunno, you tell me sprint master.

aboucaud · 2018-07-18T00:08:23Z

Off to bed, will finish this tomorrow. Most of the work should be behind now.

aboucaud · 2018-07-18T10:55:35Z

Green ✌️ !
It was tougher than expected.

I still did not address properly @amueller comment since I only added # 0.22

can you please add a comment that this is about iid and add 0.22 so that we can grep for it once we need to remove it?

The difficulty is to separate the cases in which it was about cv and the others about n_splits since I have two different messages, and I was not brave enough to do that.

I could try to unify the warning messages (below) to catch a better part of the message in filterwarnings

NSPLIT_WARNING = (
    "You should specify a value for 'n_splits' instead of relying on the "
    "default value. The default value will change from 3 to 5 "
    "in version 0.22.")

CV_WARNING = (
    "You should specify a value for 'cv' instead of relying on the "
    "default value. The default value will change from 3 to 5 "
    "in version 0.22.")

WDYT ? @GaelVaroquaux @amueller

jeremiedbb · 2018-07-18T11:03:47Z

Is skipping all the doctests the right way to make travis green ?
I may be interpreting what you did badly.

aboucaud · 2018-07-18T11:13:48Z

When I merged master into this branch, I saw that others implemented that workaround since now warning raise errors.

I agree it is probably not good thing.

Many of this failing tests used the default value for cv or n_splits which was set to 3 and will change to 5 but statically setting cv=5 also means increasing the size of the X and y arrays and adapting to the results.

I just end up having so many modifications in this PR that cannot be properly checked or reviewed that I would be in favor of having a following PR address the doctests, using # doctest: +SKIP as an anchor.

jorisvandenbossche · 2018-07-18T14:57:59Z

sklearn/feature_selection/rfe.py

@@ -312,6 +312,10 @@ class RFECV(RFE, MetaEstimatorMixin):
        Refer :ref:`User Guide <cross_validation>` for the various
        cross-validation strategies that can be used here.

+        .. deprecated:: 0.20


Can you make this versionchanged instead of deprecated? (because the keyword itself is not deprecated)

done.

we should add a line in the contributing.rst then to specify that.

jorisvandenbossche · 2018-07-18T14:58:42Z

I just end up having so many modifications in this PR that cannot be properly checked or reviewed that I would be in favor of having a following PR address the doctests, using # doctest: +SKIP as an anchor.

Guillaume: if you set it to 5 manually in the doc examples, is it then still needed to skip?

GaelVaroquaux · 2018-07-18T15:08:36Z

I just end up having so many modifications in this PR that cannot be properly checked or reviewed that I would be in favor of having a following PR address the doctests, using # doctests: + SKIP as an anchor.

No, this is really bad. We need to have tests passing, not skipped. This is our guarantee of the quality of the codebase.

aboucaud · 2018-07-18T15:28:58Z

Ok, I'm on it

aboucaud · 2018-07-18T15:42:06Z

@GaelVaroquaux can you interrupt the build on the first commit to let the last one build

jeremiedbb · 2018-07-18T15:44:09Z

It restarts automatically each time you push

jeremiedbb · 2018-07-18T19:18:02Z

doc/modules/model_evaluation.rst

@@ -99,10 +99,10 @@ Usage examples:
    >>> iris = datasets.load_iris()
    >>> X, y = iris.data, iris.target
    >>> clf = svm.SVC(gamma='scale', random_state=0)
-    >>> cross_val_score(clf, X, y, scoring='recall_macro') # doctest: +ELLIPSIS
+    >>> cross_val_score(clf, X, y, scoring='recall_macro')    # doctest: +SKIP


still skipping this one

jeremiedbb · 2018-07-18T19:18:23Z

doc/modules/model_evaluation.rst

@@ -150,7 +150,8 @@ the :func:`fbeta_score` function::
    >>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
    >>> from sklearn.model_selection import GridSearchCV
    >>> from sklearn.svm import LinearSVC
-    >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
+    >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]},
+    ...                     scoring=ftwo_scorer)        # doctest: +SKIP


jeremiedbb · 2018-07-18T19:18:34Z

doc/modules/model_evaluation.rst

    >>> # Getting the test set true positive scores
-    >>> print(cv_results['test_tp'])          # doctest: +NORMALIZE_WHITESPACE
+    >>> print(cv_results['test_tp'])                  # doctest: +SKIP


GaelVaroquaux

I saw a few change in the doctest pragma that didn't look right.

Aside from that, +1 for merge.

GaelVaroquaux · 2018-07-18T19:13:31Z

doc/modules/ensemble.rst

-    >>> scores = cross_val_score(clf, iris.data, iris.target)
-    >>> scores.mean()                             # doctest: +ELLIPSIS
+    >>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
+    >>> scores.mean()


I am very surprised by the fact that " doctest: +ELLIPSIS was removed.

GaelVaroquaux · 2018-07-18T19:14:02Z

doc/modules/learning_curve.rst

+  array([[0.93..., 0.94..., 0.92..., 0.91..., 0.92...],
+         [0.93..., 0.94..., 0.92..., 0.91..., 0.92...],
+         [0.51..., 0.52..., 0.49..., 0.47..., 0.49...]])
+  >>> valid_scores           # doctest: +ELLIPSIS


Here, I think that keeping "+NORMALIZE_WHITESPACE" would be a good idea.

jeremiedbb · 2018-07-18T19:23:23Z

Why did you remove many # doctest : NORMALIZE_WHITESPACE and ELLIPSIS ?

jeremiedbb · 2018-07-19T12:37:07Z

@GaelVaroquaux alex made the requested changes. I think it's good to go now.

GaelVaroquaux · 2018-07-19T12:45:51Z

LGTM. Merging

amueller · 2018-07-20T14:38:19Z

Ohhh yeahhh!

aboucaud changed the title ~~[WIP] Change CV defaults to 5~~ [MRG] Change CV defaults to 5 Jul 17, 2018

GaelVaroquaux reviewed Jul 17, 2018

View reviewed changes

GaelVaroquaux approved these changes Jul 17, 2018

View reviewed changes

amueller reviewed Jul 17, 2018

View reviewed changes

amueller approved these changes Jul 17, 2018

View reviewed changes

GaelVaroquaux changed the title ~~[MRG] Change CV defaults to 5~~ [MRG+1] Change CV defaults to 5 Jul 17, 2018

jeremiedbb reviewed Jul 17, 2018

View reviewed changes

aboucaud added 17 commits July 18, 2018 00:04

add FutureWarning for methods with defaults=3

d521b10

add explicit cv values to fix assertion errors

18e7c41

add tests for catching the FutureWarning

3cd80e9

Write current deprecation version

4447e6d

Add deprecation in docstring

9eae6d9

change default cv value to None

47b1048

change cv from 3 to 5 in the examples

813aab3

upgrade doctests

3b373e1

update doctest in tutorial

ee650c8

update doctest in cross-validation doc

072b959

fix tests

e882b4a

add entry to whats new

9eaa29c

address Gael comments

74c4600

address Gael comments 2

a5418aa

fix wrong indentation

da2b0cf

update doc

7717c89

add docstring deprecation warning in CV subclasses

2950da2

aboucaud added 2 commits July 18, 2018 02:05

cv=None mendatory in Ridge

34929e0

fix warning related errors

51c62ce

aboucaud added 3 commits July 18, 2018 10:27

Merge branch 'master' into cv-default-5

361f025

skip some doctests warnings

1d0e66b

make travis happy

1266f1d

qinhanmin2014 added this to the 0.20 milestone Jul 18, 2018

jorisvandenbossche reviewed Jul 18, 2018

View reviewed changes

change from deprecated to versionchanged

8df5eaf

fix doctests and remove skipping

55c22e3

jeremiedbb reviewed Jul 18, 2018

View reviewed changes

GaelVaroquaux requested changes Jul 18, 2018

View reviewed changes

aboucaud added 2 commits July 19, 2018 10:17

address comments

4f00638

Merge branch 'master' into cv-default-5

3d055f2

jeremiedbb approved these changes Jul 19, 2018

View reviewed changes

GaelVaroquaux merged commit f158e2d into scikit-learn:master Jul 19, 2018

aboucaud deleted the cv-default-5 branch July 19, 2018 12:47

aboucaud mentioned this pull request Jul 19, 2018

Fixes #11129: Change cv default to 5 #11139

Closed

Uh oh!

[MRG+1] Change CV defaults to 5 #11557

[MRG+1] Change CV defaults to 5 #11557

Uh oh!

Conversation

aboucaud commented Jul 16, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

glemaitre commented Jul 16, 2018

Uh oh!

GaelVaroquaux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Jul 17, 2018

Uh oh!

GaelVaroquaux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jul 17, 2018

Uh oh!

amueller commented Jul 17, 2018

Uh oh!

GaelVaroquaux commented Jul 17, 2018

Uh oh!

amueller commented Jul 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboucaud commented Jul 18, 2018

Uh oh!

aboucaud commented Jul 18, 2018

Uh oh!

jeremiedbb commented Jul 18, 2018

Uh oh!

aboucaud commented Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboucaud Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jul 18, 2018

Uh oh!

GaelVaroquaux commented Jul 18, 2018 via email

Uh oh!

aboucaud commented Jul 18, 2018

Uh oh!

aboucaud commented Jul 18, 2018

Uh oh!

jeremiedbb commented Jul 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aboucaud commented Jul 18, 2018 •

edited

Loading

aboucaud Jul 18, 2018 •

edited

Loading