New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Rename CV params n_{folds,iter} to n_splits #7187

Merged
merged 17 commits into from Aug 16, 2016

Conversation

Projects
None yet
6 participants
@yenchenlin
Contributor

yenchenlin commented Aug 13, 2016

Since now all CV splitters have get_n_splits(), we can rename n_{folds,iter} to n_splits in order to avoid confusion.

More detailed discussion can be found in #7169 .

  • Rename n_iter to n_splits
  • Rename n_ folds to n_splits
@@ -134,7 +134,7 @@ def test_cross_validator_with_default_params():
n_unique_labels = 4
n_folds = 2
p = 2
n_iter = 10 # (the default value)
n_splits= 10 # (the default value)

This comment has been minimized.

@jnothman
@@ -653,15 +653,15 @@ def test_label_shuffle_split():
for l in labels:
X = y = np.ones(len(l))
n_iter = 6
n_splits= 6

This comment has been minimized.

@jnothman
@jnothman

This comment has been minimized.

Member

jnothman commented Aug 13, 2016

One question is whether we should be doing any deprecation... I think not, in this case.

yenchenlin added some commits Aug 13, 2016

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 14, 2016

Test failures.

@yenchenlin

This comment has been minimized.

Contributor

yenchenlin commented Aug 14, 2016

Oh yeah, I submit this PR to ask question:

Should I modify old splitter in sklearn/cross_validation.py?
This line seems to use old splitter, and therefore results in error.

Update: Never mind, I was trapped 👋 .

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 14, 2016

More test failures, though!

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 14, 2016

Remaining failures are uses of n_folds in doc/tutorial/statistical_inference/model_selection.rst

yenchenlin added some commits Aug 14, 2016

@yenchenlin yenchenlin changed the title from [WIP] Rename CV params n_{folds,iter} to n_splits to [MRG] Rename CV params n_{folds,iter} to n_splits Aug 14, 2016

@yenchenlin

This comment has been minimized.

Contributor

yenchenlin commented Aug 14, 2016

@jnothman @MechCoder CI is smiling 😄 , I think it's ready to be reviewed!

@@ -128,7 +128,7 @@
# To get nice curve, we need a large number of iterations to
# reduce the variance
grid = GridSearchCV(clf, refit=False, param_grid=param_grid,
cv=ShuffleSplit(train_size=train_size, n_iter=250,
cv=ShuffleSplit(train_size=train_size, n_splits=250,

This comment has been minimized.

@jnothman
test_size=1. / n_folds,
random_state=0)
train_counts = [0] * n_samples
test_counts = [0] * n_samples
n_splits = 0
splits_cnt = 0

This comment has been minimized.

@raghavrv

raghavrv Aug 14, 2016

Member

n_splits_test maybe?

This comment has been minimized.

@jnothman

jnothman Aug 14, 2016

Member

Or n_splits_actual

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 14, 2016

Multiple PEP8 line length violations...

"train_size=None)")
ps_repr = "PredefinedSplit(test_fold=array([1, 1, 2, 2]))"
n_splits = [n_samples, comb(n_samples, p), n_folds, n_folds,
n_unique_labels, comb(n_unique_labels, p), n_iter, 2]
splits_cnts = [n_samples, comb(n_samples, p), n_splits, n_splits,

This comment has been minimized.

@jnothman

jnothman Aug 14, 2016

Member

expected_n_splits or n_splits_expected might be better.

This comment has been minimized.

@jnothman

jnothman Aug 14, 2016

Member

The issue is that cnt, apart from being an abbreviation we should try to avoid for clarity, means the same as n. So splits_cnt means the same as n_splits and so the naming tells us little of what it contains.

Another option is renaming the new n_splits to n_splits_param

This comment has been minimized.

@raghavrv

raghavrv Aug 14, 2016

Member

I agree!

This comment has been minimized.

@yenchenlin

yenchenlin Aug 15, 2016

Contributor

Thanks for the elaboration, it's useful for me in the future 😄

@@ -562,7 +563,7 @@ def test_stratified_shuffle_split_even():
# Test the StratifiedShuffleSplit, indices are drawn with a
# equal chance
n_folds = 5
n_iter = 1000
n_splits = 1000

This comment has been minimized.

@jnothman

jnothman Aug 14, 2016

Member

again, this could be n_splits_param

This comment has been minimized.

@yenchenlin

yenchenlin Aug 15, 2016

Contributor

After modified this line, I guess n_splits can remain the same name?

This comment has been minimized.

@raghavrv

raghavrv Aug 15, 2016

Member

Yes. Leave it as such...

@@ -566,34 +566,34 @@ def _make_test_folds(self, X, y=None, labels=None):
unique_y, y_inversed = np.unique(y, return_inverse=True)
y_counts = bincount(y_inversed)
min_labels = np.min(y_counts)
if np.all(self.n_folds > y_counts):
if np.all(self.n_splits > y_counts):
raise ValueError("All the n_labels for individual classes"
" are less than %d folds."

This comment has been minimized.

@jnothman

jnothman Aug 14, 2016

Member

maybe should be less than n_splits=%d, although this whole error message could do with better wording.

This comment has been minimized.

@yenchenlin

yenchenlin Aug 15, 2016

Contributor

Thanks! I also fixed some other error messages.

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 14, 2016

Otherwise LGTM

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 14, 2016

Please draft a change to what's new, both in the Model Selection Enhancements section and in, perhaps, API Changes.

@raghavrv

This comment has been minimized.

Member

raghavrv commented Aug 14, 2016

Yes this should go into model selection enhancements section. Otherwise with a whatsnew and reviews addressed this LGTM too.... Let's merge this! Thanks for the PR @yenchenlin

yenchenlin added some commits Aug 15, 2016

@@ -353,6 +363,11 @@ API changes summary
(`#6697 <https://github.com/scikit-learn/scikit-learn/pull/6697>`_) by
`Raghav R V`_.
- The parameters `n_iter` or `n_folds` in old CV splittersare are deprecated

This comment has been minimized.

@jnothman

jnothman Aug 15, 2016

Member

double backticks

This comment has been minimized.

@jnothman

jnothman Aug 15, 2016

Member

*"splitters are"

This comment has been minimized.

@jnothman

jnothman Aug 15, 2016

Member

Not "deprecated" as this usually refers to something that we still support.

@yenchenlin

This comment has been minimized.

Contributor

yenchenlin commented Aug 16, 2016

Comments addressed, please have a look!

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 16, 2016

LGTM

@jnothman jnothman changed the title from [MRG] Rename CV params n_{folds,iter} to n_splits to [MRG+1] Rename CV params n_{folds,iter} to n_splits Aug 16, 2016

@@ -62,6 +62,16 @@ Model Selection Enhancements and API Changes
the corresponding parameter is not applicable. Additionally a list of all
the parameter dicts are stored at ``results_['params']``.
- **Renaming of ``n_folds`` and ``n_iter`` to ``n_splits``**
Some parameter names have changed:

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

You could remove this line I think. You already have a heading...

This comment has been minimized.

@jnothman

jnothman Aug 16, 2016

Member

I think this line is good.

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

I thought after the below suggested change, this is not needed...

This comment has been minimized.

@jnothman

jnothman Aug 16, 2016

Member

Perhaps...? I'm unconvinced that this series of comments markedly contributes to the quality of the text in terms of convention, clarity or information structure.

Some parameter names have changed:
the ``n_folds`` parameter in :class:`model_selection.KFold`,
:class:`model_selection.LabelKFold`, and
:class:`model_selection.StratifiedKFold` is now ``n_splits``;

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

is now renamed to

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

Full stop and not a semi colon

the ``n_folds`` parameter in :class:`model_selection.KFold`,
:class:`model_selection.LabelKFold`, and
:class:`model_selection.StratifiedKFold` is now ``n_splits``;
the ``n_iter`` in :class:`model_selection.ShuffleSplit`,

This comment has been minimized.

@raghavrv

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

n_iter parameter

@@ -62,6 +62,16 @@ Model Selection Enhancements and API Changes
the corresponding parameter is not applicable. Additionally a list of all
the parameter dicts are stored at ``results_['params']``.
- **Renaming of ``n_folds`` and ``n_iter`` to ``n_splits``**

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

Parameters n_folds and n_iter renamed to n_splits?

- **Renaming of ``n_folds`` and ``n_iter`` to ``n_splits``**
Some parameter names have changed:
the ``n_folds`` parameter in :class:`model_selection.KFold`,

This comment has been minimized.

@raghavrv
:class:`model_selection.StratifiedKFold` is now ``n_splits``;
the ``n_iter`` in :class:`model_selection.ShuffleSplit`,
:class:`model_selection.LabelShuffleSplit`,
and :class:`model_selection.StratifiedShuffleSplit` is now ``n_splits``, too.

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

renamed to

- The parameters ``n_iter`` or ``n_folds`` in old CV splitters are replaced
by the new parameter ``n_splits`` since it can provide a consistent
and unambiguous interface to represent the number of train-test splits.
By `YenChen Lin`_.

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

Add a link to the PR. It kind of gives a back reference from whatsnew to the PR / Issue...

@@ -616,7 +617,7 @@ def test_stratified_shuffle_split_overlap_train_test_bug():
y = [0, 1, 2, 3] * 3 + [4, 5] * 5
X = np.ones_like(y)
splits = StratifiedShuffleSplit(n_iter=1,

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

splits --> sss

(This is a bit unrelated to the PR... But since you are touching the code, you might as well change it...)

This comment has been minimized.

@yenchenlin

yenchenlin Aug 16, 2016

Contributor

Thanks for pointing out!

@@ -136,7 +136,8 @@ def _is_training_data(self, X):
X = np.ones((10, 2))
X_sparse = coo_matrix(X)
y = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])
# The number of samples per class needs to be > n_folds, for StratifiedKFold(3)
# The number of samples per class needs to be > n_splits,
# for StratifiedKFold(3)

This comment has been minimized.

@raghavrv

raghavrv Aug 16, 2016

Member

pedantic nitpick: for StratifiedKFold(n_splits=3) to be explicit...

@raghavrv

This comment has been minimized.

Member

raghavrv commented Aug 16, 2016

Feel free to add a +1 once the comments are addressed. Ping @jnothman for merge once done...

@raghavrv

This comment has been minimized.

Member

raghavrv commented Aug 16, 2016

@jnothman Would you like to merge it as such? LGTM...

yenchenlin added some commits Aug 16, 2016

@yenchenlin

This comment has been minimized.

Contributor

yenchenlin commented Aug 16, 2016

@raghavrv great thanks a lot for your review!
@jnothman comments addressed!

@agramfort

This comment has been minimized.

Member

agramfort commented Aug 16, 2016

Lgtm

@jnothman merge if you are happy

@MechCoder MechCoder merged commit 42120e5 into scikit-learn:master Aug 16, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 16, 2016

(Y)

@amueller

This comment has been minimized.

Member

amueller commented Aug 18, 2016

aaaand my book broke ^^ glad it's not in print yet lol. I should do that release some time soom.

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 18, 2016

hahahahaha. Glad you wrote it for the version where everything changes.

@amueller

This comment has been minimized.

Member

amueller commented Aug 19, 2016

Well that's why I knew I couldn't do it for 0.17. But It would be great to release 0.18 within the next month (which I think is doable).

@amueller

This comment has been minimized.

Member

amueller commented Aug 22, 2016

Shouldn't the default implementation of get_n_splits be return n_splits?

@jnothman

This comment has been minimized.

Member

jnothman commented Aug 22, 2016

I think it is better explicitly implemented...?

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

[MRG+1] Rename CV params n_{folds,iter} to n_splits (scikit-learn#7187)
* Rename n_iter to n_splits

* Fix bug

* Fix examples

* Add spaces

* Rename n_folds to n_splits

* Fix error

* Fix doc

* Fix doc

* Fix example

* Rename variables name

* PEP8

* Fix error message

* Add whats_new

* Fix test

* Fix doc

* Fix doc

* Make test clear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment