New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FIX reimplement StratifiedKFold to avoid variation in test size #14704
Conversation
This is the failure I haven't fixed: def test_calibration():
"""Test calibration objects with isotonic and sigmoid"""
n_samples = 100
X, y = make_classification(n_samples=2 * n_samples, n_features=6,
random_state=42)
sample_weight = np.random.RandomState(seed=42).uniform(size=y.size)
X -= X.min() # MultinomialNB only allows positive X
# split train and test
X_train, y_train, sw_train = \
X[:n_samples], y[:n_samples], sample_weight[:n_samples]
X_test, y_test = X[n_samples:], y[n_samples:]
# Naive-Bayes
clf = MultinomialNB().fit(X_train, y_train, sample_weight=sw_train)
prob_pos_clf = clf.predict_proba(X_test)[:, 1]
pc_clf = CalibratedClassifierCV(clf, cv=y.size + 1)
assert_raises(ValueError, pc_clf.fit, X, y)
# Naive Bayes with calibration
for this_X_train, this_X_test in [(X_train, X_test),
(sparse.csr_matrix(X_train),
sparse.csr_matrix(X_test))]:
for method in ['isotonic', 'sigmoid']:
pc_clf = CalibratedClassifierCV(clf, method=method, cv=2)
# Note that this fit overwrites the fit on the entire training
# set
pc_clf.fit(this_X_train, y_train, sample_weight=sw_train)
prob_pos_pc_clf = pc_clf.predict_proba(this_X_test)[:, 1]
# Check that brier score has improved after calibration
assert (brier_score_loss(y_test, prob_pos_clf) >
brier_score_loss(y_test, prob_pos_pc_clf))
# Check invariance against relabeling [0, 1] -> [1, 2]
pc_clf.fit(this_X_train, y_train + 1, sample_weight=sw_train)
prob_pos_pc_clf_relabeled = pc_clf.predict_proba(this_X_test)[:, 1]
assert_array_almost_equal(prob_pos_pc_clf,
prob_pos_pc_clf_relabeled)
# Check invariance against relabeling [0, 1] -> [-1, 1]
pc_clf.fit(this_X_train, 2 * y_train - 1, sample_weight=sw_train)
prob_pos_pc_clf_relabeled = pc_clf.predict_proba(this_X_test)[:, 1]
assert_array_almost_equal(prob_pos_pc_clf,
prob_pos_pc_clf_relabeled)
# Check invariance against relabeling [0, 1] -> [1, 0]
pc_clf.fit(this_X_train, (y_train + 1) % 2,
sample_weight=sw_train)
prob_pos_pc_clf_relabeled = \
pc_clf.predict_proba(this_X_test)[:, 1]
if method == "sigmoid":
assert_array_almost_equal(prob_pos_pc_clf,
> 1 - prob_pos_pc_clf_relabeled)
E AssertionError:
E Arrays are not almost equal to 6 decimals
E
E Mismatch: 100%
E Max absolute difference: 0.058605
E Max relative difference: 0.17545084
E x: array([0.639171, 0.284055, 0.583672, 0.699171, 0.473917, 0.334347,
E 0.252862, 0.320488, 0.77633 , 0.41827 , 0.430533, 0.707298,
E 0.259467, 0.829195, 0.070182, 0.793064, 0.276862, 0.56796 ,...
E y: array([0.647982, 0.328991, 0.565925, 0.713699, 0.492425, 0.326041,
E 0.248765, 0.312138, 0.767384, 0.44601 , 0.407218, 0.719722,
E 0.314678, 0.825227, 0.076316, 0.799289, 0.307708, 0.595152,...
X = array([[1.55552559, 2.69252392, 3.45217209, 3.20990507, 3.54706644,
4.39080293],
[1.46335164, 2.5219545...4072529,
3.85817712],
[1.93819423, 2.94549552, 3.979749 , 4.12309001, 3.11853784,
4.08188049]])
X_test = array([[3.7431524 , 3.26830574 It's not obvious to me what assertions there should be dependent on CV splitter... Is the test just brittle? |
Ah... I see. The incumbent StratifiedKFold is invariant to class label. The new one is not. Invariance to class labels seems a pretty useful property. And it seems something worth testing more directly in test_split.py if we want to keep it. One not very satisfying solution would be making the "encoding" dependent not on the lexical ordering of the labels, but on the order in which they appear in the data. That is, rather than _, y_encoded = np.unique(y, return_inverse=True) use _, y_idx, y_inv = np.unique(y, return_index=True, return_inverse=True)
_, class_perm = np.unique(y_idx, return_inverse=True)
y_encoded = class_perm[y_inv] WDYT? |
sounds good but calling unique twice to achieve that seems a bit odd? |
sklearn/model_selection/_split.py
Outdated
|
||
test_folds = np.empty(len(y), dtype='i') | ||
for i in range(n_classes): | ||
folds_for_class = np.arange(self.n_splits).repeat(allocation[:, i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you maybe add a comment to this? I got it now but it took me a minute ;)
some related tests are failing, probably because they need to be updated? It's a bit weird that the tests rely on the implementation details. Is there a test that if KFold would be stratified then the results are identical (at least without shuffling)? |
otherwise this looks good :) |
There is a more heuristic test of the data order dependency: |
Yes, the code is pretty hard to follow! I tried to make it easier by not
using my count based solution above!!
|
Ping @wdevazelhes due to #10155. What do you think of the new implementation? |
Happy with those comments, @amueller? |
I had a look and I think it's really nice ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change the implementation directly because the difference is often small
@@ -368,24 +369,56 @@ def test_stratified_kfold_no_shuffle(): | |||
list(StratifiedKFold(2).split(X, y1)), | |||
list(StratifiedKFold(2).split(X, y2))) | |||
|
|||
# Check equivalence to KFold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this test? this seems like a coincidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller requested it. Not a coincidence. A matter of ensuring dependencies are maintained
@@ -210,7 +210,7 @@ def check_hyperparameter_searcher_with_fit_params(klass, **klass_kwargs): | |||
"Expected fit parameter(s) ['eggs'] not seen.", | |||
searcher.fit, X, y, spam=np.ones(10)) | |||
assert_raise_message(AssertionError, | |||
"Fit parameter spam has length 1; expected 4.", | |||
"Fit parameter spam has length 1; expected", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fit_params had been indexed according to the cv
|
||
def test_stratified_kfold_ratios(): | ||
@pytest.mark.parametrize('shuffle', [False, True]) | ||
@pytest.mark.parametrize('k', [4, 5, 6, 7, 8, 9, 10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce number of k maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure it's a pretty quick test, and I found some implementation bugs appeared only on some K...
|
||
* Generate test sets such that all contain the same distribution of | ||
classes, or as close as possible. | ||
* Be invariant to class label: relabelling ``y = ["Happy", "Sad"]`` to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe more straightforward to use y=[0, 1]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the translation is less clear to the reader if I use [0,1]
oh yeah! |
It is exciting to have this out of the way! I'm a little worried it will
annoy people that results have changed....
|
fixes #14673
TODO:
test_split.py
passes here and fails in master