Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] FIX reimplement StratifiedKFold to avoid variation in test size #14704

Merged
merged 13 commits into from Aug 24, 2019

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Aug 21, 2019

fixes #14673

TODO:

  • test_split.py passes here and fails in master
  • rest of test suite passes
  • add test for invariance to class label
  • determine if this breaks too much and needs a slower change (how can I describe which cases are unchanged)
  • update docs as necessary
  • update what's new

@jnothman
Copy link
Member Author

jnothman commented Aug 21, 2019

This is the failure I haven't fixed:

def test_calibration():
        """Test calibration objects with isotonic and sigmoid"""
        n_samples = 100
        X, y = make_classification(n_samples=2 * n_samples, n_features=6,
                                   random_state=42)
        sample_weight = np.random.RandomState(seed=42).uniform(size=y.size)
    
        X -= X.min()  # MultinomialNB only allows positive X
    
        # split train and test
        X_train, y_train, sw_train = \
            X[:n_samples], y[:n_samples], sample_weight[:n_samples]
        X_test, y_test = X[n_samples:], y[n_samples:]
    
        # Naive-Bayes
        clf = MultinomialNB().fit(X_train, y_train, sample_weight=sw_train)
        prob_pos_clf = clf.predict_proba(X_test)[:, 1]
    
        pc_clf = CalibratedClassifierCV(clf, cv=y.size + 1)
        assert_raises(ValueError, pc_clf.fit, X, y)
    
        # Naive Bayes with calibration
        for this_X_train, this_X_test in [(X_train, X_test),
                                          (sparse.csr_matrix(X_train),
                                           sparse.csr_matrix(X_test))]:
            for method in ['isotonic', 'sigmoid']:
                pc_clf = CalibratedClassifierCV(clf, method=method, cv=2)
                # Note that this fit overwrites the fit on the entire training
                # set
                pc_clf.fit(this_X_train, y_train, sample_weight=sw_train)
                prob_pos_pc_clf = pc_clf.predict_proba(this_X_test)[:, 1]
    
                # Check that brier score has improved after calibration
                assert (brier_score_loss(y_test, prob_pos_clf) >
                               brier_score_loss(y_test, prob_pos_pc_clf))
    
                # Check invariance against relabeling [0, 1] -> [1, 2]
                pc_clf.fit(this_X_train, y_train + 1, sample_weight=sw_train)
                prob_pos_pc_clf_relabeled = pc_clf.predict_proba(this_X_test)[:, 1]
                assert_array_almost_equal(prob_pos_pc_clf,
                                          prob_pos_pc_clf_relabeled)
    
                # Check invariance against relabeling [0, 1] -> [-1, 1]
                pc_clf.fit(this_X_train, 2 * y_train - 1, sample_weight=sw_train)
                prob_pos_pc_clf_relabeled = pc_clf.predict_proba(this_X_test)[:, 1]
                assert_array_almost_equal(prob_pos_pc_clf,
                                          prob_pos_pc_clf_relabeled)
    
                # Check invariance against relabeling [0, 1] -> [1, 0]
                pc_clf.fit(this_X_train, (y_train + 1) % 2,
                           sample_weight=sw_train)
                prob_pos_pc_clf_relabeled = \
                    pc_clf.predict_proba(this_X_test)[:, 1]
                if method == "sigmoid":
                    assert_array_almost_equal(prob_pos_pc_clf,
>                                             1 - prob_pos_pc_clf_relabeled)
E                   AssertionError: 
E                   Arrays are not almost equal to 6 decimals
E                   
E                   Mismatch: 100%
E                   Max absolute difference: 0.058605
E                   Max relative difference: 0.17545084
E                    x: array([0.639171, 0.284055, 0.583672, 0.699171, 0.473917, 0.334347,
E                          0.252862, 0.320488, 0.77633 , 0.41827 , 0.430533, 0.707298,
E                          0.259467, 0.829195, 0.070182, 0.793064, 0.276862, 0.56796 ,...
E                    y: array([0.647982, 0.328991, 0.565925, 0.713699, 0.492425, 0.326041,
E                          0.248765, 0.312138, 0.767384, 0.44601 , 0.407218, 0.719722,
E                          0.314678, 0.825227, 0.076316, 0.799289, 0.307708, 0.595152,...

X          = array([[1.55552559, 2.69252392, 3.45217209, 3.20990507, 3.54706644,
        4.39080293],
       [1.46335164, 2.5219545...4072529,
        3.85817712],
       [1.93819423, 2.94549552, 3.979749  , 4.12309001, 3.11853784,
        4.08188049]])
X_test     = array([[3.7431524 , 3.26830574

It's not obvious to me what assertions there should be dependent on CV splitter... Is the test just brittle?

@jnothman
Copy link
Member Author

jnothman commented Aug 21, 2019

Ah... I see. The incumbent StratifiedKFold is invariant to class label. The new one is not.

Invariance to class labels seems a pretty useful property. And it seems something worth testing more directly in test_split.py if we want to keep it.

One not very satisfying solution would be making the "encoding" dependent not on the lexical ordering of the labels, but on the order in which they appear in the data. That is, rather than

_, y_encoded = np.unique(y, return_inverse=True)

use

_, y_idx, y_inv = np.unique(y, return_index=True, return_inverse=True)
_, class_perm = np.unique(y_idx, return_inverse=True)
y_encoded = class_perm[y_inv]

WDYT?

@amueller
Copy link
Member

sounds good but calling unique twice to achieve that seems a bit odd?


test_folds = np.empty(len(y), dtype='i')
for i in range(n_classes):
folds_for_class = np.arange(self.n_splits).repeat(allocation[:, i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you maybe add a comment to this? I got it now but it took me a minute ;)

@amueller
Copy link
Member

some related tests are failing, probably because they need to be updated? It's a bit weird that the tests rely on the implementation details.

Is there a test that if KFold would be stratified then the results are identical (at least without shuffling)?
If we have [1, 2, 3, 1, 2, 3, 1, 2, 3] and do three-fold then KFold and StratifiedKfold should agree, right?

@amueller
Copy link
Member

otherwise this looks good :)

@jnothman
Copy link
Member Author

Is there a test that if KFold would be stratified then the results are identical (at least without shuffling)?

There is a more heuristic test of the data order dependency: test_kfold_can_detect_dependent_samples_on_digits. But no explicit test of KFold-StratifiedKFold equivalent that I've noticed yet...

@jnothman
Copy link
Member Author

jnothman commented Aug 21, 2019 via email

@jnothman
Copy link
Member Author

Ping @wdevazelhes due to #10155. What do you think of the new implementation?

@jnothman jnothman changed the title [WIP] FIX reimplement StratifiedKFold to avoid variation in test size [MRG] FIX reimplement StratifiedKFold to avoid variation in test size Aug 21, 2019
@jnothman
Copy link
Member Author

Happy with those comments, @amueller?

@wdevazelhes
Copy link
Contributor

Ping @wdevazelhes due to #10155. What do you think of the new implementation?

I had a look and I think it's really nice !

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change the implementation directly because the difference is often small

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved
@@ -368,24 +369,56 @@ def test_stratified_kfold_no_shuffle():
list(StratifiedKFold(2).split(X, y1)),
list(StratifiedKFold(2).split(X, y2)))

# Check equivalence to KFold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this test? this seems like a coincidence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller requested it. Not a coincidence. A matter of ensuring dependencies are maintained

@@ -210,7 +210,7 @@ def check_hyperparameter_searcher_with_fit_params(klass, **klass_kwargs):
"Expected fit parameter(s) ['eggs'] not seen.",
searcher.fit, X, y, spam=np.ones(10))
assert_raise_message(AssertionError,
"Fit parameter spam has length 1; expected 4.",
"Fit parameter spam has length 1; expected",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fit_params had been indexed according to the cv


def test_stratified_kfold_ratios():
@pytest.mark.parametrize('shuffle', [False, True])
@pytest.mark.parametrize('k', [4, 5, 6, 7, 8, 9, 10])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce number of k maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure it's a pretty quick test, and I found some implementation bugs appeared only on some K...


* Generate test sets such that all contain the same distribution of
classes, or as close as possible.
* Be invariant to class label: relabelling ``y = ["Happy", "Sad"]`` to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe more straightforward to use y=[0, 1]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the translation is less clear to the reader if I use [0,1]

@qinhanmin2014 qinhanmin2014 merged commit 3de368d into scikit-learn:master Aug 24, 2019
@amueller
Copy link
Member

oh yeah!

@jnothman
Copy link
Member Author

jnothman commented Aug 24, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StratifiedKFold makes fold-sizes very unequal
5 participants