[MRG+1] test method subset invariance (Fixes #10420) #10428

Johayon · 2018-01-08T21:07:00Z

added a test for all estimators to check if any of the methods {predict, predict_proba, decision_function, score_samples, transform} produces a different result if applied on all the data or a subset (in this case all the elements one by one).
the test currently fails in 4 cases

SVC with decision_function (SVC and OneVsOneClassifier decision_function inconsistent on sub-sample #9174)
SparsePCA with transform (SparsePCA inconsistent on sub-sample #10431)
MiniBatchSparsePCA with transform (SparsePCA inconsistent on sub-sample #10431)
BernoulliRBM with score_samples (due to stochasticity; can't easily fix)

jnothman · 2018-01-08T21:46:37Z

cool!

jnothman · 2018-01-08T22:03:51Z

for now, make exceptions for these cases. Your test also needs an instructive error message when the assertion fails, and an explicit test. SparsePCA uses a ridge regression where each sample is a target. I don't think this should affect things, but it might. It then uses a global normalisation step which is problematic as in OvO/SVC. Please open a new issue for this. RBM I don't think we can easily fix. It randomly corrupts one feature in each sample. For fixed random seed, the first sample will always have the same feature corrupted... The only alternative I can think of is to make the randoms a function of a cryptographic hash of the data, which is overkill.

Johayon · 2018-01-08T23:57:20Z

I have created the issue for SparsePCA.
The test seems to fail on Birch on the build for python 3.4 because the absolute tolerance is too low, should I put a higher tolerance 1e-7 or 1e-6 ? (I do not have any issue on my build)

jnothman

Check if an increased tolerance works. It's still a bit surprising that that should be necessary.

jnothman · 2018-01-09T00:06:42Z

sklearn/utils/estimator_checks.py

+                res_all = res_all[0]
+                res_one = list(map(lambda x: x[0], res_one))
+            # TODO remove cases when corrected
+            if [name, method] in [['SVC', 'decision_function'],


use tuples. Like if name, method in [('SVC', 'decision_function'), ...]

tuples are intended for struct-like objects where each field means a different thing. Arrays are usually for homogeneous semantics

jnothman

This should probably not be marked WIP anymore :P

jnothman · 2018-01-10T00:50:19Z

Please add an entry to the change log at doc/whats_new/v0.20.rst under "Changes to estimator checks". Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

glemaitre · 2018-01-10T21:40:11Z

sklearn/utils/estimator_checks.py

+@ignore_warnings(category=(DeprecationWarning, FutureWarning))
+def check_methods_subset_invariance(name, estimator_orig):
+    # check that method gives invariant results if applied
+    # one by one or on all elements together.


I would slightly rephrase the comment by replacing one by one by something like "if applied on mini bathes or the whole set"

glemaitre · 2018-01-10T22:24:35Z

sklearn/utils/estimator_checks.py

+        if hasattr(estimator, method):
+            msg = ("{method} of {name} is not invariant when applied "
+                   "to a subset.").format(method=method, name=name)
+            func = getattr(estimator, method)


I would personally make a small private function to compute and unpack the data.

def _apply_func(func, X): result_full = func(X) n_features = X.shape[1] result_by_batch = [func(batch.reshape(1, n_features)) for batch in X] # func can output tuple (e.g. score_samples) if type(res_all) == tuple: result_full = result_full[0] result_by_batch = list(map(lambda x: x[0], result_by_batch)) return np.ravel(result_full), np.ravel(result_by_batch) def check_methods_subset_invariance(name, estimator_orig): ... result_full, result_by_batch = _apply_func(get_attr(estimator, method)) ... assert_allclose(results_full, results_by_batch, atol=1e-7, err_msg=msg)

glemaitre · 2018-01-10T22:26:14Z

LGTM apart of a change in a comment and some coding style.

…ange

glemaitre · 2018-01-10T23:40:49Z

sklearn/utils/estimator_checks.py

+    for method in ["predict", "transform", "decision_function",
+                   "score_samples", "predict_proba"]:
+
+        msg = ("{method} of {name} is not invariant when applied "


we can actually move this message before the assert_close in fact.

@glemaitre you want to put it outside the block for and format it inside the assert_close and SkipTest ?

oh my bad I did not see the occurrence in SkipTest. Good as it is

glemaitre · 2018-01-10T23:41:45Z

sklearn/utils/estimator_checks.py

+            raise SkipTest(msg)
+
+        if hasattr(estimator, method):
+            result_full, result_by_batch = _apply_func(getattr(estimator,


It will be easier to read as :

result_full, result_by_batch = _apply_func( getattr(estimator, method), X)

glemaitre · 2018-01-10T23:42:33Z

@Johayon 2 small nitpicks and this is good to be merged once it will be green :)

glemaitre · 2018-01-11T00:25:38Z

@Johayon Thanks!!!

add test in all_check

e425550

Johayon added 2 commits January 9, 2018 00:22

skip test for now

1619ed3

test the test

5f5b6a7

Johayon mentioned this pull request Jan 8, 2018

SparsePCA inconsistent on sub-sample #10431

Closed

jnothman reviewed Jan 9, 2018

View reviewed changes

lower atol and use tuple

021de1f

jnothman approved these changes Jan 9, 2018

View reviewed changes

jnothman changed the title ~~[WIP] test method subset invariance (Fixes #10420)~~ [MRG+1] test method subset invariance (Fixes #10420) Jan 9, 2018

add contribution to v0.20

52967f7

glemaitre reviewed Jan 10, 2018

View reviewed changes

add a small function to compute on whole set and batches + comment ch…

1119216

…ange

glemaitre reviewed Jan 10, 2018

View reviewed changes

coding style correction

c9604b8

glemaitre merged commit 4a9034a into scikit-learn:master Jan 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] test method subset invariance (Fixes #10420) #10428

[MRG+1] test method subset invariance (Fixes #10420) #10428

Johayon commented Jan 8, 2018 •

edited by jnothman

jnothman commented Jan 8, 2018 via email

jnothman commented Jan 8, 2018 via email

Johayon commented Jan 8, 2018 •

edited

jnothman left a comment

jnothman Jan 9, 2018

jnothman left a comment

jnothman commented Jan 10, 2018

glemaitre Jan 10, 2018

glemaitre Jan 10, 2018

glemaitre commented Jan 10, 2018

glemaitre Jan 10, 2018

Johayon Jan 10, 2018

glemaitre Jan 11, 2018

glemaitre Jan 10, 2018

glemaitre commented Jan 10, 2018

glemaitre commented Jan 11, 2018

[MRG+1] test method subset invariance (Fixes #10420) #10428

[MRG+1] test method subset invariance (Fixes #10420) #10428

Conversation

Johayon commented Jan 8, 2018 • edited by jnothman

jnothman commented Jan 8, 2018 via email

jnothman commented Jan 8, 2018 via email

Johayon commented Jan 8, 2018 • edited

jnothman left a comment

Choose a reason for hiding this comment

jnothman Jan 9, 2018

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Jan 10, 2018

glemaitre Jan 10, 2018

Choose a reason for hiding this comment

glemaitre Jan 10, 2018

Choose a reason for hiding this comment

glemaitre commented Jan 10, 2018

glemaitre Jan 10, 2018

Choose a reason for hiding this comment

Johayon Jan 10, 2018

Choose a reason for hiding this comment

glemaitre Jan 11, 2018

Choose a reason for hiding this comment

glemaitre Jan 10, 2018

Choose a reason for hiding this comment

glemaitre commented Jan 10, 2018

glemaitre commented Jan 11, 2018

Johayon commented Jan 8, 2018 •

edited by jnothman

Johayon commented Jan 8, 2018 •

edited