Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] test method subset invariance (Fixes #10420) #10428

Merged
merged 7 commits into from Jan 11, 2018

Conversation

Projects
None yet
3 participants
@Johayon
Copy link
Contributor

Johayon commented Jan 8, 2018

Fixes #10420

  • added a test for all estimators to check if any of the methods {predict, predict_proba, decision_function, score_samples, transform} produces a different result if applied on all the data or a subset (in this case all the elements one by one).

  • the test currently fails in 4 cases

  1. SVC with decision_function (#9174)
  2. SparsePCA with transform (#10431)
  3. MiniBatchSparsePCA with transform (#10431)
  4. BernoulliRBM with score_samples (due to stochasticity; can't easily fix)
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jan 8, 2018

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jan 8, 2018

Johayon added some commits Jan 8, 2018

@Johayon

This comment has been minimized.

Copy link
Contributor Author

Johayon commented Jan 8, 2018

I have created the issue for SparsePCA.
The test seems to fail on Birch on the build for python 3.4 because the absolute tolerance is too low, should I put a higher tolerance 1e-7 or 1e-6 ? (I do not have any issue on my build)

@jnothman
Copy link
Member

jnothman left a comment

Check if an increased tolerance works. It's still a bit surprising that that should be necessary.

res_all = res_all[0]
res_one = list(map(lambda x: x[0], res_one))
# TODO remove cases when corrected
if [name, method] in [['SVC', 'decision_function'],

This comment has been minimized.

Copy link
@jnothman

jnothman Jan 9, 2018

Member

use tuples. Like if name, method in [('SVC', 'decision_function'), ...]

tuples are intended for struct-like objects where each field means a different thing. Arrays are usually for homogeneous semantics

@jnothman
Copy link
Member

jnothman left a comment

This should probably not be marked WIP anymore :P

@jnothman jnothman changed the title [WIP] test method subset invariance (Fixes #10420) [MRG+1] test method subset invariance (Fixes #10420) Jan 9, 2018

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jan 10, 2018

Please add an entry to the change log at doc/whats_new/v0.20.rst under "Changes to estimator checks". Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@ignore_warnings(category=(DeprecationWarning, FutureWarning))
def check_methods_subset_invariance(name, estimator_orig):
# check that method gives invariant results if applied
# one by one or on all elements together.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Jan 10, 2018

Contributor

I would slightly rephrase the comment by replacing one by one by something like "if applied on mini bathes or the whole set"

if hasattr(estimator, method):
msg = ("{method} of {name} is not invariant when applied "
"to a subset.").format(method=method, name=name)
func = getattr(estimator, method)

This comment has been minimized.

Copy link
@glemaitre

glemaitre Jan 10, 2018

Contributor

I would personally make a small private function to compute and unpack the data.

def _apply_func(func, X):
    result_full = func(X)
    n_features = X.shape[1]
    result_by_batch = [func(batch.reshape(1, n_features))
                       for batch in X]
    # func can output tuple (e.g. score_samples)
    if type(res_all) == tuple:
        result_full = result_full[0]
        result_by_batch = list(map(lambda x: x[0], result_by_batch))

    return np.ravel(result_full), np.ravel(result_by_batch)

def check_methods_subset_invariance(name, estimator_orig):
    ...
    result_full, result_by_batch = _apply_func(get_attr(estimator, method))
    ...
    assert_allclose(results_full, results_by_batch,
                    atol=1e-7, err_msg=msg)
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Jan 10, 2018

LGTM apart of a change in a comment and some coding style.

for method in ["predict", "transform", "decision_function",
"score_samples", "predict_proba"]:

msg = ("{method} of {name} is not invariant when applied "

This comment has been minimized.

Copy link
@glemaitre

glemaitre Jan 10, 2018

Contributor

we can actually move this message before the assert_close in fact.

This comment has been minimized.

Copy link
@Johayon

Johayon Jan 10, 2018

Author Contributor

@glemaitre you want to put it outside the block for and format it inside the assert_close and SkipTest ?

This comment has been minimized.

Copy link
@glemaitre

glemaitre Jan 11, 2018

Contributor

oh my bad I did not see the occurrence in SkipTest. Good as it is

raise SkipTest(msg)

if hasattr(estimator, method):
result_full, result_by_batch = _apply_func(getattr(estimator,

This comment has been minimized.

Copy link
@glemaitre

glemaitre Jan 10, 2018

Contributor

It will be easier to read as :

result_full, result_by_batch = _apply_func(
    getattr(estimator, method), X)
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Jan 10, 2018

@Johayon 2 small nitpicks and this is good to be merged once it will be green :)

@glemaitre glemaitre merged commit 4a9034a into scikit-learn:master Jan 11, 2018

0 of 5 checks passed

ci/circleci: python2 Your tests are queued behind your running builds
Details
ci/circleci: python3 Your tests are queued behind your running builds
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Jan 11, 2018

@Johayon Thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.