Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] ENH Disassemble check estimator #14381

Merged
merged 47 commits into from Aug 28, 2019

Conversation

@thomasjpfan
Copy link
Member

thomasjpfan commented Jul 16, 2019

Reference Issues/PRs

Fixes #11622
Alternative to #13843

What does this implement/fix? Explain your changes.

Libraries will be able to write the following to run their tests:

from itertools import chain

@pytest.mark.parameterize('estimator, check',
    chain.from_iterable(check_estimator(est, generate_only=False)
                        for est in estimators))
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

Any other comments?

The name argument of the checks are set by check_estimator.

thomasjpfan added 2 commits Jul 16, 2019
Copy link
Member

rth left a comment

Thanks! Please add a what's new entry as well.

generate_only : bool, optional (default=True)
When `True`, checks are evaluated when `check_estimator` is called.
When `False`, `check_estimator` generates the checks and the estimator.

This comment has been minimized.

Copy link
@rth

rth Jul 16, 2019

Member

Please add versionadded

@@ -98,17 +100,15 @@ def _rename_partial(val):

@pytest.mark.parametrize(
"estimator, check",
_generate_checks_per_estimator(_yield_all_checks,

This comment has been minimized.

Copy link
@rth

rth Jul 16, 2019

Member

We can remove this function now?

@pytest.mark.parameterize(
'estimator, check',
chain.from_iterable(check_estimator(est, generate_only=False)
for est in estimators))

This comment has been minimized.

Copy link
@rth

rth Jul 16, 2019

Member

One issues with this is that because estimator is an instance and not a class it's not going to be rendered nicely in the list of tests. That's why we use id=_rename_partial. I guess there is no easy way of fixing this for users...

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Jul 16, 2019

Author Member

Without our _rename_partial, pytest names everything estimator213-check41. It could be slightly better if we yield the name as well from check_estimator, which will result in ARDRegression-estimator211-check21.

This comment has been minimized.

Copy link
@amueller

amueller Jul 19, 2019

Member

I think we should really try to have the tests have sensible names. This is super useless otherwise I think.

This comment has been minimized.

Copy link
@amueller

amueller Jul 19, 2019

Member

Why not fix the fixme in _rename_partial and move it to estimator_checks and make it public and use it here? That solves the problem, right?

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Jul 20, 2019

Author Member

Using print_changed_only adds newlines sometimes, which leads to pytest names with newlines in them.

Making _rename_partial is a good idea, it would need another name, maybe nicer_check_estimator_id? 🤔

This comment has been minimized.

Copy link
@amueller

amueller Jul 22, 2019

Member

how about using print_changed_only and remove the newlines? Also: if you're testing stuff that's that deeply nested there's no way to give it a good name.

thomasjpfan added 3 commits Jul 16, 2019
WIP
@thomasjpfan thomasjpfan changed the title [MRG] ENH Disassemble check estimator [WIP] ENH Disassemble check estimator Jul 16, 2019
@thomasjpfan thomasjpfan changed the title [WIP] ENH Disassemble check estimator [MRG] ENH Disassemble check estimator Jul 16, 2019
@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Jul 16, 2019

We can not use yield in check_estimator because python will automatically conver the function to a generator. This happens even if the yield is guarded by an if.

This PR was updated to return a list of checks, rather than a generator.

@rth
rth approved these changes Jul 19, 2019
Copy link
Member

rth left a comment

Thanks @thomasjpfan !

from itertools import chain
@pytest.mark.parameterize(

This comment has been minimized.

Copy link
@amueller

amueller Jul 19, 2019

Member
Suggested change
@pytest.mark.parameterize(
@pytest.mark.parametrize(
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 19, 2019

looks good a part from typos and setting the ids.

thomasjpfan added 4 commits Jul 20, 2019
…_estimator
@@ -265,7 +265,31 @@ def _yield_all_checks(name, estimator):
yield check_fit_idempotent


def check_estimator(Estimator):
def readable_check_estimator_ids(val):

This comment has been minimized.

Copy link
@amueller

amueller Jul 22, 2019

Member

maybe just set_check_estimator_ids or make_check_estimator_ids?

@@ -265,7 +265,31 @@ def _yield_all_checks(name, estimator):
yield check_fit_idempotent


def check_estimator(Estimator):
def readable_check_estimator_ids(val):
"""Create readable pytest ids for `check_estimator` when

This comment has been minimized.

Copy link
@amueller

amueller Jul 22, 2019

Member

It's also used internally without reference to check_estimator.

How about

"""Create pytest ids for checks.

Returns string representations for pytest tests, to be used as id.
Use together with ``check_estimator(..., generate_only=True)``
"""

Though that makes documenting val possibly more difficult. But on the other hand you could just directly document what val needs to be instead of defining it implicitly.

Also: missing Returns section.

Copy link
Member

amueller left a comment

It would be nice if you actually tested the code that you recommend others will use ;)

from itertools import chain
import pytest
from sklearn.utils.estimator_check import check_estimator

This comment has been minimized.

Copy link
@amueller

amueller Jul 22, 2019

Member
Suggested change
from sklearn.utils.estimator_check import check_estimator
from sklearn.utils.estimator_checks import check_estimator
from itertools import chain
import pytest
from sklearn.utils.estimator_check import check_estimator
from sklearn.utils.estimator_check import readable_check_estimator_ids

This comment has been minimized.

Copy link
@amueller

amueller Jul 22, 2019

Member
Suggested change
from sklearn.utils.estimator_check import readable_check_estimator_ids
from sklearn.utils.estimator_checks import readable_check_estimator_ids
thomasjpfan added 4 commits Jul 22, 2019
pass


@pytest.mark.parametrize("val, expected", [

This comment has been minimized.

Copy link
@amueller

amueller Jul 22, 2019

Member

this seems to contradict the statement below ;)

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 22, 2019

Looks good apart from the pytest soft dependency issue and the name of the function.
I would replace "readable", or just remove it, because it doesn't really add any information.

Copy link
Member

jnothman left a comment

WDYT of adding this syntactic sugar?

def parametrize_with_checks(estimators):
    if hasattr(estimators, 'fit'):
        estimators = [estimators]
    return pytest.mark.parametrize(
        ['estimator', 'check'],
        check_estimator(estimator, generate_only=True),
        ids=set_check_estimator_ids)

Then

from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.linear_model import LogisticRegression

@parametrize_with_checks(LogisticRegression)
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

Then we could make set_check_estimator_ids private.

@rth

This comment has been minimized.

Copy link
Member

rth commented Aug 22, 2019

WDYT of adding this syntactic sugar?

+1 though maybe with a itertools.chain somewhere in parametrize_with_checks as check_estimator still doesn't support lists of estimators I think?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 22, 2019

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 22, 2019

@rth

This comment has been minimized.

Copy link
Member

rth commented Aug 22, 2019

Having this interface actually means that we probably don't need the
awkward thing where check_estimator generates the estimator as well as the
checks.

parametrize_with_checks is highly interlinked with pytest (and depends on pytest), while check_estimator(estimator, generate_only=True), is much more general and could used for other applications.

For instance to programmatically generate a list of checks for a given application, manually skip a few selected ones and run the rest with pytest.parametrize. That's an important application for contrib projects. I think it would be worth keeping it.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 22, 2019

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Aug 23, 2019

Currently we still need the estimator passed back because of how we do set_checking_parameters 😅

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 24, 2019

Currently we still need the estimator passed back because of how we do set_checking_parameters 😅

I don't actually think we mean the same thing...

But yes, it's easier to have check_estimator generate pairs also to be unambiguous in the case of classes vs instances, etc.

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Aug 26, 2019

Overall I like having the syntactic sugar, but the hard dependency on pytest is a little unsettling.

My comment was about whether it should generate (estimator, check) pairs or
just check.

@jnothman Can you expand on this?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 26, 2019

…_estimator
@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Aug 26, 2019

Thank you for the clarification! As you said, the concern with only generating the checks is that it becomes harder to distinguish the class checks and the instance checks.

I will move forward with the parametrize_with_checks decorator, we would need to document how it names the tests since set_check_estimator_ids will become private.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 26, 2019

thomasjpfan added 4 commits Aug 27, 2019
…_estimator
Copy link
Member

jnothman left a comment

It might be nice to mention the new decorator in the developer guide.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 27, 2019

@rth let's merge?

@rth
rth approved these changes Aug 28, 2019
Copy link
Member

rth left a comment

Thanks again @thomasjpfan !

@rth rth merged commit d9a12aa into scikit-learn:master Aug 28, 2019
18 checks passed
18 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 1 new alert
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 98.14% of diff hit (target 96.9%)
Details
codecov/project 96.91% (+<.01%) compared to dd3b80e
Details
scikit-learn.scikit-learn Build #20190827.32 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 28, 2019

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 3, 2019

Awesome!

@amueller amueller removed this from PR phase in Andy's pets Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.