Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Estimator Tags #10264

Closed
wants to merge 140 commits into from
Closed

[WIP] Estimator Tags #10264

wants to merge 140 commits into from

Conversation

GKjohns
Copy link
Contributor

@GKjohns GKjohns commented Dec 6, 2017

Reference Issues/PRs

Original PR: #8022

Fixes #6599, #7737, #6554, #6715

Also see #6510

What does this implement/fix? Explain your changes.

Taken over from @amueller. Implements descriptive tags for estimators e.g. 'stateless': False, 'deterministic': True

Any other comments?

See #8022 for more details

…later add more checks for nd-data (as in isotonic) if we like to.
amueller and others added 23 commits June 9, 2017 19:32
# Conflicts:
#	sklearn/feature_extraction/image.py
#	sklearn/tests/test_common.py
#	sklearn/utils/estimator_checks.py
# Conflicts:
#	sklearn/tests/test_common.py
#	sklearn/utils/estimator_checks.py
@GKjohns
Copy link
Contributor Author

GKjohns commented Dec 6, 2017

Many of the tests are failing after merging new changes after 5 months.

Many of these failures are due to checks being run on estimators that they shouldn't be. As an example, _check_transformer is attempting to check that AdditiveChi2Sampler.fit() raises a ValueError when the number of features doesn't match the number of features in training. This shouldn't happen, as AdditiveChi2Sampler has stateless tag set to true and should skip the test altogether. Currently checking to see why this is still happening.

@jnothman
Copy link
Member

jnothman commented Dec 6, 2017

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

  • 2 for Variable defined multiple times
  • 1 for Conflicting attributes in base classes

Comment posted by lgtm.com

@amueller
Copy link
Member

amueller commented Dec 6, 2017

maybe instead of running the whole suite try running check_estimator on a single estimator like DummyClassifier or AdditiveChi2Sampler.

@GKjohns
Copy link
Contributor Author

GKjohns commented Dec 15, 2017

Prof. @amueller does it make sense that I'm getting 9 failures on the current master branch (not this one)? I'm beginning to think that the issue lies with the testing framework change, as you mentioned earlier

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GKjohns, the test failures result from other changes to estimator_checks (and elsewhere) that you merged in without care, as exemplified in my comment there. In particular, #9588 added more helpful error messages to assertions in estimator_checks, and these changes may not have merged trivially.

@@ -20,3 +20,5282 @@ Previous Releases
Version 0.14 <whats_new/v0.14.rst>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was not correctly merged. Make sure this file is not touched in the diff, only doc/whats_new/v0.20.rst

parameters to ``__init__`` in the ``_required_parameters`` class attribute,
which is a list or tuple. If ``_required_parameters`` is only ``["estimator"]`` or ``["base_estimator"]``, then the
estimator will be instantiated with an instance of
``LinearDiscriminantAnalysis`` (or ``RidgeRegression`` if the estimator is a regressor) in the tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems strange to assume that if the metaestimator is a regressor, the estimator passed into it is.

Copy link
Member

@amueller amueller Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're saying is that this should be another tag for the meta-estimator? "requires-base-estimator being "regressor", "classifier", "transformer", "any"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so


In addition to the tags, estimators are also need to declare any non-optional
parameters to ``__init__`` in the ``_required_parameters`` class attribute,
which is a list or tuple. If ``_required_parameters`` is only ``["estimator"]`` or ``["base_estimator"]``, then the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this can be almost always inferred from signature. What does having it listed add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being explicit about it. This way we can detect if a parameter was left without a default by accident. Right now we require all estimators to be default-constructible. This PR will allow testing non-default-constructible estimators. But we still want to enforce that most estimators are default-constructible.

# raises error on malformed input for transform
if hasattr(X, 'T') and not _safe_tags(transformer, "stateless"):
# If it's not an array, it does not have a 'T' property
assert_raises(ValueError, transformer.transform, X.T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the assertion that's failing. The one at the end of this function is the one that's failing on stateless transformers, though it is a repeat of this with more detail.

@amueller
Copy link
Member

closing as I continued in the original PR

@amueller amueller closed this Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement estimator tags
4 participants