Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] BUG: get_param in FeatureUnion disjoint for shallow/deep params #4467

Merged
merged 1 commit into from Apr 19, 2015

Conversation

samzhang111
Copy link

  • Previously params in a FeatureUnion or Pipeline were different
    when deep=False and deep=True, making it impossible to set shallow
    params during GridSearch (issue Searching across FeatureUnion transformer_weights in a GridSearchCV #4461)
  • Added test to enforce deep=True results to be superset of deep=False.
  • Updated deep get_param output in pipeline estimators with shallow params.

TST: Test estimators for get_params invariance (issue #4465)

  • Estimators that support get_params should always be consistent
    in that get_params(deep=False) should be a subset of get_params(deep=True)
  • Implemented test over all "normal" and "other" estimators provided by
    all_estimators
  • Ignoring grid_search estimators for now

@landscape-bot
Copy link

Code Health
Repository health increased by 0.00% when pulling 4e3e8a0 on giantoak:param_invariance into c247141 on scikit-learn:master.

@@ -377,3 +378,15 @@ def test_transformer_n_iter():

if hasattr(estimator, "max_iter") and name not in external_solver:
yield check_transformer_n_iter, name, estimator

def test_get_params_invariance():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this yield to the test_all_estimators please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in addition to where it already is?

  • test_all_estimators doesn't call all_estimators with the include_other flag so it excludes Pipeline and FeatureUnion.
  • right now the test isn't written to handle other metaestimators, so either I can change the test or it can go in test_non_meta_estimators?

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I overlooked that. We should try to unify tests as much as possible, but I guess for a regression test the current variant is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing include_other=True in test_all_estimators would be nice, and allowing the other estimators in this test, would be nice, too, but we can leave that for the future.

@amueller
Copy link
Member

Thanks for the PR. Travis is unhappy.

pass

def fit(self, X, y):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fit should probably return self, even if not strictly needed now.

@amueller
Copy link
Member

amueller commented Apr 1, 2015

have you tried investigating the travis error?

@samzhang111
Copy link
Author

It looks like I created a regression on line 90 of sklearn.tests.test_pipeline.test_pipeline_init. I'll look into it later today. Thanks.

By your comment "We should try to unify tests as much as possible, but I guess for a regression test the current variant is fine," does that mean leaving the test as is is fine? (aside from fixing the regression, and adding return self to the mock estimator)

@amueller
Copy link
Member

amueller commented Apr 1, 2015

Yeah, its fine to leave the tests as is.

@jnothman
Copy link
Member

jnothman commented Apr 2, 2015

Note that by making steps available as a Pipeline param too, it's possible a user will want to do something like:

GridSearchCV(..., param_grid=[{
    'steps': [[('a', LinearSVC())]],
    'a__C': [1, 10, 100],
 },
 {
    'steps': [[('a', DecisionTreeClassifier())]],
    'a__max_depth': [1, 5, 10],
 }]

I don't think the solution here ensures the correct ordering of setting steps and sub-estimator parameters, leaving the semantics undefined and I think dependent on dictionary iteration order as to whether it results in something usable or an error (not that I've tested), hence #1769. I don't think this is a fatal flaw in this patch, but at least you should be aware.

@samzhang111
Copy link
Author

@jnothman Any motion toward merging it? I think that's a superset over this patch. The tests should still hold though.

- Previously params in a FeatureUnion or Pipeline were different
when deep=False and deep=True, making it impossible to set shallow
params during GridSearch (issue scikit-learn#4461)
- Added test to enforce deep=True results to be superset of deep=False.
- Updated deep get_param output in pipeline estimators with shallow params.

TST: Test estimators for get_params invariance

- Estimators that support get_params should always be consistent
in that get_params(deep=False) should be a subset of get_params(deep=True)
- Implemented test over all "normal" and "other" estimators provided by
all_estimators
- Ignoring grid_search estimators for now

TST: Amended init_pipeline test to only compare non-estimators
@landscape-bot
Copy link

Code Health
Repository health increased by 0.00% when pulling edf60c0 on giantoak:param_invariance into 6560a8e on scikit-learn:master.

@amueller
Copy link
Member

amueller commented Apr 5, 2015

This looks good to me.

@amueller amueller changed the title BUG: get_param in FeatureUnion disjoint for shallow/deep params [MRG + 1] BUG: get_param in FeatureUnion disjoint for shallow/deep params Apr 5, 2015
@@ -377,3 +378,15 @@ def test_transformer_n_iter():

if hasattr(estimator, "max_iter") and name not in external_solver:
yield check_transformer_n_iter, name, estimator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8

2 lines between functions

@agramfort
Copy link
Member

besides LGTM but it interacts with #4550

Let's revamp this PR after #4550 is merged.

@amueller
Copy link
Member

#4450 only has your +1, so we could merge this one first.

amueller added a commit that referenced this pull request Apr 19, 2015
[MRG + 1] BUG: get_param in FeatureUnion disjoint for shallow/deep params
@amueller amueller merged commit 932726f into scikit-learn:master Apr 19, 2015
@amueller
Copy link
Member

merging, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants