Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] fix _BaseComposition._set_params with nested parameters #9945

Merged
merged 12 commits into from Oct 18, 2017

Conversation

@amueller
Copy link
Member

@amueller amueller commented Oct 17, 2017

Fixes #9944.

This seemed the simplest fix.

This kind of means that the protection in _BaseComposition._set_params is not enough.
We could make that method work by grouping the things in BaseEstimator.set_params according to their prefixes and call set_params only once per prefix. That seems like a slightly cleaner solution, but not sure it's worth the effort.

This will go away in Python3.6 as iteration is guaranteed to be ordered then.

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 17, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 17, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 18, 2017

I pushed a more extensive fix to your branch. I hope you don't mind.

@codecov
Copy link

@codecov codecov bot commented Oct 18, 2017

Codecov Report

Merging #9945 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9945      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62533    62540       +7     
==========================================
+ Hits        60138    60145       +7     
  Misses       2395     2395
Impacted Files Coverage Δ
sklearn/tests/test_pipeline.py 99.64% <100%> (ø) ⬆️
sklearn/feature_selection/base.py 94.82% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c6483c...dd1b792. Read the comment docs.

Copy link
Member

@jnothman jnothman left a comment

@amueller, let me know if you find this use of groupby illegible. Using defaultdict may be neater.

@jnothman jnothman force-pushed the amueller:sorted_param_setting branch from b2e5a7b to 5129522 Oct 18, 2017
@jnothman jnothman force-pushed the amueller:sorted_param_setting branch from 5129522 to 1c07283 Oct 18, 2017
@jnothman jnothman added this to the 0.19.1 milestone Oct 18, 2017
@jnothman jnothman force-pushed the amueller:sorted_param_setting branch from 28a3235 to b7b4464 Oct 18, 2017
jnothman added 2 commits Oct 18, 2017
You can't determine even local parameters with only class name
Copy link
Member

@lesteve lesteve left a comment

Not an expert on the Pipeline code, but this looks good to me, with a small comment.

I double-checked that the tests were failing on master Python 3.6.

('b', DummyRegressor())
]))
])
params = {

This comment has been minimized.

@lesteve

lesteve Oct 18, 2017
Member

Should we use a collections.OrderedDict to make this test more future-proof?

This comment has been minimized.

@jnothman

jnothman Oct 18, 2017
Member

Shrug. I think it distorts the purpose of the test somewhat if we only test with an OrderedDict.

This comment has been minimized.

@lesteve

lesteve Oct 18, 2017
Member

I think this is a bit clearer (without some of the subtlety of python version-specific dict ordering):

estimator.set_params(a__b__alpha=0.001, a__b=Lasso())
@lesteve lesteve changed the title MRG iterate over parameters in sorted order in set_params [MRG+1] iterate over parameters in sorted order in set_params Oct 18, 2017
@lesteve lesteve changed the title [MRG+1] iterate over parameters in sorted order in set_params [MRG+1] fix _BaseComposition._set_params with nested parameters Oct 18, 2017
@lesteve
Copy link
Member

@lesteve lesteve commented Oct 18, 2017

I rename the PR title because it did not seem appropriate any more. Feel free to use a better one.

jnothman and others added 2 commits Oct 18, 2017
@lesteve
Copy link
Member

@lesteve lesteve commented Oct 18, 2017

Pushed a change that should fix the flake8 error.

@amueller
Copy link
Member Author

@amueller amueller commented Oct 18, 2017

@jnothman you understood the problem, I guess, because you pushed the other fix? As I said, that fix might be a bit cleaner.

Just for the record:
The problem is that these are two pipelines inside each other. The logic for handling this is inside the _BaseComposition._set_params. The outer pipeline does that, but then iterates over the parameters it delegates in an arbitrary order. That means on the inner pipeline set_params is called multiple times, with one parameter each time. That means this loop in the outer pipeline determines the sequence that the parameters are set, the inner pipeline has no control.
@jnothman's fix (which was my second suggested fix) is to pass all parameters to the inner pipeline at once, so that it can handle the order of the setting itself, which allows the logic in _BaseComposition._set_params to kick in.
My fix was to order the setting of the parameters, and in lexical ordering any prefix of a string comes before the string.

@amueller
Copy link
Member Author

@amueller amueller commented Oct 18, 2017

LGTM. This behavior is more natural.

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 18, 2017

Your fix is not sufficient if, say, the user sets steps=[('a', Lasso())], a__alpha=1

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 18, 2017

I'd missed your original comment about this potential solution!

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 18, 2017

I'm happy to merge when Travis says my new assertion is okay.

@jnothman jnothman merged commit 75763cf into scikit-learn:master Oct 18, 2017
1 of 4 checks passed
1 of 4 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
ci/circleci Your tests passed on CircleCI!
Details
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 18, 2017
@marcus-voss
Copy link

@marcus-voss marcus-voss commented Oct 25, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 25, 2017

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 25, 2017
Issue where estimator is changed as well as its parameter: scikit-learn#9945 (comment)
@marcus-voss
Copy link

@marcus-voss marcus-voss commented Oct 25, 2017

Thanks for that very quick fix! Indeed your fix does also work for my original SO minimal example. To close the issue there: Would you want to add an answer there referring to your fix, then I could accept that and close this issue there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants