New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] fix _BaseComposition._set_params with nested parameters #9945

Merged
merged 12 commits into from Oct 18, 2017

Conversation

Projects
None yet
4 participants
@amueller
Member

amueller commented Oct 17, 2017

Fixes #9944.

This seemed the simplest fix.

This kind of means that the protection in _BaseComposition._set_params is not enough.
We could make that method work by grouping the things in BaseEstimator.set_params according to their prefixes and call set_params only once per prefix. That seems like a slightly cleaner solution, but not sure it's worth the effort.

This will go away in Python3.6 as iteration is guaranteed to be ordered then.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 17, 2017

Member
Member

jnothman commented Oct 17, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 17, 2017

Member
Member

jnothman commented Oct 17, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 18, 2017

Member

I pushed a more extensive fix to your branch. I hope you don't mind.

Member

jnothman commented Oct 18, 2017

I pushed a more extensive fix to your branch. I hope you don't mind.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Oct 18, 2017

Codecov Report

Merging #9945 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9945      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62533    62540       +7     
==========================================
+ Hits        60138    60145       +7     
  Misses       2395     2395
Impacted Files Coverage Δ
sklearn/tests/test_pipeline.py 99.64% <100%> (ø) ⬆️
sklearn/feature_selection/base.py 94.82% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c6483c...dd1b792. Read the comment docs.

codecov bot commented Oct 18, 2017

Codecov Report

Merging #9945 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9945      +/-   ##
==========================================
+ Coverage   96.17%   96.17%   +<.01%     
==========================================
  Files         336      336              
  Lines       62533    62540       +7     
==========================================
+ Hits        60138    60145       +7     
  Misses       2395     2395
Impacted Files Coverage Δ
sklearn/tests/test_pipeline.py 99.64% <100%> (ø) ⬆️
sklearn/feature_selection/base.py 94.82% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c6483c...dd1b792. Read the comment docs.

@jnothman

@amueller, let me know if you find this use of groupby illegible. Using defaultdict may be neater.

jnothman added some commits Oct 18, 2017

@jnothman jnothman added this to the 0.19.1 milestone Oct 18, 2017

jnothman added some commits Oct 18, 2017

Unify error messages
You can't determine even local parameters with only class name
@lesteve

Not an expert on the Pipeline code, but this looks good to me, with a small comment.

I double-checked that the tests were failing on master Python 3.6.

Show outdated Hide outdated sklearn/tests/test_pipeline.py Outdated

@lesteve lesteve changed the title from MRG iterate over parameters in sorted order in ``set_params`` to [MRG+1] iterate over parameters in sorted order in ``set_params`` Oct 18, 2017

@lesteve lesteve changed the title from [MRG+1] iterate over parameters in sorted order in ``set_params`` to [MRG+1] fix _BaseComposition._set_params with nested parameters Oct 18, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 18, 2017

Member

I rename the PR title because it did not seem appropriate any more. Feel free to use a better one.

Member

lesteve commented Oct 18, 2017

I rename the PR title because it did not seem appropriate any more. Feel free to use a better one.

jnothman and others added some commits Oct 18, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 18, 2017

Member

Pushed a change that should fix the flake8 error.

Member

lesteve commented Oct 18, 2017

Pushed a change that should fix the flake8 error.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 18, 2017

Member

@jnothman you understood the problem, I guess, because you pushed the other fix? As I said, that fix might be a bit cleaner.

Just for the record:
The problem is that these are two pipelines inside each other. The logic for handling this is inside the _BaseComposition._set_params. The outer pipeline does that, but then iterates over the parameters it delegates in an arbitrary order. That means on the inner pipeline set_params is called multiple times, with one parameter each time. That means this loop in the outer pipeline determines the sequence that the parameters are set, the inner pipeline has no control.
@jnothman's fix (which was my second suggested fix) is to pass all parameters to the inner pipeline at once, so that it can handle the order of the setting itself, which allows the logic in _BaseComposition._set_params to kick in.
My fix was to order the setting of the parameters, and in lexical ordering any prefix of a string comes before the string.

Member

amueller commented Oct 18, 2017

@jnothman you understood the problem, I guess, because you pushed the other fix? As I said, that fix might be a bit cleaner.

Just for the record:
The problem is that these are two pipelines inside each other. The logic for handling this is inside the _BaseComposition._set_params. The outer pipeline does that, but then iterates over the parameters it delegates in an arbitrary order. That means on the inner pipeline set_params is called multiple times, with one parameter each time. That means this loop in the outer pipeline determines the sequence that the parameters are set, the inner pipeline has no control.
@jnothman's fix (which was my second suggested fix) is to pass all parameters to the inner pipeline at once, so that it can handle the order of the setting itself, which allows the logic in _BaseComposition._set_params to kick in.
My fix was to order the setting of the parameters, and in lexical ordering any prefix of a string comes before the string.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 18, 2017

Member

LGTM. This behavior is more natural.

Member

amueller commented Oct 18, 2017

LGTM. This behavior is more natural.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 18, 2017

Member

Your fix is not sufficient if, say, the user sets steps=[('a', Lasso())], a__alpha=1

Member

jnothman commented Oct 18, 2017

Your fix is not sufficient if, say, the user sets steps=[('a', Lasso())], a__alpha=1

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 18, 2017

Member

I'd missed your original comment about this potential solution!

Member

jnothman commented Oct 18, 2017

I'd missed your original comment about this potential solution!

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 18, 2017

Member

I'm happy to merge when Travis says my new assertion is okay.

Member

jnothman commented Oct 18, 2017

I'm happy to merge when Travis says my new assertion is okay.

@jnothman jnothman merged commit 75763cf into scikit-learn:master Oct 18, 2017

1 of 4 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
ci/circleci Your tests passed on CircleCI!
Details

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 18, 2017

@marcus-voss

This comment has been minimized.

Show comment
Hide comment
@marcus-voss

marcus-voss commented Oct 25, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 25, 2017

Member
Member

jnothman commented Oct 25, 2017

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 25, 2017

FIX bug in nested set_params usage
Issue where estimator is changed as well as its parameter: scikit-learn#9945 (comment)
@marcus-voss

This comment has been minimized.

Show comment
Hide comment
@marcus-voss

marcus-voss Oct 25, 2017

Thanks for that very quick fix! Indeed your fix does also work for my original SO minimal example. To close the issue there: Would you want to add an answer there referring to your fix, then I could accept that and close this issue there?

marcus-voss commented Oct 25, 2017

Thanks for that very quick fix! Indeed your fix does also work for my original SO minimal example. To close the issue there: Would you want to add an answer there referring to your fix, then I could accept that and close this issue there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment