Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Change named_steps to Bunch object #8586

Merged
merged 13 commits into from Mar 30, 2017

Conversation

@herilalaina
Copy link
Contributor

@herilalaina herilalaina commented Mar 14, 2017

Reference Issue

Fix #8481

What does this implement/fix? Explain your changes.

This PR changes pipeline.named_steps into bunch object instead of dictionary.

Any other comments?

Usage example

> clf = svm.SVC(kernel='linear')
> anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
> anova_svm.named_steps.anova
SelectKBest(k=5, score_func=<function f_regression at 0x7f1d984d8620>)

To do

  • Update source code
  • Test
  • Documentation
@codecov
Copy link

@codecov codecov bot commented Mar 14, 2017

Codecov Report

Merging #8586 into master will increase coverage by <.01%.
The diff coverage is 99.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8586      +/-   ##
==========================================
+ Coverage   95.49%   95.49%   +<.01%     
==========================================
  Files         342      342              
  Lines       61072    61088      +16     
==========================================
+ Hits        58318    58334      +16     
  Misses       2754     2754
Impacted Files Coverage Δ
sklearn/feature_selection/univariate_selection.py 99.46% <ø> (ø) ⬆️
sklearn/linear_model/coordinate_descent.py 96.94% <ø> (ø) ⬆️
sklearn/ensemble/forest.py 98.16% <ø> (ø) ⬆️
sklearn/metrics/classification.py 97.77% <ø> (ø) ⬆️
sklearn/mixture/tests/test_gaussian_mixture.py 99.63% <ø> (ø) ⬆️
sklearn/model_selection/_validation.py 98.71% <ø> (ø) ⬆️
sklearn/tree/tree.py 98.41% <ø> (ø) ⬆️
sklearn/calibration.py 98.87% <ø> (ø) ⬆️
sklearn/neighbors/approximate.py 98.95% <ø> (ø) ⬆️
sklearn/mixture/gaussian_mixture.py 100% <ø> (ø) ⬆️
... and 62 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 134c9b7...00e67c9. Read the comment docs.

@jnothman
Copy link
Member

@jnothman jnothman commented Mar 14, 2017

@herilalaina herilalaina changed the title Change named_steps to Bunch object [WIP] Change named_steps to Bunch object Mar 15, 2017
@herilalaina herilalaina changed the title [WIP] Change named_steps to Bunch object [MRG] Change named_steps to Bunch object Mar 17, 2017
Copy link
Member

@jnothman jnothman left a comment

You should also test what happens when step names conflict with attributes of dict such as values.

@@ -20,6 +20,7 @@
from .externals import six
from .utils import tosequence
from .utils.metaestimators import if_delegate_has_method
from .datasets.base import Bunch

This comment has been minimized.

@jnothman

jnothman Mar 20, 2017
Member

I don't think we want a dependency on datasets. Move Bunch to utils.

@@ -122,7 +123,7 @@ class Pipeline(_BasePipeline):
Attributes
----------
named_steps : dict
named_steps : bunch object
Read-only attribute to access any step parameter by user given name.
Keys are step names and values are steps parameters.

This comment has been minimized.

@jnothman

jnothman Mar 20, 2017
Member

Comment on what a bunch is: a dict with attribute access.

@lesteve
Copy link
Member

@lesteve lesteve commented Mar 22, 2017

Travis is failing because you have some PEP8 violations, please invest some time to configure on-the-fly flake8 checks inside your editor of choice. This allows you to spot these kind of problems while editing.

@herilalaina
Copy link
Contributor Author

@herilalaina herilalaina commented Mar 22, 2017

Hi @jnothman @lesteve , thank you for your comment.
Actually, the Bunch object didn't work when step names conflict with attribute of dict like values, items ..
So I decide to override __getattribute__ on Bunch class.

@jnothman
Copy link
Member

@jnothman jnothman commented Mar 22, 2017

@herilalaina
Copy link
Contributor Author

@herilalaina herilalaina commented Mar 22, 2017

To confirm what you said. I will remove the __getattribute__ in the Bunch class then change test_pipeline.py (line 524) with

pipeline = Pipeline([('values', transf), ("items", mult2)])
assert_true(pipeline.named_steps.values is not transf)
assert_true(pipeline.named_steps.items is not mult2)
Copy link
Member

@jnothman jnothman left a comment

I still think this deserves a brief mention in doc/modules/pipeline.rst, where it says "and as a dict in named_steps", with something like:

Attributes of named_steps map to keys, enabling tab completion in interactive environments:
    >>> pipe.named_steps.reduce_dim is pipe.named_steps['reduce_dim']
    True

Otherwise LGTM. Please add an entry to what's new.

assert_true(pipeline.named_steps.mult is mult2)

# Test bunch with conflict attribute of dict
pipeline = Pipeline([('values', transf), ("items", mult2)])

This comment has been minimized.

@jnothman

jnothman Mar 23, 2017
Member

Seeing as I could imagine a bad implementation where a Bunch was only used if the step names did not conflict, I would rather a test that included conflicting and non-conflicting names together. That is, include these steps all in the above pipeline.

@@ -247,6 +247,12 @@ API changes summary
needed for the perplexity calculation. :issue:`7954` by
:user:`Gary Foreman <garyForeman>`.

This comment has been minimized.

@jnothman

jnothman Mar 24, 2017
Member

I think this should mention pipeline

This comment has been minimized.

@herilalaina

herilalaina Mar 24, 2017
Author Contributor

Hi, I tried to organize into section each entry in API change summary. This resulted the conflict. I resolved it but appveyor test still failed (without merge). Should I remove my last commit ?

This comment has been minimized.

@jnothman

jnothman Mar 25, 2017
Member

Don't worry about sections. We'll do that at release

This comment has been minimized.

@herilalaina

herilalaina Mar 26, 2017
Author Contributor

Ok, thank you. I cleaned it.

@jnothman jnothman changed the title [MRG] Change named_steps to Bunch object [MRG+1] Change named_steps to Bunch object Mar 24, 2017
@herilalaina herilalaina force-pushed the herilalaina:autocomplete-ipython branch 3 times, most recently from 4fb4dad to 7b25de2 Mar 26, 2017
@TomDLT
TomDLT approved these changes Mar 30, 2017
@TomDLT TomDLT merged commit fb5a498 into scikit-learn:master Mar 30, 2017
5 checks passed
5 checks passed
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.49%)
Details
codecov/project 95.49% (+<.01%) compared to 134c9b7
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@amueller
Copy link
Member

@amueller amueller commented Mar 30, 2017

Awesome, thanks!

@@ -270,6 +270,12 @@ API changes summary
needed for the perplexity calculation. :issue:`7954` by
:user:`Gary Foreman <garyForeman>`.

- Replace attribute ``named_steps`` ``dict`` to :class:`sklearn.utils.Bunch`
in :class:`sklearn.pipeline.Pipeline` to enable tab completion in interactive
environment. In the case conflict value on ``named_steps`` and ``dict``

This comment has been minimized.

@amueller

amueller Mar 30, 2017
Member

I'm not sure I understand this sentence.

massich added a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017
* Change named_steps to Bunch object

* Update named_steps attribute documentation

* Add test for named steps bunch object

* Delete whitespace in test_pipeline

* Update test_pipeline.py

* Add comment for named_steps usage

* Move dataset/Bunch to utils

* Fix to PEP8 format

* Add __getattribute method to Bunch class, Fix pep8 bug

* Remove __getattribute__, update test_pipeline

* Update test with conflict and non-conflict named_steps

* Add reference to class Pipeline
@herilalaina herilalaina deleted the herilalaina:autocomplete-ipython branch Apr 30, 2017
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* Change named_steps to Bunch object

* Update named_steps attribute documentation

* Add test for named steps bunch object

* Delete whitespace in test_pipeline

* Update test_pipeline.py

* Add comment for named_steps usage

* Move dataset/Bunch to utils

* Fix to PEP8 format

* Add __getattribute method to Bunch class, Fix pep8 bug

* Remove __getattribute__, update test_pipeline

* Update test with conflict and non-conflict named_steps

* Add reference to class Pipeline
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Change named_steps to Bunch object

* Update named_steps attribute documentation

* Add test for named steps bunch object

* Delete whitespace in test_pipeline

* Update test_pipeline.py

* Add comment for named_steps usage

* Move dataset/Bunch to utils

* Fix to PEP8 format

* Add __getattribute method to Bunch class, Fix pep8 bug

* Remove __getattribute__, update test_pipeline

* Update test with conflict and non-conflict named_steps

* Add reference to class Pipeline
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Change named_steps to Bunch object

* Update named_steps attribute documentation

* Add test for named steps bunch object

* Delete whitespace in test_pipeline

* Update test_pipeline.py

* Add comment for named_steps usage

* Move dataset/Bunch to utils

* Fix to PEP8 format

* Add __getattribute method to Bunch class, Fix pep8 bug

* Remove __getattribute__, update test_pipeline

* Update test with conflict and non-conflict named_steps

* Add reference to class Pipeline
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Change named_steps to Bunch object

* Update named_steps attribute documentation

* Add test for named steps bunch object

* Delete whitespace in test_pipeline

* Update test_pipeline.py

* Add comment for named_steps usage

* Move dataset/Bunch to utils

* Fix to PEP8 format

* Add __getattribute method to Bunch class, Fix pep8 bug

* Remove __getattribute__, update test_pipeline

* Update test with conflict and non-conflict named_steps

* Add reference to class Pipeline
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Change named_steps to Bunch object

* Update named_steps attribute documentation

* Add test for named steps bunch object

* Delete whitespace in test_pipeline

* Update test_pipeline.py

* Add comment for named_steps usage

* Move dataset/Bunch to utils

* Fix to PEP8 format

* Add __getattribute method to Bunch class, Fix pep8 bug

* Remove __getattribute__, update test_pipeline

* Update test with conflict and non-conflict named_steps

* Add reference to class Pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants