Skip to content

Conversation

amueller
Copy link
Member

@amueller amueller commented Aug 1, 2019

Related to what was brought up in #8710:
I don't think we explicitly describe how to do grid-searches on meta-estimators, and Pipeline actually works different from other meta-estimators (it uses the step names, not the parameter names).

@codecov
Copy link

codecov bot commented Aug 2, 2019

Codecov Report

Merging #14548 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14548      +/-   ##
==========================================
- Coverage   96.66%   96.65%   -0.02%     
==========================================
  Files         397      397              
  Lines       73015    73015              
  Branches     7980     7980              
==========================================
- Hits        70583    70575       -8     
- Misses       2419     2427       +8     
  Partials       13       13
Impacted Files Coverage Δ
sklearn/_build_utils/__init__.py 46.15% <0%> (-5.13%) ⬇️
sklearn/svm/classes.py 97.93% <0%> (-1.04%) ⬇️
sklearn/metrics/pairwise.py 97.25% <0%> (-0.83%) ⬇️
sklearn/metrics/cluster/supervised.py 97.95% <0%> (-0.69%) ⬇️
sklearn/tree/export.py 95.09% <0%> (-0.28%) ⬇️
sklearn/cluster/tests/test_k_means.py 99.79% <0%> (-0.21%) ⬇️
sklearn/manifold/mds.py 94.11% <0%> (+0.98%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53f76d1...3f4558e. Read the comment docs.

>>> search = GridSearchCV(pipe, param_grid, cv=5)
>>> search.fit(X, y)
GridSearchCV(cv=5,
estimator=Pipeline(steps=[('select', SelectKBest()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite wide. Does doing something like this hide too much?

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[...]),
             param_grid={'model__base_estimator__max_depth': [2, 4, 6, 8],
                         'select__k': [1, 2]})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to format this differently but pytests doesn't allow me to.
We could change our repr lol.
I can also hide more with ellipsis but I think that's not super nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IGNORE_WHITESPACE doesn't allow changes in line breaks?? Who knew?

More practically, maybe just fit on the previous line and don't show the repr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it allows replacing any whitespace by any other whitespace. There's no whitespace here to replace :-/

Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>

Here, ``<estimator>`` is the parameter name of the nested estimator,
in this case ``base_estimator``.
The `Pipeline` class has a slightly different notation, as explained in :ref:`pipeline`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not Pipeline alone... ColumnTransformer, VotingClassifier, etc, do this too... It's more like "If, like a Pipeline, the meta-estimator is constructed as a collection of other estimators, ..."

The other thing that's confusing about this is that, according to get_params, the step names are parameter names of the Pipeline. They're just not constructor params...

>>> search = GridSearchCV(pipe, param_grid, cv=5)
>>> search.fit(X, y)
GridSearchCV(cv=5,
estimator=Pipeline(steps=[('select', SelectKBest()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IGNORE_WHITESPACE doesn't allow changes in line breaks?? Who knew?

More practically, maybe just fit on the previous line and don't show the repr?

@jnothman
Copy link
Member

jnothman commented Aug 3, 2019

Good that you've identified this... it's very hard to make sure there aren't gaps in the user guide when instead of fixing it, people go off writing books that cover the same material :D j/k

amueller and others added 3 commits August 5, 2019 11:13

Here, ``<estimator>`` is the parameter name of the nested estimator,
in this case ``base_estimator``.
If the meta-estimator is constructed as a collection of estimators as in `Pipeline`, then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline should resolve to the Pipeline estimator doc, not to the user guide.

@amueller
Copy link
Member Author

amueller commented Aug 5, 2019

@jnothman if I had realized there was a gap here I would have fixed it before writing the book ;)

>>> search.fit(X, y)
GridSearchCV(cv=5,
estimator=CalibratedClassifierCV(base_estimator=RandomForestClassifier(n_estimators=10)),
param_grid={'base_estimator__max_depth': [2, 4, 6, 8]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of using ellipsis, which passes pytest.

  GridSearchCV(cv=5,
               estimator=CalibratedClassifierCV(...),
               param_grid={'base_estimator__max_depth': [2, 4, 6, 8]})

Depends on if we prefer this:

Screen Shot 2019-08-05 at 4 54 51 PM

or this:

Screen Shot 2019-08-05 at 4 55 40 PM

-----------------------------------------
`GridSearchCV` and `RandomizedSearchCV` allow searching over parameters of
composite or nested estimators such as `Pipeline`, `ColumnTransformer`,
`VotingClasssifier` or `CalibratedClassifierCV`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link resolution here is kinda funny:
links

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is VotingClassifier not resolved? Did I misspell it? Also: we can fix that later :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline should resolve to the Pipeline estimator doc, not to the user guide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Isn't the user guide better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, not when all the others links are estimators and when the link is actually a title section that often doesn't make sense in the context they're used in.

Copy link
Member Author

@amueller amueller Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should fix our title handles then? hm not sure, I don't have a strong opinion. I can use pipeline.Pipeline that should work, right? or use :ref:pipeline.Pipeline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also think they all should link to comparable pages, i.e. the API page in this case.

:class:`ensemble.VotingClassifier`

should work for VotingClassifier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a typo ;)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad magic link resolution :(

Other than that LGTM.

-----------------------------------------
`GridSearchCV` and `RandomizedSearchCV` allow searching over parameters of
composite or nested estimators such as `Pipeline`, `ColumnTransformer`,
`VotingClasssifier` or `CalibratedClassifierCV`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline should resolve to the Pipeline estimator doc, not to the user guide.


Here, ``<estimator>`` is the parameter name of the nested estimator,
in this case ``base_estimator``.
If the meta-estimator is constructed as a collection of estimators as in `Pipeline`, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline should resolve to the Pipeline estimator doc, not to the user guide.

amueller and others added 2 commits August 7, 2019 15:32
Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
@adrinjalali
Copy link
Member

Pipeline still points to the user guide, doesn't it?

…it-learn into metaestimator_search_steps

# Conflicts:
#	doc/modules/grid_search.rst
Composite estimators and parameter spaces
-----------------------------------------
`GridSearchCV` and `RandomizedSearchCV` allow searching over parameters of
composite or nested estimators such as `Pipeline`, `ColumnTransformer`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline.Pipeline here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other lgtm

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
@adrinjalali adrinjalali merged commit ec2ea1b into scikit-learn:master Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants