Skip to content

Commit ec2ea1b

Browse files
amuellerjnothman
authored andcommitted
DOC add a more complex example to gridsearch for nested parameters (#14548)
* add a more complex example to gridsearch for nested parameters * slight formatting fixes * normalize whitespace doesn't do what I thought it does * more whitespace yay * Update doc/modules/compose.rst Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com> * Update doc/modules/grid_search.rst Co-Authored-By: Joel Nothman <joel.nothman@gmail.com> * reformulate according to Joel's suggestions * typo * ellipsis * Update doc/modules/grid_search.rst Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com> * Update doc/modules/grid_search.rst Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com> * change the link to point to api docs for pipeline * one more explicit use of pipeline module for linking to the API docs * Update doc/modules/grid_search.rst Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
1 parent 3eacf94 commit ec2ea1b

File tree

2 files changed

+46
-7
lines changed

2 files changed

+46
-7
lines changed

doc/modules/compose.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,9 @@ permitted). This is convenient for performing only some of the transformations
101101
>>> pipe[-1:]
102102
Pipeline(steps=[('clf', SVC())])
103103

104+
105+
.. _pipeline_nested_parameters:
106+
104107
Nested parameters
105108
.................
106109

@@ -128,7 +131,7 @@ ignored by setting them to ``'passthrough'``::
128131

129132
The estimators of the pipeline can be retrieved by index:
130133

131-
>>> pipe[0]
134+
>>> pipe[0]
132135
PCA()
133136

134137
or by name::
@@ -147,7 +150,7 @@ or by name::
147150

148151
.. topic:: See also:
149152

150-
* :ref:`grid_search`
153+
* :ref:`composite_grid_search`
151154

152155

153156
Notes
@@ -369,7 +372,7 @@ Like ``Pipeline``, individual steps may be replaced using ``set_params``,
369372
and ignored by setting to ``'drop'``::
370373

371374
>>> combined.set_params(kernel_pca='drop')
372-
FeatureUnion(transformer_list=[('linear_pca', PCA()),
375+
FeatureUnion(transformer_list=[('linear_pca', PCA()),
373376
('kernel_pca', 'drop')])
374377

375378
.. topic:: Examples:
@@ -420,7 +423,7 @@ preprocessing or a specific feature extraction method::
420423

421424
For this data, we might want to encode the ``'city'`` column as a categorical
422425
variable using :class:`preprocessing.OneHotEncoder
423-
<sklearn.preprocessing.OneHotEncoder>` but apply a
426+
<sklearn.preprocessing.OneHotEncoder>` but apply a
424427
:class:`feature_extraction.text.CountVectorizer
425428
<sklearn.feature_extraction.text.CountVectorizer>` to the ``'title'`` column.
426429
As we might use multiple feature extraction methods on the same column, we give

doc/modules/grid_search.rst

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Note that it is common that a small subset of those parameters can have a large
4343
impact on the predictive or computation performance of the model while others
4444
can be left to their default values. It is recommended to read the docstring of
4545
the estimator class to get a finer understanding of their expected behavior,
46-
possibly by reading the enclosed reference to the literature.
46+
possibly by reading the enclosed reference to the literature.
4747

4848
Exhaustive Grid Search
4949
======================
@@ -192,11 +192,47 @@ result in an error when using multiple metrics.
192192
See :ref:`sphx_glr_auto_examples_model_selection_plot_multi_metric_evaluation.py`
193193
for an example usage.
194194

195+
.. _composite_grid_search:
196+
195197
Composite estimators and parameter spaces
196198
-----------------------------------------
199+
`GridSearchCV` and `RandomizedSearchCV` allow searching over parameters of
200+
composite or nested estimators such as `pipeline.Pipeline`,
201+
`ColumnTransformer`, `VotingClassifier` or `CalibratedClassifierCV`
202+
using a dedicated ``<estimator>__<parameter>`` syntax::
203+
204+
>>> from sklearn.model_selection import GridSearchCV
205+
>>> from sklearn.calibration import CalibratedClassifierCV
206+
>>> from sklearn.ensemble import RandomForestClassifier
207+
>>> from sklearn.datasets import make_moons
208+
>>> X, y = make_moons()
209+
>>> calibrated_forest = CalibratedClassifierCV(
210+
... base_estimator=RandomForestClassifier(n_estimators=10))
211+
>>> param_grid = {
212+
... 'base_estimator__max_depth': [2, 4, 6, 8]}
213+
>>> search = GridSearchCV(calibrated_forest, param_grid, cv=5)
214+
>>> search.fit(X, y)
215+
GridSearchCV(cv=5,
216+
estimator=CalibratedClassifierCV(...),
217+
param_grid={'base_estimator__max_depth': [2, 4, 6, 8]})
218+
219+
Here, ``<estimator>`` is the parameter name of the nested estimator,
220+
in this case ``base_estimator``.
221+
If the meta-estimator is constructed as a collection of estimators as in
222+
`pipeline.Pipeline`, then ``<estimator>`` refers to the name of the estimator,
223+
see :ref:`pipeline_nested_parameters`. In practice, there can be several
224+
levels of nesting::
225+
226+
>>> from sklearn.pipeline import Pipeline
227+
>>> from sklearn.feature_selection import SelectKBest
228+
>>> pipe = Pipeline([
229+
... ('select', SelectKBest()),
230+
... ('model', calibrated_forest)])
231+
>>> param_grid = {
232+
... 'select__k': [1, 2],
233+
... 'model__base_estimator__max_depth': [2, 4, 6, 8]}
234+
>>> search = GridSearchCV(pipe, param_grid, cv=5).fit(X, y)
197235

198-
:ref:`pipeline` describes building composite estimators whose
199-
parameter space can be searched with these tools.
200236

201237
Model selection: development and evaluation
202238
-------------------------------------------

0 commit comments

Comments
 (0)