Skip to content

Commit

Permalink
DOC Fix dropdown-related warnings (#27418)
Browse files Browse the repository at this point in the history
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
  • Loading branch information
3 people authored and jeremiedbb committed Sep 20, 2023
1 parent 91163a6 commit 9f7cba3
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 61 deletions.
120 changes: 65 additions & 55 deletions doc/modules/compose.rst
Expand Up @@ -54,9 +54,8 @@ The last estimator may be any type (transformer, classifier, etc.).
Usage
-----

|details-start|
**Construction**
|details-split|
Build a pipeline
................

The :class:`Pipeline` is built using a list of ``(key, value)`` pairs, where
the ``key`` is a string containing the name you want to give this step and ``value``
Expand All @@ -70,6 +69,10 @@ is an estimator object::
>>> pipe
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])

|details-start|
**Shortand version using :func:`make_pipeline`**
|details-split|

The utility function :func:`make_pipeline` is a shorthand
for constructing pipelines;
it takes a variable number of estimators and returns a pipeline,
Expand All @@ -81,14 +84,26 @@ filling in the names automatically::

|details-end|

Access pipeline steps
.....................

The estimators of a pipeline are stored as a list in the ``steps`` attribute.
A sub-pipeline can be extracted using the slicing notation commonly used
for Python Sequences such as lists or strings (although only a step of 1 is
permitted). This is convenient for performing only some of the transformations
(or their inverse):

>>> pipe[:1]
Pipeline(steps=[('reduce_dim', PCA())])
>>> pipe[-1:]
Pipeline(steps=[('clf', SVC())])

|details-start|
**Accessing steps**
**Accessing a step by name or position**
|details-split|


The estimators of a pipeline are stored as a list in the ``steps`` attribute,
but can be accessed by index or name by indexing (with ``[idx]``) the
Pipeline::
A specific step can also be accessed by index or name by indexing (with ``[idx]``) the
pipeline::

>>> pipe.steps[0]
('reduce_dim', PCA())
Expand All @@ -97,36 +112,61 @@ Pipeline::
>>> pipe['reduce_dim']
PCA()

Pipeline's `named_steps` attribute allows accessing steps by name with tab
`Pipeline`'s `named_steps` attribute allows accessing steps by name with tab
completion in interactive environments::

>>> pipe.named_steps.reduce_dim is pipe['reduce_dim']
True

A sub-pipeline can also be extracted using the slicing notation commonly used
for Python Sequences such as lists or strings (although only a step of 1 is
permitted). This is convenient for performing only some of the transformations
(or their inverse):
|details-end|

>>> pipe[:1]
Pipeline(steps=[('reduce_dim', PCA())])
>>> pipe[-1:]
Pipeline(steps=[('clf', SVC())])
Tracking feature names in a pipeline
....................................

|details-end|
To enable model inspection, :class:`~sklearn.pipeline.Pipeline` has a
``get_feature_names_out()`` method, just like all transformers. You can use
pipeline slicing to get the feature names going into each step::

.. _pipeline_nested_parameters:
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectKBest
>>> iris = load_iris()
>>> pipe = Pipeline(steps=[
... ('select', SelectKBest(k=2)),
... ('clf', LogisticRegression())])
>>> pipe.fit(iris.data, iris.target)
Pipeline(steps=[('select', SelectKBest(...)), ('clf', LogisticRegression(...))])
>>> pipe[:-1].get_feature_names_out()
array(['x2', 'x3'], ...)

|details-start|
**Nested parameters**
**Customize feature names**
|details-split|

Parameters of the estimators in the pipeline can be accessed using the
``<estimator>__<parameter>`` syntax::
You can also provide custom feature names for the input data using
``get_feature_names_out``::

>>> pipe[:-1].get_feature_names_out(iris.feature_names)
array(['petal length (cm)', 'petal width (cm)'], ...)

|details-end|

.. _pipeline_nested_parameters:

Access to nested parameters
...........................

It is common to adjust the parameters of an estimator within a pipeline. This parameter
is therefore nested because it belongs to a particular sub-step. Parameters of the
estimators in the pipeline are accessible using the ``<estimator>__<parameter>``
syntax::

>>> pipe.set_params(clf__C=10)
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC(C=10))])

|details-start|
**When does it matter?**
|details-split|

This is particularly important for doing grid searches::

>>> from sklearn.model_selection import GridSearchCV
Expand All @@ -143,36 +183,11 @@ ignored by setting them to ``'passthrough'``::
... clf__C=[0.1, 10, 100])
>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)

The estimators of the pipeline can be retrieved by index:

>>> pipe[0]
PCA()

or by name::

>>> pipe['reduce_dim']
PCA()

To enable model inspection, :class:`~sklearn.pipeline.Pipeline` has a
``get_feature_names_out()`` method, just like all transformers. You can use
pipeline slicing to get the feature names going into each step::

>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectKBest
>>> iris = load_iris()
>>> pipe = Pipeline(steps=[
... ('select', SelectKBest(k=2)),
... ('clf', LogisticRegression())])
>>> pipe.fit(iris.data, iris.target)
Pipeline(steps=[('select', SelectKBest(...)), ('clf', LogisticRegression(...))])
>>> pipe[:-1].get_feature_names_out()
array(['x2', 'x3'], ...)
.. topic:: See Also:

You can also provide custom feature names for the input data using
``get_feature_names_out``::
* :ref:`composite_grid_search`

>>> pipe[:-1].get_feature_names_out(iris.feature_names)
array(['petal length (cm)', 'petal width (cm)'], ...)
|details-end|

.. topic:: Examples:

Expand All @@ -184,11 +199,6 @@ You can also provide custom feature names for the input data using
* :ref:`sphx_glr_auto_examples_compose_plot_compare_reduction.py`
* :ref:`sphx_glr_auto_examples_miscellaneous_plot_pipeline_display.py`

.. topic:: See Also:

* :ref:`composite_grid_search`

|details-end|

.. _pipeline_cache:

Expand Down
9 changes: 3 additions & 6 deletions doc/modules/feature_extraction.rst
Expand Up @@ -225,7 +225,7 @@ it is advisable to use a power of two as the ``n_features`` parameter;
otherwise the features will not be mapped evenly to the columns.

.. topic:: References:

* `MurmurHash3 <https://github.com/aappleby/smhasher>`_.

|details-end|
Expand Down Expand Up @@ -398,9 +398,8 @@ last document::

.. _stop_words:

|details-start|
**Using stop words**
|details-split|
Using stop words
----------------

Stop words are words like "and", "the", "him", which are presumed to be
uninformative in representing the content of a text, and which may be
Expand Down Expand Up @@ -431,8 +430,6 @@ identify and warn about some kinds of inconsistencies.
In *Proc. Workshop for NLP Open Source Software*.
|details-end|

.. _tfidf:

Tf–idf term weighting
Expand Down

0 comments on commit 9f7cba3

Please sign in to comment.