Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Adds Plotting API to Partial Dependence #14646

Merged
merged 36 commits into from Sep 20, 2019

Conversation

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 13, 2019

Reference Issues/PRs

Fixes #14596

What does this implement/fix? Explain your changes.

  1. Adds PartialDependenceDisplay for plot_partial_dependence.
  2. Deprecates the fig option. The new API can take a single axes and place all the plots in that ax.
  3. If a list of axes are given, it must match the number of partial dependence plots. If it does, the partial dependence plots will be place into those axes.
  4. The example, plot_partial_dependence_visualization_api.py, is used to demonstrate how plot_partial_dependence how to use this axes based API.

Any other comments?

This PR removes the custom padding between subplots because the padding depends on the size of the figure. When using matplotlib's constrained_layout:

fig, ax = plt.subplots(constrained_layout=True)
disp = plot_partial_dependence(..., ax=ax)

The labels of the plots do not overlap. This can not be used in our examples because we are support matplotlib 1.5.1 (which does not have constrained_layout).

thomasjpfan added 7 commits Aug 8, 2019
WIP
…ial_dep
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 13, 2019

could we use tight layout in the examples?

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 13, 2019

I think the possibility of providing a single axes and stealing or providing the right number of axes and reusing should be explained in the dev docs where you explained the display API

Copy link
Contributor

glemaitre left a comment

It generally looks good.

examples/inspection/plot_partial_dependence.py Outdated Show resolved Hide resolved
The :func:`~sklearn.inspection.plot_partial_dependence` function returns a
:class:`~sklearn.inspection.PartialDependenceDisplay` object that can be used
for plotting without needing to recalculate the partial dependence. In this
example we should how to plot partial dependence plots and quickly customize

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor
Suggested change
example we should how to plot partial dependence plots and quickly customize
example, we show how to plot partial dependence plots and how to quickly customize
:class:`~sklearn.inspection.PartialDependenceDisplay` object that can be used
for plotting without needing to recalculate the partial dependence. In this
example we should how to plot partial dependence plots and quickly customize
tht plot with the Visualization API.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor
Suggested change
tht plot with the Visualization API.
the plot with the Visualization API.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1,
random_state=0)

hgbr = make_pipeline(StandardScaler(), HistGradientBoostingRegressor())

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor

There is no need to use the StandardScaler. It will also speed-up the partial-dependence computation since we will use the recursive method included in the HistGradientBoostingRegressor which is not available when using a Pipeline.

# curves.
#
# One way to plot the curves is to place them in the same figure, with the
# curves of each model on each row. First, we create a figure two axes with two

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor

"a figure two axes" -> is it missing a "with"?

For two-way partial dependence plots.
ax : Matplotlib axes, list of Matplotlib axes or None, (default=None)
By default,

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor

Missing the rest of the sentence ;)

sklearn/inspection/partial_dependence.py Show resolved Hide resolved
sklearn/inspection/partial_dependence.py Show resolved Hide resolved
from sklearn.inspection import plot_partial_dependence


boston = load_boston()

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor

it would be better to name it BOSTON.

Maybe this is time to use a fixture for the data here instead?

# Test partial dependence plot function on multi-output input.
X, y = multioutput_regression_data
clf = LinearRegression()
clf.fit(X, y)

This comment has been minimized.

Copy link
@glemaitre

glemaitre Aug 14, 2019

Contributor

are you using clf?

Copy link
Contributor

glemaitre left a comment

Ups but not to be approved yet :) wrong button

@@ -1724,4 +1724,39 @@ attributes:
return viz.plot(ax=ax, name=name, **kwargs)
```

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member
Suggested change
```
the axes on the grid. Furthermore, the matplotlib Artists are stored in
`lines_` and `contours_` where the key is the position on the grid. When a list
of axes is passsed in, the `axes_`, `lines_`, and `contours_` keys is single
int corresponding to the position on the passed in list of axes.

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

Can you link to the advanced partial dependence example here?

@@ -0,0 +1,110 @@
"""
=========================================
Partial Dependence with Visualization API

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

Advanced plotting with partial dependence?

Any partial dependence plot uses the visualization API by definition, right?

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

Love the example btw!

# curves.
#
# One way to plot the curves is to place them in the same figure, with the
# curves of each model on each row. First, we create a figure two axes with two

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member
Suggested change
# curves of each model on each row. First, we create a figure two axes with two
# curves of each model on each row. First, we create a figure with two axes within two
Copy link
Member

amueller left a comment

We might need to think about the data structures for self.axes_ a bit more. Or at least you should clarify.

# :func:`~sklearn.inspection.PartialDependenceDisplay.plot` functions of
# `hgbr_disp` and `mlp_disp`. The plot funciton will plot the two curves in the
# space by the passed in axes. The resulting plot places the histogram gradient
# boosting partial dependence curves on top of the multi-layer perceptron

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

"places the gbrt plots in the first row and the mlp plots in the second row"?

# Another way to compare the curves is to plot them on top of each other. Here
# we create a figure with one row and two columns. The axes are passed into the
# :func:`~sklearn.inspection.PartialDependenceDisplay.plot` function as list,
# which will plot the partial dependence curves of each model on the same axes.

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

Maybe add a sentence saying that the length of axes must be equal to the number of features we plot?

----------
bounding_ax_ : matplotlib Axes or None
If `ax` is an axes or None, the `bounding_ax_` is the axes where the
grid of partial dependence plots are drawn. If `ax` is list of axes,

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member
Suggested change
grid of partial dependence plots are drawn. If `ax` is list of axes,
grid of partial dependence plots are drawn. If `ax` is a list of axes,
and jth column. If `ax` is a list of axes, `axes_[i]` is the ith item
in `ax`.
lines_ : matplotlib Artists

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

is this a dict? a list? a numpy array?

grid of partial dependence plots are drawn. If `ax` is list of axes,
`bounding_ax_` is None.
axes_ : dict of matplotlib Axes

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

how can a dict have [i, j]?

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

It's a little weird that the shape of the attributes depends on whether we passed a list of axes or not, but I don't see a nice way around that right now.

given, it is treated as a bounding axes and a grid of partial
depdendence plots will be drawn on that top of it. If a list of
axes are passed, the partial dependence plots will be drawn on
those axes. By default, a single bounding axes is created and

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

remove single here?

those axes. By default, a single bounding axes is created and
treated as the single axes case.
n_cols : int, default=3

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

Honestly, I hate the way this is currently done and we should be doing what dabl is doing or https://github.com/matplotlib/grid-strategy is doing but that's for another day lol.

A figure object onto which the plots will be drawn, after the
figure has been cleared. By default, a new one is created.
.. deprecated:: 0.22

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

I agree. We need it for backward-compatibility though. We could make it private. I'm not sure if that's less weird?

if fig is None:
_, ax = plt.subplots()
else:
ax = fig.add_subplot(111)

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

I hate the matlab syntax. I would use either add_subplot(1, 1, 1) or gca?

# which will plot the partial dependence curves of each model on the same axes.

# sphinx_gallery_thumbnail_number = 4
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

This comment has been minimized.

Copy link
@amueller

amueller Aug 14, 2019

Member

I feel like it would be more natural to just call hgbr_disp with the default arguments and use the generated axes to pass to mlp_disp.
Right now the API might require us to do a ravel or something? But that's something that we should support. Maybe we should be using OrderedDicts?

thomasjpfan added 5 commits Aug 14, 2019
…ial_dep
@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Aug 16, 2019

thomasjpfan added 4 commits Aug 16, 2019
@@ -23,16 +23,16 @@ class PartialDependenceDisplay:
Parameters
----------
pd_results : list of (ndarray, ndarray)
pd_results : (ndarray, ndarray) list

This comment has been minimized.

Copy link
@amueller

amueller Aug 16, 2019

Member

list of tuples?

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Aug 16, 2019

If a user passes in a list or an ndarray of one dimension, axes_ will be one dimension and the len(features)==ax.shape[0] can be checked.

Copy link
Contributor

NicolasHug left a comment

Great work!

Made a bunch of comments. I only looked at the docstrings and the examples so far.

  • I don't think we really need the _plot folder, why not just have partial_dependence_display.py? If we must have _plot then it would make sense to me that plot_partial_dependence() is there. On top of that we now have 3 files named partial_dependence.py :/

  • About the deciles: (I'm literally discovering that now, shame on me), would you mind adding a small comment in the docstring of plot_partial_depence, e.g.
    "The deciles of the feature values will be shown with tick marks on the x-axis "

  • With the current API, is it easy to extend it in the future by allowing e.g. 3d plots instead of contours for 2way pdps?


When a single axes is passed in, that axes defines a space for the multiple
axes to be placed. In this case, matplotlib's
`gridspec.GridSpecFromSubplotSpec` can be used to split up the space::

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

I'm a bit confused. Is this a suggestion of what the internals of plot_* could/should be? If so, let's say it ;)

By default, the `ax` keyworld in `plot` is `None`. In this case, the single
axes is created and the gridspec api is used to create the regions to plot in.

For example, :func:`~sklearn.inspection.plot_partial_dependence` plots multiple

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

I'd suggest "See for example [...] which plots multiple..."

axes is created and the gridspec api is used to create the regions to plot in.

For example, :func:`~sklearn.inspection.plot_partial_dependence` plots multiple
lines and contours using this API. The axes that is passed in or created that

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

The axes that is passed in or created that defines the space

That sounds a bit odd to me

the grid. Positions that are not used are set to `None`. Furthermore, the
matplotlib Artists are stored in `lines_` and `contours_` where the key is the
position on the grid. When a list of axes is passsed in, the `axes_`, `lines_`,
and `contours_` keys is single int corresponding to the position on the passed

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

is single int -> "are integers" ?

on -> of ??

I find that whole sentence a bit hard to read.

and `contours_` keys is single int corresponding to the position on the passed
in list of axes.

Read more in :ref:`sphx_glr_auto_examples_plot_roc_curve_visualization_api.py`

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

should this be in the above section?

# Plotting partial dependence of the two models independently
# ===========================================================
#
# Next, we plot a partial dependence curves for features "LSTAT" and "RM" for

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor
Suggested change
# Next, we plot a partial dependence curves for features "LSTAT" and "RM" for
# Next, we plot a partial dependence curves for features "LSTAT" and "RM"

##############################################################################
# The partial depdendence curves can be plotted for the multi-layer perceptron.
# In this case `line_kw` was passed to

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor
Suggested change
# In this case `line_kw` was passed to
# In this case `line_kw` is passed to
##############################################################################
# The partial depdendence curves can be plotted for the multi-layer perceptron.
# In this case `line_kw` was passed to
# `~sklearn.inspection.plot_partial_dependence` to change the color of the

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

does not link

# The partial depdendence curves can be plotted for the multi-layer perceptron.
# In this case `line_kw` was passed to
# `~sklearn.inspection.plot_partial_dependence` to change the color of the
# curve and `n_cols` was set to 1 to set the number of columns to 1.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor
Suggested change
# curve and `n_cols` was set to 1 to set the number of columns to 1.
# curve and `n_cols` is set to 1.
#
# One way to plot the curves is to place them in the same figure, with the
# curves of each model on each row. First, we create a figure with two axes
# within two rows and one column. The two axes are passed to the

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 3, 2019

Contributor

I like the example but it's pretty condensed and it combines multiple functionalities at once so it's hard to understand how things interact independently.

For example it's a bit confusing at first how you can end up with a 2 by 2 grid while you only asked for 1 column.

I think it's missing a very simple plot with just 1 feature where you combine the curves of the 2 models. Then

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Sep 5, 2019

Author Member

I added a "Plotting partial dependence for one feature" section at the end. With the current API, there are two ways to plot a single feature:

Option 1

tree_disp = plot_partial_dependence(tree, X, ["LSTAT"],
                                    feature_names=feature_names)
mlp_disp = plot_partial_dependence(mlp, X, ["LSTAT"],
                                   feature_names=feature_names,
                                   ax=tree_disp.axes_, line_kw={"c": "red"})

Option 2

_, ax = plt.subplots()
tree_disp = plot_partial_dependence(tree, X, ["LSTAT"],
                                    feature_names=feature_names, ax=[ax])
mlp_disp = plot_partial_dependence(mlp, X, ["LSTAT"],
                                   feature_names=feature_names,
                                   ax=[ax], line_kw={"c": "red"})

For the example, I went with option 1. The "nicer" way to do this is:

Possible option that is being disallowed.

_, ax = plt.subplots()
tree_disp = plot_partial_dependence(tree, X, ["LSTAT"],
                                    feature_names=feature_names, ax=ax)
mlp_disp = plot_partial_dependence(mlp, X, ["LSTAT"],
                                   feature_names=feature_names,
                                   ax=ax, line_kw={"c": "red"})

The first call will call ax.set_visible(False) to denote that the space has been used. The second call will see that the axes is not visible and raise an error. We can technically support this "single feature" and "single axes" case, which I think will add another layer of complexity to the API i.e. "If ax is a single axes and len(features) == 1, then we behave differently"

A figure object onto which the plots will be drawn, after the
figure has been cleared. By default, a new one is created.
.. deprecated:: 0.22

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 4, 2019

Contributor

Should we just not publicly document it then?


Scikit-learn defines a simple API for creating visualizations for machine
learning. The key features of this API is to run calculations once and to have
the flexibility to adjust the visualizations after the fact.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Sep 4, 2019

Contributor

To explain the purpose of this doc we can add something like

This section is intended for developers who whish to develop or maintain plotting tools. For usage, users should refer to the user guide <link>

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Sep 4, 2019

I don't think we really need the _plot folder,

I have trying to keep using this _plot folder to place the display and plot function into its namespace and only import it in sklearn.inspection. This can not be done here because plot_partial_dependence is already defined publicly in sklearn.inspeciton.partial_dependence. So for only partial_dependence it can be justified to create a _partial_dependence_display.py file.

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Sep 4, 2019

I won't argue more but I find that having a module with a single file isn't necessary (esp. since it creates other kinds of inconsistencies), and having 3 different files named partial_dependence.py is confusing.

thomasjpfan added 3 commits Sep 4, 2019
@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Sep 5, 2019

With the current API, is it easy to extend it in the future by allowing e.g. 3d plots instead of contours for 2way pdps?

I have been avoiding 3-d plots. They... add another dimension to plotting. I think current plot_partial_dependence API is kind of bloated with its 2-way partial dependence, I would not want to add more to it.

thomasjpfan added 6 commits Sep 5, 2019
…ial_dep
thomasjpfan added 3 commits Sep 9, 2019
…ial_dep
@glemaitre glemaitre added this to TO REVIEW in Guillaume's pet Sep 16, 2019
sklearn/inspection/partial_dependence.py Show resolved Hide resolved
sklearn/inspection/partial_dependence.py Show resolved Hide resolved
A figure object onto which the plots will be drawn, after the
figure has been cleared. By default, a new one is created.
.. deprecated:: 0.22

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
.. deprecated:: 0.22
.. deprecated:: 0.22
``fig`` will be removed in 0.24.
Dict with keywords passed to the `matplotlib.pyplot.contourf`
call for two-way partial dependence plots.
fig : Matplotlib figure object, optional (default=None)

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
fig : Matplotlib figure object, optional (default=None)
fig : Matplotlib figure object, default=None

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor

Is there a way to not introduce a parameter which is directly deprecated.

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Sep 19, 2019

Author Member

Updated PR to not need to add a fig just to deprecate it.

:class:`~sklearn.inspection.PartialDependenceDisplay` object that can be used
for plotting without needing to recalculate the partial dependence. In this
example, we show how to plot partial dependence plots and how to quickly
customize the plot with the Visualization API.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
customize the plot with the Visualization API.
customize the plot with the visualization API.
For two-way partial dependence plots.
ax : Matplotlib axes or array-like of Matplotlib axes, default=None
- If a single axes is passed in, it is treated as a bounding axes

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
- If a single axes is passed in, it is treated as a bounding axes
- If a single axis is passed in, it is treated as a bounding axes
and a grid of partial depedendence plots will be drawn within
these bounds. The `n_cols` parameter controls the number of
columns in the grid.
- If a array-like of axes are passed in, the partial dependence

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
- If a array-like of axes are passed in, the partial dependence
- If an array-like of axes are passed in, the partial dependence
columns in the grid.
- If a array-like of axes are passed in, the partial dependence
plots will be drawn directly into these axes.
- If `None`, a figure and a bounding axes is created and treated

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
- If `None`, a figure and a bounding axes is created and treated
- If `None`, a figure and a bounding axis is created and treated
- If a array-like of axes are passed in, the partial dependence
plots will be drawn directly into these axes.
- If `None`, a figure and a bounding axes is created and treated
as the single axes case.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor
Suggested change
as the single axes case.
as the single axis case.
display = PartialDependenceDisplay(pd_results, features, feature_names,
target_idx, pdp_lim, deciles)
return display.plot(ax=ax, n_cols=n_cols, line_kw=line_kw,
contour_kw=contour_kw, fig=fig)

This comment has been minimized.

Copy link
@glemaitre

glemaitre Sep 16, 2019

Contributor

Regarding the introduction of fig

@glemaitre glemaitre moved this from TO REVIEW to REVIEWED AND WAITING FOR CHANGES in Guillaume's pet Sep 16, 2019
thomasjpfan added 2 commits Sep 19, 2019
…ial_dep
@glemaitre glemaitre merged commit 97185ec into scikit-learn:master Sep 20, 2019
13 of 16 checks passed
13 of 16 checks passed
codecov/patch CI failed.
Details
codecov/project CI failed.
Details
ci/circleci: doc Your tests failed on CircleCI
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 1 new alert
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
scikit-learn.scikit-learn Build #20190919.54 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Sep 20, 2019

LGTM Thanks @thomasjpfan

@glemaitre glemaitre moved this from REVIEWED AND WAITING FOR CHANGES to MERGED in Guillaume's pet Sep 20, 2019
@jnothman jnothman mentioned this pull request Oct 28, 2019
0 of 8 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants
You can’t perform that action at this time.