Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] ENH Add support for dataframe in PDP #14028

Merged
merged 110 commits into from Oct 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
33868d1
TST add test to ensure support of pipeline in PDP
glemaitre Jun 5, 2019
f2035fe
EHN add support for dataframe in PDP
glemaitre Jun 5, 2019
133c116
revert to brute method for pipeline
glemaitre Jun 6, 2019
79156f3
refactor common part with columntransformer
glemaitre Jun 6, 2019
59cb6f5
fix
glemaitre Jun 6, 2019
cb4b00b
TST check the support of different types for features
glemaitre Jun 6, 2019
db0b589
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Jul 18, 2019
c04dcba
problem merge
glemaitre Jul 18, 2019
0326a88
PEP8
glemaitre Jul 19, 2019
2f0f690
issue merge
glemaitre Jul 19, 2019
33e655d
iter
glemaitre Jul 19, 2019
0194717
fix
glemaitre Jul 22, 2019
72ee546
PEP8
glemaitre Jul 22, 2019
db25ee6
update docstring
glemaitre Jul 23, 2019
60b8f59
whats new
glemaitre Jul 23, 2019
c01385c
EHN add support for scalar, slice and mask in safe_indexing axis=0
glemaitre Jul 25, 2019
0e5c037
DOC
glemaitre Jul 25, 2019
f5e08c4
FIX behaviour when passing None
glemaitre Jul 25, 2019
bb4db91
PEP8
glemaitre Jul 25, 2019
8cd74db
address thomas comments
glemaitre Jul 29, 2019
9878ef1
Merge remote-tracking branch 'glemaitre/is/consistent_safe_indexing' …
glemaitre Jul 29, 2019
2f6a0bd
Merge remote-tracking branch 'origin/master' into is/consistent_safe_…
glemaitre Jul 29, 2019
075dd80
debug
glemaitre Jul 29, 2019
d0f8d60
FIX change boolean array-likes indexing in old NumPy version
glemaitre Jul 29, 2019
f95a228
change indexing
glemaitre Jul 29, 2019
1c81803
add regression test in utils
glemaitre Jul 30, 2019
c8009a2
fix
glemaitre Jul 30, 2019
a80b33d
add test in column transformer
glemaitre Jul 30, 2019
56a6759
Merge remote-tracking branch 'origin/master' into is/mask_indexing
glemaitre Aug 1, 2019
9fb045d
raise error if axis not 0 or 1
glemaitre Aug 1, 2019
0d46f7f
Merge branch 'is/mask_indexing' into is/consistent_safe_indexing
glemaitre Aug 1, 2019
5dcf34f
itert
glemaitre Aug 1, 2019
70f0e02
iter
glemaitre Aug 1, 2019
7127b5a
refactor
glemaitre Aug 1, 2019
2f96882
PEP8 comments
glemaitre Aug 1, 2019
619fb05
iter
glemaitre Aug 1, 2019
b7539bd
style
glemaitre Aug 1, 2019
18fba6c
make check_is_fitted not take attributes
amueller Aug 1, 2019
e034ed8
cleanup, remove any_or_all
amueller Aug 1, 2019
1dc9258
fix LOF, birch, mixtures
amueller Aug 1, 2019
92d1aaf
iter
glemaitre Aug 1, 2019
d6034ea
remove unused method
amueller Aug 1, 2019
b1918e8
address different comments
glemaitre Aug 2, 2019
fe29402
Merge branch 'is/mask_indexing' into is/consistent_safe_indexing
glemaitre Aug 2, 2019
6322f99
iter
glemaitre Aug 2, 2019
5fbbd92
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Aug 2, 2019
d7f6810
Merge branch 'is/consistent_safe_indexing' into pdp_dataframe
glemaitre Aug 2, 2019
e478e20
iter
glemaitre Aug 2, 2019
4d4cc2d
update error message
glemaitre Aug 2, 2019
050932a
Merge remote-tracking branch 'origin/master' into is/consistent_safe_…
glemaitre Aug 2, 2019
801584f
Merge branch 'is/consistent_safe_indexing' into pdp_dataframe
glemaitre Aug 2, 2019
3cb95ac
fix partial dependence function
amueller Aug 2, 2019
4d3a8b4
make change backward-compatible
amueller Aug 2, 2019
1181982
also allow private fitted attributes
amueller Aug 2, 2019
7ed876d
slight refactoring in CountVectorizer to mess less with the vocabulary
amueller Aug 2, 2019
8701cc0
added regression test for not being able to call inverse_transform be…
amueller Aug 2, 2019
be4a90f
add special check for classes
amueller Aug 2, 2019
b62933d
address comments
glemaitre Aug 5, 2019
7e33027
more functions to fix
amueller Aug 5, 2019
e3f96bf
Merge remote-tracking branch 'amueller/anything_fitted' into pdp_data…
glemaitre Aug 5, 2019
34586e5
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Aug 5, 2019
82fbc6f
address almost all comments
glemaitre Aug 5, 2019
18c8b55
PEP8
glemaitre Aug 5, 2019
665ae92
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Sep 12, 2019
1587bfe
fix merge conflict error
glemaitre Sep 12, 2019
8a887ca
handle pipeline in partial dependence function
glemaitre Sep 12, 2019
b6e6a44
drop support for negative int indexing
glemaitre Sep 12, 2019
1da0b31
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Sep 13, 2019
9dbfea5
TST check dataframe are supported in plot_partial_dependence
glemaitre Sep 13, 2019
33865c8
Update sklearn/inspection/partial_dependence.py
glemaitre Sep 16, 2019
3cf6d75
DOC Add missing attributes to SVC and NuSVC (#14930)
kwinata Sep 13, 2019
34c8250
DOC Remove GraphViz mention in plot_tree docstring (#14973)
lesteve Sep 13, 2019
27de857
MAINT filter deprecation warnings triggered by all_estimators (#14691)
ogrisel Sep 13, 2019
0f9d481
MNT Deprecate enforce_estimator_tags_y (#14945)
NicolasHug Sep 13, 2019
9e0b7d2
DOC Adds more docstring standards (#14744)
thomasjpfan Sep 13, 2019
2db5c0d
DOC Add example for GroupShuffleSplit (#14906)
JesperDramsch Sep 13, 2019
334fe5a
DOC add missing attributes to OneVsRestClassifier (#14783)
catajara Sep 13, 2019
6f4509a
TST Adjusts rtol for test_lda_predict (#14978)
minggli Sep 14, 2019
c20e312
DOC Change default dataset for `plot_johnson_lindenstrauss_bound.py` …
andreanr Sep 14, 2019
9b65ed7
MNT deprecate outputs_2d_ attribute of dummy estimators (#14933)
NicolasHug Sep 15, 2019
e19a9d7
[MRG] Make k_means use KMeans instead (#14985)
NicolasHug Sep 16, 2019
c289e27
EHN update lobpcg from scipy master (#14971)
glemaitre Sep 16, 2019
27bfcc8
FIX implement repr for RepeatedKFold and RepeatedStratifiedKFold (#14…
DrGFreeman Sep 16, 2019
a846bad
address comments from Nicolas
glemaitre Sep 16, 2019
e06d0c6
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Sep 16, 2019
7e979d4
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Sep 20, 2019
09e5899
support indices in tuple in safe_indexing
glemaitre Sep 20, 2019
56455ee
PEP8
glemaitre Sep 23, 2019
85e0b7f
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Oct 21, 2019
c8c2a08
reviews
glemaitre Oct 21, 2019
4d427aa
safe_indexing is private
glemaitre Oct 21, 2019
c0469f7
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Oct 23, 2019
b7c6844
fix comments
glemaitre Oct 23, 2019
099ec56
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Oct 23, 2019
2579ef7
iter
glemaitre Oct 24, 2019
13d39f8
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Oct 24, 2019
a5777ad
reduce list of estimator to check for fitness
glemaitre Oct 24, 2019
0aa3cd9
remove unused import
glemaitre Oct 24, 2019
dc56f7b
fix
glemaitre Oct 24, 2019
e1de4a4
address thomas comments
glemaitre Oct 24, 2019
53cdf4a
remove support for slice
glemaitre Oct 28, 2019
09c7e7f
Merge remote-tracking branch 'glemaitre/pdp_dataframe' into pdp_dataf…
glemaitre Oct 28, 2019
42284d6
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Oct 28, 2019
8dee997
Merge remote-tracking branch 'glemaitre/pdp_dataframe' into pdp_dataf…
glemaitre Oct 28, 2019
1162ca7
Merge remote-tracking branch 'origin/master' into pdp_dataframe
glemaitre Oct 30, 2019
fa9f04a
add accept_slice to _determine_key_dtype
glemaitre Oct 30, 2019
f7f7096
docstring
glemaitre Oct 30, 2019
8029cf4
docstring
glemaitre Oct 30, 2019
a187e0c
docstring
glemaitre Oct 30, 2019
46aea93
update example
glemaitre Oct 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats_new/v0.22.rst
Expand Up @@ -370,6 +370,10 @@ Changelog
:class:`ensemble.HistGradientBoostingRegressor`. :pr:`13769` by
`Nicolas Hug`_.

- |Enhancement| :func:`inspection.partial_dependence` accepts pandas DataFrame
and :class:`pipeline.Pipeline` containing :class:`compose.ColumnTransformer`.
:pr:`14028` by :user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.kernel_approximation`
...................................

Expand Down
24 changes: 14 additions & 10 deletions examples/inspection/plot_partial_dependence.py
Expand Up @@ -30,6 +30,7 @@

from time import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

Expand All @@ -54,8 +55,8 @@
# (here the average target, by default)

cal_housing = fetch_california_housing()
names = cal_housing.feature_names
X, y = cal_housing.data, cal_housing.target
X = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)
y = cal_housing.target

y -= y.mean()

Expand Down Expand Up @@ -104,8 +105,9 @@
tic = time()
# We don't compute the 2-way PDP (5, 1) here, because it is a lot slower
# with the brute method.
features = [0, 5, 1, 2]
plot_partial_dependence(est, X_train, features, feature_names=names,
features = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms']
plot_partial_dependence(est, X_train, features,
feature_names=X_train.columns.tolist(),
n_jobs=3, grid_resolution=20)
print("done in {:.3f}s".format(time() - tic))
fig = plt.gcf()
Expand Down Expand Up @@ -143,8 +145,10 @@

print('Computing partial dependence plots...')
tic = time()
features = [0, 5, 1, 2, (5, 1)]
plot_partial_dependence(est, X_train, features, feature_names=names,
features = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms',
('AveOccup', 'HouseAge')]
plot_partial_dependence(est, X_train, features,
feature_names=X_train.columns.tolist(),
n_jobs=3, grid_resolution=20)
print("done in {:.3f}s".format(time() - tic))
fig = plt.gcf()
Expand Down Expand Up @@ -192,16 +196,16 @@

fig = plt.figure()

target_feature = (1, 5)
pdp, axes = partial_dependence(est, X_train, target_feature,
features = ('AveOccup', 'HouseAge')
pdp, axes = partial_dependence(est, X_train, features=features,
grid_resolution=20)
XX, YY = np.meshgrid(axes[0], axes[1])
Z = pdp[0].T
ax = Axes3D(fig)
surf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1,
cmap=plt.cm.BuPu, edgecolor='k')
ax.set_xlabel(names[target_feature[0]])
ax.set_ylabel(names[target_feature[1]])
ax.set_xlabel(features[0])
ax.set_ylabel(features[1])
ax.set_zlabel('Partial dependence')
# pretty init view
ax.view_init(elev=22, azim=122)
Expand Down
13 changes: 7 additions & 6 deletions examples/plot_partial_dependence_visualization_api.py
Expand Up @@ -15,6 +15,7 @@
"""
print(__doc__)

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.neural_network import MLPRegressor
Expand All @@ -32,8 +33,8 @@
# housing price dataset.

boston = load_boston()
X, y = boston.data, boston.target
feature_names = boston.feature_names
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target

tree = DecisionTreeRegressor()
mlp = make_pipeline(StandardScaler(),
Expand All @@ -55,7 +56,7 @@
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Decision Tree")
tree_disp = plot_partial_dependence(tree, X, ["LSTAT", "RM"],
feature_names=feature_names, ax=ax)
feature_names=X.columns.tolist(), ax=ax)

##############################################################################
# The partial depdendence curves can be plotted for the multi-layer perceptron.
Expand All @@ -65,7 +66,7 @@
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Multi-layer Perceptron")
mlp_disp = plot_partial_dependence(mlp, X, ["LSTAT", "RM"],
feature_names=feature_names, ax=ax,
feature_names=X.columns.tolist(), ax=ax,
line_kw={"c": "red"})

##############################################################################
Expand Down Expand Up @@ -134,7 +135,7 @@
# the same axes. In this case, `tree_disp.axes_` is passed into the second
# plot function.
tree_disp = plot_partial_dependence(tree, X, ["LSTAT"],
feature_names=feature_names)
feature_names=X.columns.tolist())
mlp_disp = plot_partial_dependence(mlp, X, ["LSTAT"],
feature_names=feature_names,
feature_names=X.columns.tolist(),
ax=tree_disp.axes_, line_kw={"c": "red"})