Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH get column names by default in PDP when passing data… #15429

Merged
merged 8 commits into from Nov 7, 2019

Conversation

@glemaitre
Copy link
Contributor

glemaitre commented Nov 1, 2019

follow-up of #14028
partially addressed #14969

This allows not having to specify feature_names with pandas DataFrame by taking X.column.tolist() by default.

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Nov 1, 2019

@glemaitre glemaitre mentioned this pull request Nov 1, 2019
2 of 4 tasks complete
Copy link
Contributor

NicolasHug left a comment

Mostly looks good

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
@NicolasHug NicolasHug added this to the 0.22 milestone Nov 1, 2019
@glemaitre glemaitre changed the title ENH get column names by default in PDP when passing dataframe [MRG] ENH get column names by default in PDP when passing dataframe Nov 4, 2019
Copy link
Member

adrinjalali left a comment

Is there a way to check if the correct feature names are used in the plot?

if not(hasattr(X, '__array__') or sparse.issparse(X)):
X = check_array(X, force_all_finite='allow-nan', dtype=np.object)
Comment on lines +598 to +599

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Nov 4, 2019

Member

I feel like at some point this should be inside the check_array. Also, why not pass accept_sparse to check_array and not check it here?

This comment has been minimized.

Copy link
@glemaitre

glemaitre Nov 4, 2019

Author Contributor

I feel like at some point this should be inside the check_array

Agreed. I think that @jorisvandenbossche intended a PR on this a while ago.

Also, why not pass accept_sparse to check_array and not check it here?

I think that the idea was to delegate the check to the underlying pipeline.
In fact, I am not sure that we need to make any checking at all. If we only need to get a column, _safe_indexing should be smart enough to deal with list.

This comment has been minimized.

Copy link
@glemaitre

glemaitre Nov 4, 2019

Author Contributor

Actually the next line will fail n_features=X.shape[0] so we need something else than a list.

if hasattr(X, "loc"):
# get the column names for a pandas dataframe
feature_names = X.columns.tolist()
Comment on lines +604 to +606

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Nov 4, 2019

Member

should this not explicitly check for columns instead?

This comment has been minimized.

Copy link
@glemaitre

glemaitre Nov 4, 2019

Author Contributor

Until now, we always ducktyped dataframe in this way.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Nov 4, 2019

Contributor

Maybe it's time to have a is_dataframe helper

This comment has been minimized.

Copy link
@adrinjalali

adrinjalali Nov 4, 2019

Member

we can leave it for another PR though. I'm happy as is for this PR

Copy link
Contributor

NicolasHug left a comment

LGTM, thanks @glemaitre

Is there a way to check if the correct feature names are used in the plot?

That'd be nice, I think @thomasjpfan would know that?

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Nov 4, 2019

That'd be nice, I think @thomasjpfan would know that?

It is done already in l.131-132 in the file test_plot_partial_dependence.py

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Nov 4, 2019

Actually, my change in this test create a dataframe and therefore, we do not test anymore with numpy arrays. We should do both as well there.

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Nov 4, 2019

I added back the test where the input data is a numpy array and feature_names is given.

@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented Nov 7, 2019

Merged with master to get fix for CI. Will merge when green.

@thomasjpfan thomasjpfan changed the title [MRG] ENH get column names by default in PDP when passing dataframe ENH get column names by default in PDP when passing data… Nov 7, 2019
@thomasjpfan thomasjpfan merged commit 2e881f5 into scikit-learn:master Nov 7, 2019
20 checks passed
20 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 97.16%)
Details
codecov/project 97.16% (+<.01%) compared to 2558ccc
Details
scikit-learn.scikit-learn Build #20191107.54 succeeded
Details
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented Nov 7, 2019

Thank you @glemaitre !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.