Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

Merged
merged 18 commits into from Jul 2, 2019

Conversation

@glemaitre
Copy link
Contributor

glemaitre commented Jun 6, 2019

Allows to index columns as well as rows using the code from ColumnTransformer.
Added some tests for the supported type.

This PR should simplify the review of #14028

NB: the functions have been moved and renamed but not modified.

@glemaitre glemaitre changed the title EHN add parameter axis in safe_indexing to slice rows and columns [MRG] EHN add parameter axis in safe_indexing to slice rows and columns Jun 6, 2019
@thomasjpfan thomasjpfan self-requested a review Jun 7, 2019
@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented Jun 7, 2019

Thank you for the PR! This would be useful for permutation importance too! #13146

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Jun 7, 2019

@NicolasHug This PR would need to be in for the PDP (and the permutation importance then ;)
I still need to cover the raise but you can have a look at the naming and if you are fine with the naming

Copy link
Contributor

NicolasHug left a comment

A few comments but mostly LGTM.

_check_key_type isn't tested anywhere?

sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Show resolved Hide resolved
@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Jun 11, 2019

_check_key_type isn't tested anywhere?

This is tested through _get_column

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Jun 11, 2019

This is tested through _get_column

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Jun 11, 2019

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

Good point

glemaitre added 2 commits Jun 11, 2019
@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented Jun 11, 2019

Looks like the pandas tests are not being reported again to codecov again 🤔

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 11, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Jun 12, 2019

Is there any sense in just using safe_indexing(X.T, mask)?

Do you mean to use the same indexing but with a transposed matrix in all cases?

My intent was to merge the code from the ColumnTransformer which is generic to get columns and can be reused in several places (PDP, permutation importance). So the supported types are different from the current safe_indexing for the rows.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 13, 2019

sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

jnothman left a comment

I thought in ColumnTransformer we only needed to handle callables at fit time and not actually when indexing. I am surprised we need callable support here.

sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
glemaitre added 2 commits Jun 20, 2019
glemaitre added 2 commits Jun 20, 2019
Copy link
Contributor

NicolasHug left a comment

Minor comments + some potential bug (?)

Apart from that LGTM

sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

NicolasHug left a comment

I would propose the following docstring which IMHO is clearer. LGTM anyway.

    Parameters
    ----------
    X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series
        Data from which to sample rows, items or columns.
    indices : array-like
        - When ``axis=0``, indices need to be an array of integer.
        - When ``axis=1``, indices can be one of:
            - integer: output is 1D, unless `X` is sparse.
            - container: lists, slices, boolean masks: output is 2D.
              Supported data types for containers:
                - integer or boolean (positional): supported for
                  arrays, sparse matrices and dataframes
                - string (key-based): only supported for dataframes. No keys
                  other than strings are allowed.
    axis : int, default=0
        The axis along which `X` will be subsampled. ``axis=0`` will select
        rows while ``axis=1`` will select columns.
Copy link
Member

jnothman left a comment

With regards to your improved docstring, @NicolasHug, what about the scalar string?

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Jun 26, 2019

I understand that scalar is just another name for integer, in this context?

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Jul 1, 2019

I updated the documentation with @NicolasHug proposal.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 1, 2019

Can we confirm the behaviour with a single string?

@glemaitre

This comment has been minimized.

Copy link
Contributor Author

glemaitre commented Jul 2, 2019

scalar string will return a pandas series

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 2, 2019

Considering that the documentation is the only point of contention here, and not the specification, I'll merge. Thanks @glemaitre

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 2, 2019

I should add that I'm okay with the most recent docstring update.

@jnothman jnothman merged commit 5f37ec1 into scikit-learn:master Jul 2, 2019
16 checks passed
16 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 8 new alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 96.96% of diff hit (target 96.75%)
Details
codecov/project 96.82% (+0.06%) compared to 06f6626
Details
scikit-learn.scikit-learn Build #20190702.33 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details
koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.