Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

Merged
merged 18 commits into from Jul 2, 2019

Conversation

glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jun 6, 2019

Allows to index columns as well as rows using the code from ColumnTransformer.
Added some tests for the supported type.

This PR should simplify the review of #14028

NB: the functions have been moved and renamed but not modified.

@glemaitre glemaitre changed the title EHN add parameter axis in safe_indexing to slice rows and columns [MRG] EHN add parameter axis in safe_indexing to slice rows and columns Jun 6, 2019
@thomasjpfan thomasjpfan self-requested a review Jun 7, 2019
@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jun 7, 2019

Thank you for the PR! This would be useful for permutation importance too! #13146

@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Jun 7, 2019

@NicolasHug This PR would need to be in for the PDP (and the permutation importance then ;)
I still need to cover the raise but you can have a look at the naming and if you are fine with the naming

Copy link
Member

@NicolasHug NicolasHug left a comment

A few comments but mostly LGTM.

_check_key_type isn't tested anywhere?

sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Show resolved Hide resolved
@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Jun 11, 2019

_check_key_type isn't tested anywhere?

This is tested through _get_column

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Jun 11, 2019

This is tested through _get_column

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Jun 11, 2019

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

Good point

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jun 11, 2019

Looks like the pandas tests are not being reported again to codecov again 🤔

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 11, 2019

@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Jun 12, 2019

Is there any sense in just using safe_indexing(X.T, mask)?

Do you mean to use the same indexing but with a transposed matrix in all cases?

My intent was to merge the code from the ColumnTransformer which is generic to get columns and can be reused in several places (PDP, permutation importance). So the supported types are different from the current safe_indexing for the rows.

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 13, 2019

sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@jnothman jnothman left a comment

I thought in ColumnTransformer we only needed to handle callables at fit time and not actually when indexing. I am surprised we need callable support here.

sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@NicolasHug NicolasHug left a comment

Minor comments + some potential bug (?)

Apart from that LGTM

sklearn/utils/__init__.py Outdated Show resolved Hide resolved
sklearn/utils/__init__.py Show resolved Hide resolved
sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved
Copy link
Member

@NicolasHug NicolasHug left a comment

I would propose the following docstring which IMHO is clearer. LGTM anyway.

    Parameters
    ----------
    X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series
        Data from which to sample rows, items or columns.
    indices : array-like
        - When ``axis=0``, indices need to be an array of integer.
        - When ``axis=1``, indices can be one of:
            - integer: output is 1D, unless `X` is sparse.
            - container: lists, slices, boolean masks: output is 2D.
              Supported data types for containers:
                - integer or boolean (positional): supported for
                  arrays, sparse matrices and dataframes
                - string (key-based): only supported for dataframes. No keys
                  other than strings are allowed.
    axis : int, default=0
        The axis along which `X` will be subsampled. ``axis=0`` will select
        rows while ``axis=1`` will select columns.

Copy link
Member

@jnothman jnothman left a comment

With regards to your improved docstring, @NicolasHug, what about the scalar string?

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Jun 26, 2019

I understand that scalar is just another name for integer, in this context?

@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Jul 1, 2019

I updated the documentation with @NicolasHug proposal.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 1, 2019

Can we confirm the behaviour with a single string?

@glemaitre
Copy link
Contributor Author

@glemaitre glemaitre commented Jul 2, 2019

scalar string will return a pandas series

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 2, 2019

Considering that the documentation is the only point of contention here, and not the specification, I'll merge. Thanks @glemaitre

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 2, 2019

I should add that I'm okay with the most recent docstring update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants