Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DataFrame#loc[..., NDArray[bool]] #862

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

skatsuta
Copy link
Contributor

@skatsuta skatsuta commented Feb 11, 2024

  • Tests added: Please use assert_type() to assert the type of any return value

Problem

Currently, only list[bool] or Series[bool] can be used as a boolean vector for column masking in DataFrame#loc.
For example, Pyright reports the following error when using NDArray[bool] for column masking:

# test.py
import pandas as pd

df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
df.loc[:, df.columns.isin(["col1"])]
$ pyright test.py
/Users/skatsuta/src/github.com/SmartDriveInc/mobility-risk-score/ml/test.py
  /Users/skatsuta/src/github.com/SmartDriveInc/mobility-risk-score/ml/test.py:4:1 - error: No overloads for "__getitem__" match the provided arguments (reportCallIssue)
  /Users/skatsuta/src/github.com/SmartDriveInc/mobility-risk-score/ml/test.py:4:1 - error: Argument of type "tuple[slice, np_ndarray_bool]" cannot be assigned to parameter "idx" of type "tuple[Scalar, slice]" in function "__getitem__"
    "tuple[slice, np_ndarray_bool]" is incompatible with "tuple[Scalar, slice]"
      Tuple entry 1 is incorrect type
        Type "slice" cannot be assigned to type "Scalar"
          "slice" is incompatible with "str"
          "slice" is incompatible with "bytes"
          "slice" is incompatible with "date"
          "slice" is incompatible with "datetime"
          "slice" is incompatible with "timedelta" (reportArgumentType)
2 errors, 0 warnings, 0 informations

However, such usage is perfectly valid from a pandas perspective.

Solution

Allow MaskType to be used as a boolean vector for column masking in DataFrame#loc.
MaskType contains np_ndarray_bool in addition to Series[bool], so the above code will no longer generate a type error.

Instead, Series[bool] is removed from the type annotation because it is included in MaskType.

Also add test cases for it.

@skatsuta skatsuta marked this pull request as ready for review February 11, 2024 14:01
@skatsuta skatsuta changed the title ENH: Add support for DataFrame#loc[..., NDArray[bool]] Add support for DataFrame#loc[..., NDArray[bool]] Feb 11, 2024
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skatsuta

I created an issue #868 to see if we can do a similar change elsewhere.

@Dr-Irv Dr-Irv merged commit 6fd6145 into pandas-dev:main Feb 12, 2024
13 checks passed
@skatsuta skatsuta deleted the dataframe-loc-boolean-mask branch February 13, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants