-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable indexing with nullable Boolean #31591
Changes from 20 commits
75c915f
9c5b9f0
2441b40
d71d1ba
4d3a264
543ef9a
d3e7a69
ad7ae66
6991394
f6e9ce5
1234407
9b7e879
efdd29a
7fa36b6
b8e3d6b
bc3fe3f
73ad221
547d7bc
5649445
bb3d143
f107252
7b924b7
46d77df
ac71cbf
e5ed092
9fcdb23
c2dfa93
a9a12b1
7c10f33
cf3d60d
157d8b9
250f228
647f0f6
6ccd96d
a9e73de
adc3075
29ff823
0a58605
b38a209
5088cbb
54efdd9
c6b81ed
67800c6
4c334f3
578fd3c
a559385
705947e
4974778
319b525
8007ce4
a10765f
d7fc3b7
bca582e
6f9a298
e1e39fe
5a72b2f
c0e8dc7
a293bc6
607d9ed
2e7f9b3
bfe472b
a6294f8
c6d23f6
c8ee434
fbda99d
3bf9327
dd65b0d
974ec5d
8f2d7bb
080d1d2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,11 @@ For example: | |
ser["2014"] | ||
ser.loc["May 2015"] | ||
|
||
Indexing with Nullable Boolean Arrays | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Previously indexing with a nullable Boolean array would raise a ``ValueError``, however this is now permitted with ``NA`` being treated as ``False``. (:issue:`31503`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add an example of the before and the after (style this after similar notes in 1.0.0) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that it raised only when there were missing values. |
||
|
||
.. _whatsnew_110.enhancements.other: | ||
|
||
Other enhancements | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
from pandas.core.dtypes.common import ( | ||
is_array_like, | ||
is_bool_dtype, | ||
is_extension_array_dtype, | ||
is_integer_dtype, | ||
is_list_like, | ||
) | ||
|
@@ -333,14 +334,11 @@ def check_array_indexer(array: AnyArrayLike, indexer: Any) -> Any: | |
... | ||
IndexError: Boolean index has wrong length: 3 instead of 2. | ||
|
||
A ValueError is raised when the mask cannot be converted to | ||
a bool-dtype ndarray. | ||
NA values are treated as False. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NA values in a boolean array. In other contexts, they should still raise. |
||
|
||
>>> mask = pd.array([True, pd.NA]) | ||
>>> pd.api.indexers.check_array_indexer(arr, mask) | ||
Traceback (most recent call last): | ||
... | ||
ValueError: Cannot mask with a boolean indexer containing NA values | ||
array([ True, False]) | ||
|
||
A numpy boolean mask will get passed through (if the length is correct): | ||
|
||
|
@@ -392,10 +390,17 @@ def check_array_indexer(array: AnyArrayLike, indexer: Any) -> Any: | |
|
||
dtype = indexer.dtype | ||
if is_bool_dtype(dtype): | ||
try: | ||
indexer = np.asarray(indexer, dtype=bool) | ||
except ValueError: | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raise ValueError("Cannot mask with a boolean indexer containing NA values") | ||
if is_extension_array_dtype(dtype): | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
indexer = indexer.to_numpy(dtype=bool, na_value=False) | ||
else: | ||
try: | ||
dsaxton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
indexer = np.asarray(indexer, dtype=bool) | ||
except ValueError: | ||
msg = ( | ||
"Cannot mask with a non-ExtensionArray boolean indexer " | ||
" containing missing values" | ||
) | ||
raise ValueError(msg) | ||
|
||
# GH26658 | ||
if len(indexer) != len(array): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,8 @@ | |
from pandas.util._decorators import Appender | ||
|
||
from pandas.core.dtypes.common import ( | ||
is_bool_dtype, | ||
is_extension_array_dtype, | ||
is_float, | ||
is_integer, | ||
is_iterator, | ||
|
@@ -2222,6 +2224,8 @@ def check_bool_indexer(index: Index, key) -> np.ndarray: | |
"the indexed object do not match)." | ||
) | ||
result = result.astype(bool)._values | ||
elif is_extension_array_dtype(key) and is_bool_dtype(key): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we have this at all anymore? is check_array_indexer not sufficient ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah good point, there's a lot of duplicate code happening. Will work on cleaning this up. |
||
result = key.to_numpy(dtype=bool, na_value=False) | ||
else: | ||
# key might be sparse / object-dtype bool, check_array_indexer needs bool array | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment here seems to indicate that this was also done for sparse data (and not only object dtype data). But no tests are failing? This might need some checking if the comment was outdated (I think it was only recently added) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you check this comment? (about the sparse from the code comment) |
||
result = np.asarray(result, dtype=bool) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -929,3 +929,23 @@ def test_diff(): | |
result = s.diff() | ||
expected = pd.Series(expected) | ||
tm.assert_series_equal(result, expected) | ||
|
||
|
||
def test_nullable_boolean_mask_series(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should these tests be removed in favor of the modified tests? May be a bit redundant with the others checking for the same / similar behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if they are truly redudant |
||
s = pd.Series([1, 2, 3]) | ||
mask = pd.array([True, True, None], dtype="boolean") | ||
|
||
result = s[mask] | ||
expected = s.iloc[:2] | ||
|
||
tm.assert_series_equal(result, expected) | ||
|
||
|
||
def test_nullable_boolean_mask_frame(): | ||
df = pd.DataFrame({"a": [1, 2, 3]}) | ||
mask = pd.array([True, True, None], dtype="boolean") | ||
|
||
result = df[mask] | ||
expected = df.iloc[:2, :] | ||
|
||
tm.assert_frame_equal(result, expected) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -158,21 +158,21 @@ def test_getitem_boolean_array_mask(self, data): | |
result = pd.Series(data)[mask] | ||
self.assert_series_equal(result, expected) | ||
|
||
def test_getitem_boolean_array_mask_raises(self, data): | ||
dsaxton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def test_getitem_boolean_na_treated_as_false(self, data): | ||
mask = pd.array(np.zeros(data.shape, dtype="bool"), dtype="boolean") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you take here something with True's as well? (now it will give an empty result) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be good to run the test also for both a boolean array and a list as mask (to ensure the list works) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the list input may not be working properly, will work on fixing that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche So I think the issue with list masks containing bools and pd.NA was that Made an update there to recognize pd.NA and also updated the test; hopefully CI will still pass. The assumption that boolean indexers are ones that can be cast as numpy boolean arrays seems to happen in a lot of places (e.g., https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/base.py#L4147) so I could see this causing problems. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, the "old" code of still accepting object dtype makes this a bit more complex indeed. Maybe instead of casting to a numpy array, we could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that should theoretically work in combination with the right change to |
||
mask[:2] = pd.NA | ||
|
||
msg = ( | ||
"Cannot mask with a boolean indexer containing NA values|" | ||
"cannot mask with array containing NA / NaN values" | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
data[mask] | ||
result = data[mask] | ||
expected = data[mask.fillna(False)] | ||
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
s = pd.Series(data) | ||
|
||
with pytest.raises(ValueError): | ||
s[mask] | ||
result = s[mask] | ||
expected = s[mask.fillna(False)] | ||
|
||
tm.assert_series_equal(result, expected) | ||
|
||
@pytest.mark.parametrize( | ||
"idx", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok on moving to 1.0.2