Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable indexing with nullable Boolean #31591

Merged
merged 70 commits into from
Feb 22, 2020
Merged

ENH: Enable indexing with nullable Boolean #31591

merged 70 commits into from
Feb 22, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Feb 3, 2020

closes #31503

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I think from the discussion in #31503 that this is something people want to allow.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I think the tests you've deleted should all be updated instead.

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved
pandas/core/indexing.py Outdated Show resolved Hide resolved
pandas/tests/extension/base/getitem.py Show resolved Hide resolved
pandas/tests/indexing/test_na_indexing.py Show resolved Hide resolved
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves NA - MaskedArrays Related to pd.NA and nullable extension arrays Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Feb 3, 2020
@TomAugspurger TomAugspurger added this to the 1.1 milestone Feb 3, 2020
Daniel Saxton added 8 commits February 3, 2020 17:35
pandas/core/indexers.py Outdated Show resolved Hide resolved
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, thanks for working on this!

@@ -124,20 +125,17 @@ def test_setitem_mask_raises(self, data, box_in_series):
with pytest.raises(IndexError, match="wrong length"):
data[mask] = data[0]

def test_setitem_mask_boolean_array_raises(self, data, box_in_series):
# missing values in mask
def test_setitem_mask_boolean_array_with_na(self, data, box_in_series):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this test duplicating the test_setitem_mask above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat, yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply remove it then, I think?

@@ -30,13 +31,6 @@ will raise a ``ValueError``.
mask = pd.array([True, False, pd.NA], dtype="boolean")
s[mask]

The missing values will need to be explicitly filled with True or False prior
to using the array as a mask.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep this example but reword it as something like "if you want different behaviour, you can fill manually with fillna(True)" ?

@@ -47,6 +47,33 @@ Bug fixes

.. ---------------------------------------------------------------------------

Indexing with Nullable Boolean Arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this above bug fixes section, and make this of the same level? (so using ~~~ for underline)
(it's not a bug fix but a deliberate API change)

@@ -2207,10 +2208,12 @@ def check_bool_indexer(index: Index, key) -> np.ndarray:
"the indexed object do not match)."
)
result = result.astype(bool)._values
else:
elif is_object_dtype(key):
# key might be sparse / object-dtype bool, check_array_indexer needs bool array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this comment? (about the sparse from the code comment)

@dsaxton
Copy link
Member Author

dsaxton commented Feb 19, 2020

@jorisvandenbossche It seems that check_array_indexer works okay with sparse input, so maybe the comment is outdated?

In [1]: import numpy as np 
   ...: import pandas as pd 
   ...: from pandas.core.indexing import check_bool_indexer 
   ...: from pandas.core.indexers import check_array_indexer 
   ...:  
   ...: arr = pd.arrays.SparseArray([True, False]) 
   ...: idx = np.array([1, 2]) 
   ...:  
   ...: check_array_indexer(idx, arr) 
   ...:                                                                                                                                                                            
Out[1]: array([ True, False])

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche
Copy link
Member

It seems that check_array_indexer works okay with sparse input, so maybe the comment is outdated?

That seems so then, yes. Last question: can you update the comment then?

@jreback jreback merged commit b9bcdc3 into pandas-dev:master Feb 22, 2020
@jreback
Copy link
Contributor

jreback commented Feb 22, 2020

thanks @dsaxton

very nice!

@jreback
Copy link
Contributor

jreback commented Feb 22, 2020

@dsaxton

hmm not sure why this didn’t backport automatically
can u push a PR

@dsaxton
Copy link
Member Author

dsaxton commented Feb 23, 2020

@dsaxton

hmm not sure why this didn’t backport automatically
can u push a PR

I'm not too familiar with how backporting works, do I just put a PR against the 1.0.x branch instead of master?

@jreback
Copy link
Contributor

jreback commented Feb 23, 2020

yes checkout the 1.0.x and cherry pick the commit then push to a new remote branch but before you do that let me see if i can get the bot to do it

@jreback
Copy link
Contributor

jreback commented Feb 23, 2020

@meeseeksdev backport to 1.0.x

@lumberbot-app
Copy link

lumberbot-app bot commented Feb 23, 2020

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 1.0.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 b9bcdc30765e88718c792b27ab9f3e27054f9fc7
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #31591: ENH: Enable indexing with nullable Boolean'
  1. Push to a named branch :
git push YOURFORK 1.0.x:auto-backport-of-pr-31591-on-1.0.x
  1. Create a PR against branch 1.0.x, I would have named this PR:

"Backport PR #31591 on branch 1.0.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

@jreback
Copy link
Contributor

jreback commented Feb 23, 2020

@dsaxton ok instructions above

@dsaxton
Copy link
Member Author

dsaxton commented Feb 23, 2020

@dsaxton ok instructions above

Created this PR: #32192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: query / boolean selection with nullable dtypes with NAs
4 participants