Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify is_bool_indexer for Extension dtypes #22326

Closed
TomAugspurger opened this issue Aug 13, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@TomAugspurger
Copy link
Contributor

commented Aug 13, 2018

What do we want here?

In [1]: import pandas as pd

In [2]: pd.core.common.is_bool_indexer(pd.Categorical([True, True]))
Out[2]: False

working around this in #22325

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2018

Likewise for is_bool_dtype.

@TomAugspurger TomAugspurger referenced this issue Aug 13, 2018

Merged

SparseArray is an ExtensionArray #22325

4 of 4 tasks complete
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2018

This manifests in failures for .loc with Categoricalndex holding booleans

In [3]: pd.Series([1, 2, 3]).loc[pd.Categorical([True, False, True])]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-0532517e2922> in <module>()
----> 1 pd.Series([1, 2, 3]).loc[pd.Categorical([True, False, True])]

~/sandbox/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1500
   1501             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1502             return self._getitem_axis(maybe_callable, axis=axis)
   1503
   1504     def _is_scalar_access(self, key):

~/sandbox/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1902                     raise ValueError('Cannot index with multidimensional key')
   1903
-> 1904                 return self._getitem_iterable(key, axis=axis)
   1905
   1906             # nested tuple slicing

~/sandbox/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1203             # A collection of keys
   1204             keyarr, indexer = self._get_listlike_indexer(key, axis,
-> 1205                                                          raise_missing=False)
   1206             return self.obj._reindex_with_indexers({axis: [keyarr, indexer]},
   1207                                                    copy=True, allow_dups=True)

~/sandbox/pandas/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1159         self._validate_read_indexer(keyarr, indexer,
   1160                                     o._get_axis_number(axis),
-> 1161                                     raise_missing=raise_missing)
   1162         return keyarr, indexer
   1163

~/sandbox/pandas/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1244                 raise KeyError(
   1245                     u"None of [{key}] are in the [{axis}]".format(
-> 1246                         key=key, axis=self.obj._get_axis_name(axis)))
   1247
   1248             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"

That should be Series([1, 3]).

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Aug 22, 2018

Should we have something similar on the dtype as for numeric data? (the ExtensionDtype._is_numeric added recently) To indicate a certain dtype is considered boolean?

Because otherwise those inspection functions would need to know how to inspect the dtype (for categorical checking the dtype of the categories).

I am only a bit worried about a possible proliferation of such attributes ..

(now, categorical with boolean categories also doesn't sound that useful. We could also say we require an actual boolean dtype to do boolean indexing)

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2018

I suppose this is what the .kind attribute is for?

Right now Categorical.dtype.kind is always O:

In [8]: pd.Categorical([True, False]).dtype.kind
Out[8]: 'O'

If we changed that to be .categories.dtype.kind we wouldn't need a ._is_boolean, though we would need to implement a BooleanIndex.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018

@jreback jreback added this to the 0.24.0 milestone Sep 13, 2018

TomAugspurger added a commit that referenced this issue Sep 20, 2018

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.