Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bool Ops: One bug hides another #22092

Closed
jbrockmendel opened this issue Jul 28, 2018 · 2 comments · Fixed by #22293
Closed

Bool Ops: One bug hides another #22092

jbrockmendel opened this issue Jul 28, 2018 · 2 comments · Fixed by #22293
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jbrockmendel
Copy link
Member

Boolean ops (&, |, ^) are less well-tested than most of the others. Going through some of these, some of them raise for weird reasons:

ser = pd.Series([True, False, True])
idx = pd.Index([False, True, True])

Starting with ser & idx, we'd expect to get Series([False, False, True]). Instead we get a ValueError because we accidentally go down the path treating Index as a scalar:

>>> ser & idx
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1481, in wrapper
    res_values = na_op(self.values, other)
  File "pandas/core/ops.py", line 1439, in na_op
    if not isna(y):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

OK, that's fixed easily enough. What about the reversed operation idx & ser? That means something entirely different, since Index.__and__ is an alias for Index.intersection, so we'd expect either a Series or Index containing both False and True. Instead we get a ValueError:

>>> idx & ser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/indexes/base.py", line 2658, in __and__
    return self.intersection(other)
  File "pandas/core/indexes/base.py", line 2823, in intersection
    if self.is_monotonic and other.is_monotonic:
  File "pandas/core/indexes/base.py", line 1407, in is_monotonic
    return self.is_monotonic_increasing
  File "pandas/core/indexes/base.py", line 1424, in is_monotonic_increasing
    return self._engine.is_monotonic_increasing
  File "pandas/_libs/index.pyx", line 214, in pandas._libs.index.IndexEngine.is_monotonic_increasing.__get__
    self._do_monotonic_check()
  File "pandas/_libs/index.pyx", line 228, in pandas._libs.index.IndexEngine._do_monotonic_check
    values = self._get_index_values()
  File "pandas/_libs/index.pyx", line 244, in pandas._libs.index.IndexEngine._get_index_values
    return self.vgetter()
  File "pandas/core/indexes/base.py", line 1847, in <lambda>
    return self._engine_type(lambda: self._ndarray_values, len(self))
  File "pandas/core/base.py", line 752, in _ndarray_values
    if is_extension_array_dtype(self):
  File "pandas/core/dtypes/common.py", line 1718, in is_extension_array_dtype
    arr_or_dtype = pandas_dtype(arr_or_dtype)
  File "pandas/core/dtypes/common.py", line 2026, in pandas_dtype
    if dtype in [object, np.object_, 'object', 'O']:
  File "pandas/core/generic.py", line 1726, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The comparison dtype in [object, np.object_, 'object', 'O'] raises when dtype is array-like. If we change that condition to is_hashable(dtype) and dtype in [object, np.object_, 'object', 'O'], we get Index([False, True, True]), which is what we expected from the Series op, but not from the Index op.

@gfyoung gfyoung added Bug Dtype Conversions Unexpected or buggy dtype conversions Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 30, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 30, 2018

Weird behavior = surprise features 🙂

@makbigc
Copy link
Contributor

makbigc commented Aug 12, 2018

2nd bug:
The & operation of an Index of Boolean value with another Series of Boolean value is expected to be element-wise, while the Index.intersection method is set-wise. If the element-wise logical operation is added, should it be added in a new route other than Index.intersection?

Moreover, there is a bug in idx & idx:

In [23]: idx1 = pd.Index([True, True, False, False])

In [24]: idx2 = pd.Index([True, False, True, False])

In [25]: idx1 & idx2
Out[25]: Index([True, True, False, False], dtype='object')

This situation is alike in ^ and | operator.

@jreback jreback added this to the 0.24.0 milestone Aug 22, 2018
jbrockmendel pushed a commit that referenced this issue Sep 18, 2018
* Fix bug #GH22092

* Update v0.24.0.txt

* Update v0.24.0.txt

* Update ops.py

* Update test_operators.py

* Update v0.24.0.txt

* Update test_operators.py
aeltanawy pushed a commit to aeltanawy/pandas that referenced this issue Sep 20, 2018
…-dev#22293)

* Fix bug #GH22092

* Update v0.24.0.txt

* Update v0.24.0.txt

* Update ops.py

* Update test_operators.py

* Update v0.24.0.txt

* Update test_operators.py
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018
…-dev#22293)

* Fix bug #GH22092

* Update v0.24.0.txt

* Update v0.24.0.txt

* Update ops.py

* Update test_operators.py

* Update v0.24.0.txt

* Update test_operators.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants