Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitwise operation weirdness #9016

Closed
benkuhn opened this issue Dec 5, 2014 · 10 comments
Closed

Bitwise operation weirdness #9016

benkuhn opened this issue Dec 5, 2014 · 10 comments
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@benkuhn
Copy link

benkuhn commented Dec 5, 2014

It seems like pd.Series([x]) | pd.Series([y]) with x, y integers returns pd.Series([x | y]).astype(bool). This is a reasonable semantic, but pd.Series([x]) & pd.Series([y]) seems to return pd.Series([x & y % 2]) == 1, which is a lot weirder. Is there a justification for this? I couldn't find one in the documentation (& is hard to search!), so it may be a bug.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

can you show a full copy/pastable example, and pd.show_versions().

These are the bitwise operators......(and integers will get coerced)

@benkuhn
Copy link
Author

benkuhn commented Dec 5, 2014

Here:

In [43]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.4
Cython: 0.20.2
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 2.2.0
sphinx: None
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

In [44]: pd.Series(range(10)) | pd.Series([2] * 10)
Out[44]:
0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
dtype: bool

In [45]: pd.Series(range(10)) & pd.Series([15] * 10)
Out[45]:
0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
9     True
dtype: bool

Seems like | does a standard bitwise or before coercing to a bool, but & does a standard bitwise & and then only coerces the lowest-order bit to bool. I'd expect the last one to be 9 Trues (0 should still be False).

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

numpy does this (and that's what Series is doing)

In [2]: np.arange(10) & 1            
Out[2]: array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1])

In [3]: np.arange(10) | 1
Out[3]: array([1, 1, 3, 3, 5, 5, 7, 7, 9, 9])

Why are you doing this? These are boolean comparison ops, not bitwise-ops

@benkuhn
Copy link
Author

benkuhn commented Dec 5, 2014

Sorry, I edited the post while you were viewing. With 1 on the right hand I would expect that behavior. With 15, however, I wouldn't, and numpy satisfies that expectation (while Pandas doesn't):

In [49]: pd.Series(range(10)) & pd.Series([15] * 10)
Out[49]:
0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
9     True
dtype: bool

In [50]: np.arange(10) & (np.array([15] * 10))
Out[50]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We weren't actually trying to do this, but we had an errant type conversion somewhere else that cause strange behavior due to weak-typing in Pandas, and I found this while investigating what other strange behaviors might result.

@shoyer
Copy link
Member

shoyer commented Dec 6, 2014

I can reproduce this on master. I agree that we should probably stick to what numpy does and not coerce to booleans.

I'm not sure I'd call this a bug exactly, but it does seem like an undesirable inconsistency.

@jreback jreback added API Design Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 6, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 6, 2014
@tvyomkesh
Copy link
Contributor

Tried a quick & dirty hack and getting back desired behavior. Tests passed too. Sharing snippet so I can gather some expert comments and advise. Would be happy to work on a pull request if this looks in the correct direction. Thanks.

diff --git a/pandas/core/ops.py b/pandas/core/ops.py
index 46429cc..59be639 100644
--- a/pandas/core/ops.py
+++ b/pandas/core/ops.py
@@ -657,10 +657,16 @@ def _bool_method_SERIES(op, name, str_rep):
         if isinstance(other, pd.Series):
             name = _maybe_match_name(self, other)

-            other = other.reindex_like(self).fillna(False)
-            return self._constructor(na_op(self.values, other.values),
+            if other.dtype == self.dtype and self.dtype == 'int64':
+                other = other.reindex_like(self).fillna(False)
+                return self._constructor(na_op(self.values, other.values),
                                      index=self.index,
                                      name=name).fillna(False)
+            else:
+                other = other.reindex_like(self).fillna(False).astype(bool)
+                return self._constructor(na_op(self.values, other.values),
+                                     index=self.index,
+                                     name=name).fillna(False).astype(bool)
         elif isinstance(other, pd.DataFrame):
             return NotImplemented
         else:

Some outputs showing results after the above fix

In [2]: a = Series([True, False, True], list('bca'))
In [3]: b = Series([])
In [6]: a & b
Out[6]:
b    False
c    False
a    False
dtype: bool

In [7]: c = Series(range(10))
In [8]: d = Series([15] * 10)
In [11]: c & d
Out[11]:
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

Thanks,

@jreback
Copy link
Contributor

jreback commented Jan 20, 2015

this should be handled in na_op, with the fill value determined by dtype

@tvyomkesh
Copy link
Contributor

@jreback thanks. I will look into this and try and work on a PR.

@tvyomkesh
Copy link
Contributor

@jreback for now, I have made the changes in wrapper() itself instead of na_op() because it looked to me like wrapper is controlling the input and output fill and dtype. Once wrapper() takes care of this, na_op() seems to be doing the right thing by itself and so I did not have to change anything to get back the expected behavior. Happy to change if things need to be altered toward better design etc. Thanks.

@jreback
Copy link
Contributor

jreback commented Feb 5, 2015

closed by #9338

@jreback jreback closed this as completed Feb 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants