I'm using numpy 1.6.2, pandas 0.9.1, and Python 2.7.2. I see strange behavior when using numpy.logical_and() depending on how I create a Series object. For example:
>>> import numpy
>>> import pandas
>>> series = pandas.Series([1, 2, 3])
>>> x = pandas.Series([True]).reindex_like(series).fillna(True)
>>> y = pandas.Series(True, index=series.index)
>>> numpy.logical_and(series, y)
>>> numpy.logical_and(series, x)
Traceback (most recent call last):
File "<ipython-input-10-e2050a2015bf>", line 1, in <module>
What is the difference between x and y here that is causing the AttributeError?
Also, I originally posted this as a question on stackoverflow. There are comments saying this works with pandas 0.9.0 and numpy 1.8. I haven't verified this for myself yet. However, my scenario is using the most recent stable releases of both projects.
x is object type and y is bool dtype. this works if you do numpy.logical_and(series, x.astype(bool))
reindex_like is going to introduce a bunch of NaNs so that's going to convert the Series into a bool dtype
maybe we should call maybe_convert_objects at the end of fillna?
Did something change recently to change this in 0.9.1? I've been told that the snippet above works in 0.9.0. I don't know how to get this exact version to test for myself though.
As a side note, any time a Series has NaN it is automatically a bool dtype? So, anytime I call fillna() on a Series it will implicitly convert the type?
Right now fillna does NOT convert the type but reindex_like does. Because NaN is a float, after reindex_like the Series becomes mixed type so gets converted to object dtype
To be clear: this is a wart due to pandas's "best efforts" implementation of missing data using NumPy. I would expect the same code to fail on 0.9.0
Is there a reason why series & y is not an option? That should work
series & y
Oh, I see what you mean. So is this technically a bug then?
The reason I'm not using series & y is because I'm actually taking a dictionary of operations to perform and combining them myself. I might be approaching the problem in the wrong way (I'm trying to replace legacy custom code with some Pandas functionality).
You can find more information about my exact situation on this stackoverflow post.
Maybe this subtle issue should be mentioned in the docs for reindex_like()?
Also, how can I get 0.9.0 and test this? The person responding on my stackoverflow post claimed this worked with pandas 0.9.0 AND numpy 1.8. So, not sure what the difference in numpy is from 1.8 and 1.6.2 so might not be 'broken' in pandas 0.9.0.
Shouldn't you use operator.and_ instead of numpy.logical_and which goes through NumPy's ufunc machinery (and fails)?
Yes, I can use operator.and_. I was only using numpy.logical_and because I assumed that the numpy version would be faster and possibly more efficient. Maybe this is not necessarily the case?
Should I close this? I guess it's not actually a bug, just a subtle side-effect of reindex_like(). Maybe should just be noted in docs and closed?
Yeah let's close the issue. If you get energetic and want to add a caveat in the docs about using ufuncs on boolean arrays that have had missing data, go for it. Maybe on the gotchas page