Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

BUG: partial boolean indexing fails with dtype = int #2746

Closed
jreback opened this Issue Jan 24, 2013 · 7 comments

Comments

Projects
None yet
2 participants
Contributor

jreback commented Jan 24, 2013

this fails because we are putmasking on a dtype=int ndarray
need to cast if this is the case (in series.where)

import pandas
df = pandas.DataFrame(index=[1,2])
df['test'] = [1,2]
df['test'][[True, False]] = pandas.Series([0],index=[1])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-7738933c68bb> in <module>()
----> 1 df['test'][[True,False]] = pandas.Series([0],index=[1])

/mnt/home/jreback/pandas/pandas/core/series.pyc in __setitem__(self, key, value)
    677         if _is_bool_indexer(key):
    678             key = self._check_bool_indexer(key)
--> 679             self.where(~key,value,inplace=True)
    680         else:
    681             self._set_with(key, value)

/mnt/home/jreback/pandas/pandas/core/series.pyc in where(self, cond, other, inplace)
    630             raise ValueError('Length of replacements must equal series length')
    631 
--> 632         np.putmask(ser, ~cond, other)
    633 
    634         return None if inplace else ser

TypeError: array cannot be safely cast to required type

Contributor

jreback commented Feb 15, 2013

@stephenwlin we didn't close this for some reason....any thoughts?

Contributor

stephenwlin commented Feb 15, 2013

ok i can take a look

Contributor

stephenwlin commented Feb 19, 2013

@jreback I thought about this and there's probably a one or two line fix that can handle this particular case...but I think the most general solution here might be to implement a function called _maybe_unify which takes a list (or tuple, maybe...maybe either or) of dtypes and returns the most general dtype that can hold all of them.

This would probably be useful in a lot of other places where similar unifications are required, like joins or sets (I think a lot of this logic is duplicated in different places right now and not complete either, mostly only handling ints and floats cases). For example, I'm fairly sure that right now, __setitem__ on DataFrame (either directly or through the ix property) either doesn't even allow assigning values that don't fit in the existing column types, or downcasts them implicit to the existing types, rather than upcasting appropriately to fit them which is probably the right thing to do instead: implementing such upcasts would be a lot easier with such a utility function.

(Possibly _maybe_promote could even be rewritten to be in terms of _maybe_unify, since most of what the former is doing is basically a restricted form of the latter...however, there's also some extra work in computing/modifying a fill_value so it might take some thought to keep it clean and while preserving that.)

The one thing I can think of that might be kind of tricky would be the case of datetime64[ns]: right now, they don't upcast correctly to objects, so you can't unify them with other types: instead, you want to allow other objects (like dates, string, etc.) to implicit cast to datetime64[ns] on assignment, if possible, as well as convert NaN to NaT. This is currently handled as a special case in __setitem__ I think...maybe it can be done with a more general solution that doesn't introduce local hacks.

What do you think? Overkill?

Contributor

stephenwlin commented Feb 19, 2013

(On second thought, might not be worth it to handle the >2 dtypes case...I don't think it happens anywhere practically)

Contributor

jreback commented Feb 28, 2013

I tried to fix this, and my solution would work, but have to change the data and dtype of a series inplace is currently not possible, in np 1.7 would work though http://docs.scipy.org/doc/numpy/reference/generated/numpy.copyto.html#numpy.copyto

(once series is refactored to have a _data as its internals, then this is easy)

Contributor

jreback commented Feb 28, 2013

got it....in place updating

ser.dtype = result.dtype ser[:] = result

@jreback jreback added a commit that referenced this issue Mar 1, 2013

@jreback jreback Merge pull request #2950 from jreback/intna_2746
BUG: in-place conversion of integer series to float (on putmasking), GH #2746.
325d119

@jreback jreback closed this Mar 1, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment