BUG: replace not converting dtypes #3907

Closed
jreback opened this Issue Jun 14, 2013 · 27 comments

Comments

Projects
None yet
2 participants
Contributor

jreback commented Jun 14, 2013

I believe replace should do a convert_objects(copy=False) after replacement to provide dtype soft-conversion

In [1]: df = DataFrame([['foo','bar','bah'],['bar','foo','bah']])

In [2]: df
Out[2]: 
     0    1    2
0  foo  bar  bah
1  bar  foo  bah

In [3]: m = { 'foo' : 1, 'bar' : 2, 'bah' : 3 }

In [5]: df.replace(m)
Out[5]: 
   0  1  2
0  1  3  2
1  2  1  3

In [6]: df.replace(m).dtypes
Out[6]: 
0    object
1    object
2    object
dtype: object

In [8]: df.replace(m).convert_objects().dtypes
Out[8]: 
0    int64
1    int64
2    int64
dtype: object
Contributor

jreback commented Jun 14, 2013

@cpcloud for you!

cpcloud was assigned Jun 14, 2013

Member

cpcloud commented Jun 14, 2013

there's an infer_types param that does this already...should it default to True?

Contributor

jreback commented Jun 14, 2013

yes...didn't realize it was False default (and I would put copy=False)

Member

cpcloud commented Jun 14, 2013

i thought copy didn't do anything in internals.py, changed recently?

Member

cpcloud commented Jun 14, 2013

yep docs said true but i think forgot to change when i finally submitted the original pr

Member

cpcloud commented Jun 14, 2013

ah i c copy is doing something ...

Contributor

jreback commented Jun 14, 2013

actually this is tricky....because if say nothing changes then you don't need to copy, otherwise I guess you do.....maye just leave copy==True (if needs conversion, IOW there are ANY object blocks);
acutally this could be an option to convert, e.g. convert='needed' ?

Contributor

jreback commented Jun 14, 2013

Because I believe you copy a block in replace (if needed), so no point in copying twice...

Contributor

jreback commented Jun 14, 2013

then again, say you don't actually replace anything....should STILL copy I guess

Member

cpcloud commented Jun 14, 2013

i copy if inplace=True (the default)...

Contributor

jreback commented Jun 14, 2013

ok...then its easy, I would drop infer_types; and always convert_objects, passing copy=not inplace

Member

cpcloud commented Jun 14, 2013

sounds good

Member

cpcloud commented Jun 14, 2013

btw is there an iterable_but_not_string(obj) function somewhere?

Member

cpcloud commented Jun 14, 2013

easy enough for me to put one in common.py if not...

Member

cpcloud commented Jun 14, 2013

would do in a separate PR

Contributor

jreback commented Jun 14, 2013

com.is_list_like

Member

cpcloud commented Jun 14, 2013

thanks

Member

cpcloud commented Jun 14, 2013

problem is that now convert clobbers strings in object arrays with nans when u originally replace nans with numbers..

Member

cpcloud commented Jun 14, 2013

i.e., mixed_frame.fillna(value=10.0) and mixed_frame.replace(nan, 10.0) should be equivalent

Member

cpcloud commented Jun 14, 2013

nvm fixed it

Member

cpcloud commented Jun 14, 2013

@jreback i see that is_list_like has a check for hasattr(arg, 'len'). Should that be hasattr(arg, '__len__')?

Member

cpcloud commented Jun 14, 2013

also _is_sequence looks redundant with is_list_like should I remove that as well?

Contributor

jreback commented Jun 14, 2013

in theory yes, but I'll be it works anyhow (and python may do some translation on that)

Contributor

jreback commented Jun 14, 2013

check where things are used....I think _is_sequence is slightly different

Member

cpcloud commented Jun 14, 2013

hm, hasattr([], 'len') != hasattr([], '__len__') so prolly should chnage that one

Member

cpcloud commented Jun 14, 2013

hm or just removed entirely

jreback closed this in #3909 Jun 15, 2013

Contributor

jreback commented Jun 15, 2013

thanks for the fix

cpcloud was unassigned by wesm Oct 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment