Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: replace not converting dtypes #3907

Closed
jreback opened this issue Jun 14, 2013 · 27 comments · Fixed by #3909
Closed

BUG: replace not converting dtypes #3907

jreback opened this issue Jun 14, 2013 · 27 comments · Fixed by #3909
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jun 14, 2013

I believe replace should do a convert_objects(copy=False) after replacement to provide dtype soft-conversion

In [1]: df = DataFrame([['foo','bar','bah'],['bar','foo','bah']])

In [2]: df
Out[2]: 
     0    1    2
0  foo  bar  bah
1  bar  foo  bah

In [3]: m = { 'foo' : 1, 'bar' : 2, 'bah' : 3 }

In [5]: df.replace(m)
Out[5]: 
   0  1  2
0  1  3  2
1  2  1  3

In [6]: df.replace(m).dtypes
Out[6]: 
0    object
1    object
2    object
dtype: object

In [8]: df.replace(m).convert_objects().dtypes
Out[8]: 
0    int64
1    int64
2    int64
dtype: object
@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

@cpcloud for you!

@ghost ghost assigned cpcloud Jun 14, 2013
@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

there's an infer_types param that does this already...should it default to True?

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

yes...didn't realize it was False default (and I would put copy=False)

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

i thought copy didn't do anything in internals.py, changed recently?

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

yep docs said true but i think forgot to change when i finally submitted the original pr

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

ah i c copy is doing something ...

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

actually this is tricky....because if say nothing changes then you don't need to copy, otherwise I guess you do.....maye just leave copy==True (if needs conversion, IOW there are ANY object blocks);
acutally this could be an option to convert, e.g. convert='needed' ?

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

Because I believe you copy a block in replace (if needed), so no point in copying twice...

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

then again, say you don't actually replace anything....should STILL copy I guess

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

i copy if inplace=True (the default)...

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

ok...then its easy, I would drop infer_types; and always convert_objects, passing copy=not inplace

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

sounds good

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

btw is there an iterable_but_not_string(obj) function somewhere?

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

easy enough for me to put one in common.py if not...

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

would do in a separate PR

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

com.is_list_like

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

thanks

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

problem is that now convert clobbers strings in object arrays with nans when u originally replace nans with numbers..

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

i.e., mixed_frame.fillna(value=10.0) and mixed_frame.replace(nan, 10.0) should be equivalent

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

nvm fixed it

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

@jreback i see that is_list_like has a check for hasattr(arg, 'len'). Should that be hasattr(arg, '__len__')?

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

also _is_sequence looks redundant with is_list_like should I remove that as well?

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

in theory yes, but I'll be it works anyhow (and python may do some translation on that)

@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2013

check where things are used....I think _is_sequence is slightly different

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

hm, hasattr([], 'len') != hasattr([], '__len__') so prolly should chnage that one

@cpcloud
Copy link
Member

cpcloud commented Jun 14, 2013

hm or just removed entirely

@jreback
Copy link
Contributor Author

jreback commented Jun 15, 2013

thanks for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants