Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_reduce promotes numerics to strings when axis=0 and mixed type. #6806

Closed
dalejung opened this issue Apr 4, 2014 · 3 comments · Fixed by #6814
Closed

_reduce promotes numerics to strings when axis=0 and mixed type. #6806

dalejung opened this issue Apr 4, 2014 · 3 comments · Fixed by #6814
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@dalejung
Copy link
Contributor

dalejung commented Apr 4, 2014

import pandas as pd
import numpy as np

df = pd.DataFrame({'num':[1,2,3], 'num2': np.random.randn(3), 'strings':list('abc')})
df.sum().values # array(['6', '-1.47818918058', 'abc'], dtype=object)

The issue is that axis=0 tries to coerce the dtypes and np.array will promote the numerics to strings.

Note that:

df.T.sum(axis=1).values # array([6, -1.4781891805780178, 'abc'], dtype=object)

Maybe com._coerce_to_dtypes should just return the list?

@jreback
Copy link
Contributor

jreback commented Apr 4, 2014

I think could just make that returned array object dtype

hate doing that though so maybe just returning a list is better as I think series contruction will handle better

@jreback jreback added this to the 0.14.0 milestone Apr 5, 2014
@dalejung
Copy link
Contributor Author

dalejung commented Apr 5, 2014

So thinking about this more.

In [24]: df = pd.DataFrame({
   ....:     'string_data': ['a', 'b', 'c', 'd', 'e'],
   ....:     'bool_data': [True, True, False, False, False],
   ....:     'int_data': [10, 20, 30, 40, 50],
   ....: })

In [25]: df.sum()
Out[25]:
bool_data       True
int_data         150
string_data    abcde
dtype: object

In [26]: df.T.sum(axis=1)
Out[26]:
bool_data          2
int_data         150
string_data    abcde
dtype: object

In [27]: np.sum([True,True])
Out[27]: 2

Note that the bool_data is converted back to its original dtype for axis=0. Should df.op() be equivalent to df.T.op(axis=1)?

@jreback
Copy link
Contributor

jreback commented Apr 6, 2014

I think the coercion for bool data should only convert to bool if it's 1/0 else leave it as int
(that's why it's True)

it is evaluating fine -

it then tries to coerce to float64 ( kind of a legacy thing) if it fails then it will item by item coerce (which is relatively new - I added because timedelta arithmetic was not coercing properly)

dalejung added a commit to dalejung/pandas that referenced this issue Apr 6, 2014
case.
BUG: Remove creating np.array. This allows us to use our own logic for
promoting dtypes. pandas-dev#6806

BUG: Only convert 0/1 ints to bool. pandas-dev#6806

DOC: added release notes
jeffreystarr pushed a commit to jeffreystarr/pandas that referenced this issue Apr 28, 2014
case.
BUG: Remove creating np.array. This allows us to use our own logic for
promoting dtypes. pandas-dev#6806

BUG: Only convert 0/1 ints to bool. pandas-dev#6806

DOC: added release notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants