Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas numeric_only behavoir in full reduce in python2 #83

Closed
simon-mo opened this issue Sep 24, 2018 · 0 comments
Closed

Pandas numeric_only behavoir in full reduce in python2 #83

simon-mo opened this issue Sep 24, 2018 · 0 comments
Labels
pandas 🤔 Weird Behaviors of Pandas

Comments

@simon-mo
Copy link
Collaborator

by default, pandas numeric_only option in full_reduce like operation (e.g. max, min, mean, ..) will take an numeric_only argument. If will:

  • Try to operate on full axis if possible
  • If first option errors, operate on numeric_only values.

In Modin, we decided the following behavior:

numeric_only = True if axis else kwargs.get("numeric_only", False)

because the asynchronous nature of our computation model.

However, this will lead to the following behavior in python2. In python2:

In [1]: max([1,2,3,'a'])
Out[1]: 'a'

In a mixed type dataframe:

   col1  col2  col3 col4
0     1     4   8.0    a
1     2     5   9.4    b
2     3     6  10.1    c
3     4     7  11.3    d

taking max over rows will lead to

0    a
1    b
2    c
3    d
dtype: object

This is not expected behavior, therefore we choose to not following pandas behavior at this situation.

@simon-mo simon-mo added the pandas 🤔 Weird Behaviors of Pandas label Sep 24, 2018
dchigarev pushed a commit to dchigarev/modin that referenced this issue Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pandas 🤔 Weird Behaviors of Pandas
Projects
None yet
Development

No branches or pull requests

1 participant