Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/EA: groupby on an EA should return the EA type #23227

Closed
jreback opened this issue Oct 18, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@jreback
Copy link
Contributor

commented Oct 18, 2018

In [1]: df = pd.DataFrame({'Int': pd.Series([1, 2, 3], dtype='Int64'), 'A': [1, 2, 1]})
   ...: df
   ...: 
Out[1]: 
  Int  A
0   1  1
1   2  2
2   3  1

In [2]: df.groupby('A').Int.sum()
Out[2]: 
A
1    4
2    2
Name: Int, dtype: int64

[2] should be of type Int64; IOW this aggregation needs to be passed thru to the EA _from_sequence

@5hirish

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2018

@jreback Hi working on this issue; did some code walkthrough, learned that here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L834 in the method _python_agg_general at result, counts = self.grouper.agg_series(obj, f) the obj type is Int64 but the result type is int64. So, I checked the agg_series method, seems like a Cython converter converts it from object to int64 after calculating the aggregation result. So, I am kind of lost as to where I should put _from_squence after checking if it's of type ExtensionArray

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

so look at _wrap_aggregated_output

there are a number of paths here though

@5hirish 5hirish referenced this issue Oct 24, 2018

Merged

BUG: GroupBy return EA dtype #23318

4 of 4 tasks complete
@5hirish

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

@jreback

  1. Tests under pytest pandas/tests/arrays/ like test_integer.py/test_preserve_dtypes, test_integer.py/test_reduce_to_float are failing because:
AssertionError: Attributes are different
Attribute "dtype" are different
     [left]:  Int64      <-- result
     [right]: float64  <-- expected

there is custom datatype Int64 in the test input data of the above tests and test expects it to be converted to int64 or float64. Does pandas consider Int64 as int64, cause Int64 satisfies both the following conditions: if is_extension_array_dtype(dtype): and if numeric_only and is_numeric_dtype(dtype) or not numeric_only ?

  1. Tests under pandas/tests/sparse/ like test_groupby.py/test_groupby_includes_fill_value are failing because:
AssertionError: Attributes are different           
Attribute "dtype" are different
    [left]:  Sparse[float64, nan]   <-- result
    [right]: float64                       <-- expected
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

@5hirish probably best to keep the conversation in the PR now that it's open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.