BUG: GroupBy return EA dtype #23318

5hirish · 2018-10-24T17:13:19Z

closes BUG/EA: groupby on an EA should return the EA type #23227
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-10-24T17:13:21Z

Hello @5hirish! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/groupby/groupby.py !
There are no PEP8 issues in the file pandas/tests/groupby/test_grouping.py !

5hirish · 2018-10-24T20:13:43Z

Ran tests locally pytest pandas/tests/groupby/test_grouping.py -v and pytest pandas/tests/groupby/ -v; successfull

5hirish · 2018-10-25T12:53:07Z

@jreback

Tests under pytest pandas/tests/arrays/ like test_integer.py/test_preserve_dtypes, test_integer.py/test_reduce_to_float are failing because:

AssertionError: Attributes are different
Attribute "dtype" are different
     [left]:  Int64      <-- result
     [right]: float64  <-- expected

there is custom datatype Int64 in the test input data of the above tests and test expects it to be converted to int64 or float64. Does pandas consider Int64 as int64, cause Int64 satisfies both the following conditions: if is_extension_array_dtype(dtype): and if numeric_only and is_numeric_dtype(dtype) or not numeric_only ?

Tests under pandas/tests/sparse/ like test_groupby.py/test_groupby_includes_fill_value are failing because:

AssertionError: Attributes are different           
Attribute "dtype" are different
    [left]:  Sparse[float64, nan]   <-- result
    [right]: float64                       <-- expected

TomAugspurger · 2018-10-25T14:08:50Z

For failures like 1, you'll need to update expected to have the new type.

Does pandas consider Int64 as int64

No. Int64 and int64 are different. Int64 is our integer-NA dtype.

You'll need to update the Sparse tests as well. you'll need to do to_sparse on the expected.

pandas/tests/groupby/test_grouping.py

pandas/core/groupby/groupby.py

…g_agg_ea_dtype

5hirish · 2018-10-27T18:35:15Z

All tests and checks have passed except for one where I got this error although all tests have passed The command "ci/code_checks.sh" exited with 1. please review and let me know.

TomAugspurger · 2018-10-29T20:24:36Z

isort error: https://travis-ci.org/pandas-dev/pandas/jobs/447147983#L2772

Make sure to merge master. LMK if you need help fixing it.

jreback · 2018-10-30T12:37:02Z

pandas/tests/arrays/test_integer.py

-        "B": np.array([1.0, 3.0]),
-        "C": np.array([1, 3], dtype="int64")
-    }, index=pd.Index(['a', 'b'], name='A'))
-    tm.assert_frame_equal(result, expected)


fully construct the return frame as in the existing example

Do you mean this
a)

result = df.groupby("A").op() assert result.dtypes['C'].name == 'Int64'

b) or do you mean that I should construct the expected as:

expected = pd.DataFrame({ "B": np.array([1.0, 3.0]), "C": np.array([1, 3], dtype="Int64") }, index=pd.Index(['a', 'b'], name='A')) tm.assert_frame_equal(result, expected)

Because in b) the expected constructed dataframe converts Int64 to int64.

Because in b) the expected constructed dataframe converts Int64 to int64

The "Int64" (capital "I") is a pandas thing. You need

"C": integer_array([1, 3], dtype="Int64")

the 2nd. we want to construct the resulting frame and compare

Why was this marked resolved? Am I missing the expected and the tm.assert_frame_equal?

No, you are correct, I missed that one...

pandas/tests/arrays/test_integer.py

TomAugspurger

This will need a release note in whatsnew/0.24.0.txt

TomAugspurger · 2018-10-31T19:05:51Z

pandas/tests/arrays/test_integer.py

-        "B": np.array([1.0, 3.0]),
-        "C": np.array([1, 3], dtype="int64")
-    }, index=pd.Index(['a', 'b'], name='A'))
-    tm.assert_frame_equal(result, expected)


Because in b) the expected constructed dataframe converts Int64 to int64

The "Int64" (capital "I") is a pandas thing. You need

"C": integer_array([1, 3], dtype="Int64")

pandas/tests/arrays/test_integer.py

pandas/tests/test_resample.py

5hirish · 2018-11-01T14:13:38Z

Did isort pandas/core/groupby/groupby.py locally, fixed imports, still error.

…a_dtype

TomAugspurger · 2018-11-01T22:26:38Z

@5hirish I think you needed to merge master before running isort. Pretty sure travis tests a merge commit, which may have explained the difference.

I merged master and pushed a change. So you'll want to pull before making additional changes.

doc/source/whatsnew/v0.24.0.txt

codecov · 2018-11-01T23:20:21Z

Codecov Report

Merging #23318 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23318      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51191    51202      +11     
==========================================
+ Hits        47210    47225      +15     
+ Misses       3981     3977       -4

Flag	Coverage Δ
#multiple	`90.61% <100%> (ø)`	⬆️
#single	`42.26% <14.28%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/groupby/groupby.py	`96.51% <100%> (+0.02%)`	⬆️
pandas/core/arrays/timedeltas.py	`93.75% <0%> (-0.56%)`	⬇️
pandas/util/testing.py	`86.64% <0%> (-0.2%)`	⬇️
pandas/core/indexes/base.py	`96.46% <0%> (-0.16%)`	⬇️
pandas/core/arrays/datetimelike.py	`96.1% <0%> (-0.06%)`	⬇️
pandas/core/indexes/datetimes.py	`96.41% <0%> (-0.03%)`	⬇️
pandas/core/indexes/datetimelike.py	`98.01% <0%> (-0.01%)`	⬇️
pandas/core/indexes/api.py	`99% <0%> (ø)`	⬆️
pandas/core/frame.py	`97.03% <0%> (ø)`	⬆️
pandas/core/generic.py	`96.81% <0%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93aba79...2487fd1. Read the comment docs.

doc/source/whatsnew/v0.24.0.txt

jreback · 2018-11-02T13:25:41Z

pandas/tests/arrays/test_integer.py

-        "B": np.array([1.0, 3.0]),
-        "C": np.array([1, 3], dtype="int64")
-    }, index=pd.Index(['a', 'b'], name='A'))
-    tm.assert_frame_equal(result, expected)


the 2nd. we want to construct the resulting frame and compare

pandas/tests/sparse/test_groupby.py

doc/source/whatsnew/v0.24.0.txt

TomAugspurger · 2018-11-05T12:49:35Z

pandas/tests/arrays/test_integer.py

-        "B": np.array([1.0, 3.0]),
-        "C": np.array([1, 3], dtype="int64")
-    }, index=pd.Index(['a', 'b'], name='A'))
-    tm.assert_frame_equal(result, expected)


Why was this marked resolved? Am I missing the expected and the tm.assert_frame_equal?

pandas/tests/sparse/test_groupby.py

pandas/tests/test_resample.py

…by and arrays/test_integer

TomAugspurger

LGTM. @jreback any other comments?

jreback · 2018-11-06T14:48:50Z

thanks @5hirish

5hirish added 2 commits October 24, 2018 20:05

Added EA check and passed through _from_sequence

09e8472

Added test case to check extension arrays dtype

27ce828

BUG: GH23227 Fixed

28b45f2

jreback requested changes Oct 26, 2018

View reviewed changes

pandas/tests/groupby/test_grouping.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

jreback added Groupby ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 26, 2018

5hirish added 4 commits October 27, 2018 19:49

Fixed arrays tests to check dtypes

9243612

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

9502ba4

…g_agg_ea_dtype

Fixed sparse test cases

14666a1

Fixed test case to re-sample categorical data for timedelta

44287de

jreback requested changes Oct 30, 2018

View reviewed changes

TomAugspurger reviewed Oct 31, 2018

View reviewed changes

test case dataframe assertion added

d54c743

5hirish and others added 3 commits November 1, 2018 19:57

Added whatsnew in v0.24.0

05eb70c

Merge remote-tracking branch 'upstream/master' into 5hirish-bug_agg_e…

8ac749d

…a_dtype

Fixed isort

c1d8416

TomAugspurger reviewed Nov 1, 2018

View reviewed changes

doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved

Updated whatsnew for 0.24.0

fc5d2f2

5hirish force-pushed the bug_agg_ea_dtype branch from 8a9a36e to fc5d2f2 Compare November 2, 2018 09:11

jreback requested changes Nov 2, 2018

View reviewed changes

Updated whatsnew for 0.24.0, applied suggestions to sparse/test_group

2859c70

TomAugspurger reviewed Nov 5, 2018

View reviewed changes

5hirish added 2 commits November 5, 2018 19:11

Updated whatsnew for 0.24.0, applied suggestions to sparse/test_group…

07ffcc7

…by and arrays/test_integer

Applied suggestions to sparse/test_groupby

2487fd1

TomAugspurger approved these changes Nov 6, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Nov 6, 2018

jreback approved these changes Nov 6, 2018

View reviewed changes

jreback merged commit 6bf6cd2 into pandas-dev:master Nov 6, 2018

jschendel mentioned this pull request Nov 9, 2018

TST: Test failure on 32bit for TestSparseGroupBy.test_aggfuncs #23605

Closed

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

BUG: GroupBy return EA dtype (pandas-dev#23318)

6efd331

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

BUG: GroupBy return EA dtype (pandas-dev#23318)

0e76af9

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: GroupBy return EA dtype (pandas-dev#23318)

d4e999f

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: GroupBy return EA dtype (pandas-dev#23318)

20aff1a

Uh oh!

BUG: GroupBy return EA dtype #23318

BUG: GroupBy return EA dtype #23318

Uh oh!

Conversation

5hirish commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Oct 24, 2018

Uh oh!

5hirish commented Oct 24, 2018

Uh oh!

5hirish commented Oct 25, 2018

Uh oh!

TomAugspurger commented Oct 25, 2018

Uh oh!

Uh oh!

Uh oh!

5hirish commented Oct 27, 2018

Uh oh!

TomAugspurger commented Oct 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

5hirish commented Nov 1, 2018

Uh oh!

TomAugspurger commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 6, 2018

Uh oh!

Uh oh!

5hirish commented Oct 24, 2018 •

edited

Loading

TomAugspurger commented Nov 1, 2018 •

edited

Loading

codecov bot commented Nov 1, 2018 •

edited

Loading