Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GroupBy return EA dtype #23318

Merged
merged 15 commits into from Nov 6, 2018
Merged

Conversation

5hirish
Copy link
Contributor

@5hirish 5hirish commented Oct 24, 2018

@pep8speaks
Copy link

Hello @5hirish! Thanks for submitting the PR.

@5hirish
Copy link
Contributor Author

5hirish commented Oct 24, 2018

Ran tests locally pytest pandas/tests/groupby/test_grouping.py -v and pytest pandas/tests/groupby/ -v; successfull

@5hirish
Copy link
Contributor Author

5hirish commented Oct 25, 2018

@jreback

  1. Tests under pytest pandas/tests/arrays/ like test_integer.py/test_preserve_dtypes, test_integer.py/test_reduce_to_float are failing because:
AssertionError: Attributes are different
Attribute "dtype" are different
     [left]:  Int64      <-- result
     [right]: float64  <-- expected

there is custom datatype Int64 in the test input data of the above tests and test expects it to be converted to int64 or float64. Does pandas consider Int64 as int64, cause Int64 satisfies both the following conditions: if is_extension_array_dtype(dtype): and if numeric_only and is_numeric_dtype(dtype) or not numeric_only ?

  1. Tests under pandas/tests/sparse/ like test_groupby.py/test_groupby_includes_fill_value are failing because:
AssertionError: Attributes are different           
Attribute "dtype" are different
    [left]:  Sparse[float64, nan]   <-- result
    [right]: float64                       <-- expected

@TomAugspurger
Copy link
Contributor

For failures like 1, you'll need to update expected to have the new type.

Does pandas consider Int64 as int64

No. Int64 and int64 are different. Int64 is our integer-NA dtype.

You'll need to update the Sparse tests as well. you'll need to do to_sparse on the expected.

pandas/tests/groupby/test_grouping.py Outdated Show resolved Hide resolved
pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved
@jreback jreback added Groupby ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 26, 2018
@5hirish
Copy link
Contributor Author

5hirish commented Oct 27, 2018

All tests and checks have passed except for one where I got this error although all tests have passed The command "ci/code_checks.sh" exited with 1. please review and let me know.

@TomAugspurger
Copy link
Contributor

isort error: https://travis-ci.org/pandas-dev/pandas/jobs/447147983#L2772

Make sure to merge master. LMK if you need help fixing it.

"B": np.array([1.0, 3.0]),
"C": np.array([1, 3], dtype="int64")
}, index=pd.Index(['a', 'b'], name='A'))
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully construct the return frame as in the existing example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this
a)

result = df.groupby("A").op()
assert result.dtypes['C'].name == 'Int64'

b) or do you mean that I should construct the expected as:

expected = pd.DataFrame({	
        "B": np.array([1.0, 3.0]),
        "C": np.array([1, 3], dtype="Int64")
    }, index=pd.Index(['a', 'b'], name='A'))
tm.assert_frame_equal(result, expected)

Because in b) the expected constructed dataframe converts Int64 to int64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in b) the expected constructed dataframe converts Int64 to int64

The "Int64" (capital "I") is a pandas thing. You need

"C": integer_array([1, 3], dtype="Int64")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd. we want to construct the resulting frame and compare

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this marked resolved? Am I missing the expected and the tm.assert_frame_equal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are correct, I missed that one...

pandas/tests/arrays/test_integer.py Show resolved Hide resolved
pandas/tests/arrays/test_integer.py Outdated Show resolved Hide resolved
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a release note in whatsnew/0.24.0.txt

"B": np.array([1.0, 3.0]),
"C": np.array([1, 3], dtype="int64")
}, index=pd.Index(['a', 'b'], name='A'))
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in b) the expected constructed dataframe converts Int64 to int64

The "Int64" (capital "I") is a pandas thing. You need

"C": integer_array([1, 3], dtype="Int64")

pandas/tests/arrays/test_integer.py Outdated Show resolved Hide resolved
pandas/tests/test_resample.py Show resolved Hide resolved
@5hirish
Copy link
Contributor Author

5hirish commented Nov 1, 2018

Did isort pandas/core/groupby/groupby.py locally, fixed imports, still error.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 1, 2018

@5hirish I think you needed to merge master before running isort. Pretty sure travis tests a merge commit, which may have explained the difference.

I merged master and pushed a change. So you'll want to pull before making additional changes.

@codecov
Copy link

codecov bot commented Nov 1, 2018

Codecov Report

Merging #23318 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23318      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51191    51202      +11     
==========================================
+ Hits        47210    47225      +15     
+ Misses       3981     3977       -4
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 42.26% <14.28%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby/groupby.py 96.51% <100%> (+0.02%) ⬆️
pandas/core/arrays/timedeltas.py 93.75% <0%> (-0.56%) ⬇️
pandas/util/testing.py 86.64% <0%> (-0.2%) ⬇️
pandas/core/indexes/base.py 96.46% <0%> (-0.16%) ⬇️
pandas/core/arrays/datetimelike.py 96.1% <0%> (-0.06%) ⬇️
pandas/core/indexes/datetimes.py 96.41% <0%> (-0.03%) ⬇️
pandas/core/indexes/datetimelike.py 98.01% <0%> (-0.01%) ⬇️
pandas/core/indexes/api.py 99% <0%> (ø) ⬆️
pandas/core/frame.py 97.03% <0%> (ø) ⬆️
pandas/core/generic.py 96.81% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93aba79...2487fd1. Read the comment docs.

doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
"B": np.array([1.0, 3.0]),
"C": np.array([1, 3], dtype="int64")
}, index=pd.Index(['a', 'b'], name='A'))
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd. we want to construct the resulting frame and compare

pandas/tests/sparse/test_groupby.py Outdated Show resolved Hide resolved
pandas/tests/sparse/test_groupby.py Show resolved Hide resolved
doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
"B": np.array([1.0, 3.0]),
"C": np.array([1, 3], dtype="int64")
}, index=pd.Index(['a', 'b'], name='A'))
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this marked resolved? Am I missing the expected and the tm.assert_frame_equal?

pandas/tests/sparse/test_groupby.py Outdated Show resolved Hide resolved
pandas/tests/test_resample.py Show resolved Hide resolved
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @jreback any other comments?

@jreback jreback added this to the 0.24.0 milestone Nov 6, 2018
@jreback jreback merged commit 6bf6cd2 into pandas-dev:master Nov 6, 2018
@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

thanks @5hirish

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/EA: groupby on an EA should return the EA type
4 participants