Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add groupby & reduce support to EA #22762

Merged
merged 23 commits into from Oct 12, 2018

Conversation

Projects
None yet
5 participants
@jreback
Copy link
Contributor

commented Sep 19, 2018

closes #21789
closes #22346
xref #22865

@jreback jreback added this to the 0.24.0 milestone Sep 19, 2018

@pep8speaks

This comment has been minimized.

Copy link

commented Sep 19, 2018

Hello @jreback! Thanks for updating the PR.

Comment last updated on October 06, 2018 at 14:08 Hours UTC
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2018

Show resolved Hide resolved pandas/core/arrays/integer.py
Show resolved Hide resolved pandas/core/arrays/integer.py
Show resolved Hide resolved pandas/tests/extension/base/reduce.py
Show resolved Hide resolved pandas/tests/extension/base/reduce.py Outdated

@jreback jreback force-pushed the jreback:groupby branch from 4e2b5d6 to 43e2176 Sep 19, 2018

@codecov

This comment has been minimized.

Copy link

commented Sep 19, 2018

Codecov Report

Merging #22762 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22762      +/-   ##
==========================================
+ Coverage    92.2%    92.2%   +<.01%     
==========================================
  Files         169      169              
  Lines       50924    50944      +20     
==========================================
+ Hits        46952    46972      +20     
  Misses       3972     3972
Flag Coverage Δ
#multiple 90.62% <100%> (ø) ⬆️
#single 42.29% <32%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 95.94% <100%> (+0.05%) ⬆️
pandas/core/series.py 93.76% <100%> (+0.01%) ⬆️
pandas/core/arrays/integer.py 94.9% <100%> (+0.31%) ⬆️
pandas/core/arrays/categorical.py 95.62% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8ce3d0...aeaf5f3. Read the comment docs.

@jorisvandenbossche
Copy link
Member

left a comment

From a quick look: I have the feeling this is exposing too much internals of pandas to EA (eg this basically makes all our nanops implementations public).
We should maybe rethink the _reduce interface a bit? (I assume now this is basically identical to what we already have defined on series/dataframe?)

Show resolved Hide resolved pandas/core/arrays/base.py Outdated

@jreback jreback referenced this pull request Sep 23, 2018

Merged

TST/CLN: Fixturize frame/test_analytics #22733

3 of 3 tasks complete
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Sep 23, 2018

@jorisvandenbossche simplified

@jreback jreback force-pushed the jreback:groupby branch from 2955ae8 to 8103b8c Sep 23, 2018

@@ -708,6 +712,25 @@ def _add_comparison_ops(cls):
cls.__le__ = cls._create_comparison_method(operator.le)
cls.__ge__ = cls._create_comparison_method(operator.ge)

def _reduce(self, name, skipna=True, **kwargs):

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Sep 26, 2018

Contributor

What sort of things go into kwargs? Is it things like ddof?

This comment has been minimized.

Copy link
@jreback

jreback Sep 28, 2018

Author Contributor

yes, they are specific to the reducer

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2018

In the meeting, we discussed what error should be raised when an ExtensionArray doesn't implement a specific reduction (e.g. StringArray won't implement mean). On this branch I think a NotImplementedError will be raised. Perhaps a TypeError would be more appropriate?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2018

@TomAugspurger

In the meeting, we discussed what error should be raised when an ExtensionArray doesn't implement a specific reduction (e.g. StringArray won't implement mean). On this branch I think a NotImplementedError will be raised. Perhaps a TypeError would be more appropriate?

so we have a base class _reduce which raises AbstractMethodError(). so you are suggesting that we have an overriden _reduce? or just generically change that to TypeError?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2018

I'm not really sure what's cleanest.

If a class implements just some reductions, changing what we do in the base class doesn't won't make a difference. At that point I think it's up to the EA to raise the correct exception for methods they don't support.

For classes that don't implement any reductions, we could either raise a TypeError, or put each call in a

try:
    values._reduce(...)
except NotImplementedError:
    raise TypeError(...)

but doing that everywhere seems error prone.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2018

So my earlier comment "On this branch I think a NotImplementedError" may have been incorrect.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Sep 28, 2018

or just generically change that to TypeError?

I would do that, because in the base scenario (the EA does not support reductions), users should see a TypeError and not an AbstractMethodError. And if we already raise that in the base implementation, an EA author does not need to do that.

@jreback jreback force-pushed the jreback:groupby branch from 5a83940 to cb0499c Oct 1, 2018

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2018

ok updated to generically raise TypeError on EA's if a reduction operation is attempted and not overriden.

Show resolved Hide resolved doc/source/whatsnew/v0.24.0.txt
Show resolved Hide resolved pandas/core/arrays/base.py Outdated
Show resolved Hide resolved pandas/core/arrays/base.py Outdated
Show resolved Hide resolved pandas/core/arrays/base.py Outdated
Show resolved Hide resolved pandas/core/arrays/base.py
Show resolved Hide resolved pandas/core/arrays/integer.py Outdated
Show resolved Hide resolved pandas/core/dtypes/common.py Outdated
Show resolved Hide resolved pandas/core/series.py
Show resolved Hide resolved pandas/core/arrays/integer.py Outdated
Show resolved Hide resolved pandas/tests/extension/decimal/array.py Outdated

@jreback jreback force-pushed the jreback:groupby branch from cb0499c to c68b9b5 Oct 3, 2018

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2018

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

@jreback can you answer to some of my remaining comments?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2018

@jorisvandenbossche

You could also mask the data for now before passing to nanops ?

can't see to reply above, but that is not possible. you change behavior by pre-filtering, the nan's need to be handled by the nanops routines which already do the special cases (e.g. sum and such), and its just much simpler.

@TomAugspurger TomAugspurger referenced this pull request Oct 11, 2018

Merged

SparseArray is an ExtensionArray #22325

4 of 4 tasks complete

@jreback jreback force-pushed the jreback:groupby branch from 0c80cbf to 2b3d96f Oct 11, 2018

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 11, 2018

@jorisvandenbossche addressed comments & removed some xfailed tests.

jorisvandenbossche added some commits Oct 12, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

That test looks good. +1 to merge.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

@jorisvandenbossche your commits all look fine.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

I opened #23106 for the return type issue

@jorisvandenbossche jorisvandenbossche merged commit 08e2752 into pandas-dev:master Oct 12, 2018

6 checks passed

ci/circleci: py27_compat Your tests passed on CircleCI!
Details
ci/circleci: py35_ascii Your tests passed on CircleCI!
Details
ci/circleci: py36_locale Your tests passed on CircleCI!
Details
ci/circleci: py36_locale_slow Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181012.12 succeeded
Details
@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

@jreback Thanks!

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

yeah the return type seems easy but ran into some issues.

brute4s99 added a commit to brute4s99/pandas that referenced this pull request Nov 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.