Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add groupby & reduce support to EA #22762

Merged
merged 23 commits into from
Oct 12, 2018

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Sep 19, 2018

closes #21789
closes #22346
xref #22865

@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 19, 2018
@jreback jreback added this to the 0.24.0 milestone Sep 19, 2018
@pep8speaks
Copy link

pep8speaks commented Sep 19, 2018

Hello @jreback! Thanks for updating the PR.

Comment last updated on October 06, 2018 at 14:08 Hours UTC

@jreback
Copy link
Contributor Author

jreback commented Sep 19, 2018

pandas/core/arrays/integer.py Show resolved Hide resolved
pandas/core/arrays/integer.py Show resolved Hide resolved
pandas/tests/extension/base/reduce.py Show resolved Hide resolved
pandas/tests/extension/base/reduce.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 19, 2018

Codecov Report

Merging #22762 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22762      +/-   ##
==========================================
+ Coverage    92.2%    92.2%   +<.01%     
==========================================
  Files         169      169              
  Lines       50924    50944      +20     
==========================================
+ Hits        46952    46972      +20     
  Misses       3972     3972
Flag Coverage Δ
#multiple 90.62% <100%> (ø) ⬆️
#single 42.29% <32%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 95.94% <100%> (+0.05%) ⬆️
pandas/core/series.py 93.76% <100%> (+0.01%) ⬆️
pandas/core/arrays/integer.py 94.9% <100%> (+0.31%) ⬆️
pandas/core/arrays/categorical.py 95.62% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8ce3d0...aeaf5f3. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look: I have the feeling this is exposing too much internals of pandas to EA (eg this basically makes all our nanops implementations public).
We should maybe rethink the _reduce interface a bit? (I assume now this is basically identical to what we already have defined on series/dataframe?)

pandas/core/arrays/base.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor Author

jreback commented Sep 23, 2018

@jorisvandenbossche simplified

@@ -708,6 +712,25 @@ def _add_comparison_ops(cls):
cls.__le__ = cls._create_comparison_method(operator.le)
cls.__ge__ = cls._create_comparison_method(operator.ge)

def _reduce(self, name, skipna=True, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of things go into kwargs? Is it things like ddof?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are specific to the reducer

@TomAugspurger
Copy link
Contributor

In the meeting, we discussed what error should be raised when an ExtensionArray doesn't implement a specific reduction (e.g. StringArray won't implement mean). On this branch I think a NotImplementedError will be raised. Perhaps a TypeError would be more appropriate?

@jreback
Copy link
Contributor Author

jreback commented Sep 28, 2018

@TomAugspurger

In the meeting, we discussed what error should be raised when an ExtensionArray doesn't implement a specific reduction (e.g. StringArray won't implement mean). On this branch I think a NotImplementedError will be raised. Perhaps a TypeError would be more appropriate?

so we have a base class _reduce which raises AbstractMethodError(). so you are suggesting that we have an overriden _reduce? or just generically change that to TypeError?

@TomAugspurger
Copy link
Contributor

I'm not really sure what's cleanest.

If a class implements just some reductions, changing what we do in the base class doesn't won't make a difference. At that point I think it's up to the EA to raise the correct exception for methods they don't support.

For classes that don't implement any reductions, we could either raise a TypeError, or put each call in a

try:
    values._reduce(...)
except NotImplementedError:
    raise TypeError(...)

but doing that everywhere seems error prone.

@TomAugspurger
Copy link
Contributor

So my earlier comment "On this branch I think a NotImplementedError" may have been incorrect.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 28, 2018

or just generically change that to TypeError?

I would do that, because in the base scenario (the EA does not support reductions), users should see a TypeError and not an AbstractMethodError. And if we already raise that in the base implementation, an EA author does not need to do that.

@jreback
Copy link
Contributor Author

jreback commented Oct 1, 2018

ok updated to generically raise TypeError on EA's if a reduction operation is attempted and not overriden.

doc/source/whatsnew/v0.24.0.txt Show resolved Hide resolved
pandas/core/arrays/base.py Outdated Show resolved Hide resolved
pandas/core/arrays/base.py Outdated Show resolved Hide resolved
pandas/core/arrays/base.py Outdated Show resolved Hide resolved
pandas/core/arrays/base.py Show resolved Hide resolved
pandas/core/arrays/integer.py Outdated Show resolved Hide resolved
pandas/core/dtypes/common.py Outdated Show resolved Hide resolved
pandas/core/series.py Show resolved Hide resolved
pandas/core/arrays/integer.py Outdated Show resolved Hide resolved
pandas/tests/extension/decimal/array.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor Author

jreback commented Oct 3, 2018

@jorisvandenbossche updated

@jorisvandenbossche
Copy link
Member

@jreback can you answer to some of my remaining comments?

@jreback
Copy link
Contributor Author

jreback commented Oct 5, 2018

@jorisvandenbossche

You could also mask the data for now before passing to nanops ?

can't see to reply above, but that is not possible. you change behavior by pre-filtering, the nan's need to be handled by the nanops routines which already do the special cases (e.g. sum and such), and its just much simpler.

@jreback
Copy link
Contributor Author

jreback commented Oct 11, 2018

@jorisvandenbossche addressed comments & removed some xfailed tests.

@TomAugspurger
Copy link
Contributor

That test looks good. +1 to merge.

@jreback
Copy link
Contributor Author

jreback commented Oct 12, 2018

@jorisvandenbossche your commits all look fine.

@jorisvandenbossche
Copy link
Member

I opened #23106 for the return type issue

@jorisvandenbossche jorisvandenbossche merged commit 08e2752 into pandas-dev:master Oct 12, 2018
@jorisvandenbossche
Copy link
Member

@jreback Thanks!

@jreback
Copy link
Contributor Author

jreback commented Oct 12, 2018

yeah the return type seems easy but ran into some issues.

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reductions for ExtensionArray ENH: add reduce op (and groupby reduce) support to EA
5 participants