-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix Summarize not always producing a scalar value (Pandas backend) #2410
Conversation
|
Here is a rundown of the
In Ibis we are using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1
ibis/pandas/aggcontext.py
Outdated
| @@ -252,19 +252,52 @@ def agg(self, grouped_data, function, *args, **kwargs): | |||
| pass | |||
|
|
|||
|
|
|||
| def make_applied_function(function, args=None, kwargs=None): | |||
| def wrap_for_apply(function, args, kwargs): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type these as much as possible (same for wrap_for_agg)
| 2) Treat the function as a N->1 aggregation function (i.e. calls the | ||
| function once on the entire Series) | ||
| Pandas `agg` will use behavior #1 unless an error is raised when doing so. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a Parameters section
ibis/pandas/aggcontext.py
Outdated
| raises a TypeError otherwise. When Pandas `agg` is attempting to use | ||
| behavior #1 but sees the TypeError, it will fall back to behavior #2. | ||
| """ | ||
| assert callable(function), 'function {} is not callable'.format(function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use f-strings
This PR is to fix a bug with the
Summarizeaggregation context class in the Pandas backend.Problem
Currently,
Summarize.aggmakes pretty straightforward use ofpandas.Series.aggto produce its result.However,
pandas.Series.aggseems to have a quirk where if a single function is passed topandas.Series.agg, it will behave exactly likepandas.Series.apply(in fact, it will callpandas.Series.apply) and produce aSeriesresult, unless an exception is caught while doing this, in which case it will treat the function as an aggregation function (and produce a scalar result).Because of this quirk,
Summarize.aggwill have unexpected results (i.e., will sometimes produce aSeriesrather than a scalar value) simply depending on whether the function passed toSummarize.agghappens to not raise an error when passed topandas.Series.apply.Solution
Currently (before this PR),
Summarize.aggwould wrap the function passed to it. This PR essentially adds logic in that wrapper function to detect if the function is being passed topandas.Series.apply, and raise aTypeErrorif so (which will forcepandas.Series.aggto not behave likepandas.Series.apply).Example
Code
Output
Before
After
Testing
Created a few tests for
Summarizeinibis/pandas/tests/test_aggcontext.py:test_summarize_single_seriestest_summarize_single_seriesgroupbytest_summarize_multiple_series<-- one test parameter for this fails before this PR