-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Add support for destruct for aggregation with no groupby #2511
Conversation
| elif np.isscalar(result): | ||
| elif isinstance(result, pd.Series): | ||
| return result.rename(result_name) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this one an elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Update this function to be more restrictive.
| result = pd.concat(data, axis=1) | ||
| else: | ||
| # Promote scalar to Series | ||
| result = pd.concat([pd.Series([v]) for v in data], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have the right name? IIUC this is a scalar returned for an aggregation. but why does this hit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are set below inline 116.
This is used to return multiple fields (destruct) for an aggregation. In which case, user will return sth like
return 1.0, 2.0
from the UDF and therefore we coerce the output to the proper shape in order for it to work with in the logic in:
https://github.com/ibis-project/ibis/blob/master/ibis/backends/pandas/execution/generic.py#L436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is not inefficient to do so because this is only hit when doing aggregation without groupby and therefore, this is done only once for the entire aggregation node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this sounds fine
| @@ -94,11 +94,6 @@ def zscore(series): | |||
| return (series - series.mean()) / series.std() | |||
|
|
|||
|
|
|||
| @udf.elementwise([], dt.int64) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an oversight from the vectorized udf changes. We do not support return a scalar from elementwise udf anymore.
|
thanks @icexelloss |
What is the change
Follow up of #2487
Added support for aggregation without groupby with destruct.
This change itself is straight forward. The main change is in
coerce_to_outputwhere I added support for coercing a tuple of scalar, e.g., (1.0, 2.0) to a single row DataFrame, which is what we used for other destruct columns.How is this tested
Added
test_reduction_udf_no_groupby