Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Add support for destruct for aggregation with no groupby #2511

Merged
merged 3 commits into from
Nov 2, 2020

Conversation

icexelloss
Copy link
Contributor

@icexelloss icexelloss commented Nov 2, 2020

What is the change

Follow up of #2487

Added support for aggregation without groupby with destruct.

This change itself is straight forward. The main change is in coerce_to_output where I added support for coercing a tuple of scalar, e.g., (1.0, 2.0) to a single row DataFrame, which is what we used for other destruct columns.

How is this tested

Added test_reduction_udf_no_groupby

@jreback jreback added this to the Next Feature Release milestone Nov 2, 2020
elif np.isscalar(result):
elif isinstance(result, pd.Series):
return result.rename(result_name)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this one an elif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Update this function to be more restrictive.

result = pd.concat(data, axis=1)
else:
# Promote scalar to Series
result = pd.concat([pd.Series([v]) for v in data], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have the right name? IIUC this is a scalar returned for an aggregation. but why does this hit here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are set below inline 116.

This is used to return multiple fields (destruct) for an aggregation. In which case, user will return sth like

return 1.0, 2.0

from the UDF and therefore we coerce the output to the proper shape in order for it to work with in the logic in:

https://github.com/ibis-project/ibis/blob/master/ibis/backends/pandas/execution/generic.py#L436

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is not inefficient to do so because this is only hit when doing aggregation without groupby and therefore, this is done only once for the entire aggregation node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this sounds fine

@@ -94,11 +94,6 @@ def zscore(series):
return (series - series.mean()) / series.std()


@udf.elementwise([], dt.int64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight from the vectorized udf changes. We do not support return a scalar from elementwise udf anymore.

@jreback jreback added analytics Higher-level data analysis API related issues or PRs pandas The pandas backend labels Nov 2, 2020
@jreback jreback merged commit db7536b into ibis-project:master Nov 2, 2020
@jreback
Copy link
Contributor

jreback commented Nov 2, 2020

thanks @icexelloss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analytics Higher-level data analysis API related issues or PRs pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants