-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary operation #1309
Arbitrary operation #1309
Conversation
ibis/expr/api.py
Outdated
| log10 = _unary_op('log10', _ops.Log10) | ||
| ln = _unary_op('ln', _ops.Ln) | ||
| sign = _unary_op('sign', _ops.Sign) | ||
| sqrt = _unary_op('sqrt', _ops.Sqrt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remove these, since they are a breaking API change. If you want to deprecate them, open an issue about it and we can discuss there.
|
@cpcloud reverted, forgot the asterisk import |
|
Can you implement this on a couple of backends (maybe bigquery and clickhouse)? |
|
@kszucs can you rebase? maybe add a clickhouse and an impala implementation and test to the backend test suite then I'll merge |
|
I simply cannot make it work with bigquery: If You don't insist on bigquery implementation then it's ready for review. |
ibis/pandas/execution/generic.py
Outdated
| 'Arbitrary {} is not supported'.format(op.how)) | ||
|
|
||
| data = data[mask] if mask is not None else data | ||
| return data.iloc[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to use context here, so that arbitrary can be used as a window function. There should be a bunch of examples of how to use it in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series first and last methods are not working here https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.first.html
Am I using the wrong signature? I thought it was for table.column.reducer() scenario.
@execute_node.register(ops.Arbitrary, pd.Series, (pd.Series, type(None)))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case I'm thinking of is something like:
t.a - t.a.arbitrary()where the result of t.a.arbitrary() needs to be projected over the length of t.a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to not use first and last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud could You give me a full expression I should test against it?
ibis/pandas/execution/generic.py
Outdated
| def execute_arbitrary_series_groupby(op, data, _, context=None, **kwargs): | ||
| if op.how not in {'first', 'last'}: | ||
| raise com.OperationNotDefinedError( | ||
| 'Arbitrary {} is not supported'.format(op.how)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use {!r} here so the how is called out in the error message.
ibis/expr/api.py
Outdated
| @@ -463,6 +463,23 @@ def group_concat(arg, sep=',', where=None): | |||
| return _ops.GroupConcat(arg, sep, where).to_expr() | |||
|
|
|||
|
|
|||
| def arbitrary(arg, where=None, how='first'): | |||
| """ | |||
| Selects the first encountered value. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably read something like: "Selects the first non-null value in a column" so that users know what happens when NULL values are encountered.
| ---------- | ||
| arg : array expression | ||
| where: bool, default None | ||
| how : {'first', 'last', 'heavy'}, default 'first' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document what heavy means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud done.
|
@cpcloud rebased |
|
Cool, merging on green. Don't worry about the warning |
|
Merging, this passed here: https://ci.appveyor.com/project/cpcloud/ibis-xh5g1/build/1.0.1516 |
Relates to #1230