New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary operation #1309

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@kszucs
Member

kszucs commented Jan 28, 2018

Relates to #1230

log10 = _unary_op('log10', _ops.Log10)
ln = _unary_op('ln', _ops.Ln)
sign = _unary_op('sign', _ops.Sign)
sqrt = _unary_op('sqrt', _ops.Sqrt)

This comment has been minimized.

@cpcloud

cpcloud Jan 28, 2018

Member

Don't remove these, since they are a breaking API change. If you want to deprecate them, open an issue about it and we can discuss there.

@cpcloud cpcloud self-requested a review Jan 28, 2018

@cpcloud cpcloud self-assigned this Jan 28, 2018

@cpcloud cpcloud added this to the 0.13 milestone Jan 28, 2018

@kszucs

This comment has been minimized.

Member

kszucs commented Jan 28, 2018

@cpcloud reverted, forgot the asterisk import

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jan 29, 2018

Can you implement this on a couple of backends (maybe bigquery and clickhouse)?

@kszucs

This comment has been minimized.

Member

kszucs commented Jan 30, 2018

@cpcloud Sure, but I'd like to rebase it on top of #1292

@cpcloud

This comment has been minimized.

Member

cpcloud commented Feb 5, 2018

@kszucs can you rebase? maybe add a clickhouse and an impala implementation and test to the backend test suite then I'll merge

@kszucs kszucs force-pushed the kszucs:arbitrary branch from b60c3fb to 1c6c06c Feb 6, 2018

@kszucs

This comment has been minimized.

Member

kszucs commented Feb 6, 2018

I simply cannot make it work with bigquery:
Function not found: FIRST
Function not found: LAST

If You don't insist on bigquery implementation then it's ready for review.

'Arbitrary {} is not supported'.format(op.how))
data = data[mask] if mask is not None else data
return data.iloc[index]

This comment has been minimized.

@cpcloud

cpcloud Feb 6, 2018

Member

I think you need to use context here, so that arbitrary can be used as a window function. There should be a bunch of examples of how to use it in this file.

This comment has been minimized.

@kszucs

kszucs Feb 6, 2018

Member

Series first and last methods are not working here https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.first.html

Am I using the wrong signature? I thought it was for table.column.reducer() scenario.

@execute_node.register(ops.Arbitrary, pd.Series, (pd.Series, type(None)))

This comment has been minimized.

@cpcloud

cpcloud Feb 6, 2018

Member

The case I'm thinking of is something like:

t.a - t.a.arbitrary()

where the result of t.a.arbitrary() needs to be projected over the length of t.a

This comment has been minimized.

@cpcloud

cpcloud Feb 6, 2018

Member

It's fine to not use first and last

This comment has been minimized.

@kszucs

kszucs Feb 6, 2018

Member

@cpcloud could You give me a full expression I should test against it?

def execute_arbitrary_series_groupby(op, data, _, context=None, **kwargs):
if op.how not in {'first', 'last'}:
raise com.OperationNotDefinedError(
'Arbitrary {} is not supported'.format(op.how))

This comment has been minimized.

@cpcloud

cpcloud Feb 6, 2018

Member

Let's use {!r} here so the how is called out in the error message.

@@ -463,6 +463,23 @@ def group_concat(arg, sep=',', where=None):
return _ops.GroupConcat(arg, sep, where).to_expr()
def arbitrary(arg, where=None, how='first'):
"""
Selects the first encountered value.

This comment has been minimized.

@cpcloud

cpcloud Feb 6, 2018

Member

This should probably read something like: "Selects the first non-null value in a column" so that users know what happens when NULL values are encountered.

----------
arg : array expression
where: bool, default None
how : {'first', 'last', 'heavy'}, default 'first'

This comment has been minimized.

@cpcloud

cpcloud Feb 7, 2018

Member

Can you document what heavy means?

This comment has been minimized.

@kszucs
@cpcloud

cpcloud approved these changes Feb 9, 2018

@kszucs kszucs force-pushed the kszucs:arbitrary branch from 6c6d3d7 to 427e82c Feb 9, 2018

@kszucs

This comment has been minimized.

Member

kszucs commented Feb 9, 2018

@cpcloud rebased

@cpcloud

This comment has been minimized.

Member

cpcloud commented Feb 9, 2018

Cool, merging on green. Don't worry about the warning

@cpcloud

This comment has been minimized.

Member

cpcloud commented Feb 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment