Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary operation #1309

Closed
wants to merge 6 commits into from
Closed

Arbitrary operation #1309

wants to merge 6 commits into from

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Jan 28, 2018

Relates to #1230

ibis/expr/api.py Outdated
log10 = _unary_op('log10', _ops.Log10)
ln = _unary_op('ln', _ops.Ln)
sign = _unary_op('sign', _ops.Sign)
sqrt = _unary_op('sqrt', _ops.Sqrt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove these, since they are a breaking API change. If you want to deprecate them, open an issue about it and we can discuss there.

@cpcloud cpcloud self-requested a review January 28, 2018 16:47
@cpcloud cpcloud self-assigned this Jan 28, 2018
@cpcloud cpcloud added feature Features or general enhancements expressions Issues or PRs related to the expression API labels Jan 28, 2018
@cpcloud cpcloud added this to the 0.13 milestone Jan 28, 2018
@kszucs
Copy link
Member Author

kszucs commented Jan 28, 2018

@cpcloud reverted, forgot the asterisk import

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2018

Can you implement this on a couple of backends (maybe bigquery and clickhouse)?

@kszucs
Copy link
Member Author

kszucs commented Jan 30, 2018

@cpcloud Sure, but I'd like to rebase it on top of #1292

@cpcloud
Copy link
Member

cpcloud commented Feb 5, 2018

@kszucs can you rebase? maybe add a clickhouse and an impala implementation and test to the backend test suite then I'll merge

@kszucs
Copy link
Member Author

kszucs commented Feb 6, 2018

I simply cannot make it work with bigquery:
Function not found: FIRST
Function not found: LAST

If You don't insist on bigquery implementation then it's ready for review.

'Arbitrary {} is not supported'.format(op.how))

data = data[mask] if mask is not None else data
return data.iloc[index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use context here, so that arbitrary can be used as a window function. There should be a bunch of examples of how to use it in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series first and last methods are not working here https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.first.html

Am I using the wrong signature? I thought it was for table.column.reducer() scenario.

@execute_node.register(ops.Arbitrary, pd.Series, (pd.Series, type(None)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I'm thinking of is something like:

t.a - t.a.arbitrary()

where the result of t.a.arbitrary() needs to be projected over the length of t.a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to not use first and last

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud could You give me a full expression I should test against it?

def execute_arbitrary_series_groupby(op, data, _, context=None, **kwargs):
if op.how not in {'first', 'last'}:
raise com.OperationNotDefinedError(
'Arbitrary {} is not supported'.format(op.how))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use {!r} here so the how is called out in the error message.

ibis/expr/api.py Outdated
@@ -463,6 +463,23 @@ def group_concat(arg, sep=',', where=None):
return _ops.GroupConcat(arg, sep, where).to_expr()


def arbitrary(arg, where=None, how='first'):
"""
Selects the first encountered value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably read something like: "Selects the first non-null value in a column" so that users know what happens when NULL values are encountered.

----------
arg : array expression
where: bool, default None
how : {'first', 'last', 'heavy'}, default 'first'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what heavy means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud done.

@kszucs
Copy link
Member Author

kszucs commented Feb 9, 2018

@cpcloud rebased

@cpcloud
Copy link
Member

cpcloud commented Feb 9, 2018

Cool, merging on green. Don't worry about the warning

@cpcloud
Copy link
Member

cpcloud commented Feb 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants