Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(exprs): chain mutate() for window op and non window op throws error for pyspark backend #2453

Closed
LeeTZ opened this issue Oct 6, 2020 · 1 comment · Fixed by #4005
Closed
Labels
bug Incorrect behavior inside of ibis pyspark The Apache PySpark backend
Milestone

Comments

@LeeTZ
Copy link
Contributor

LeeTZ commented Oct 6, 2020

Currently, chain mutate() for window will fail for pyspark backend:

table = client.table('time_indexed_table')
    context = (
        pd.Timestamp('20170102 07:00:00', tz='UTC'),
        pd.Timestamp('20170105', tz='UTC'),
    )
    window = ibis.trailing_window(
        preceding=ibis.interval(hours=1), order_by='time', group_by='key'
    )
result_pd = (
        table.mutate(
            count_1h=table['value'].count().over(window),
        ).mutate(count=table['value'].count())
        .execute(timecontext=context)
    )

failed with AnalysisException:

E               pyspark.sql.utils.AnalysisException: "grouping expressions sequence is empty, and 'time_indexed_table.`time`' is not an aggregate function. Wrap '(count(time_indexed_table.`value`) AS `count_yoyo`)' in windowing function(s) or wrap 'time_indexed_table.`time`' in first() (or first_value) if you don't care which value you get.;;\nProject [time#19, key#20L, value#21, count#52L, count_yoyo#53L]\n+- Project [time#19, key#20L, value#21, count_yoyo#53L, _w0#54L, count#52L, count#52L]\n   +- Window [count(value#21) windowspecdefinition(key#20L, _w0#54L ASC NULLS FIRST, specifiedwindowframe(RangeFrame, currentrow$(), 3600)) AS count#52L], [key#20L], [_w0#54L ASC NULLS FIRST]\n      +- Aggregate [time#19, key#20L, value#21, count(value#21) AS count_yoyo#53L, cast(time#19 as bigint) AS _w0#54L]\n         +- Filter ((time#19 >= 1483340400000000) && (time#19 < 1483405200000000))\n            +- SubqueryAlias `time_indexed_table`\n               +- LogicalRDD [time#19, key#20L, value#21], false\n"
@cpcloud cpcloud changed the title BUG: chain mutate() for window op and non window op throws error for pyspark backend bug(exprs): chain mutate() for window op and non window op throws error for pyspark backend Dec 29, 2021
@cpcloud cpcloud added the bug Incorrect behavior inside of ibis label Dec 29, 2021
@cpcloud cpcloud added the pyspark The Apache PySpark backend label Jan 10, 2022
@cpcloud cpcloud added this to the 3.x milestone Apr 19, 2022
@cpcloud
Copy link
Member

cpcloud commented Apr 27, 2022

@LeeTZ Can you still reproduce this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis pyspark The Apache PySpark backend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants