Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(bigquery): rank with windowed order by with multiple columns lists columns in the wrong order (regression in 7.2.0, working in 7.1.0) #7940

Closed
1 task done
tswast opened this issue Jan 8, 2024 · 4 comments · Fixed by #7943
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis

Comments

@tswast
Copy link
Contributor

tswast commented Jan 8, 2024

What happened?

import ibis
bq = ibis.bigquery.connect(project_id="swast-scratch")
import pandas
df = pandas.DataFrame({"bool_col": [True, False, False, None, True]})
table = ibis.memtable(df)
window = ibis.window(
    order_by=[
        ibis.asc(table["bool_col"].isnull()),
        ibis.asc(table["bool_col"]),
    ],
)
rank = table["bool_col"].rank().over(window)
expr = table.select([
    rank, table["bool_col"]
])
bq.execute(expr)
print(bq.compile(expr))

This works fine in 7.1.0:

In [2]: bq.execute(expr)
Out[2]: 
   MinRank() bool_col
0          0    False
1          4     None
2          0    False
3          2     True
4          2     True

In [3]: print(bq.compile(expr))
SELECT
  (
    rank() OVER (ORDER BY t0.`bool_col` IS NULL ASC, t0.`bool_col` ASC) - 1
  ) AS `MinRank`,
  t0.`bool_col`
FROM swast-scratch._63cfa399614a54153cc386c27d6c0c6fdb249f9e._ibis_pandas_memtable_45ft72ov35gvleackhzyaenloi AS t0

In 7.2.0, the IS NULL column expression appears second, which changes where NULL values appear in the ranking:

In [3]: bq.execute(expr)
Out[3]: 
   MinRank() bool_col
0          3     True
1          1    False
2          1    False
3          0     None
4          3     True
SELECT
  (
    -- IS NULL column expression should appear first. :-(
    rank() OVER (ORDER BY t0.`bool_col` ASC, t0.`bool_col` IS NULL ASC) - 1
  ) AS `MinRank`,
  t0.`bool_col`
FROM swast-scratch._63cfa399614a54153cc386c27d6c0c6fdb249f9e.ibis_pandas_memtable_ctis2cnl65dplgtuqh7eqozqwy AS t0

What version of ibis are you using?

7.2.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cpcloud
Copy link
Member

cpcloud commented Jan 8, 2024

@tswast Thanks for the issue!

I'm working on a fix that hopefully will be up shortly!

kszucs pushed a commit that referenced this issue Jan 9, 2024
…ubsequent expressions are duplicated (#7943)

## Description of changes

This PR fixes an issue where duplicate window expressions were given the
position of the new (duplicated value)
of the expression, rather than preserving the original order.

This was because we were iterating over the new keys first and putting
those expressions into a `dict`.

BY iterating over the old keys first, we preserve the original order.

## Issues closed

Closes #7940.
@tswast
Copy link
Contributor Author

tswast commented Jan 9, 2024

Thanks @cpcloud !

@tswast
Copy link
Contributor Author

tswast commented Feb 26, 2024

Looks like this might still be broken in ibis 8.0.0. The nlargest/nsmallest tests are still failing in googleapis/python-bigquery-dataframes#277

@tswast
Copy link
Contributor Author

tswast commented Feb 27, 2024

Looks like this might still be broken in ibis 8.0.0. The nlargest/nsmallest tests are still failing in googleapis/python-bigquery-dataframes#277

Nevermind, we were doing column.rank() instead of ibis.rank() for some reason, which was overriding the window a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis
Projects
Archived in project
2 participants