You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
if__name__=="__main__":
df=pl.DataFrame(
{'region': [1, 1, 2]},
)
#print(df)ctx=pl.SQLContext(register_globals=False, eager_execution=True)
ctx.register("df", df)
print(ctx.execute("select region, count(*) from df group by region having count(*) > 1"))
To my surprise this prints out both regions (1 and 2.. Even though region 2 has only 1 element)
Ahh... we require the having clause to reference named select columns here as we apply the constraint as a post-aggregation step (this is in line with MySQL's take on the "having" clause, though we prefer targeting PostgreSQL behaviour wherever possible).
This formulation will work correctly:
select region, count(*) as n from df group by region having n >1
Just ran the following on a mysql (mysql/8.3.0_1) instance:
with data as (
select1as region
union allselect1as region
union allselect2as region
)
select region, count(*) as region_count
from data
group by region
havingcount(*) >1;
+--------+--------------+
| region | region_count |
+--------+--------------+
| 1 | 2 |
+--------+--------------+
1 row in set (0.00 sec)
Checks
Reproducible example
To my surprise this prints out both regions (1 and 2.. Even though region 2 has only 1 element)
Log output
No response
Issue description
The having clause is not honored.
Expected behavior
Only region 1 is printed
Installed versions
The text was updated successfully, but these errors were encountered: