Correctness issue: Having clause not honored in SQL statement #15950

timvw · 2024-04-29T09:31:09Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

if __name__ == "__main__":

    df = pl.DataFrame(
        {'region': [1, 1, 2]},
    )
    #print(df)

    ctx = pl.SQLContext(register_globals=False, eager_execution=True)
    ctx.register("df", df)
    print(ctx.execute("select region, count(*) from df group by region having count(*) > 1"))

To my surprise this prints out both regions (1 and 2.. Even though region 2 has only 1 element)

shape: (2, 2)
┌────────┬─────┐
│ region ┆ len │
│ ---    ┆ --- │
│ i64    ┆ u32 │
╞════════╪═════╡
│ 1      ┆ 2   │
│ 2      ┆ 1   │
└────────┴─────┘

Log output

No response

Issue description

The having clause is not honored.

Expected behavior

Only region 1 is printed

Installed versions

--------Version info---------
Polars:               0.20.22
Index type:           UInt32
Platform:             macOS-14.4.1-arm64-arm-64bit
Python:               3.9.15 (main, Apr 20 2024, 22:00:10) 
[Clang 15.0.0 (clang-1500.3.9.4)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              <not installed>
pydantic:             2.7.0
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.29
xlsx2csv:             <not installed>
xlsxwriter:           3.2.0

The text was updated successfully, but these errors were encountered:

alexander-beedie · 2024-04-29T14:00:50Z

Ahh... we require the having clause to reference named select columns here as we apply the constraint as a post-aggregation step (this is in line with MySQL's take on the "having" clause, though we prefer targeting PostgreSQL behaviour wherever possible).

This formulation will work correctly:

select region, count(*) as n from df group by region having n > 1

# ┌────────┬─────┐
# │ region ┆ n   │
# │ ---    ┆ --- │
# │ i64    ┆ u32 │
# ╞════════╪═════╡
# │ 1      ┆ 2   │
# └────────┴─────┘

I'll have to see if we can improve this (either by raising a suitable error, or automatically substituting to achieve the same effect) 🤔

timvw · 2024-04-29T17:20:16Z

I assume that adding this to the documentation would be helpful as well

alexander-beedie · 2024-04-30T06:34:02Z

I assume that adding this to the documentation would be helpful as well

Yup; we also need a total overhaul of our SQL documentation :))

timvw · 2024-04-30T07:41:11Z

Just ran the following on a mysql (mysql/8.3.0_1) instance:

with data as (
  select 1 as region
  union all
  select 1 as region
  union all 
  select 2 as region
)
select region, count(*) as region_count
from data
group by region
having count(*) > 1;

+--------+--------------+
| region | region_count |
+--------+--------------+
|      1 |            2 |
+--------+--------------+
1 row in set (0.00 sec)

timvw added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 29, 2024

alexander-beedie added A-sql Area: Polars SQL functionality and removed needs triage Awaiting prioritization by a maintainer labels Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness issue: Having clause not honored in SQL statement #15950

Correctness issue: Having clause not honored in SQL statement #15950

timvw commented Apr 29, 2024

alexander-beedie commented Apr 29, 2024 •

edited

timvw commented Apr 29, 2024

alexander-beedie commented Apr 30, 2024

timvw commented Apr 30, 2024

Correctness issue: Having clause not honored in SQL statement #15950

Correctness issue: Having clause not honored in SQL statement #15950

Comments

timvw commented Apr 29, 2024

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

alexander-beedie commented Apr 29, 2024 • edited

timvw commented Apr 29, 2024

alexander-beedie commented Apr 30, 2024

timvw commented Apr 30, 2024

alexander-beedie commented Apr 29, 2024 •

edited