Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression 0.20.17: ComputeError: filter predicate was not of type boolean with predicate_pushdown enabled #15442

Closed
2 tasks done
robinvd opened this issue Apr 2, 2024 · 4 comments · Fixed by #15522
Closed
2 tasks done
Labels
A-optimizer Area: plan optimization bug Something isn't working P-high Priority: high python Related to Python Polars

Comments

@robinvd
Copy link

robinvd commented Apr 2, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame(
    {
        "x": [206.0],
        "y": [225.0],
    }
)
center = pl.struct(
    x=pl.col("x"),
    y=pl.col("y"),
)

left = 0
right = 1000
in_x = (left < center.struct.field("x")) & (center.struct.field("x") <= right)

# works fine
print(df.filter(in_x))
print(df.lazy().filter(in_x).collect(predicate_pushdown=False))
# polars.exceptions.ComputeError: filter predicate was not of type boolean
print(df.lazy().filter(in_x).collect())

Log output

dataframe filtered
dataframe filtered
Traceback (most recent call last):
  File "/home/robin/prog/work/processingtools/run_20-18_bug.py", line 49, in <module>
    print(df.lazy().filter(in_x).collect())
  File "/home/robin/prog/work/processingtools/venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1943, in collect
    return wrap_df(ldf.collect())
polars.exceptions.ComputeError: filter predicate was not of type boolean

Issue description

Only happens with predicate_pushdown=True and polars==0.20.18

EDIT: also happens on 0.20.17 but works on 0.20.16

Expected behavior

works like in previous versions

Installed versions

--------Version info---------
Polars:               0.20.18
Index type:           UInt32
Platform:             Linux-6.5.0-26-generic-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.5.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.1
nest_asyncio:         1.5.8
numpy:                1.24.4
openpyxl:             <not installed>
pandas:               2.0.2
pyarrow:              14.0.2
pydantic:             1.10.13
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@robinvd robinvd added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 2, 2024
@robinvd robinvd changed the title regression 0.20.18: ComputeError: filter predicate was not of type boolean with predicate_pushdown enabled regression 0.20.17: ComputeError: filter predicate was not of type boolean with predicate_pushdown enabled Apr 2, 2024
@cmdlineluser
Copy link
Contributor

Can reproduce.

Just while debugging things, I did notice that it also works with pl.lit

in_x = (pl.lit(left) < center.struct.field("x")) & (center.struct.field("x") <= pl.lit(right))

df.lazy().filter(in_x).collect()

# shape: (1, 2)
# ┌───────┬───────┐
# │ x     ┆ y     │
# │ ---   ┆ ---   │
# │ f64   ┆ f64   │
# ╞═══════╪═══════╡
# │ 206.0 ┆ 225.0 │
# └───────┴───────┘

@robinvd
Copy link
Author

robinvd commented Apr 2, 2024

bisect:

f50ff2ebb2e76c73c300fb7e12ac20e4b22c6024 is the first bad commit
commit f50ff2ebb2e76c73c300fb7e12ac20e4b22c6024
Author: Ritchie Vink <ritchie46@gmail.com>
Date:   Sat Mar 23 11:16:47 2024 +0100

    feat: Add IR for expressions. (#15168)

#15168

but this is a rather big MR and im not familiar with the codebase, so that is where my debugging ends

@reswqa
Copy link
Collaborator

reswqa commented Apr 3, 2024

The leftmost name principle is not followed by struct.field, which is somewhat problematic under the new Expr framework.

I think this should be fixed after #15430(It's not confirmed, but I have a feeling it's probably the same reason), and @ritchie46 is looking at that one.

Update: Not the same reason actually, but both are caused by the introduction of ExprIR.

@reswqa reswqa added P-medium Priority: medium A-optimizer Area: plan optimization and removed needs triage Awaiting prioritization by a maintainer labels Apr 3, 2024
@reswqa reswqa self-assigned this Apr 7, 2024
@reswqa reswqa added P-high Priority: high and removed P-medium Priority: medium labels Apr 8, 2024
@reswqa reswqa removed their assignment Apr 8, 2024
@robinvd
Copy link
Author

robinvd commented Apr 8, 2024

Confirmed fixed on main. Thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-optimizer Area: plan optimization bug Something isn't working P-high Priority: high python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants