Lazy processing throws error when equivalent nonlazy code does not #14382

william-chu-github · 2024-02-08T22:26:23Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars

polars.Config.set_verbose(True)

DF = polars.DataFrame({
  "b": [1, 6, 8, 7]
})
DF2 = polars.DataFrame({
  "a": [1, 2, 4, 4],  "b": [True, True, True, True]
})
  
DF3 = DF2.lazy().\
  select(
    *["a", "b"], polars.lit("b").alias("b_name"),
    DF.get_column("b").alias("b_old")
  ).\
  filter(polars.col("b") == False).drop("b").collect()

Log output

File "C:\Program Files\Python38\lib\site-packages\polars\lazyframe\frame.py", line 1940, in collect
    return wrap_df(ldf.collect())

ComputeError: column 'b_old' not available in schema Schema:
name: a, data type: Int64
name: b, data type: Boolean

Issue description

Some lazy processing throws exceptions when the equivalent nonlazy code runs fine.

In the reproducible example above, if I remove the lazy processing, the code returns a dataframe with zero rows, as expected:

DF3 = DF2.\
  select(
    *["a", "b"], polars.lit("b").alias("b_name"),
    DF.get_column("b").alias("b_old")
  ).\
  filter(polars.col("b") == False).drop("b")

If I take the lazy code but do not process the filtering code or anything after, it runs fine:

DF3_nofilter = DF2.lazy().\
  select(
    *["a", "b"], polars.lit("b").alias("b_name"),
    DF.get_column("b").alias("b_old")
  ).collect()

DF3_nofilter is a dataframe where b_old definitely exists, so I don't know why the original code is complaining that b_old isn't in the schema, and, more surprisingly, why it asks about b_old at all when the filtering and dropping operations mention only b, not b_old.

shape: (4, 4)
┌─────┬──────┬────────┬───────┐
│ a   ┆ b    ┆ b_name ┆ b_old │
│ --- ┆ ---  ┆ ---    ┆ ---   │
│ i64 ┆ bool ┆ str    ┆ i64   │
╞═════╪══════╪════════╪═══════╡
│ 1   ┆ true ┆ b      ┆ 1     │
│ 2   ┆ true ┆ b      ┆ 6     │
│ 4   ┆ true ┆ b      ┆ 8     │
│ 4   ┆ true ┆ b      ┆ 7     │
└─────┴──────┴────────┴───────┘

Expected behavior

Output of the lazy and nonlazy processing should be the same, and the lazy processing should not throw an exception.

Installed versions

--------Version info---------
Polars:               0.20.7
Index type:           UInt32
Platform:             Windows-10-10.0.19041-SP0
Python:               3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.0
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               2023.4.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.6.2
numpy:                1.22.3
openpyxl:             <not installed>
pandas:               2.0.2
pyarrow:              11.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.45
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

cmdlineluser · 2024-02-08T22:46:32Z

Just for debugging purposes: it runs with projection_pushdown disabled.

.collect(projection_pushdown=False)
# shape: (0, 3)
# ┌─────┬────────┬───────┐
# │ a   ┆ b_name ┆ b_old │
# │ --- ┆ ---    ┆ ---   │
# │ i64 ┆ str    ┆ i64   │
# ╞═════╪════════╪═══════╡
# └─────┴────────┴───────┘

I'm not sure if they are all the same issue, but there are a few similar problems currently open:

william-chu-github added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 8, 2024

ritchie46 added P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Feb 8, 2024

ritchie46 mentioned this issue Feb 12, 2024

fix: remove literal Series from projection state #14437

Merged

ritchie46 closed this as completed in #14437 Feb 12, 2024

AndriiG13 mentioned this issue Mar 5, 2024

Add Polars Builtin Check Tests unionai-oss/pandera#1518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy processing throws error when equivalent nonlazy code does not #14382

Lazy processing throws error when equivalent nonlazy code does not #14382

william-chu-github commented Feb 8, 2024 •

edited

Loading

cmdlineluser commented Feb 8, 2024

Lazy processing throws error when equivalent nonlazy code does not #14382

Lazy processing throws error when equivalent nonlazy code does not #14382

Comments

william-chu-github commented Feb 8, 2024 • edited Loading

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

cmdlineluser commented Feb 8, 2024

william-chu-github commented Feb 8, 2024 •

edited

Loading