Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect ColumnNotFound panic, which occurs only for LazyFrames #16474

Closed
2 tasks done
kevinli1993 opened this issue May 24, 2024 · 2 comments
Closed
2 tasks done

Incorrect ColumnNotFound panic, which occurs only for LazyFrames #16474

kevinli1993 opened this issue May 24, 2024 · 2 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@kevinli1993
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

ds = pl.DataFrame(dict(
    ident = ['X', 'D', 'B', 'D', 'A', 'V', 'M', 'M', 'W', 'C'],
    x = [14.55, 4.79, -203.57, 151.92, 1.02, -106.23, -4.01, -39.06, -15.31, 195.12],
    y = [72.64, 42.4, -174.43, -213.49, -132.56, 4.14, 44.36, -120.12, -69.31, -182.67],
    z = [97.54, 96.53, 18.0, 16.93, 141.23, -125.27, 69.95, 194.77, 37.53, 43.02],
    w = [0, 0, 0, 0, 1, 0, 1, 1, 1, 0]
))

ds = ds.lazy()

ds = (
    ds
    .with_columns(pl.col("x").truediv(100).alias("XX"))
    .with_columns((pl.col("y") - pl.col("XX")).abs().alias("YY"))
    .with_columns((pl.col("y") - pl.col("z")).abs().alias("ZZ"))
)

filtering_ds = (
    ds
    .filter(pl.col("w").eq(0))
    .select(pl.col("ident"), pl.lit(True).alias("found"))
    .unique()
)

# This will crash!
ds.join(filtering_ds, on="ident", how="left").collect()


# This will work.
A = ds.collect()
B = filtering_ds.collect()
A.join(B, on="ident", how="left")

Log output

found multiple sources; run comm_subplan_elim
thread '<unnamed>' panicked at crates/polars-plan/src/utils.rs:371:79:
called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("ZZ"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "..../asdf.py", line 28, in <module>
    ds.join(filtering_ds, on="ident", how="left").collect()
  File "..../python3.12/site-packages/polars/lazyframe/frame.py", line 1837, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("ZZ"))



--- here is the result with RUST_BACKTRACE=full ---
thread '<unnamed>' panicked at crates/polars-plan/src/utils.rs:371:79:
called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("ZZ"))
stack backtrace:
   0:        0x10ca9524c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0d2dc83dc1a2d180
   1:        0x10ab380d8 - core::fmt::write::hafaf83c1f13acb2d
   2:        0x10ca6e794 - std::io::Write::write_fmt::h99a5e7783791b568
   3:        0x10ca98a40 - std::sys_common::backtrace::print::hdef7d6c5b6479962
   4:        0x10ca98358 - std::panicking::default_hook::{{closure}}::hd93f15c5fd187f2b
   5:        0x10ca99cf4 - std::panicking::rust_panic_with_hook::h5ec40fc780130b7e
   6:        0x10ca98d58 - std::panicking::begin_panic_handler::{{closure}}::hf0062a47097d69cb
   7:        0x10ca98cc0 - std::sys_common::backtrace::__rust_end_short_backtrace::hac3ff190b7c839f5
   8:        0x10ca98cb4 - _rust_begin_unwind
   9:        0x10cbf2514 - core::panicking::panic_fmt::h7ad8ed088f78f191
  10:        0x10cbf2864 - core::result::unwrap_failed::hd6ce9f195c9c6386
  11:        0x10c661a7c - polars_plan::utils::expr_irs_to_schema::hed11efd314d0ed7f
  12:        0x10c6db67c - polars_plan::logical_plan::builder_ir::IRBuilder::project::h41ce0860157111a9
  13:        0x10c70b12c - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::finish_node::h9fa78021255f2866
  14:        0x10c7031a8 - polars_plan::logical_plan::optimizer::projection_pushdown::projection::process_projection::he5aff3a468de5017
  15:        0x10c6fd8e4 - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::push_down::{{closure}}::h1321b269310b0be8
  16:        0x10c6fbda4 - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::push_down::h6d28299197aa65a5
  17:        0x10c7047ac - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::pushdown_and_assign::h27dc35079bad0302
  18:        0x10c703184 - polars_plan::logical_plan::optimizer::projection_pushdown::projection::process_projection::he5aff3a468de5017
  19:        0x10c6fd8e4 - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::push_down::{{closure}}::h1321b269310b0be8
  20:        0x10c6fbda4 - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::push_down::h6d28299197aa65a5
  21:        0x10c6fbbac - polars_plan::logical_plan::optimizer::projection_pushdown::ProjectionPushDown::optimize::h8b4880f6a562e96e
  22:        0x10c716e14 - polars_plan::logical_plan::optimizer::cache_states::set_cache_states::hdb939b11449a5a9d
  23:        0x10c7124dc - polars_plan::logical_plan::optimizer::optimize::h215c5fd39715abf6
  24:        0x10b8fe030 - polars_lazy::frame::LazyFrame::optimize_with_scratch::h81c39270ae1a2a85
  25:        0x10b976b44 - polars_lazy::frame::LazyFrame::collect::hf6a4525f2adaf417
  26:        0x10aa25594 - polars::lazyframe::PyLazyFrame::__pymethod_collect__::h9775d9deb0a33ddb
  27:        0x10a431ba8 - pyo3::impl_::trampoline::trampoline::h18f3c6ddaad7db28
  28:        0x10aa37298 - polars::lazyframe::_::__INVENTORY::trampoline::h753d29a406a94190
  29:        0x1055d4ffc - _method_vectorcall_VARARGS_KEYWORDS.llvm.4634403487862285446
  30:        0x10552638c - __PyEval_EvalFrameDefault
  31:        0x1055b1a1c - _PyEval_EvalCode
  32:        0x1055fc7fc - _run_eval_code_obj.llvm.17966673262612444252
  33:        0x1054ad79c - _run_mod.llvm.17966673262612444252
  34:        0x10554d99c - _pyrun_file
  35:        0x1054e26ac - __PyRun_SimpleFileObject
  36:        0x1055f8ee0 - __PyRun_AnyFileObject
  37:        0x1054ee5a4 - _pymain_run_file_obj
  38:        0x105560e30 - _pymain_run_file
  39:        0x105624108 - _Py_RunMain
  40:        0x1054ee328 - _Py_BytesMain

Issue description

Joining two frames results in an error related to the column "ZZ" not being found. But if I .collect() both frames, then the query works, and it is evident that "ZZ" does exist in the first frame.

Expected behavior

ds.join(filtering_ds, on="ident", how="left").collect()

does not crash and gives the same (up to ordering of rows) result as

A = ds.collect()
B = filtering_ds.collect()
A.join(B, on="ident", how="left")

Installed versions

--------Version info---------
Polars:               0.20.29
Index type:           UInt32
Platform:             macOS-14.4.1-arm64-arm-64bit
Python:               3.12.3 (main, Apr 12 2024, 17:16:04) [Clang 15.0.0 (clang-1500.1.0.2.5)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.5.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              16.1.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@kevinli1993 kevinli1993 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 24, 2024
@cmdlineluser
Copy link
Contributor

@kevinli1993
Copy link
Author

Thanks @cmdlineluser (I'm very impressed, btw!). I can confirm this panic does not occur on a build from main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants