Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds panic caused by specific combination of window functions #17049

Closed
2 tasks done
postatum opened this issue Jun 18, 2024 · 2 comments
Closed
2 tasks done
Labels
A-panic Area: code that results in panic exceptions bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@postatum
Copy link

postatum commented Jun 18, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

# Create test df
df = pl.DataFrame(
    {
        "age": [[x * y for x in range(100)] for y in range(100)],
        "id": [x for x in range(100)],
        "other": [x for x in range(100)],
    }
)
# Extend df with itself to have >1 chunks
df2 = df.extend(df)

# Run operations which start the problem
df3 = (
    df2.with_columns(
        pl.col("other").over("id", mapping_strategy="join").alias("other"),
        non_null_count=pl.col("other").count().over("id"),
    )
    .filter(pl.col("id").rank(method="ordinal").over(pl.col("id")) == 1)
)

# Run command which throws an error
df3.filter(pl.col("non_null_count") > 1)

Log output

dataframe filtered
thread '<unnamed>' panicked at crates/polars-core/src/series/mod.rs:213:42:
index out of bounds: the len is 1 but the index is 1
stack backtrace:
   0:        0x1165fd22c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h5b162cab46f344a5
   1:        0x113e8d71b - core::fmt::write::h4a73583a3886d3b0
   2:        0x1165ce8e2 - std::io::Write::write_fmt::h8846f8d604484bad
   3:        0x116601731 - std::sys_common::backtrace::print::h7eceb11702f657b6
   4:        0x116600ebc - std::panicking::default_hook::{{closure}}::he179a4d2e5ce811d
   5:        0x116602e45 - std::panicking::rust_panic_with_hook::hbfe888ce2af6ee0d
   6:        0x116601a42 - std::panicking::begin_panic_handler::{{closure}}::h2461a6874e053e43
   7:        0x116601999 - std::sys_common::backtrace::__rust_end_short_backtrace::h1c49106eba8c7b96
   8:        0x116601986 - _rust_begin_unwind
   9:        0x11677f5b2 - core::panicking::panic_fmt::ha4b3f782c24c0530
  10:        0x11677f636 - core::panicking::panic_bounds_check::h92ee9ff9209a1b91
  11:        0x114b33927 - polars_core::series::Series::select_chunk::h9c252054c6300484
  12:        0x11503f68b - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::heb9c5fbe5f79530f
  13:        0x1151dc0b5 - <polars_lazy::physical_plan::executors::filter::FilterExec as polars_lazy::physical_plan::executors::executor::Executor>::execute::{{closure}}::h232f4558b6ea7fa1
  14:        0x1151dbbf6 - <polars_lazy::physical_plan::executors::filter::FilterExec as polars_lazy::physical_plan::executors::executor::Executor>::execute::h4593abdde8170cbc
  15:        0x11502d2ca - polars_lazy::frame::LazyFrame::collect::hc4dc245c44bac139
  16:        0x113d0892a - polars::lazyframe::PyLazyFrame::__pymethod_collect__::h9e06c421f3a29efa
  17:        0x11353526b - pyo3::impl_::trampoline::trampoline::h6090baf2882c9518
  18:        0x113d1ea98 - polars::lazyframe::_::__INVENTORY::trampoline::ha27f82bd914b0bf3
  19:        0x10da060f7 - _method_vectorcall_VARARGS_KEYWORDS
  20:        0x10dadb9b4 - _call_function
  21:        0x10dad779a - __PyEval_EvalFrameDefault
  22:        0x10dad21e1 - __PyEval_Vector
  23:        0x10d9ff49f - _method_vectorcall
  24:        0x10dadb9b4 - _call_function
  25:        0x10dad8581 - __PyEval_EvalFrameDefault
  26:        0x10dad21e1 - __PyEval_Vector
  27:        0x10dadb9b4 - _call_function
  28:        0x10dad779a - __PyEval_EvalFrameDefault
  29:        0x10dad21e1 - __PyEval_Vector
  30:        0x10dad2138 - _PyEval_EvalCode
  31:        0x10db1f923 - _run_mod
  32:        0x10db1da2e - __PyRun_SimpleFileObject
  33:        0x10db1d49b - __PyRun_AnyFileObject
  34:        0x10db3bf51 - _Py_RunMain
  35:        0x10db3c3a0 - _pymain_main
  36:        0x10db3c3fb - _Py_BytesMain
Traceback (most recent call last):
  File "/Users/artem/Downloads/error.py", line 28, in <module>
    df3.filter(pl.col("non_null_count") > 1)
  File "/Users/artem/.virtualenvs/proj1/lib/python3.10/site-packages/polars/dataframe/frame.py", line 4268, in filter
    return self.lazy().filter(*predicates, **constraints).collect(_eager=True)
  File "/Users/artem/.virtualenvs/proj1/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1967, in collect
    return wrap_df(ldf.collect(callback))
pyo3_runtime.PanicException: index out of bounds: the len is 1 but the index is 1

Issue description

After performing a specific combination of window functions on a dataframe, an error is thrown when filtering that dataframe by any column. Calling "DataFrame.rechunk()" before doing the last filter fixes the issue.

This started happening since version of 0.20.22. Happens on both "polars" and "polars-lts-cpu" libraries.

May be irrelevant: If you inspect "n_chunks" of columns before the last filter, you'll notice that the latest columns only have 1 chunk, while the dataframe says it has >1 chunk.

Expected behavior

The error is not thrown.

Installed versions

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             macOS-14.5-x86_64-i386-64bit
Python:               3.10.12 (main, Mar  4 2024, 12:35:22) [Clang 15.0.0 (clang-1500.0.40.1)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.23.5
openpyxl:             <not installed>
pandas:               1.5.3
pyarrow:              8.0.0
pydantic:             1.10.15
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.37
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           3.2.0
@postatum postatum added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 18, 2024
@coastalwhite coastalwhite added the A-panic Area: code that results in panic exceptions label Jun 18, 2024
@cmdlineluser
Copy link
Contributor

I think this is another form of #16830 which is fixed on main.

Perhaps you can test the latest pre-release version to confirm.

>>> pl.__version__
... df3.filter(pl.col("non_null_count") > 1)

# '1.0.0-beta.1'
# shape: (100, 4)
# ┌─────────────────┬─────┬───────────┬────────────────┐
# │ age             ┆ id  ┆ other     ┆ non_null_count │
# │ ---             ┆ --- ┆ ---       ┆ ---            │
# │ list[i64]       ┆ i64 ┆ list[i64] ┆ u32            │
# ╞═════════════════╪═════╪═══════════╪════════════════╡
# │ [0, 0, … 0]     ┆ 0   ┆ [0, 0]    ┆ 2              │
# │ [0, 1, … 99]    ┆ 1   ┆ [1, 1]    ┆ 2              │
# │ [0, 2, … 198]   ┆ 2   ┆ [2, 2]    ┆ 2              │
# │ [0, 3, … 297]   ┆ 3   ┆ [3, 3]    ┆ 2              │
# │ [0, 4, … 396]   ┆ 4   ┆ [4, 4]    ┆ 2              │
# │ …               ┆ …   ┆ …         ┆ …              │
# │ [0, 95, … 9405] ┆ 95  ┆ [95, 95]  ┆ 2              │
# │ [0, 96, … 9504] ┆ 96  ┆ [96, 96]  ┆ 2              │
# │ [0, 97, … 9603] ┆ 97  ┆ [97, 97]  ┆ 2              │
# │ [0, 98, … 9702] ┆ 98  ┆ [98, 98]  ┆ 2              │
# │ [0, 99, … 9801] ┆ 99  ┆ [99, 99]  ┆ 2              │
# └─────────────────┴─────┴───────────┴────────────────┘

@postatum
Copy link
Author

postatum commented Jun 18, 2024

@cmdlineluser I've just tried this on 1.0.0-beta.1 and the issue seems to be resolved. I swear the trace used to end in chunks.rs file so I couldn't find a similar issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-panic Area: code that results in panic exceptions bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants