Unexpected `index out of bounds` error for specific dataset and set of operations #16830

maxzw · 2024-06-09T08:48:53Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

With data of shape (872, 10):
repro_data.txt

Note: not all columns have operations performed on them, but they apparently need to be present for the error to occur!

import polars as pl

df = pl.DataFrame(
    data=data,
    schema={
        "group1": pl.Int16,
        "group2": pl.Int32,
        "val1": pl.Boolean,
        "val2": pl.Float64,
        "val3": pl.Boolean,
        "val4": pl.Float64,
        "val5": pl.Float64,
        "val6": pl.Int32,
        "val7": pl.Float64,
        "val8": pl.Float64,
    },
)

df = df.filter(pl.col("val1") | pl.col("val3"))
df = df.with_columns(pl.col("val4").max().over("group1", "group2").fill_null(0).alias("val4"))
df = df.filter(pl.col("val4") > pl.col("val7").sum().over("group1", "group2"))
df.with_columns(pl.col("val4").floor())

Log output

thread '<unnamed>' panicked at crates/polars-core/src/series/mod.rs:213:42:
index out of bounds: the len is 1 but the index is 1
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic_bounds_check
   3: polars_core::series::Series::select_chunk
   4: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   5: polars_lazy::physical_plan::executors::stack::StackExec::execute_impl
   6: <polars_lazy::physical_plan::executors::stack::StackExec as polars_lazy::physical_plan::executors::executor::Executor>::execute
   7: polars_lazy::frame::LazyFrame::collect
   8: polars::lazyframe::PyLazyFrame::__pymethod_collect__
   9: pyo3::impl_::trampoline::trampoline
  10: polars::lazyframe::_::__INVENTORY::trampoline
  11: _method_vectorcall_VARARGS_KEYWORDS
  12: _call_function
  13: __PyEval_EvalFrameDefault
  14: __PyEval_Vector
  15: _method_vectorcall
  16: _call_function
  17: __PyEval_EvalFrameDefault
  18: __PyEval_Vector
  19: _call_function
  20: __PyEval_EvalFrameDefault
  21: __PyEval_Vector
  22: _builtin_exec
  23: _cfunction_vectorcall_FASTCALL
  24: _call_function
  25: __PyEval_EvalFrameDefault
  26: _gen_send_ex2
  27: __PyEval_EvalFrameDefault
  28: _gen_send_ex2
  29: __PyEval_EvalFrameDefault
  30: _gen_send_ex2
  31: _gen_send
  32: _method_vectorcall_O
  33: _call_function
  34: __PyEval_EvalFrameDefault
  35: __PyEval_Vector
  36: _call_function
  37: __PyEval_EvalFrameDefault
  38: __PyEval_Vector
  39: _call_function
  40: __PyEval_EvalFrameDefault
  41: __PyEval_Vector
  42: _method_vectorcall
  43: _PyVectorcall_Call
  44: __PyEval_EvalFrameDefault
  45: __PyEval_Vector
  46: _method_vectorcall
  47: _call_function
  48: __PyEval_EvalFrameDefault
  49: _gen_send_ex2
  50: __PyEval_EvalFrameDefault
  51: _gen_send_ex2
  52: __PyEval_EvalFrameDefault
  53: _gen_send_ex2
  54: __PyEval_EvalFrameDefault
  55: _gen_send_ex2
  56: __PyEval_EvalFrameDefault
  57: _gen_send_ex2
  58: __PyEval_EvalFrameDefault
  59: _gen_send_ex2
  60: _task_step_impl
  61: _task_step
  62: _task_wakeup
  63: _cfunction_vectorcall_O
  64: __PyObject_VectorcallTstate.4665
  65: _context_run
  66: _cfunction_vectorcall_FASTCALL_KEYWORDS
  67: __PyEval_EvalFrameDefault
  68: __PyEval_Vector
  69: _call_function
  70: __PyEval_EvalFrameDefault
  71: __PyEval_Vector
  72: _call_function
  73: __PyEval_EvalFrameDefault
  74: __PyEval_Vector
  75: _call_function
  76: __PyEval_EvalFrameDefault
  77: __PyEval_Vector
  78: _call_function
  79: __PyEval_EvalFrameDefault
  80: __PyEval_Vector
  81: _call_function
  82: __PyEval_EvalFrameDefault
  83: __PyEval_Vector
  84: _method_vectorcall
  85: _call_function
  86: __PyEval_EvalFrameDefault
  87: __PyEval_Vector
  88: _builtin_exec
  89: _cfunction_vectorcall_FASTCALL
  90: _call_function
  91: __PyEval_EvalFrameDefault
  92: __PyEval_Vector
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Issue description

Operations result in unexpected error.

Casting to pandas and back anywhere within these operations resolves the issue:

df = df.filter(pl.col("val1") | pl.col("val3"))

df = pl.from_pandas(df.to_pandas(), schema_overrides=df.schema)

df = df.with_columns(pl.col("val4").max().over("group1", "group2").fill_null(0).alias("val4"))
df = df.filter(pl.col("val4") > pl.col("val7").sum().over("group1", "group2"))
df.with_columns(pl.col("val4").floor())

Expected behavior

I expect this error not to occur.

Installed versions

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             macOS-14.5-arm64-arm-64bit
Python:               3.10.0 (default, Mar  3 2022, 03:54:28) [Clang 12.0.0 ]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            0.17.4
fastexcel:            <not installed>
fsspec:               2023.9.0
gevent:               <not installed>
hvplot:               0.9.2
matplotlib:           3.7.2
nest_asyncio:         1.6.0
numpy:                1.26.2
openpyxl:             <not installed>
pandas:               2.1.2
pyarrow:              14.0.1
pydantic:             2.7.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.20
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

maxzw · 2024-06-09T08:50:51Z

@ritchie46 here is a repro for #16605

cmdlineluser · 2024-06-09T10:04:55Z

Can reproduce.

If it is of use for debugging: It does not seem to happen using the Lazy API.

df = df.lazy()
df = df.filter(pl.col("val1") | pl.col("val3"))
df = df.with_columns(pl.col("val4").max().over("group1", "group2").fill_null(0).alias("val4"))
df = df.filter(pl.col("val4") > pl.col("val7").sum().over("group1", "group2"))
df.with_columns(pl.col("val4").floor()).collect()

# shape: (9, 10)
# ┌────────┬────────┬──────┬──────┬───┬───────┬──────┬──────────┬───────────┐
# │ group1 ┆ group2 ┆ val1 ┆ val2 ┆ … ┆ val5  ┆ val6 ┆ val7     ┆ val8      │
# │ ---    ┆ ---    ┆ ---  ┆ ---  ┆   ┆ ---   ┆ ---  ┆ ---      ┆ ---       │
# │ i64    ┆ i64    ┆ bool ┆ f64  ┆   ┆ f64   ┆ i64  ┆ f64      ┆ f64       │
# ╞════════╪════════╪══════╪══════╪═══╪═══════╪══════╪══════════╪═══════════╡
# │ 1001   ┆ 100004 ┆ true ┆ null ┆ … ┆ 87.0  ┆ 0    ┆ 2.705119 ┆ 40.904418 │
# │ 1001   ┆ 100007 ┆ true ┆ null ┆ … ┆ 173.0 ┆ 0    ┆ 2.6165   ┆ 34.486    │
# │ 1001   ┆ 100009 ┆ true ┆ null ┆ … ┆ 211.0 ┆ 0    ┆ 4.458603 ┆ 77.95037  │
# │ 1001   ┆ 100010 ┆ true ┆ null ┆ … ┆ 178.0 ┆ 0    ┆ 2.3165   ┆ 37.77     │
# │ 1001   ┆ 100011 ┆ true ┆ null ┆ … ┆ 174.0 ┆ 0    ┆ 5.548593 ┆ 71.207139 │
# │ 1001   ┆ 100012 ┆ true ┆ null ┆ … ┆ 196.0 ┆ 0    ┆ 2.1685   ┆ 32.888    │
# │ 1001   ┆ 100015 ┆ true ┆ null ┆ … ┆ 89.0  ┆ 0    ┆ 2.400406 ┆ 39.732588 │
# │ 1003   ┆ 100008 ┆ true ┆ null ┆ … ┆ 238.0 ┆ 0    ┆ 4.913397 ┆ 93.076396 │
# │ 1003   ┆ 100013 ┆ true ┆ null ┆ … ┆ 101.5 ┆ 0    ┆ 2.254043 ┆ 45.486928 │
# └────────┴────────┴──────┴──────┴───┴───────┴──────┴──────────┴───────────┘

stinodego · 2024-06-09T20:43:49Z

I cannot reproduce this 🤔

Elvynzs · 2024-06-10T08:42:28Z

Surprisingly I cannot reproduce using the given data/code, however I have the same issue.

I will try to find the time to make a minimal repro code for my case.

Elvynzs · 2024-06-10T10:11:31Z

Here it is, I was able to cut out a lot of the initial code :

import polars as pl
import numpy as np

df = pl.DataFrame({"index_1":np.repeat(np.arange(100), 10), "index_2":np.repeat(np.arange(100), 10)})
df = pl.concat([df[0:500], df[500:]])
df = df.filter(df["index_1"] == 0)
df = df.with_columns(index_2 = pl.Series(values=[0]*10))
df.set_sorted("index_2") #Also crashes on write_parquet and some other operations

It crashes for me (Windows 11).

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[9], line 8
      6 df = df.filter(df["index_1"] == 0)
      7 df = df.with_columns(index_2 = pl.Series(values=[0]*10))
----> 8 df.set_sorted("index_2")

File C:\...\polars\dataframe\frame.py:10674, in DataFrame.set_sorted(self, column, descending, *more_columns)
  10653 def set_sorted(
  10654     self,
  10655     column: str | Iterable[str],
  10656     *more_columns: str,
  10657     descending: bool = False,
  10658 ) -> DataFrame:
  10659     """
  10660     Indicate that one or multiple columns are sorted.
  10661 
   (...)
  10669         Whether the columns are sorted in descending order.
  10670     """
  10671     return (
  10672         self.lazy()
  10673         .set_sorted(column, *more_columns, descending=descending)
> 10674         .collect(_eager=True)
  10675     )

File C:\...\polars\lazyframe\frame.py:1967, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   1964 # Only for testing purposes atm.
   1965 callback = _kwargs.get("post_opt_callback")
-> 1967 return wrap_df(ldf.collect(callback))

PanicException: index out of bounds: the len is 1 but the index is 1

stinodego · 2024-06-10T10:23:51Z

That one I can reproduce, thanks!

ritchie46 · 2024-06-10T10:25:17Z

Taking a look.

cmdlineluser · 2024-06-10T10:26:50Z

Perhaps the original issue could be platform specific? I can reproduce it on macOS (same as @maxzw).

@Elvynzs I can reproduce your example also.

It seems it may be a little different, and have to do with your use of Series.

Changing the filter to use expressions makes the example run for me:

df.filter(pl.col("index_1") == 0)

maxzw · 2024-06-10T15:18:39Z

@Elvynzs I'm not sure your issue is equal to the one in the description, but I'll check if the fix also works for mine 😃

cmdlineluser · 2024-06-10T17:49:15Z

The original issue no longer reproduces for me thanks to #16852

maxzw · 2024-06-11T11:28:36Z

I can confirm as well! Thanks @ritchie46! 💯

maxzw added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 9, 2024

maxzw mentioned this issue Jun 9, 2024

Random encounters of pyo3_runtime.PanicException: index out of bounds after version 0.20.20 #16605

Closed

2 tasks

stinodego added the A-panic Area: code that results in panic exceptions label Jun 9, 2024

stinodego added P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels Jun 10, 2024

ritchie46 self-assigned this Jun 10, 2024

ritchie46 mentioned this issue Jun 10, 2024

fix: Fix should_rechunk check #16852

Merged

ritchie46 closed this as completed in #16852 Jun 10, 2024

c-peters added the accepted Ready for implementation label Jun 16, 2024

This was referenced Jun 18, 2024

Index out of bounds for unequal number of chunks across columns #17036

Closed

Index out of bounds panic caused by specific combination of window functions #17049

Closed

HCelion mentioned this issue Jun 20, 2024

Platform Dependent pyo3_runtime.PanicException #17089

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected `index out of bounds` error for specific dataset and set of operations #16830

Unexpected `index out of bounds` error for specific dataset and set of operations #16830

maxzw commented Jun 9, 2024 •

edited

Loading

maxzw commented Jun 9, 2024 •

edited

Loading

cmdlineluser commented Jun 9, 2024

stinodego commented Jun 9, 2024

Elvynzs commented Jun 10, 2024

Elvynzs commented Jun 10, 2024 •

edited

Loading

stinodego commented Jun 10, 2024

ritchie46 commented Jun 10, 2024

cmdlineluser commented Jun 10, 2024

maxzw commented Jun 10, 2024

cmdlineluser commented Jun 10, 2024 •

edited

Loading

maxzw commented Jun 11, 2024

Unexpected index out of bounds error for specific dataset and set of operations #16830

Unexpected index out of bounds error for specific dataset and set of operations #16830

Comments

maxzw commented Jun 9, 2024 • edited Loading

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

maxzw commented Jun 9, 2024 • edited Loading

cmdlineluser commented Jun 9, 2024

stinodego commented Jun 9, 2024

Elvynzs commented Jun 10, 2024

Elvynzs commented Jun 10, 2024 • edited Loading

stinodego commented Jun 10, 2024

ritchie46 commented Jun 10, 2024

cmdlineluser commented Jun 10, 2024

maxzw commented Jun 10, 2024

cmdlineluser commented Jun 10, 2024 • edited Loading

maxzw commented Jun 11, 2024

Unexpected `index out of bounds` error for specific dataset and set of operations #16830

Unexpected `index out of bounds` error for specific dataset and set of operations #16830

maxzw commented Jun 9, 2024 •

edited

Loading

maxzw commented Jun 9, 2024 •

edited

Loading

Elvynzs commented Jun 10, 2024 •

edited

Loading

cmdlineluser commented Jun 10, 2024 •

edited

Loading