Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shift_and_fill on list column with over raises PanicException #4310

Closed
cbilot opened this issue Aug 8, 2022 · 2 comments · Fixed by #4314 or #4390
Closed

shift_and_fill on list column with over raises PanicException #4310

cbilot opened this issue Aug 8, 2022 · 2 comments · Fixed by #4314 or #4390
Assignees
Labels
bug Something isn't working

Comments

@cbilot
Copy link

cbilot commented Aug 8, 2022

What language are you using?

Python

Have you tried latest version of polars?

yes

What version of polars are you using?

0.13.62

What operating system are you using polars on?

Linux Mint 20.3

What language version are you using

python 3.10.4

Describe your bug.

Using shift_and_fill on a column of type list along with over causes a PanicException.

I've also included a MWE of combining shift and over on a list column that can cause a seg fault. (I'm guessing that this is related.)

What are the steps to reproduce the behavior?

Let's start with this data:

import polars as pl

df = pl.DataFrame(
    {
        "col_int": [1, 1, 2, 2],
        "col_list": [[1], [1], [2], [2]],
    }
)
df
shape: (4, 2)
┌─────────┬───────────┐
│ col_int ┆ col_list  │
│ ---     ┆ ---       │
│ i64     ┆ list[i64] │
╞═════════╪═══════════╡
│ 1       ┆ [1]       │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       │
└─────────┴───────────┘

Running this query raises a PanicException.

(
    df
    .with_column(
        pl.col('col_list').shift_and_fill(1, [])
        .over('col_int')
        .alias('list_shifted')
    )
)
thread '<unnamed>' panicked at 'implementation error, cannot get ref List(Null) from Int64', /github/workspace/polars/polars-core/src/series/mod.rs:945:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 4144, in with_column
    return self.with_columns([column])
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 5308, in with_columns
    self.lazy()
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/lazy_frame.py", line 652, in collect
    return self._dataframe_class._from_pydf(ldf.collect())
pyo3_runtime.PanicException: implementation error, cannot get ref List(Null) from Int64

If we use a shift_and_fill without using over, we get no error.

(
    df
    .with_column(
        pl.col('col_list').shift_and_fill(1, [])
        .alias('list_shifted')
    )
)
shape: (4, 3)
┌─────────┬───────────┬──────────────┐
│ col_int ┆ col_list  ┆ list_shifted │
│ ---     ┆ ---       ┆ ---          │
│ i64     ┆ list[i64] ┆ list[i64]    │
╞═════════╪═══════════╪══════════════╡
│ 1       ┆ [1]       ┆ []           │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [2]          │
└─────────┴───────────┴──────────────┘

Different Fill Values

Trying different fill values in the shift_and_fill still leads to errors.

>>> (
...     df
...     .with_column(
...         pl.col('col_list').shift_and_fill(1, None)
...         .over('col_int')
...         .alias('list_shifted')
...     )
... 
... )
thread '<unnamed>' panicked at 'implementation error, cannot get ref List(Null) from Int64', /github/workspace/polars/polars-core/src/series/mod.rs:945:13
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 4144, in with_column
    return self.with_columns([column])
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 5308, in with_columns
    self.lazy()
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/lazy_frame.py", line 652, in collect
    return self._dataframe_class._from_pydf(ldf.collect())
pyo3_runtime.PanicException: implementation error, cannot get ref List(Null) from Int64

>>> (
...     df
...     .with_column(
...         pl.col('col_list').shift_and_fill(1, [0])
...         .over('col_int')
...         .alias('list_shifted')
...     )
... 
... )
thread '<unnamed>' panicked at 'implementation error, cannot get ref List(Null) from Int64', /github/workspace/polars/polars-core/src/series/mod.rs:945:13
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 4144, in with_column
    return self.with_columns([column])
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 5308, in with_columns
    self.lazy()
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/lazy_frame.py", line 652, in collect
    return self._dataframe_class._from_pydf(ldf.collect())
pyo3_runtime.PanicException: implementation error, cannot get ref List(Null) from Int64

shift can cause a seg fault

This query using shift instead of shift_and_fill does not lead to an error.

(
    df
    .with_column(
        pl.col('col_list').shift(1)
        .over('col_int')
        .alias('list_shifted')
    )
)
shape: (4, 3)
┌─────────┬───────────┬──────────────┐
│ col_int ┆ col_list  ┆ list_shifted │
│ ---     ┆ ---       ┆ ---          │
│ i64     ┆ list[i64] ┆ list[i64]    │
╞═════════╪═══════════╪══════════════╡
│ 1       ┆ [1]       ┆ null         │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ null         │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [2]          │
└─────────┴───────────┴──────────────┘

However, a query like the following - which includes a sort and a comparison between lists columns - can sometimes cause a seg fault. (Sometimes, it just leads to an error.)

(
    df
    .sort('col_int')
    .with_column(
        (pl.col('col_list').shift(1) != pl.col('col_list'))
        .over('col_int')
        .alias('list_shifted')
    )
)
>>> (
...     df
...     .sort('col_int')
...     .with_column(
...         (pl.col('col_list').shift(1) != pl.col('col_list'))
...         .over('col_int')
...         .alias('list_shifted')
...     )
... 
... )
Fatal Python error: Segmentation fault

Current thread 0x00007f41bcc48740 (most recent call first):
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/lazy_frame.py", line 652 in collect
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 5308 in with_columns
  File "/home/corey/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 4144 in with_column
  File "<stdin>", line 4 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pyarrow._hdfsio, pyarrow._compute, pyarrow._parquet, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.strptime, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, pyarrow._csv, pyarrow._feather (total: 64)
Segmentation fault (core dumped)

Other Notes

The above is based on a Stack Overflow question.

@cbilot cbilot added the bug Something isn't working label Aug 8, 2022
@cbilot
Copy link
Author

cbilot commented Aug 9, 2022

Unfortunately, the fix in 0.14.0 has introduced another bug. Using some of the examples above, when a column of type list is shifted using over, we now get a list of lists.

(
    df
    .with_column(
        pl.col('col_list').shift(1)
        .over('col_int')
        .alias('list_shifted')
    )
)
shape: (4, 3)
┌─────────┬───────────┬─────────────────┐
│ col_int ┆ col_list  ┆ list_shifted    │
│ ---     ┆ ---       ┆ ---             │
│ i64     ┆ list[i64] ┆ list[list[i64]] │
╞═════════╪═══════════╪═════════════════╡
│ 1       ┆ [1]       ┆ [null, [1]]     │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       ┆ [null, [1]]     │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [null, [2]]     │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [null, [2]]     │
└─────────┴───────────┴─────────────────┘
(
    df
    .with_column(
        pl.col('col_list').shift_and_fill(1, [])
        .over('col_int')
        .alias('list_shifted')
    )
)
shape: (4, 3)
┌─────────┬───────────┬─────────────────┐
│ col_int ┆ col_list  ┆ list_shifted    │
│ ---     ┆ ---       ┆ ---             │
│ i64     ┆ list[i64] ┆ list[list[i64]] │
╞═════════╪═══════════╪═════════════════╡
│ 1       ┆ [1]       ┆ [[], [1]]       │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       ┆ [[], [1]]       │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [[], [2]]       │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [[], [2]]       │
└─────────┴───────────┴─────────────────┘
>>> pl.__version__
'0.14.0'

Edit

I should have noted that shift and shift_and_fill continue to work correctly without the over.

(
    df
    .with_column(
        pl.col('col_list').shift(1)
        .alias('list_shifted')
    )
)
shape: (4, 3)
┌─────────┬───────────┬──────────────┐
│ col_int ┆ col_list  ┆ list_shifted │
│ ---     ┆ ---       ┆ ---          │
│ i64     ┆ list[i64] ┆ list[i64]    │
╞═════════╪═══════════╪══════════════╡
│ 1       ┆ [1]       ┆ null         │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [2]          │
└─────────┴───────────┴──────────────┘
(
    df
    .with_column(
        pl.col('col_list').shift_and_fill(1, [])
        .alias('list_shifted')
    )
)
shape: (4, 3)
┌─────────┬───────────┬──────────────┐
│ col_int ┆ col_list  ┆ list_shifted │
│ ---     ┆ ---       ┆ ---          │
│ i64     ┆ list[i64] ┆ list[i64]    │
╞═════════╪═══════════╪══════════════╡
│ 1       ┆ [1]       ┆ []           │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1       ┆ [1]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [1]          │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2       ┆ [2]       ┆ [2]          │
└─────────┴───────────┴──────────────┘

I discovered this as I was about to update the Stack Overflow question.

@cbilot
Copy link
Author

cbilot commented Aug 14, 2022

For documentation: Stack Overflow updated to indicate that this is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants