Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_parquet segfault on main #14714

Closed
2 tasks done
cmdlineluser opened this issue Feb 27, 2024 · 1 comment · Fixed by #14724
Closed
2 tasks done

scan_parquet segfault on main #14714

cmdlineluser opened this issue Feb 27, 2024 · 1 comment · Fixed by #14724
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@cmdlineluser
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

pl.scan_parquet("foo.parquet").collect()

Log output

thread 'thread '<unnamed>' panicked at /Users/user/git/polars/crates/polars-utils/src/slice.rs:<unnamed>' panicked at /Users/user/git/polars/crates/polars-utils/src/slice.rs:92:18:
range end index 37 out of range for slice of length 16
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
92:18:
range end index 178 out of range for slice of length 16
thread '<unnamed>' panicked at /Users/user/git/polars/crates/polars-utils/src/slice.rs:92:18:
range end index 124 out of range for slice of length 16

Issue description

The full dataset is from: https://github.com/pypi-data/data/releases/download/2024-02-22-03-05/index-15.parquet (1.3GB)

Reading that I get either a compute utf8 error or a segfault. (non-deterministic)

import polars as pl

pl.scan_parquet("index-15.parquet").head(100_000).collect()
# utf8 error or segfault

Using 0.20.10 I extracted a single row which is attached: foo.parquet.zip in an attempt for a minimal repro.

It produces range index errors and a thread panic.

Expected behavior

Read file without error.

Installed versions

--------Version info---------
Polars:               0.20.11
Index type:           UInt32
Platform:             macOS-13.6.1-arm64-arm-64bit
Python:               3.11.6 (main, Nov  2 2023, 04:39:43) [Clang 14.0.3 (clang-1403.0.22.14.1)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           0.3.2
deltalake:            <not installed>
fsspec:               2023.6.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.2
openpyxl:             <not installed>
pandas:               2.0.3
pyarrow:              12.0.1
pydantic:             2.5.2
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             0.8.1
xlsxwriter:           3.1.9
@cmdlineluser cmdlineluser added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 27, 2024
@ritchie46
Copy link
Member

Found the culprit.

@c-peters c-peters added the accepted Ready for implementation label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants