You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
crashes when there are no rows with colA == "foo". Without the filter everything works
Log output
In [2]: pl.scan_parquet('/tmp/demo1/inventory/**/*.parquet').filter(pl.col.hostname=="foo").select(['namespace', 'hostname', 'timestamp']).collect()
thread '<unnamed>' panicked at crates/polars-lazy/src/physical_plan/executors/scan/parquet.rs:305:37:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[2], line 1
----> 1 pl.scan_parquet('/tmp/demo1/inventory/**/*.parquet').filter(pl.col.hostname=="foo").select(['namespace', 'hostname', 'timestamp']).collect()
File ~/work/stardust/enterprise/.venv/lib/python3.11/site-packages/polars/utils/deprecation.py:100, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
95 @wraps(function)
96 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
97 _rename_keyword_argument(
98 old_name, new_name, kwargs, function.__name__, version
99 )
--> 100 return function(*args, **kwargs)
File ~/work/stardust/enterprise/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py:1788, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, _eager)
1775 comm_subplan_elim = False
1777 ldf = self._ldf.optimization_toggle(
1778 type_coercion,
1779 predicate_pushdown,
(...)
1786 _eager,
1787 )
-> 1788 returnwrap_df(ldf.collect())
PanicException: index out of bounds: the len is 0 but the index is 0
Issue description
In a deeply nested parquet folder, if I do a pl.scan_parquet on the top level directory, followed by a filter that doesn't select any rows, collect() crashes with the exception reported in the log. if the filter selects any rows or the filter is not applied, everything works.
Expected behavior
An empty dataframe being returned, not a panic exception.
I have a public repository pointer that you can use. https://github.com/netenglabs/suzieq/tree/develop/tests/data/parquet
You can use pl.scan_parquet('tests/data/parquet/inventory/**/*.parquet').filter(pl.col.namespace=="foo").select(['namespace', 'hostname', 'timestamp']).collect()
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
not a fully reproducible example
crashes when there are no rows with colA == "foo". Without the filter everything works
Log output
Issue description
In a deeply nested parquet folder, if I do a
pl.scan_parquet
on the top level directory, followed by a filter that doesn't select any rows,collect()
crashes with the exception reported in the log. if the filter selects any rows or the filter is not applied, everything works.Expected behavior
An empty dataframe being returned, not a panic exception.
Installed versions
The text was updated successfully, but these errors were encountered: