-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Left Join Fails in Recent Version #14863
Comments
Thank you for the issue report. Can you make a rminimal eproducable example? Remove all code that doesn't influence and ideally reproduce from an in-memory table. If not, provide the code that generates the file. Currently a file is missing and I cannot reproduce. |
I'll work on putting an example together that will compile for you. |
parquet_files.tgz |
Seems to be happening because Minimal repro: import polars as pl
pl.LazyFrame(schema=["a", "b", "c"]).cast(pl.String).sink_parquet("empty.parquet")
pl.LazyFrame({"a": ["uno"]}).join(
pl.scan_parquet("empty.parquet"),
on="a"
).collect()
# shape: (0, 3)
# ┌─────┬─────┬─────┐
# │ a ┆ b ┆ c │
# │ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str │
# ╞═════╪═════╪═════╡
# └─────┴─────┴─────┘
pl.LazyFrame({"a": ["uno"]}).join(
pl.scan_parquet("empty.parquet"),
on="a"
).collect(streaming=True)
# PanicException: called `Option::unwrap()` on a `None` value |
Thanks @cmdlineluser! |
I would like to take on this please |
@ritchie46 @tzehaoo - Have either one of you had a chance to look deeper into this? Is it something to do with the updated parquet file handling library that was introduced? |
Checks
Reproducible example
Log output
Issue description
The above code works flawlessly in Polars v0.34.2. When working to upgrade our Polars version to the recent v0.38.1 version I noticed this issue. So I went back to upgrade a version at a time to see when this issue got introduced. It looks like this got introduced in v0.35.4 when changes were made to joining. Particularly, this left join is what is causing the issue that uses multiple columns...
if I comment that line out it will sink appropriately. One thing to note is the parquet that is being left joined is empty most of the time or has only a few rows in it.
Expected behavior
Would expect this work correctly as in v0.34.2.
Installed versions
features = ["lazy", "streaming", "csv", "polars-io", "parquet", "performant", "strings", "concat_str", "ipc"]
The text was updated successfully, but these errors were encountered: