-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LazyFrame drop complains other existing column is not present incorrectly #12722
Comments
Just to confirm, this also happens on It seems to trigger when It's appears to be using the original schema from df = pl.select(A = pl.struct(B = pl.lit('C'))).with_row_count()
df2 = df.lazy().select(pl.col('A').struct['B'])
print(df2.schema)
# OrderedDict([('B', Utf8)])
df2.drop('aaa').collect()
# ComputeError: column 'B' not available in schema Schema:
# name: row_nr, data type: UInt32
# name: A, data type: Struct([Field { name: "B", dtype: Utf8 }])
#
# ^^^ this is `df.schema` It seems to be related to (df2.drop('Operation_Type')
.collect(projection_pushdown=False)
)
# shape: (1, 5)
# ┌────────────────────────────┬────────────────────────────┬──────────────┬───────────────────────────────────┬───────────┐
# │ ntpTime ┆ serviceTime ┆ hostname ┆ GUFI ┆ Flight_Id │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ datetime[μs] ┆ datetime[μs] ┆ str ┆ str ┆ i64 │
# ╞════════════════════════════╪════════════════════════════╪══════════════╪═══════════════════════════════════╪═══════════╡
# │ 2023-10-23 12:35:23.769956 ┆ 2023-10-23 12:35:23.769956 ┆ vc-fs06-tduc ┆ 00000000-0000-4000-A000-00000000… ┆ 9001 │
# └────────────────────────────┴────────────────────────────┴──────────────┴───────────────────────────────────┴───────────┘ |
I got a similar issue when using When I collect the resultant df, the struct fields exist, but if I try to select the struct field while in a lazy context and the collect afterwards, it doesn't seem to work. |
As someone who encountered a similar issue, I've managed to simplify it down to: example = pl.DataFrame({"f": {"a": [0, 1, 2]}})
(example.lazy()
.select(pl.col("f").struct.field("a"))
.select(pl.col("a"))
).collect() Results in: ComputeError: column 'a' not available in schema Schema:
name: f, data type: Struct([Field { name: "a", dtype: Int64 }]) A very simple solution I've found is to explicitly alias the extracted field: example = pl.DataFrame({"f": {"a": [0, 1, 2]}})
(example.lazy()
.select(pl.col("f").struct.field("a").alias("a"))
.select(pl.col("a"))
).collect() This works as expected. |
Hi @stinodego, I believe this issue can be closed. None of the three reproducible examples raise anymore. |
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Log output
Issue description
This seems like the same error as reported in #1644, which was fixed in an earlier build if this is the same issue
Expected behavior
This works if I remove the
.lazy_frame
and the.fetch()
.Installed versions
The text was updated successfully, but these errors were encountered: