You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
When reading a parquet dataframe lazily and selecting some transformed Float32 column, the data type gets incorrectly reported as being Float64. After collecting the result however, it's correctly reported as being Float32 again.
Reproducible example
importpolarsasplimportnumpyasnpdf=pl.DataFrame({
"x": np.array([1,1,1], dtype="int32"),
"y": np.array([1,2,3], dtype="float32"),
}).write_parquet("test.parquet")
df=pl.read_parquet("test.parquet").lazy()
print(df.schema)
# {'x': Int32, 'y': Float32}print(df.select(pl.col("y")).schema)
# {'y': Float32}# THIS FAILS! Data type should be reported as Float32print(df.select(-pl.col("y")).schema)
# {'literal': Float64}# After collecting the dataset, the type is correctprint(df.select(-pl.col("y")).collect().schema)
# {'literal': Float32}
This has nothing to do with Parquet. A more minimal example:
importpolarsasplfrompolars.testingimportassert_frame_equallf=pl.LazyFrame({"a": [1.0, 2.0]}, schema={"a": pl.Float32})
result=lf.select(-pl.col("a"))
expected=pl.LazyFrame({"a": [-1.0, -2.0]}, schema={"a": pl.Float32})
assert_frame_equal(result, expected)
# AssertionError: LazyFrames are different (dtypes do not match)# [left]: {'a': Float64}# [right]: {'a': Float32}
There are two issues here:
We calculate the negation by doing pl.lit(0) - expr, which casts to the supertype of expr and Int32. We should implement negation directly to avoid this.
The supertype in this case is Float64. So the schema is correct, but the result of the minus operation is wrong. This should be fixed.
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
When reading a parquet dataframe lazily and selecting some transformed Float32 column, the data type gets incorrectly reported as being Float64. After collecting the result however, it's correctly reported as being Float32 again.
Reproducible example
Expected behavior
Installed versions
The text was updated successfully, but these errors were encountered: