Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent implicit type casting when using max_horizontal and mean_horizontal #15951

Closed
2 tasks done
NicolasMuellerQC opened this issue Apr 29, 2024 · 2 comments
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@NicolasMuellerQC
Copy link

NicolasMuellerQC commented Apr 29, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame(data=dict(x=[1, 2, 3]))
print(
    df.with_columns(
        (1 - 0.5 * pl.col("x")).alias("y"),
        pl.max_horizontal(0, 1 - 0.5 * pl.col("x")).alias("z"),
        pl.max_horizontal(0.0, 1 - 0.5 * pl.col("x")).alias("w"),
        pl.max_horizontal(0, 1.0 - 0.5 * pl.col("x")).alias("r"),
    )
)

# Output:
# shape: (3, 5)
# ┌─────┬──────┬─────┬─────┬─────┐
# │ x   ┆ y    ┆ z   ┆ w   ┆ r   │
# │ --- ┆ ---  ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ f64  ┆ i32 ┆ f64 ┆ f64 │
# ╞═════╪══════╪═════╪═════╪═════╡
# │ 1   ┆ 0.5  ┆ 1   ┆ 1.0 ┆ 0.5 │
# │ 2   ┆ 0.0  ┆ 0   ┆ 0.0 ┆ 0.0 │
# │ 3   ┆ -0.5 ┆ 0   ┆ 0.0 ┆ 0.0 │
# └─────┴──────┴─────┴─────┴─────┘


print(
    df.with_columns(
        (1 - 0.5 * pl.col("x")).alias("y"),
        pl.mean_horizontal(0, 1 - 0.5 * pl.col("x")).alias("z"),
        pl.mean_horizontal(0.0, 1 - 0.5 * pl.col("x")).alias("w"),
        pl.mean_horizontal(0, 1.0 - 0.5 * pl.col("x")).alias("r"),
    )
)
# Output:
# shape: (3, 5)
# ┌─────┬──────┬─────┬─────┬───────┐
# │ x   ┆ y    ┆ z   ┆ w   ┆ r     │
# │ --- ┆ ---  ┆ --- ┆ --- ┆ ---   │
# │ i64 ┆ f64  ┆ f64 ┆ f64 ┆ f64   │
# ╞═════╪══════╪═════╪═════╪═══════╡
# │ 1   ┆ 0.5  ┆ 0.5 ┆ 0.5 ┆ 0.25  │
# │ 2   ┆ 0.0  ┆ 0.0 ┆ 0.0 ┆ 0.0   │
# │ 3   ┆ -0.5 ┆ 0.0 ┆ 0.0 ┆ -0.25 │
# └─────┴──────┴─────┴─────┴───────┘

Log output

No output.

Issue description

The implicit type casting when using max_horizontal is inconsistent:

  • The expression (1 - 0.5 * pl.col("x")).alias("y") on the df has type f64 and correctly calculates e.g. 1-0.5*1 = 0.5 in the first row.
  • The expression pl.max_horizontal(0, 1 - 0.5 * pl.col("x")).alias("z") has type i32 even though the second expression has type f64 (by point one). I think operations involving i32 and f64 should be casted to f64 if there is any implicit casting.
  • The expression pl.max_horizontal(0.0, 1 - 0.5 * pl.col("x")).alias("w") has type f64, but it looks like the second column was casted to i32 before the max aggregation, even though it has type f64 (by point one).
  • The expression pl.max_horizontal(0, 1.0 - 0.5 * pl.col("x")).alias("r")has type f64 and the output is what I would expect.

The same problem applies to mean_horizontal.

Expected behavior

I would expect the columns "z", "w", "r" to be equal in both cases. The column "r" is correct.

Installed versions

--------Version info---------
Polars:               0.20.23
Index type:           UInt32
Platform:             macOS-14.4.1-arm64-arm-64bit
Python:               3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           0.3.2
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.3.1
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.4
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.2.2
pyarrow:              15.0.2
pydantic:             1.10.14
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.29
xlsx2csv:             <not installed>
xlsxwriter:           3.1.9

@TobiasDummschat
Copy link
Contributor

This seems to be fixed in 0.20.24. Now all columns added columns have type f64.

@NicolasMuellerQC
Copy link
Author

NicolasMuellerQC commented May 7, 2024

Thanks, I can confirm that it's fixed in 0.20.24!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants