No error thrown when using `clip` on int column with float bounds #11385

nicolas-mng · 2023-09-28T13:43:10Z

Description

Hi,
I have this situation occurring to me and I think that polars throwing an error would have helped me debug.

# Case 1 with an int64 polars DataFrame
# Trying to clip a column between min_clip and max_clip, both being float values
import polars as pl
import numpy as np

df = pl.DataFrame({"column": np.arange(100, dtype=np.int64)})
min_clip, max_clip = 0.2, 20
df = df.with_columns(
                [
                    pl.lit(min_clip).alias("min_clip"),  # just here to show the actual values and types
                    pl.lit(max_clip).alias("max_clip"),
                    (pl.col("column").clip(min_clip, max_clip)).alias("column_clipped"),
                ])

┌───────────┬───────────────────┬───────────────────┬───────────────────┐
│ column      ┆ min_clip       ┆ max_clip      ┆ column_clipped│
│ ---       ┆ ---               ┆ ---               ┆ ---               │
│ i64       ┆ f64               ┆ i32               ┆ i64               │
╞═══════════╪═══════════════════╪═══════════════════╪═══════════════════╡
│ 0         ┆ 0.2               ┆ 20                ┆ 0 <- the min clipped value here is wrong, it should be 0.2
│ 1         ┆ 0.2               ┆ 20                ┆ 1                 │
│ 2         ┆ 0.2               ┆ 20                ┆ 2                 │
│ 3         ┆ 0.2               ┆ 20                ┆ 3                 │
│ …         ┆ …                 ┆ …                 ┆ …                 │
│ 96        ┆ 0.2               ┆ 20                ┆ 20                │
│ 97        ┆ 0.2               ┆ 20                ┆ 20                │
│ 98        ┆ 0.2               ┆ 20                ┆ 20                │
│ 99        ┆ 0.2               ┆ 20                ┆ 20                │
└───────────┴───────────────────┴───────────────────┴───────────────────┘

# Case 2 with a float32 polars DataFrame
df = pl.DataFrame({"column": np.arange(100, dtype=np.float32)})
┌───────────┬───────────────────┬───────────────────┬───────────────────┐
│ column       ┆ min_clip      ┆ max_clip      ┆ column_clipped│
│ ---       ┆ ---               ┆ ---               ┆ ---               │
│ f32       ┆ f64               ┆ i32               ┆ f32               │
╞═══════════╪═══════════════════╪═══════════════════╪═══════════════════╡
│ 0.0       ┆ 0.2               ┆ 20                ┆ 0.2  <- it now works│
│ 1.0       ┆ 0.2               ┆ 20                ┆ 1.0               │
│ 2.0       ┆ 0.2               ┆ 20                ┆ 2.0               │
│ 3.0       ┆ 0.2               ┆ 20                ┆ 3.0               │
│ …         ┆ …                 ┆ …                 ┆ …                 │
│ 96.0      ┆ 0.2               ┆ 20                ┆ 20.0              │
│ 97.0      ┆ 0.2               ┆ 20                ┆ 20.0              │
│ 98.0      ┆ 0.2               ┆ 20                ┆ 20.0              │
│ 99.0      ┆ 0.2               ┆ 20                ┆ 20.0              │
└───────────┴───────────────────┴───────────────────┴───────────────────┘

The error must be happening because of various casting of ints and floats and adding a .cast(pl.Float32) seems to fix the issue. But in my actual use case, this operation was hidden within a more convoluted DataFrame operation so it was harder to track the origin of my bug. Maybe having polars throw a warning here would help? Something about boundaries not respected because of type mismatch?

Thanks for the work on polars!

The text was updated successfully, but these errors were encountered:

alexander-beedie · 2023-09-28T14:09:04Z

Could you fix the test case?
Currently doesn't run ;)

nicolas-mng · 2023-09-28T14:36:07Z

Oops sorry, it now works :)

stinodego · 2023-10-02T07:37:39Z

Not sure what the right way to go is here. We could have clip always return a float, but that feels inefficient...

nicolas-mng · 2023-10-04T08:29:23Z

I guess, either autocasting to the higher of the precisions or throwing a type mismatch error would be two legitimate options. I'm worried now that a simple warning would be too quiet and I cannot think of too many use-cases where having values wrongly clipped is ok?

s-banach · 2024-01-11T19:46:19Z

I would assume x.clip(0.2, 20) clips all integers below 0.2 and above 20.
So I would assume 0 gets clipped up to 1 in an integer column.
Just got caught by surprise with this one today.

s-banach · 2024-01-12T18:12:36Z

Similar issue, no warning with Series.scatter(), scattering float values into an int column.

stinodego · 2024-01-12T20:55:23Z

I guess we should strict cast the inputs to the type of the column, rather than regular cast. That would raise an error with bad inputs and require you to cast the column manually first.

EDIT: Actually, we should use supertypes here.

EDIT2: Actually, we should not upcast the original column...

nicolas-mng added the enhancement New feature or an improvement of an existing feature label Sep 28, 2023

stinodego added accepted Ready for implementation and removed accepted Ready for implementation labels Oct 2, 2023

stinodego added accepted Ready for implementation bug Something isn't working P-medium Priority: medium and removed enhancement New feature or an improvement of an existing feature labels Jan 12, 2024

stinodego self-assigned this Jan 12, 2024

stinodego added python Related to Python Polars and removed accepted Ready for implementation labels Jan 12, 2024

stinodego mentioned this issue Jan 14, 2024

fix: Properly handle type conversions in clip #13713

Closed

stinodego mentioned this issue Feb 11, 2024

Unexpected behavior in clip #8090

Closed

2 tasks

stinodego changed the title ~~No warning thrown when using pl.clip on int column~~ No error thrown when using clip on int column with float bounds Feb 11, 2024

stinodego mentioned this issue Feb 11, 2024

fix: Error on some invalid clip inputs #14416

Merged

stinodego removed their assignment Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No error thrown when using `clip` on int column with float bounds #11385

No error thrown when using `clip` on int column with float bounds #11385

nicolas-mng commented Sep 28, 2023 •

edited

Loading

alexander-beedie commented Sep 28, 2023 •

edited

Loading

nicolas-mng commented Sep 28, 2023

stinodego commented Oct 2, 2023

nicolas-mng commented Oct 4, 2023

s-banach commented Jan 11, 2024 •

edited

Loading

s-banach commented Jan 12, 2024

stinodego commented Jan 12, 2024 •

edited

Loading

No error thrown when using clip on int column with float bounds #11385

No error thrown when using clip on int column with float bounds #11385

Comments

nicolas-mng commented Sep 28, 2023 • edited Loading

Description

alexander-beedie commented Sep 28, 2023 • edited Loading

nicolas-mng commented Sep 28, 2023

stinodego commented Oct 2, 2023

nicolas-mng commented Oct 4, 2023

s-banach commented Jan 11, 2024 • edited Loading

s-banach commented Jan 12, 2024

stinodego commented Jan 12, 2024 • edited Loading

No error thrown when using `clip` on int column with float bounds #11385

No error thrown when using `clip` on int column with float bounds #11385

nicolas-mng commented Sep 28, 2023 •

edited

Loading

alexander-beedie commented Sep 28, 2023 •

edited

Loading

s-banach commented Jan 11, 2024 •

edited

Loading

stinodego commented Jan 12, 2024 •

edited

Loading