Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list.set_* on Categoricals/Enums gives confusing error messages #13912

Closed
2 tasks done
shenker opened this issue Jan 22, 2024 · 0 comments · Fixed by #14110
Closed
2 tasks done

list.set_* on Categoricals/Enums gives confusing error messages #13912

shenker opened this issue Jan 22, 2024 · 0 comments · Fixed by #14110
Assignees
Labels
A-dtype-categorical Area: categorical data type accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@shenker
Copy link
Contributor

shenker commented Jan 22, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Seen in polars-u64-idx 0.20.5 and 0.20.6-rc.1. Related: #11730, #11735.

Passing in a python list of strings to .list.set_* does not do what I'd expect (automatically cast argument to pl.List(pl.Categorical), to match the type of the Expr it's operating on). Furthermore, the error messages are confusing, making it difficult to figure out what's going on. If you explicitly pass dtype=pl.List(pl.Categorical) to pl.lit, it works as expected.

with pl.StringCache():
    df = pl.DataFrame(
        {
            "a": [
                ["x", "y", "z"],
                ["y", "z"],
                ["y", "z", "x1"],
                ["y", "z", "x2"],
                ["y", "z", "x3"],
            ]
        },
        schema=dict(a=pl.List(pl.Categorical)),
    )
    df.with_columns(pl.col("a").list.set_intersection(["x", "y"]).alias("b"))

prints

InvalidOperationError: casting to a non-enum variant with rev map is not supported for the user
with pl.StringCache():
    df = pl.DataFrame(
        {
            "a": [
                ["x", "y", "z"],
                ["y", "z"],
                ["y", "z", "x1"],
                ["y", "z", "x2"],
                ["y", "z", "x3"],
            ]
        },
        schema=dict(a=pl.List(pl.Categorical)),
    )
    df.with_columns(pl.col("a").list.set_intersection(pl.lit(["x", "y"], dtype=pl.Categorical)).alias("b"))

prints

PanicException: called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("validity mask length must match the number of values"))

This works:

with pl.StringCache():
    df = pl.DataFrame(
        {
            "a": [
                ["x", "y", "z"],
                ["y", "z"],
                ["y", "z", "x1"],
                ["y", "z", "x2"],
                ["y", "z", "x3"],
            ]
        },
        schema=dict(a=pl.List(pl.Categorical)),
    )
    print(df.with_columns(pl.col("a").list.set_intersection(pl.lit(["x", "y", "x1"], dtype=pl.List(pl.Categorical))).alias("b")))

prints

┌──────────────────┬─────────────┐
│ a                ┆ b           │
│ ---              ┆ ---         │
│ list[cat]        ┆ list[cat]   │
╞══════════════════╪═════════════╡
│ ["x", "y", "z"]  ┆ ["x", "y"]  │
│ ["y", "z"]       ┆ ["y"]       │
│ ["y", "z", "x1"] ┆ ["y", "x1"] │
│ ["y", "z", "x2"] ┆ ["y"]       │
│ ["y", "z", "x3"] ┆ ["y"]       │
└──────────────────┴─────────────┘

Log output

No response

Issue description

n/a

Expected behavior

n/a

Installed versions

polars_u64_idx-0.20.6rc1

@shenker shenker added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 22, 2024
@shenker shenker changed the title list.set_* on Categoricals/Enums broken Unintuitive behavior of list.set_* on Categoricals/Enums should be documented Jan 22, 2024
@shenker shenker changed the title Unintuitive behavior of list.set_* on Categoricals/Enums should be documented list.set_* on Categoricals/Enums gives confusing error messages Jan 22, 2024
@stinodego stinodego added the A-dtype-categorical Area: categorical data type label Jan 23, 2024
@c-peters c-peters self-assigned this Jan 29, 2024
@c-peters c-peters added the accepted Ready for implementation label Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-categorical Area: categorical data type accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants