Inconsistent behavior of `pl.element().replace()`: Works for list of integers, crashes for list of strings #15074

moritzwilksch · 2024-03-14T18:20:20Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df_int = pl.DataFrame({"a": [[1, 2, 4]]})
df_str = pl.DataFrame({"a": [["1", "2", "4"]]})

print(df_int.select(pl.col("a").list.eval(pl.element().replace({1: [11, 111], 2: [22]}))))  # works
# ┌────────────────────────┐
# │ a                      │
# │ ---                    │
# │ list[list[i64]]        │
# ╞════════════════════════╡
# │ [[11, 111], [22], [4]] │
# └────────────────────────┘

# errors
print(df_str.select(pl.col("a").list.eval(pl.element().replace({"1": ["11", "111"], "2": ["22"]}))))
# ComputeError: cannot cast List type (inner: 'String', to: 'String')

# different error with explicit literal
print(df_str.select(pl.col("a").list.eval(pl.element().replace({"1": pl.lit(["11", "111"])}))))
# ComputeError: cannot cast 'Object' type

Log output

No response

Issue description

Replacing elements inside a list column with .list.eval(pl.element().replace({key: list_of_values}) works for type list[int] but errors for data of type list[str]

Expected behavior

The results should be identical apart from the datatype.

Installed versions

--------Version info---------
Polars:               0.20.15
Index type:           UInt32
Platform:             macOS-14.4-arm64-arm-64bit
Python:               3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

cmdlineluser · 2024-03-14T18:41:02Z

The int version working seems a bit odd because 4 is being "replaced" by [4]

I did notice that forcing the default value is possible:

df_str.select(
   pl.col("a").list.eval(
      pl.element().replace({"1": ["11", "111"], "2": ["22"]}, default=pl.concat_list(pl.element()))
   )
)

# shape: (1, 1)
# ┌────────────────────────────────┐
# │ a                              │
# │ ---                            │
# │ list[list[str]]                │
# ╞════════════════════════════════╡
# │ [["11", "111"], ["22"], ["4"]] │
# └────────────────────────────────┘

moritzwilksch · 2024-03-14T18:44:15Z

ahh i figured that happens to force a consistent datatype. Neat workaround with the default though!

ThomasMAhern · 2024-04-01T10:53:18Z

I too am hoping for this to be resolved!

cmdlineluser · 2024-04-01T15:53:55Z

@ThomasMAhern What do you need resolved?

It seems that both examples should raise here and the fact that the int case works is a "bug"?

EDIT: Ah, the return_dtype docs say: "the data type is determined automatically based on the other inputs" - so perhaps I am mistaken.

Just experimenting further, something else seems broken:

df_int.with_columns(
   a1 = pl.col("a").list.eval(pl.element().replace("1", [["a", "b"]]))
)

# shape: (1, 2)
# ┌───────────┬────────────────────────────┐
# │ a         ┆ a1                         │
# │ ---       ┆ ---                        │
# │ list[i64] ┆ list[list[str]]            │
# ╞═══════════╪════════════════════════════╡
# │ [1, 2, 4] ┆ [["a", "b"], ["2"], ["4"]] │
# └───────────┴────────────────────────────┘

1 is being cast to "1" then the replace succeeds?

ThomasMAhern · 2024-04-02T11:51:10Z

It seems I'm able to get the expected output if I specify return_dtype=pl.List(pl.String)

moritzwilksch added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior of `pl.element().replace()`: Works for list of integers, crashes for list of strings #15074

Inconsistent behavior of `pl.element().replace()`: Works for list of integers, crashes for list of strings #15074

moritzwilksch commented Mar 14, 2024 •

edited

Loading

cmdlineluser commented Mar 14, 2024

moritzwilksch commented Mar 14, 2024

ThomasMAhern commented Apr 1, 2024

cmdlineluser commented Apr 1, 2024 •

edited

Loading

ThomasMAhern commented Apr 2, 2024

Inconsistent behavior of pl.element().replace(): Works for list of integers, crashes for list of strings #15074

Inconsistent behavior of pl.element().replace(): Works for list of integers, crashes for list of strings #15074

Comments

moritzwilksch commented Mar 14, 2024 • edited Loading

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

cmdlineluser commented Mar 14, 2024

moritzwilksch commented Mar 14, 2024

ThomasMAhern commented Apr 1, 2024

cmdlineluser commented Apr 1, 2024 • edited Loading

ThomasMAhern commented Apr 2, 2024

Inconsistent behavior of `pl.element().replace()`: Works for list of integers, crashes for list of strings #15074

Inconsistent behavior of `pl.element().replace()`: Works for list of integers, crashes for list of strings #15074

moritzwilksch commented Mar 14, 2024 •

edited

Loading

cmdlineluser commented Apr 1, 2024 •

edited

Loading