`map` with struct `return_dtype` errors if return is single dict with correct keys & `None` #10398

desmond-dsouza · 2023-08-09T18:51:10Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

S = pl.Struct([pl.Field("x", pl.Int64)])

df = pl.DataFrame({"a": [1]})

def f_id(ds: pl.Series) -> pl.Series:
    return pl.Series("r", [{"x": d["a"] + 1} for d in ds])

def f_null(ds: pl.Series) -> pl.Series:
    return pl.Series("r", [{"x": None} for d in ds])

# return_dtype works, f_id returns dicts with correct keys & Ints
df.select(pl.struct("a").map(f_id, return_dtype=S))

# return_dtype crashes, f_null returns dicts with correct keys & None
df.select(pl.struct("a").map(f_null, return_dtype=S))

Issue description

I believe both map calls should work since there is an explicit return_dtype and the shape and order of the dicts match that dtype. Changing the return to just None without the enclosing dict still results in a SchemaError

Expected behavior

No SchemaError

Installed versions

In [29]: pl.show_versions()
--------Version info---------
Polars:              0.18.11
Index type:          UInt32
Platform:            macOS-11.7.8-x86_64-i386-64bit
Python:              3.11.3 (v3.11.3:f3909b8bc8, Apr  4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.1
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2023.6.0
matplotlib:          3.7.2
numpy:               1.25.0
pandas:              2.0.3
pyarrow:             12.0.1
pydantic:            1.10.11
sqlalchemy:          1.4.49
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>

The text was updated successfully, but these errors were encountered:

alexander-beedie · 2023-08-10T08:18:12Z

@orlp, @ritchie46, a quick coffee-break triage for you:

I think the comparison check here needs to be slightly more forgiving than ==, handling Null fields in nested dtypes (we handle the flat/scalar dtype case fine). In the example above, the comparison fails because...

Struct([Field {name: "x",dtype: Null}]) !=
Struct([Field {name: "x",dtype: Int64}])

...which is true, but should still pass the check as all-null data is acceptable for any dtype in this context. Perhaps we already have such a comparison function hiding somewhere? If not, guess we need one :)

DeflateAwning · 2023-09-23T15:28:35Z

I encountered this bug. What would it take to get it fixed?

CanglongCl · 2024-04-07T05:13:16Z

I will prefer it is correct behavior since dtype of pl.Series("r", [{"x": None} for d in ds]) is absolutely null.

Solution is tell series the dtype you want here like pl.Series("r", [{"x": None} for d in ds], dtype=S).

DeflateAwning · 2024-04-07T06:16:01Z

Tbh, have lost track of what this error means. Would appreciate if someone could express a minimal reproducable example.

Would be happy to take a stab at fixing it.

cmdlineluser · 2024-04-07T11:48:53Z

@DeflateAwning

When using a flat/scalar null, there is no error and it is "upcast":

df = pl.DataFrame({"a": 1})

df.with_columns(
   pl.all().map_elements(lambda x:
      None,
      return_dtype = pl.Float64
   )
)

# shape: (1, 1)
# ┌──────┐
# │ a    │
# │ ---  │
# │ f64  │ # <- dtype `null` "upcast" to `f64` as per return_dtype
# ╞══════╡
# │ null │
# └──────┘

But with a list/dict it errors instead:

df.with_columns(
   pl.all().map_elements(lambda x:
      {"x": None},
      return_dtype = pl.Struct({"x": pl.Float64})
   )
)

# SchemaError: expected output type ...

The expectation appears to be that the inner null would upcast:

s = pl.Series([{"x": None}])

s.dtype
# Struct({'x': Null})

s.cast(pl.Struct({"x": pl.Float64})).dtype
# Struct({'x': Float64})

side-note: I've also just noticed on a Series there is no error but the inner dtype remains as null:

pl.Series(["a"]).map_elements(lambda x:
   {"x": None},
   return_dtype = pl.Struct({"x": pl.Float64})
).dtype

# Struct({'x': Null})

DeflateAwning · 2024-04-07T22:49:44Z

Any idea where to start looking in the codebase?

stinodego · 2024-04-16T22:42:31Z

Should be fixed by #15699

desmond-dsouza added bug Something isn't working python Related to Python Polars labels Aug 9, 2023

alexander-beedie added the accepted Ready for implementation label Aug 10, 2023

ritchie46 changed the title ~~map with struct return_dtype crashes if return is single dict with correct keys & None~~ map with struct return_dtype errors if return is single dict with correct keys & None Aug 10, 2023

shenker mentioned this issue Nov 6, 2023

map_rows with dict return value: BindingsError: "Could not determine output type" #12271

Open

2 tasks

stinodego added P-medium Priority: medium and removed accepted Ready for implementation labels Jan 12, 2024

cmdlineluser mentioned this issue Feb 29, 2024

Bug Report - SchemaError with apply Function in Polars 0.20.3 #14723

Closed

1 task

cmdlineluser mentioned this issue Apr 16, 2024

fix: allow null dtypes in UDFs if they match the schema #15699

Merged

stinodego closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`map` with struct `return_dtype` errors if return is single dict with correct keys & `None` #10398

`map` with struct `return_dtype` errors if return is single dict with correct keys & `None` #10398

desmond-dsouza commented Aug 9, 2023

alexander-beedie commented Aug 10, 2023 •

edited

Loading

DeflateAwning commented Sep 23, 2023

CanglongCl commented Apr 7, 2024

DeflateAwning commented Apr 7, 2024 •

edited

Loading

cmdlineluser commented Apr 7, 2024

DeflateAwning commented Apr 7, 2024

stinodego commented Apr 16, 2024

map with struct return_dtype errors if return is single dict with correct keys & None #10398

map with struct return_dtype errors if return is single dict with correct keys & None #10398

Comments

desmond-dsouza commented Aug 9, 2023

Checks

Reproducible example

Issue description

Expected behavior

Installed versions

alexander-beedie commented Aug 10, 2023 • edited Loading

DeflateAwning commented Sep 23, 2023

CanglongCl commented Apr 7, 2024

DeflateAwning commented Apr 7, 2024 • edited Loading

cmdlineluser commented Apr 7, 2024

DeflateAwning commented Apr 7, 2024

stinodego commented Apr 16, 2024

`map` with struct `return_dtype` errors if return is single dict with correct keys & `None` #10398

`map` with struct `return_dtype` errors if return is single dict with correct keys & `None` #10398

alexander-beedie commented Aug 10, 2023 •

edited

Loading

DeflateAwning commented Apr 7, 2024 •

edited

Loading