New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It is not possible to concatenate arrays of different data types. #15946
Comments
Also note the example frame was generated by polars using
So if this expression can product a singleton array with a null value (i.e. |
It seems there is no "supertype" inference happening from the You can modify the example to return a Series instead with the dtype: return pl.Series(output, dtype=pl.Struct({'id': pl.String, 'probability': pl.Float64})) As for canonical, there is df.with_columns(
pl.col("frequencies").list.eval(
pl.struct(
id = pl.element().struct["id"],
probability = pl.element().struct["count"] / pl.element().struct["count"].sum()
)
)
)
# shape: (4, 2)
# ┌──────────┬─────────────────────────────────────────────────────────┐
# │ group_id ┆ frequencies │
# │ --- ┆ --- │
# │ i64 ┆ list[struct[2]] │
# ╞══════════╪═════════════════════════════════════════════════════════╡
# │ 0 ┆ [{null,0.5}, {"a",0.5}] │
# │ 1 ┆ [{"b",0.8}, {null,0.2}] │
# │ 2 ┆ [{null,0.625}, {"a",0.1875}, {"b",0.125}, {"c",0.0625}] │
# │ 3 ┆ [{null,1.0}] │
# └──────────┴─────────────────────────────────────────────────────────┘ Although I imagine there is a simpler approach without using |
Thanks @cmdlineluser - I was aware of .list.eval() but I misunderstood the meaning of Either work-around works, should I close this? |
Yeah, the element usage is a bit confusing in some cases.
I'm not sure - it does look like your original example should work. At the very least, I would think the PanicException needs to be fixed. |
Checks
Reproducible example
Log output
Issue description
I'm sure this isn't the "canonical" way of achieving what I want, but it's a POC which I intend on rewriting later. I have grouped data, each group contains a
frequencies
dict that maps keys to counts and I'm trying to convert this to an equivalent dict of probabilities.Each struct has an id and count key, and the value associated with id may be null, this is the source of the error. When there is only 1 item in the array (i.e. group_id==3) then the error above is thrown.
In fact simply returning the input as output (from the ufunc) will also trigger the error.
If you comment out the last row from the example (i.e.
{"group_id":3...
) then there's no error.Expected behavior
I think this should work, even if it's not the recommended way of doing this?
Installed versions
The text was updated successfully, but these errors were encountered: