-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Enum
inputs in lit
#16668
Comments
Do you happen to know what version this worked on? Going backwards, I get a different error on TypeError: invalid literal value: 'State.VIC' But it fails on every version I've tried. |
I'm not even sure if this is actually a bug. |
We do not automatically convert from python Enum types to polars Enum Series; you can put in a feature request for this. Note that the dtype of your dataframe is simply a string: >>> data
shape: (6, 1)
┌─────────────────┐
│ state │
│ --- │
│ str │
╞═════════════════╡
│ victoria │
│ victoria │
│ victoria │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘
So tl;dr you are not actually performing enum filtering, but trying to filter a string column based on a python Enum object, which polars does not recognize. |
Sorry guys, I just re-tested on previous versions and found I missed something key. The enum class was a string enum, not just an enum. from enum import Enum
import polars as pl
class State(str, Enum): # NOTE: missed the `str` before the enum
VIC = "victoria"
NSW = "new south wales"
data = pl.DataFrame({
'state': [State.VIC] * 3 + [State.NSW] * 3
})
print(data) prints out: (note: correctly converts the 'enum' to a string)
Filtering if on 0.20.31:
However, back in 0.20.25
This string enum also works in other libraries, so there's precedent for it to also work in polars: assert State.VIC == "victoria"
my_dict = {}
my_dict[State.VIC] = 'foo'
print(my_dict['victoria']) # prints out 'foo'
# Or in pandas
import pandas as pd
data_pd = pd.DataFrame({'state': [State.VIC] * 3 + [State.NSW] * 3})
print(data_pd.loc[lambda df: df['state'] == State.VIC])
# Prints:
# state
# 0 State.VIC
# 1 State.VIC
# 2 State.VIC |
@cmdlineluser sorry for the changes. did you see my update or just want me to close this and re raise the issue? |
I can reproduce that change in behaviour. import polars as pl
from enum import Enum
class State(str, Enum): VIC = "victoria"
pl.select(
version = pl.lit(pl.__version__),
enum = pl.lit(State.VIC)
) shape: (1, 2)
┌─────────┬──────────┐
│ version ┆ enum │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪══════════╡
│ 0.20.25 ┆ victoria │
└─────────┴──────────┘ shape: (1, 2)
┌─────────┬───────────┐
│ version ┆ enum │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪═══════════╡
│ 0.20.26 ┆ State.VIC │
└─────────┴───────────┘ Perhaps @orlp / @mcrumiller can confirm and advise on how to proceed. |
Actually since this is strictly on the Python side I'll defer this to @stinodego . |
So this really is a feature request to handle Python We can create |
Checks
Reproducible example
Log output
No response
Issue description
In a previous version (0.20.25), you could filter a string column on an enum and there was no issues. However now it filters incorrectly:
Expected behavior
Using the value of the enum works correctly.
outputs:
Installed versions
The text was updated successfully, but these errors were encountered: