Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Enum inputs in lit #16668

Closed
2 tasks done
Andre-Medina opened this issue Jun 3, 2024 · 8 comments · Fixed by #16858
Closed
2 tasks done

Support Enum inputs in lit #16668

Andre-Medina opened this issue Jun 3, 2024 · 8 comments · Fixed by #16858
Assignees
Labels
A-input-parsing Area: parsing input arguments accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars

Comments

@Andre-Medina
Copy link

Andre-Medina commented Jun 3, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

from enum import Enum
import polars as pl

class State(str, Enum):  # NOTE: updated to a `str` enum
    
    VIC = "victoria"
    NSW = "new south wales"


data = pl.DataFrame({
    'state': [State.VIC.value] * 3 + [State.NSW.value] * 3  
})

print(data)

data.filter(pl.col('state') == State.VIC)

Log output

No response

Issue description

In a previous version (0.20.25), you could filter a string column on an enum and there was no issues. However now it filters incorrectly:

shape: (0, 1)
┌───────┐
│ state │
│ ---   │
│ str   │
╞═══════╡
└───────┘

Expected behavior

Using the value of the enum works correctly.

data.filter(pl.col('state') == State.VIC.value)

outputs:

shape: (3, 1)
┌──────────┐
│ state    │
│ ---      │
│ str      │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘

Installed versions

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.36
Python:               3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.2.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.3
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             2.6.1
pyiceberg:            <not installed>
pyxlsb:               1.0.10
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             0.8.1
xlsxwriter:           <not installed>
@Andre-Medina Andre-Medina added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 3, 2024
@cmdlineluser
Copy link
Contributor

Do you happen to know what version this worked on?

Going backwards, I get a different error on 0.20.21

TypeError: invalid literal value: 'State.VIC'

But it fails on every version I've tried.

@orlp
Copy link
Collaborator

orlp commented Jun 3, 2024

I'm not even sure if this is actually a bug.

@mcrumiller
Copy link
Contributor

mcrumiller commented Jun 3, 2024

We do not automatically convert from python Enum types to polars Enum Series; you can put in a feature request for this. Note that the dtype of your dataframe is simply a string:

>>> data
shape: (6, 1)
┌─────────────────┐
│ state           │
│ ---             │
│ str             │
╞═════════════════╡
│ victoria        │
│ victoria        │
│ victoria        │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘

type(State.VIC) is <enum 'State'>, so polars is trying to filter a string column based on an object, and doesn't like it. State.VIC.value is a string, and so the filter works.

So tl;dr you are not actually performing enum filtering, but trying to filter a string column based on a python Enum object, which polars does not recognize.

@Andre-Medina
Copy link
Author

Andre-Medina commented Jun 4, 2024

Sorry guys, I just re-tested on previous versions and found I missed something key. The enum class was a string enum, not just an enum.

from enum import Enum
import polars as pl

class State(str, Enum):  # NOTE: missed the `str` before the enum
    
    VIC = "victoria"
    NSW = "new south wales"

data = pl.DataFrame({
    'state': [State.VIC] * 3 + [State.NSW] * 3  
})

print(data)

prints out: (note: correctly converts the 'enum' to a string)

shape: (6, 1)
┌─────────────────┐
│ state           │
│ ---             │
│ str             │
╞═════════════════╡
│ victoria        │
│ victoria        │
│ victoria        │
│ new south wales │
│ new south wales │
│ new south wales │
└─────────────────┘

Filtering if on 0.20.31:

>>> print(data.filter(pl.col('state') == State.VIC)) # Does not filter
shape: (0, 1)
┌───────┐
│ state │
│ ---   │
│ str   │
╞═══════╡
└───────┘
>>> print(data.filter(pl.col('state') == State.VIC.value)) # Adding the .value, filters correctly
shape: (3, 1)
┌──────────┐
│ state    │
│ ---      │
│ str      │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘

However, back in 0.20.25

>>> print(data.filter(pl.col('state') == State.VIC)) # With or without the .value, filters correctly 
shape: (3, 1)
┌──────────┐
│ state    │
│ ---      │
│ str      │
╞══════════╡
│ victoria │
│ victoria │
│ victoria │
└──────────┘

This string enum also works in other libraries, so there's precedent for it to also work in polars:

assert State.VIC == "victoria"
my_dict = {}
my_dict[State.VIC] = 'foo'
print(my_dict['victoria']) # prints out 'foo'

# Or in pandas
import pandas as pd
data_pd = pd.DataFrame({'state': [State.VIC] * 3 + [State.NSW] * 3})
print(data_pd.loc[lambda df: df['state'] == State.VIC])
# Prints:
#        state
# 0  State.VIC
# 1  State.VIC
# 2  State.VIC

@Andre-Medina
Copy link
Author

@cmdlineluser sorry for the changes. did you see my update or just want me to close this and re raise the issue?

@cmdlineluser
Copy link
Contributor

I can reproduce that change in behaviour.

import polars as pl
from enum import Enum

class State(str, Enum): VIC = "victoria"

pl.select(
    version = pl.lit(pl.__version__), 
    enum = pl.lit(State.VIC)
)    
shape: (1, 2)
┌─────────┬──────────┐
│ versionenum     │
│ ------      │
│ strstr      │
╞═════════╪══════════╡
│ 0.20.25victoria │
└─────────┴──────────┘
shape: (1, 2)
┌─────────┬───────────┐
│ versionenum      │
│ ------       │
│ strstr       │
╞═════════╪═══════════╡
│ 0.20.26State.VIC │
└─────────┴───────────┘

Perhaps @orlp / @mcrumiller can confirm and advise on how to proceed.

@orlp
Copy link
Collaborator

orlp commented Jun 10, 2024

Actually since this is strictly on the Python side I'll defer this to @stinodego .

@stinodego stinodego changed the title Enum filtering has broken. Support Enum inputs in lit Jun 10, 2024
@stinodego stinodego added enhancement New feature or an improvement of an existing feature and removed bug Something isn't working needs triage Awaiting prioritization by a maintainer labels Jun 10, 2024
@stinodego
Copy link
Member

So this really is a feature request to handle Python Enum types in the lit function.

We can create Enum types out of those, I think that's the most elegant.

@stinodego stinodego self-assigned this Jun 10, 2024
@stinodego stinodego added the A-input-parsing Area: parsing input arguments label Jun 10, 2024
@c-peters c-peters added the accepted Ready for implementation label Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-parsing Area: parsing input arguments accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants