Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pl.List('*') #15489

Open
bobir01 opened this issue Apr 5, 2024 · 4 comments
Open

Support for pl.List('*') #15489

bobir01 opened this issue Apr 5, 2024 · 4 comments
Labels
A-input-parsing Area: parsing input arguments A-selectors Area: column selectors bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@bobir01
Copy link

bobir01 commented Apr 5, 2024

Description

I noticed that in dataframe selection it is not possible to select columns by by_dtype (using selectors) when column is NESTED_DTYPES.
So why there is no such option of selecting by_dtype in higher levels like pl.List('*') or pl.List(pl.Any) (i know it does not exist)

I don't know much about Rust but even in python level, there is dummy solution :)

[(key, val) for key, val in df.schema.items() if val == pl.List]
############ OUTPUT #############
[('barcode_list', List(Binary)),
 ('front_category_list', List(Binary)),
 ('substitution_product_list', List(Binary)),
 ('image_list', List(Binary))]

I figured out that in order to select pl.List columns we should specify the inner dtypes like:
pl.List(pl.String)

image

could someone tell is_dtype selection will be in python level or rust level?

Thank you for your time!

@bobir01 bobir01 added the enhancement New feature or an improvement of an existing feature label Apr 5, 2024
@bobir01 bobir01 changed the title Add support for pl.List('*') Support for pl.List('*') Apr 5, 2024
@stinodego stinodego added the A-selectors Area: column selectors label Apr 5, 2024
@stinodego
Copy link
Member

stinodego commented Apr 5, 2024

Passing pl.List to the by_dtype selector should work. I'll classify this as a bug. Thanks for the report!

The difficulty here is that, on the Rust side, we don't really have the notion of a generic uninstantiated List type.

@stinodego stinodego added P-low Priority: low A-input-parsing Area: parsing input arguments bug Something isn't working python Related to Python Polars and removed enhancement New feature or an improvement of an existing feature labels Apr 5, 2024
@bobir01
Copy link
Author

bobir01 commented Apr 5, 2024

@stinodego , this issue is happening with datetime too ;(

@cmdlineluser
Copy link
Contributor

For datetimes specifically, there is the dedicated .datetime() selector.

cs.by_dtype(pl.Datetime).meta.serialize()
# {'DtypeColumn':[{'Datetime':['Microseconds',null]}]}

cs.datetime().meta.serialize()
# {'DtypeColumn': [{'Datetime': ['Milliseconds', '*']},
#  {'Datetime': ['Milliseconds', None]},
#  {'Datetime': ['Microseconds', '*']},
#  {'Datetime': ['Microseconds', None]},
#  {'Datetime': ['Nanoseconds', '*']},
#  {'Datetime': ['Nanoseconds', None]}]}

It uses * to wildcard timezones, and None for the no timezone case.

It also has to do it for each possible time_unit.

The chat in #13683 may be relevant.

@bobir01
Copy link
Author

bobir01 commented Apr 11, 2024

For datetimes specifically, there is the dedicated .datetime() selector

@stinodego ohh i was using pl.Datetime therefore i might missed this Thank you i didnt now about cs.datetime()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-parsing Area: parsing input arguments A-selectors Area: column selectors bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

3 participants