-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): Rust functions for typed JsonPath implementation #5140
Conversation
Thanks for moving this to polars-ops. What are your thoughts on exposing this API. In polars-lazy we always need to know the output dtype of a transformation. Any idea how to determine that statically? |
Great question! I wanted to raise that for the next PR, and have been thinking about it after looking through the Expr APIs. Here are my thoughts per method:
What do you think? |
It sounds good for most! I am only not entirely sure about But this is something we can explore on the way. I think we should find some design with which we can make the |
This PR introduces 4 new methods on
Utf8Chunked
, allowing JSON values within the string arrays to be parsed to appropriate types. The currentjson_path_match
only allows forstr
return types, and does not handle nested types well. This PR replaces #3413, and introduces only the core Rust functions needed. I'll follow up with another PR that includes theExpr
implementations to allow these to be used in regular and lazy DataFrames, including Python support for these features.New methods:
Utf8Chunked.json_infer
- returns DataType for the JSON fields in the arrayUtf8Chunked.json_extract
- returns a Series with the appropriate types for the JSON values in the arrayUtf8Chunked.json_path_select
- returns a Utf8Chunked array, selecting based on the provided JsonPathUtf8Chunked.json_path_extract
- returns a Series with the appropriate type after selecting based on a JsonPathNotes:
json_path_match
. I would suggest deprecating that method if these are adopted.select_json
to more elegantly handle the differences between the number of elements returned. Otherwise it is a replica ofextract_json
.