Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for struct dtype #756
Add initial support for struct dtype #756
Changes from all commits
76f5c87
53e3e87
0847fe4
a5eb947
e565494
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this typespec for a couple of reasons:
:struct
- to use the same name used on PolarsString.t()
instead ofatom()
for the field name - since we can import large JSON datasets, with potentially random keys, it felt prudent to use a string here to avoid leaking atomsThe trade-offs for these choices are:
:struct
already has a meaning in Elixir. We could name it:map
instead but we would lose the 1:1 naming parity with Polars.Please let me know if you have any other ideas regarding the dtype name and how to represent it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I really like this PR and think this missing functionality would be very helpful.
That said, I'm hesitant about the term "struct" since as you say it's overloaded. What do you think about
DF.from_rows
? Then it would pair withDF.to_rows
just likeSeries.from_list
/Series.to_list
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not importing data from Elixir structs, but literally having a struct column inside Explorer. You could even have lists of structs. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{:f, 32}
even though there's no real equivalent in Elixir.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was a bit iffy about naming it
struct
as well because of the pre-existing meaning within Elixir, but also naming it the more Elixir-appropriatemap
would mean people used to Polars would have to keep this "translation" in mind when using Explorer.Feels like a "pick your poison" type of choice 馃槄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could solve the naming issue by documenting it (like I mentioned in my other comment). Since we are following Arrow data structs, we could just document that. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philss Yeah if struct aligns with polars and arrow then it's the right call 馃憤 And documenting the name in case of potential confusion is def the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the extra bits to the docs here: e565494
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With struct type being added the name didn't seem right anymore. Hope I got the intention right here. 馃槄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed a similar approach from the list type implementation and used
:any
to denote "a struct of any shape". I'm not 100% happy with it though, please let me know if you have any suggestions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels kinda inefficient (it's going to wrap/run through
Enum.map
/unwrap each value of the list of maps), but I was not too worried about it since I don't seeSeries.from_list
as being the main way people will interact with this dtype. I feel like building a struct series inside a dataframe, or reading a struct series from an external file would be far more common.Let me know if you disagree though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks very undocumented, but it is part of the public API from Polars: https://docs.rs/polars/latest/polars/datatypes/enum.AnyValue.html#method._iter_struct_av
The Python implementation deals with struct values through it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe it's a unstable API, since normally it would not have the
_
at the beginning. We can ask them later if this should be used, or if there is an alternative.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I tried dealing with the enum entry directly as well, but it quickly required me to do a bunch of unsafe operations 馃槄
That function is still performing those unsafe calls, but I trust their unsafe Rust more than mine 馃槀