-
Notifications
You must be signed in to change notification settings - Fork 17
[ARROW-210] Add support for large_list and large_string PyArrow DataTypes #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARROW-210] Add support for large_list and large_string PyArrow DataTypes #191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
expected = Table.from_pydict( | ||
{"_id": [1, 2], "data": self.expected_times}, | ||
ArrowSchema([("_id", int32()), ("data", timestamp("ms", tz=tz))]), | ||
"""Test behavior of setting tzinfo CodecOptions in Collection.with_options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the context for these datetime test changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded the test. The original wasn't testing what it appeared to. Subleties between timestamps and datetimes.
@@ -86,3 +86,22 @@ def __eq__(self, other): | |||
if isinstance(other, type(self)): | |||
return self.typemap == other.typemap | |||
return False | |||
|
|||
@classmethod | |||
def from_arrow(cls, aschema: pa.Schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the constructor accept pa.Schema
rather than introduce a new api method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same same but different. This was quick. I time-boxed my work, and stopped myself short of bigger refactoring. I would love to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that we try to avoid adding new public methods unless there's a clear need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We created https://jira.mongodb.org/browse/ARROW-220 after discussing further.
Adding support for two additional Arrow DataTypes:
large_list
andlarge_string
appear in theTable.schema
when one callsPolars to_arrow
.This is a small extension of our ListBuilder and StringBuilder classes and additional tests,