Skip to content

Conversation

caseyclements
Copy link
Contributor

Adding support for two additional Arrow DataTypes: large_list and large_string appear in the Table.schema when one calls Polars to_arrow.

This is a small extension of our ListBuilder and StringBuilder classes and additional tests,

@caseyclements caseyclements requested a review from a team as a code owner January 30, 2024 23:54
@caseyclements caseyclements requested review from Jibola and removed request for a team January 30, 2024 23:54
Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@blink1073 blink1073 changed the title [Arrow-210] Add support for large_list and large_string PyArrow DataTypes [ARROW-210] Add support for large_list and large_string PyArrow DataTypes Jan 31, 2024
@caseyclements caseyclements merged commit 7f4e298 into mongodb-labs:main Jan 31, 2024
@caseyclements caseyclements deleted the ARROW-210-large-list branch January 31, 2024 17:20
expected = Table.from_pydict(
{"_id": [1, 2], "data": self.expected_times},
ArrowSchema([("_id", int32()), ("data", timestamp("ms", tz=tz))]),
"""Test behavior of setting tzinfo CodecOptions in Collection.with_options.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the context for these datetime test changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded the test. The original wasn't testing what it appeared to. Subleties between timestamps and datetimes.

@@ -86,3 +86,22 @@ def __eq__(self, other):
if isinstance(other, type(self)):
return self.typemap == other.typemap
return False

@classmethod
def from_arrow(cls, aschema: pa.Schema):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the constructor accept pa.Schema rather than introduce a new api method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same same but different. This was quick. I time-boxed my work, and stopped myself short of bigger refactoring. I would love to discuss.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that we try to avoid adding new public methods unless there's a clear need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We created https://jira.mongodb.org/browse/ARROW-220 after discussing further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants