Skip to content

feat(python): Make polars not copy data when importing from arrow #7084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 22, 2023
Merged

feat(python): Make polars not copy data when importing from arrow #7084

merged 2 commits into from
Feb 22, 2023

Conversation

datapythonista
Copy link
Contributor

Closes #7007

Seems like when importing an Arrow array with one chunk, Polars is making a copy of the data by unnecessarily using combine_chunks(), when it should use the chunk directly and not copy data.

I didn't fine any test for the array_to_pyseries to write my test checking that memory is not being copied in the same file, so I used test_constructors.py which seemed the most appropriate. Let me know if it should live elsewhere.

@@ -140,7 +140,7 @@ def arrow_to_pyseries(name: str, values: pa.Array, rechunk: bool = True) -> PySe
elif array.num_chunks == 0:
pys = PySeries.from_arrow(name, pa.array([], array.type))
else:
pys = PySeries.from_arrow(name, array.combine_chunks())
pys = PySeries.from_arrow(name, array.chunks[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this only if array.num_chunks == 1 end leave the else branch as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do that, but just to make sure we're in the same page, the existing if has next conditions:

  • array.num_chunks > 1
  • array.num_chunks == 0
  • else

Assuming num_chunks will always be an integer number greater or equal to zero, adding array.num_chunks == 1 will make that the else code is never executed.

Another option worth considering is having array.num_chunks == 0 and array.num_chunks == 1 and then have the rest as else instead of the current array.num_chunks > 1.

As said, happy to implement what you say, but just want to make sure we're understanding things the same way. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, my bad.

I didn't expand the lines above. You are right. 👍

@ritchie46 ritchie46 changed the title Make polars not copy data when importing from arrow feat(python): Make polars not copy data when importing from arrow Feb 22, 2023
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 22, 2023
@ritchie46
Copy link
Member

Thanks!

@ritchie46 ritchie46 merged commit d36e898 into pola-rs:master Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series round trip to pandas with arrow types copies data
2 participants