Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow construction of PySeries from memory buffers #10718

Closed
stinodego opened this issue Aug 24, 2023 · 0 comments · Fixed by #13323
Closed

Allow construction of PySeries from memory buffers #10718

stinodego opened this issue Aug 24, 2023 · 0 comments · Fixed by #13323
Assignees
Labels
A-interchange Area: Python dataframe interchange protocol accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Milestone

Comments

@stinodego
Copy link
Member

stinodego commented Aug 24, 2023

Blocker for #10701

We're trying to create a Polars DataFrame from an interchange object, which is basically just a description of where stuff is in memory.

I would propose the following functionality, which should operate in a zero-copy manner.

1. PySeries.from_buffer (DONE)

Args

  • dtype: physical data type
  • pointer: memory address of the start of the buffer (e.g. some integer)
  • length: number of elements in the buffer
  • offset: possible offset in bits from the start of the buffer
  • base: Object holding the buffer, typically this will be some unknown/foreign object

Example

import polars as pl
from polars.polars import PySeries
from polars.utils._wrap import wrap_s
from polars.testing import assert_series_equal

s = pl.Series([1, 2, 3], dtype=pl.Int16)
offset, length, pointer = s._s.get_ptr()  # (0, 3, 139905829208096)

result = wrap_s(PySeries.from_buffer(pl.Int16, pointer, length, offset, base=s))

assert_series_equal(result, s)

2. PySeries.from_buffers

Args

  • dtype: data type
  • data_buffer: PySeries containing the physical representation of the data
  • validity_buffer: Optional PySeries bitmask
  • offsets_buffer: Optional PySeries of type Int64

Example

import polars as pl
from polars.polars import PySeries
from polars.utils._wrap import wrap_s
from polars.testing import assert_series_equal

data = pl.Series([97, 98, 99, 195, 169, 195, 162, 195, 167], dtype=pl.UInt8)
validity = pl.Series([True, True, False, True])
offsets = pl.Series([0, 1, 3, 3, 9], dtype=pl.Int64)

result = wrap_s(PySeries.from_buffers(pl.Utf8, data._s, validity._s, offsets._s))

expected = pl.Series(["a", "bc", None, "éâç"])
assert_series_equal(result, expected)

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interchange Area: Python dataframe interchange protocol accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Archived in project
2 participants