feat: add byteorder argument to `to_buffers` #2095

agoose77 · 2023-01-09T20:24:06Z

Fixes #2067

agoose77 · 2023-01-09T20:25:14Z

src/awkward/_do.py

@@ -76,6 +76,7 @@ def to_buffers(
    form_key: str | None = "node{id}",
    id_start: Integral = 0,
    backend: Backend = None,
+    byteorder: Literal["<", ">"] = "<",


This is a decision for the default to be little-endian. We could also use the native endianness here.

agoose77 · 2023-01-09T20:25:46Z

src/awkward/_util.py

+def to_byteorder(array, byteorder):
+    assert byteorder in "<>"
+    if byteorder != native_byteorder:
+        return array.byteswap(inplace=False)


Rather than swapping the dtype endianness, we always use native endianness and just swap the bytes. We don't want to do this in-place, or to_buffers would break the array.

agoose77 · 2023-01-09T20:26:28Z

src/awkward/forms/form.py

@@ -426,6 +426,7 @@ def length_zero_array(
            container={"": b"\x00\x00\x00\x00\x00\x00\x00\x00"},
            buffer_key="",
            backend=backend,
+            byteorder=ak._util.native_byteorder,


Anything using ArrayBuilder is working with native buffers

agoose77 · 2023-01-09T20:26:46Z

src/awkward/operations/ak_from_buffers.py

@@ -16,7 +16,8 @@ def from_buffers(
    container,
    buffer_key="{form_key}-{attribute}",
    *,
-    backend: str = "cpu",
+    backend="cpu",


Remove annotation as for now HL api is untyped

agoose77 · 2023-01-09T20:28:10Z

@jpivarski since writing this PR I'm more torn as to whether it would be better use the native endianness by default, and only make pickle use little-endian. What do you think here?

codecov · 2023-01-09T20:42:16Z

Codecov Report

Merging #2095 (a993e62) into main (42404f2) will increase coverage by 0.00%.
The diff coverage is 95.52%.

Additional details and impacted files

Impacted Files	Coverage Δ
src/awkward/_connect/rdataframe/from_rdataframe.py	`0.00% <ø> (ø)`
src/awkward/forms/form.py	`85.65% <ø> (ø)`
src/awkward/highlevel.py	`76.21% <ø> (ø)`
src/awkward/operations/ak_from_avro_file.py	`72.22% <ø> (ø)`
src/awkward/operations/ak_from_iter.py	`94.44% <ø> (ø)`
src/awkward/operations/ak_to_json.py	`83.56% <0.00%> (ø)`
src/awkward/operations/ak_to_list.py	`76.92% <0.00%> (ø)`
src/awkward/typing.py	`88.88% <ø> (ø)`
src/awkward/operations/ak_from_buffers.py	`89.39% <94.44%> (-0.37%)`	⬇️
src/awkward/_do.py	`84.30% <100.00%> (ø)`
... and 16 more

This function is _not_ idempotent.

jpivarski

The buffers within an Awkward Array must be native-endian because we just didn't implement support for wrong-endian arrays (the way that NumPy did, for instance).

So the only choice to be made is whether to_buffers/from_buffers should

default to returning (to_buffers) or accepting (from_buffers) little-endian arrays, which means that everything is no-translation and no-copy on little-endian machines, but always-translate and always-copy on big-endian machines;
default to native-endian, which puts the problem of keeping track of endianness through I/O on users.

I'd rather make big-endian machines (which are rare) work harder than make users work harder, so I'm in favor of this PR as-is. This includes the pickle format: we can declare that to always be little-endian (to_buffers/from_buffers defaults).

It looks like you're doing the right thing with copying buffers that need to be changed.

agoose77 · 2023-01-10T08:07:03Z

I've explicitly declared the byteorder for pickling too, so that we're never in doubt, and we don't accidentally regress. I also missed a case of from_buffers in from_rdataframe which I've fixed.

agoose77 added 5 commits January 9, 2023 19:28

feat: add byteorder argument to to_buffers

9745d6c

feat: add byteorder to from_buffers

22e62d5

fix: pass new byteorder argument

978b5fb

fix: actually swap bytes rather than dtyep

393e4e1

test: ensure byteorder works

a6b8f34

agoose77 commented Jan 9, 2023

View reviewed changes

agoose77 requested a review from jpivarski January 9, 2023 20:28

fix: import Literal from typing_extensions

12ed2ca

agoose77 temporarily deployed to docs-preview January 9, 2023 20:42 — with GitHub Actions Inactive

refactor: rename function to_byteorder

a72b053

This function is _not_ idempotent.

agoose77 temporarily deployed to docs-preview January 9, 2023 20:56 — with GitHub Actions Inactive

docs: clarify docstrings

bcc9e6f

agoose77 temporarily deployed to docs-preview January 9, 2023 21:48 — with GitHub Actions Inactive

jpivarski approved these changes Jan 10, 2023

View reviewed changes

agoose77 added 2 commits January 10, 2023 08:05

fix: caught missing from_buffers byteorder case

b095790

refactor: explicitly pass byteorder even when it matches default

440d081

Merge branch 'main' into agoose77/feat-to-buffers-byteorder

a993e62

agoose77 temporarily deployed to docs-preview January 10, 2023 08:15 — with GitHub Actions Inactive

agoose77 merged commit 4db99b6 into main Jan 10, 2023

agoose77 deleted the agoose77/feat-to-buffers-byteorder branch January 10, 2023 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add byteorder argument to `to_buffers` #2095

feat: add byteorder argument to `to_buffers` #2095

agoose77 commented Jan 9, 2023 •

edited

Loading

agoose77 Jan 9, 2023

agoose77 Jan 9, 2023

agoose77 Jan 9, 2023

agoose77 Jan 9, 2023

agoose77 commented Jan 9, 2023 •

edited

Loading

codecov bot commented Jan 9, 2023 •

edited

Loading

jpivarski left a comment

agoose77 commented Jan 10, 2023

feat: add byteorder argument to to_buffers #2095

feat: add byteorder argument to to_buffers #2095

Conversation

agoose77 commented Jan 9, 2023 • edited Loading

agoose77 Jan 9, 2023

Choose a reason for hiding this comment

agoose77 Jan 9, 2023

Choose a reason for hiding this comment

agoose77 Jan 9, 2023

Choose a reason for hiding this comment

agoose77 Jan 9, 2023

Choose a reason for hiding this comment

agoose77 commented Jan 9, 2023 • edited Loading

codecov bot commented Jan 9, 2023 • edited Loading

Codecov Report

jpivarski left a comment

Choose a reason for hiding this comment

agoose77 commented Jan 10, 2023

feat: add byteorder argument to `to_buffers` #2095

feat: add byteorder argument to `to_buffers` #2095

agoose77 commented Jan 9, 2023 •

edited

Loading

agoose77 commented Jan 9, 2023 •

edited

Loading

codecov bot commented Jan 9, 2023 •

edited

Loading