Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add byteorder argument to to_buffers #2095

Merged
merged 11 commits into from
Jan 10, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jan 9, 2023

Fixes #2067

@@ -76,6 +76,7 @@ def to_buffers(
form_key: str | None = "node{id}",
id_start: Integral = 0,
backend: Backend = None,
byteorder: Literal["<", ">"] = "<",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a decision for the default to be little-endian. We could also use the native endianness here.

def to_byteorder(array, byteorder):
assert byteorder in "<>"
if byteorder != native_byteorder:
return array.byteswap(inplace=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than swapping the dtype endianness, we always use native endianness and just swap the bytes. We don't want to do this in-place, or to_buffers would break the array.

@@ -426,6 +426,7 @@ def length_zero_array(
container={"": b"\x00\x00\x00\x00\x00\x00\x00\x00"},
buffer_key="",
backend=backend,
byteorder=ak._util.native_byteorder,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything using ArrayBuilder is working with native buffers

@@ -16,7 +16,8 @@ def from_buffers(
container,
buffer_key="{form_key}-{attribute}",
*,
backend: str = "cpu",
backend="cpu",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove annotation as for now HL api is untyped

@agoose77
Copy link
Collaborator Author

agoose77 commented Jan 9, 2023

@jpivarski since writing this PR I'm more torn as to whether it would be better use the native endianness by default, and only make pickle use little-endian. What do you think here?

@codecov
Copy link

codecov bot commented Jan 9, 2023

Codecov Report

Merging #2095 (a993e62) into main (42404f2) will increase coverage by 0.00%.
The diff coverage is 95.52%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_connect/rdataframe/from_rdataframe.py 0.00% <ø> (ø)
src/awkward/forms/form.py 85.65% <ø> (ø)
src/awkward/highlevel.py 76.21% <ø> (ø)
src/awkward/operations/ak_from_avro_file.py 72.22% <ø> (ø)
src/awkward/operations/ak_from_iter.py 94.44% <ø> (ø)
src/awkward/operations/ak_to_json.py 83.56% <0.00%> (ø)
src/awkward/operations/ak_to_list.py 76.92% <0.00%> (ø)
src/awkward/typing.py 88.88% <ø> (ø)
src/awkward/operations/ak_from_buffers.py 89.39% <94.44%> (-0.37%) ⬇️
src/awkward/_do.py 84.30% <100.00%> (ø)
... and 16 more

@agoose77 agoose77 temporarily deployed to docs-preview January 9, 2023 20:42 — with GitHub Actions Inactive
This function is _not_ idempotent.
@agoose77 agoose77 temporarily deployed to docs-preview January 9, 2023 20:56 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview January 9, 2023 21:48 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffers within an Awkward Array must be native-endian because we just didn't implement support for wrong-endian arrays (the way that NumPy did, for instance).

So the only choice to be made is whether to_buffers/from_buffers should

  • default to returning (to_buffers) or accepting (from_buffers) little-endian arrays, which means that everything is no-translation and no-copy on little-endian machines, but always-translate and always-copy on big-endian machines;
  • default to native-endian, which puts the problem of keeping track of endianness through I/O on users.

I'd rather make big-endian machines (which are rare) work harder than make users work harder, so I'm in favor of this PR as-is. This includes the pickle format: we can declare that to always be little-endian (to_buffers/from_buffers defaults).

It looks like you're doing the right thing with copying buffers that need to be changed.

@agoose77
Copy link
Collaborator Author

I've explicitly declared the byteorder for pickling too, so that we're never in doubt, and we don't accidentally regress. I also missed a case of from_buffers in from_rdataframe which I've fixed.

@agoose77 agoose77 temporarily deployed to docs-preview January 10, 2023 08:15 — with GitHub Actions Inactive
@agoose77 agoose77 merged commit 4db99b6 into main Jan 10, 2023
@agoose77 agoose77 deleted the agoose77/feat-to-buffers-byteorder branch January 10, 2023 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.to_buffers doesn't declare endianness
2 participants