-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add byteorder argument to to_buffers
#2095
Conversation
@@ -76,6 +76,7 @@ def to_buffers( | |||
form_key: str | None = "node{id}", | |||
id_start: Integral = 0, | |||
backend: Backend = None, | |||
byteorder: Literal["<", ">"] = "<", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a decision for the default to be little-endian. We could also use the native endianness here.
def to_byteorder(array, byteorder): | ||
assert byteorder in "<>" | ||
if byteorder != native_byteorder: | ||
return array.byteswap(inplace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than swapping the dtype endianness, we always use native endianness and just swap the bytes. We don't want to do this in-place, or to_buffers
would break the array.
@@ -426,6 +426,7 @@ def length_zero_array( | |||
container={"": b"\x00\x00\x00\x00\x00\x00\x00\x00"}, | |||
buffer_key="", | |||
backend=backend, | |||
byteorder=ak._util.native_byteorder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything using ArrayBuilder
is working with native buffers
@@ -16,7 +16,8 @@ def from_buffers( | |||
container, | |||
buffer_key="{form_key}-{attribute}", | |||
*, | |||
backend: str = "cpu", | |||
backend="cpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove annotation as for now HL api is untyped
@jpivarski since writing this PR I'm more torn as to whether it would be better use the native endianness by default, and only make |
Codecov Report
Additional details and impacted files
|
This function is _not_ idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffers within an Awkward Array must be native-endian because we just didn't implement support for wrong-endian arrays (the way that NumPy did, for instance).
So the only choice to be made is whether to_buffers
/from_buffers
should
- default to returning (
to_buffers
) or accepting (from_buffers
) little-endian arrays, which means that everything is no-translation and no-copy on little-endian machines, but always-translate and always-copy on big-endian machines; - default to native-endian, which puts the problem of keeping track of endianness through I/O on users.
I'd rather make big-endian machines (which are rare) work harder than make users work harder, so I'm in favor of this PR as-is. This includes the pickle format: we can declare that to always be little-endian (to_buffers
/from_buffers
defaults).
It looks like you're doing the right thing with copying buffers that need to be changed.
I've explicitly declared the byteorder for pickling too, so that we're never in doubt, and we don't accidentally regress. I also missed a case of |
Fixes #2067