-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: unpickle arrays made in Awkward v1 (as v2). #2604
Conversation
ak.nan_to_num( | ||
ak.Array([1, 2, 3], backend="jax"), nan=ak.Array([1, 2, 3], backend="jax") | ||
ak.Array([1.1, 2.2, 3.3], backend="jax"), | ||
nan=ak.Array([1.1, 2.2, 3.3], backend="jax"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change had nothing to do with the PR. A new version of JAX complains that integers are not an inexact type, so I made them floats.
Codecov Report
Additional details and impacted files
|
We should let @zbhatti test this before merging. |
@zbhatti in your environment to try this out you should follow the instructions here: https://github.com/scikit-hep/awkward#installation-for-developers (but with this branch |
e488f9e
to
e0ff0f6
Compare
…kle-from-awkward1
8ddb857
to
5321178
Compare
This is apparently easier than I'd anticipated, thanks for tackling it @jpivarski! We could create shim classes in I've added support for partitioned arrays, which I think ticks all boxes? |
# If length is a sequence, we have awkward1 | ||
if isinstance(length, Sequence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm glad we have this way to identify an Awkward 1 partitioned array.
That's great! To fix CI, we'll need to merge in #2617. It looks like the last successful test (5b1677c) passed the sdist tests. #2604 (comment) should have been a separate PR, but we can just wait for this one to get merged, since I think that's pretty soon. We haven't heard from @zbhatti about whether this works in his case. We can wait a bit longer, but then move ahead anyway and if there are any specific problems with unpickling his files, we'll address them in another PR. |
@jpivarski as I'm working with him I think it is fine to merge whenever you're ready and he can follow up with additional questions. |
This will make @zbhatti's life easier.
Now it's possible to move arrays all the way from Awkward v0 to v1 to v2, although they need to be pickled and unpickled in each step.
awkward0
library andawkward<2
(v1), use pickle to load it as anawkward0
array, then use the ak.from_awkward0 (v1) function to convert it into a v1 array.awkward>=2
(v2), use pickle to load it as a v2 array.This PR makes the last step possible.
@agoose77, we had talked about the possibility of doing this, and weren't sure if it was a good idea to reintroduce
ak._ext
. In v1, that was a compiled extension with all the C++ code in it. In this PR, it's an ordinary Python module providing synonyms for theForm
classes, since this is where pickle will look for them. The downside of this is that_ext
now has a very different function, and it's not in any way "external." However, it's also hidden with an underscore.The second part to get this to work is for all of the
Form
subclasses to be unpicklable from their v1 formats and their v2 formats. Fortunately, the v1 formats are always tuples and the v2 formats are always dicts, so they're easily distinguished.This PR adds a
__setstate__
to eachForm
subclass (no__getstate__
; we want the default v2 behavior of returningself.__dict__
). If it's a tuple (v1), each subclass has to follow a different prescription to unpickle it, as all of the v1Form
subclasses were pickled in different ways.If a v1 array had multiple partitions, only the first partition will be loaded (with no error). Something ought to be done about that; an error message at least (probably in
ak.Array.__setstate__
). A v1 array can't be pickled as virtual because it gets ak.packed (v1).In this PR, only the
Form
subclasses were modified. Nothing had to be done other than that (inak.from_buffers
orak.Array.__setstate__
).There are tests for array types that aren't eliminated by ak.packed (v1). The untestable ones are simple extrapolations from the tested ones. It'll be fine!