Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unpickle arrays made in Awkward v1 (as v2). #2604

Merged
merged 10 commits into from
Aug 7, 2023

Conversation

jpivarski
Copy link
Member

This will make @zbhatti's life easier.

Now it's possible to move arrays all the way from Awkward v0 to v1 to v2, although they need to be pickled and unpickled in each step.

  1. In an environment with Awkward 0.x, pickle the array.
  2. In an environment (possibly the same) with the awkward0 library and awkward<2 (v1), use pickle to load it as an awkward0 array, then use the ak.from_awkward0 (v1) function to convert it into a v1 array.
  3. In the same environment, save the v1 array with pickle.
  4. In a different environment with awkward>=2 (v2), use pickle to load it as a v2 array.

This PR makes the last step possible.

@agoose77, we had talked about the possibility of doing this, and weren't sure if it was a good idea to reintroduce ak._ext. In v1, that was a compiled extension with all the C++ code in it. In this PR, it's an ordinary Python module providing synonyms for the Form classes, since this is where pickle will look for them. The downside of this is that _ext now has a very different function, and it's not in any way "external." However, it's also hidden with an underscore.

The second part to get this to work is for all of the Form subclasses to be unpicklable from their v1 formats and their v2 formats. Fortunately, the v1 formats are always tuples and the v2 formats are always dicts, so they're easily distinguished.

This PR adds a __setstate__ to each Form subclass (no __getstate__; we want the default v2 behavior of returning self.__dict__). If it's a tuple (v1), each subclass has to follow a different prescription to unpickle it, as all of the v1 Form subclasses were pickled in different ways.

If a v1 array had multiple partitions, only the first partition will be loaded (with no error). Something ought to be done about that; an error message at least (probably in ak.Array.__setstate__). A v1 array can't be pickled as virtual because it gets ak.packed (v1).

In this PR, only the Form subclasses were modified. Nothing had to be done other than that (in ak.from_buffers or ak.Array.__setstate__).

There are tests for array types that aren't eliminated by ak.packed (v1). The untestable ones are simple extrapolations from the tested ones. It'll be fine!

Comment on lines 13 to 16
ak.nan_to_num(
ak.Array([1, 2, 3], backend="jax"), nan=ak.Array([1, 2, 3], backend="jax")
ak.Array([1.1, 2.2, 3.3], backend="jax"),
nan=ak.Array([1.1, 2.2, 3.3], backend="jax"),
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change had nothing to do with the PR. A new version of JAX complains that integers are not an inexact type, so I made them floats.

@codecov
Copy link

codecov bot commented Aug 3, 2023

Codecov Report

Merging #2604 (50b7abd) into main (7802765) will decrease coverage by 0.06%.
The diff coverage is 71.72%.

Additional details and impacted files
Files Changed Coverage Δ
src/awkward/forms/bitmaskedform.py 82.52% <42.85%> (-2.90%) ⬇️
src/awkward/forms/indexedform.py 78.50% <42.85%> (-2.50%) ⬇️
src/awkward/forms/listform.py 77.14% <42.85%> (-2.45%) ⬇️
src/awkward/forms/numpyform.py 83.13% <50.00%> (-12.71%) ⬇️
src/awkward/highlevel.py 76.82% <75.00%> (-0.14%) ⬇️
src/awkward/forms/bytemaskedform.py 84.53% <85.71%> (+0.09%) ⬆️
src/awkward/__init__.py 97.14% <100.00%> (+0.08%) ⬆️
src/awkward/_ext.py 100.00% <100.00%> (ø)
src/awkward/forms/emptyform.py 86.02% <100.00%> (+1.13%) ⬆️
src/awkward/forms/indexedoptionform.py 88.77% <100.00%> (+0.86%) ⬆️
... and 5 more

@jpivarski
Copy link
Member Author

We should let @zbhatti test this before merging.

@jpivarski jpivarski temporarily deployed to docs-preview August 3, 2023 02:44 — with GitHub Actions Inactive
pyproject.toml Outdated Show resolved Hide resolved
@jpivarski jpivarski temporarily deployed to docs-preview August 3, 2023 02:56 — with GitHub Actions Inactive
@matthewfeickert
Copy link
Member

matthewfeickert commented Aug 3, 2023

We should let @zbhatti test this before merging.

@zbhatti in your environment to try this out you should follow the instructions here: https://github.com/scikit-hep/awkward#installation-for-developers (but with this branch jpivarski/allow-unpickle-from-awkward1 checked out)

pyproject.toml Outdated Show resolved Hide resolved
@henryiii henryiii temporarily deployed to docs-preview August 3, 2023 16:51 — with GitHub Actions Inactive
@agoose77 agoose77 force-pushed the jpivarski/allow-unpickle-from-awkward1 branch from e488f9e to e0ff0f6 Compare August 4, 2023 08:55
@agoose77 agoose77 force-pushed the jpivarski/allow-unpickle-from-awkward1 branch from 8ddb857 to 5321178 Compare August 4, 2023 09:02
@agoose77 agoose77 temporarily deployed to docs-preview August 4, 2023 09:11 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview August 4, 2023 09:22 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview August 4, 2023 10:24 — with GitHub Actions Inactive
@agoose77
Copy link
Collaborator

agoose77 commented Aug 4, 2023

This is apparently easier than I'd anticipated, thanks for tackling it @jpivarski!

We could create shim classes in _ext to avoid mixing v1 logic into v2 forms, but I'm not that worried about it, so let's keep it as-is.

I've added support for partitioned arrays, which I think ticks all boxes?

@agoose77 agoose77 temporarily deployed to docs-preview August 4, 2023 10:36 — with GitHub Actions Inactive
Comment on lines +1447 to +1448
# If length is a sequence, we have awkward1
if isinstance(length, Sequence):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm glad we have this way to identify an Awkward 1 partitioned array.

@jpivarski
Copy link
Member Author

I've added support for partitioned arrays, which I think ticks all boxes?

That's great!

To fix CI, we'll need to merge in #2617.

It looks like the last successful test (5b1677c) passed the sdist tests.

#2604 (comment) should have been a separate PR, but we can just wait for this one to get merged, since I think that's pretty soon.

We haven't heard from @zbhatti about whether this works in his case. We can wait a bit longer, but then move ahead anyway and if there are any specific problems with unpickling his files, we'll address them in another PR.

@jpivarski jpivarski temporarily deployed to docs-preview August 7, 2023 17:52 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview August 7, 2023 18:58 — with GitHub Actions Inactive
@matthewfeickert
Copy link
Member

We can wait a bit longer, but then move ahead anyway and if there are any specific problems with unpickling his files, we'll address them in another PR.

@jpivarski as I'm working with him I think it is fine to merge whenever you're ready and he can follow up with additional questions.

@jpivarski jpivarski enabled auto-merge (squash) August 7, 2023 21:26
@jpivarski jpivarski temporarily deployed to docs-preview August 7, 2023 21:41 — with GitHub Actions Inactive
@jpivarski jpivarski merged commit 7f8817e into main Aug 7, 2023
32 checks passed
@jpivarski jpivarski deleted the jpivarski/allow-unpickle-from-awkward1 branch August 7, 2023 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants