-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use object.__new__(ak.Array)
for pickling constructor
#2113
Conversation
You just started this, but remember we want to future-proof pickle-loading by making the awkward/src/awkward/highlevel.py Line 1448 in e350396
and awkward/src/awkward/highlevel.py Line 2090 in e350396
with form, length, container, behavior, *args = state # including at for ak.Record or form = state[0]
length = state[1]
container = state[2]
behavior = state[3]
# including at for ak.Record (which Python versions accept the This is not the time to introduce a version number, but making as many old versions (starting with this one) insensitive to extra arguments would make it possible to add a version number at any time. |
Codecov Report
Additional details and impacted files
|
You read my mind! (I stopped for food) Yes, a version number is useful for backward compatibility, whilst this is useful for forwards compatibility. |
Take the first `N` state values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might still be issues pickling behaviors
with lambda functions in them, but that's to be expected. For that reason, the standard behaviors
all have names in the Awkward package (and in Vector).
I always get confused about the differences between __reduce__
and __getstate__
, and I prefer __getstate__
because it is symmetric with __setstate__
. But presumably, you have a reason for switching to __reduce__
in this one, perhaps to be able to control the __new__
method that gets used.
As for future-proofing the state
unpacking, I just verified that the *_
syntax was introduced with Python 3.0 (PEP 3132), so it's definitely safe to use. Although our tests would have revealed it if it wasn't, since we test every Python version we support.
I'll set this to auto-merge!
Good Practice™!
We could also have written def _array_new():
return object.__new__(Array) but I can't see any motivation for it.
Yep! |
When pickling an Awkward Array, the use of pickle's
__getstate__
interface means that pickle stores a reference totype(array)
in the ensuing bytestream. For arrays with a behavior class, this might not be available at unpickle time. e.g. if the author adds their class inak.behavior
, it will not be found in a new session during unpickling, unless the same registration is performed. Whilst this is a somewhat sensible constraint, we should not needlessly force this upon the user.Conventionally, arrays can have
__name__
s that aren't found in their respective behaviours; these aren't currently errors, so we shouldn't fail to unpickle them. We can fix this situation this by removing the reference to thetype(array)
in favour ofobject.__new__(ak.Array)
, so that only thebehavior
member references the class.Note that arrays with local behaviors (
ak.Array(..., behavior=...)
) will always fail to unpickle in the event that the references inside the behaviour can't be resolved!Fixes #2106