-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pickler hook for the user to customize the serialization of user defined functions and types. #80081
Comments
Pickler objects provide a dispatch_table attribute, where the user can specify Especially, it is not possible to define custom saving methods for functions The aforementioned failures exist on purpose in the standard library (as a way While prototyping with Antoine Pitrou, we came to the conclusion that a hook (*) dynamic module are modules that cannot be imported by name as traditional |
FYI, I've removed the duplicate message :-) Also adding Serhiy as cc. |
Adding such a hook would make it possible to reimplement cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class (instead of the slow pickle._Pickler as done currently). This would mean rewriting most of the CloudPickler method to only rely on a save_reduce-style design instead of directly calling pickle._Pickler.write and pickle._Pickler.save. This is tedious but doable. There is however a blocker with the current way closures are set: when we pickle a dynamically defined function (e.g. lambda, nested function or function in __main__), we currently use a direct call to memoize (https://github.com/cloudpipe/cloudpickle/blob/v0.7.0/cloudpickle/cloudpickle.py#L594) so as to be able to refer to the function itself in its own closure without causing an infinite loop in CloudPickler.dump. This also makes possible to pickle mutually recursive functions. The easiest way to avoid having to call memoize explicitly would be to be able to pass the full __closure__ attribute in the state dict of the reduce call. Indeed the save_reduce function calls memoize automatically after saving the reconstructor and its args but prior to saving the state: https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L3903-L3931 It would therefore be possible to pass a (state, slotstate) tuple with the closure in slotstate that so it could be reconstructed at unpickling time with a setattr: https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L6258-L6272 However, it is currently not possible to setattr __closure__ at the moment. We can only set individual closure cell contents (which is not compatible with the setattr state trick described above). To summarize, we need to implement the setter function for the __closure__ attribute of functions and methods to make it natural to reimplement the CloudPickler by inheriting from _pickle.Pickler using the hook described in this issue. |
Update: Instead of changing permission on some attributes of function objects (globals and __closure__), we added an optional argument called state_setter to save_reduce. This expects a callable that will be saved inside the object's pickle string, and called when setting the state of the object instead of using the default way in load_build. Also, we tested the cloudpickle package against these patches (see cloudpipe/cloudpickle#253). The tests run fine, and we observe a 10-30x speedup for real-life use-cases. We are starting to hit convergence on the implementation :) |
Both PRs are now merged. Thank you Pierre! |
Sorry, there were hard times to me, so I missed this issue. Adding the 6th item breaks the pickle protocol and the code which expects |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: