bpo-28053: Complete and fix custom reducers in multiprocessing. #9959

pablogsal · 2018-10-18T22:20:24Z

This PR tries to complete and fix the implementation of the custom reducer classes in multiprocessing.

Important

I have marked the PR as DO-NOT-MERGE because I have still several doubts about the previous implemented API, regarding the AbstractReducer base class and the methods that the user needs to implement and how the rest of the library interacts with multiprocessing.reducer. For example:

I am not sure multiprocessing.reducer.dumps and multiprocessing.reducer.register are needed outside the ForklingPickler class and how that interacts with the ABC.
I am not sure the AbstractReducer is implemented completely (there is no abstract methods marked).

This PR is a draft implementation of the complete API, tests and documentation so we can discuss how to implement these correctly in a better way.

https://bugs.python.org/issue28053

tirkarthi

Just a few minor typos I found while reading the PR.

Edit : Sorry, just read at the end that it's a draft implementation. Feel free to ignore them if needed.

Doc/library/multiprocessing.rst

Lib/multiprocessing/heap.py

Doc/library/multiprocessing.rst

Lib/multiprocessing/connection.py

pitrou · 2018-11-06T20:54:43Z

Thanks @pablogsal for attacking this issue :-)

pablogsal · 2018-12-09T19:17:42Z

@pitrou Thank you very much for the review!

I have simplified the API. Now setting a custom reducer looks like this:

   import multiprocessing
   from multiprocessing.reduction import AbstractReducer, ForkingPickler

    class ForkingPicklerProtocol2(ForkingPickler):
       @classmethod
       def dumps(cls, obj, pickle_protocol=2):
           return super().dumps(obj, protocol=pickle_protocol)

    class PickleProtocol2Reducer(AbstractReducer):
       def get_pickler_class(self):
           return ForkingPicklerProtocol2

    multiprocessing.set_reducer(PickleProtocol2Reducer)

I am making the interface a bit more strict, so multiprocessing.set_reducer() must be called with a subclass of AbstractReducer and get_pickler_class must return a subclass of pickler.Pickle. This way, the constructor and the rest of the methods needed for the multiprocessing reduction machinery will be there. I have added some new test that check this behaviour.

pitrou · 2019-02-06T15:23:49Z

@pablogsal Do you need another review on this?

pablogsal · 2019-05-27T18:03:18Z

@pitrou It took me a while but I have stabilized all tests and fixed some details on Windows. I have also added Listener and Client to the context so they also can benefit from custom reducers. Please, check my previous comment regarding some details.

This patch is already very big and very very complex and when errors happen they are extremely obscure or platform dependent, so I apologize in advance if I miss something obvious, but I have too many spinning plates.

Could you take another look?

pitrou

Thanks for the update. It seems there are test failures on all 3 CI platforms...

pitrou · 2019-05-27T21:05:24Z

Doc/library/multiprocessing.rst

+
+      Defaults to :meth:`pickle.Pickle.dump`
+
+   .. classmethod:: loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict")


Are the optional arguments required? Does multiprocessing ever pass them explicitly?

Also, the method / classmethod asymmetry is weird and doesn't help designing an implementation. Do you think that can be fixed (one way or the other)?

Yes, I find this very weird as well. We can make the load() be a method but that would require instantiating a Pickler() object for no reason (AbstractPickler must inherit from Pickler to make the dump work correctly). It will help with the asymmetry, though.

What do you think?

Notice that to instantiate the Pickler class we need to provide a dummy file-like object (probably a StringIO instance). I find that suboptimal as well.

The other possibility is making dump a method. In that case, we would need to create a Pickler instance and copy and update the dispatch table over it every time is called.

pitrou · 2019-05-27T21:06:12Z

Doc/library/multiprocessing.rst

+
+   .. method:: get_pickler_class():
+
+      This method must return an subclass of :class:`pickler.Pickler` to be used by


This does not make it very clear the relation ship between pickler.Pickler and AbstractPickler.

That should be AbstractPickler

pitrou · 2019-05-27T21:06:37Z

Doc/library/multiprocessing.rst

@@ -1187,6 +1187,81 @@ For example:
    the data in the pipe is likely to become corrupted, because it may become
    impossible to be sure where the message boundaries lie.

+Custom Reduction
+~~~~~~~~~~~~~~~~


You'll need some versionadded directive at some point.

Should we add this? Technically this PR is fixing the previous implementation, although as the old one was broken, one could argue that we are adding the feature.

pitrou · 2019-05-27T21:11:04Z

Lib/multiprocessing/reduction.py

@@ -51,14 +51,35 @@ def dumps(cls, obj, protocol=None):
        cls(buf, protocol).dump(obj)
        return buf.getbuffer()

-    loads = pickle.loads
+    @classmethod
+    def loads(cls, bytes_object, *, fix_imports=True,


Uh... I hadn't noticed these were class methods...

The problem is that loads do not need to instantiate a Pickler class so it was designed here as a class method.

Would you prefer it to be a regular method that does the same (defers the call to pickle.loads)?

pitrou · 2019-05-27T21:14:24Z

Lib/multiprocessing/reduction.py

+
+
+def loads(s, *, fix_imports=True, encoding="ASCII", errors="strict"):
+    return ForkingPickler.loads(s, fix_imports=fix_imports,


By the way, I see that sharedctypes is still using _ForkingPickler directly. Should it be fixed as well?

pitrou · 2019-05-27T21:23:55Z