-
-
Notifications
You must be signed in to change notification settings - Fork 30k
-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiprocessing's overlapped PipeConnection on Windows #56537
Comments
There are some problems with the new Windows overlapped implementation
-- The unit tests attached pass on Unix. On Windows versions up to 3.2, |
The attached patch hopefully fixes problems (1)-(5), but I have never used overlapped I/O before. test_pipe_poll.py passes with these changes. |
Hello, Thanks for the patch.
Sounds ok to me.
There is infrastructure in _multiprocessing to handle Ctrl-C. Look for
Could you make the tests part of Lib/test/test_multiprocessing.py? Also, what is the rationale for the following change:
+ elif timeout == 0.0 and nleft != 0: |
If PeekNamedPipe() returns (navail, nleft) there are 3 cases:
The check is a shortcut for case 3, although it probably never
I will look in to it. |
pipe_interruptible.patch is a patch to support to making poll() interruptible. It applies on top of pipe_poll_2.patch. I am not sure what the guarantees are for when KeyboardInterrupt will be raised. I would have done it a bit differently if I knew a good way to test whether the current thread is the main one. Maybe there should be something like PyThread_is_main() and/or threading.ismain(). |
Hmm, it seems to me that it should be done in _poll() instead. Otherwise, recv() will not be interruptible, will it? Also, after looking at this again, it seems sigint_event would be better provided (at the C level) by the Python core, e.g. as _PyOS_SigintEvent. The time module already uses a similar mechanism. By setting the event in the sigint handler itself, rather than in an auxiliary HandlerRoutine callback function, it should also ensure that, when the event is set, the "tripped" bit in signalmodule.c has been set, meaning we wouldn't need to Sleep() anymore. (I also wonder why the event isn't auto-reset. Otherwise, there's a race condition - which is probably harmless in most cases, but still.) |
Or maybe WaitForMultipleObjects() should be changed to also wait on sigint_event if called by the main thread.
If it is an auto-reset event then we must worry about a non-main thread stealing the event. Maybe _PyOS_SigintEvent() should return NULL if called by a non-main thread, and be documented with a loud usage warning. |
Indeed, this may be even better.
You are right, we need a manual reset *or* we must ensure that every |
On second thoughts, even using an auto-reset event, resetting the event before waiting is unavoidable. Otherwise you are liable to catch an event that was set a long time ago. |
sigint_event.patch is a patch to make The patch also adds functions _PyOS_SigintEvent and _PyOS_IsMainThread _PyOS_SigintEvent returns a manual reset event (cast to void*) which _PyOS_IsMainThread returns 0 or 1 according to whether the current The time and _multiprocessing modules have been updated to use these Note that WaitForMultipleObjects has a bWaitAll parameter. When this |
I have noticed a few more problems.
|
Here is an updated patch (pipe_poll_fix.patch) which should be applied on top of sigint_event.patch. It fixes the problems with PipeConnection.poll() and Queue.empty() and makes PipeListener.accept() use overlapped I/O. This should make all the pipe releated blocking functions/methods interruptible on Windows. |
New changeset 08953a04b2e6 by Antoine Pitrou in branch 'default': |
I've re-added the fast uncontended path in semaphore.c and committed sigint_event.patch. Now I'm gonna take a look at pipe_poll_fix.patch... |
I have the feeling that if we have to call GetLastError() at the Python |
There are three "expected" error conditions: ERROR_OPERATION_ABORTED: operation stopped by CancelIo(Ex)() In the win32 api you need GetLastError() to distinguish between these cases, but maybe we can expose something better.
If WaitForMultipleObjects() returns WAIT_OBJECT_0 + 0 then, as you say, there is no need to call ov.cancel(), but it is harmless to call it if the operation has already completed. The cases aren't really mutually exclusive: if you call ov.cancel() you *must* still do ov.GetOverlappedResult(True) to check for the race where the operation completes after the wait times out, but before ov.cancel() can stop it. (Your original implementation had that bug -- see point (5) in my original bug report.) |
It seems to me that ERROR_OPERATION_ABORTED is a "true" error, and so
Ah, right. Thanks for the explanation. |
I guess so, although we do expect it whenever poll() times out. What exception would be appropriate? BlockingIOError? TimeoutError? |
I would say either an OSError with the appropriate winerror, or a |
Quite honestly I don't like the way that polling a pipe reads a partial message from the pipe. If at all possible, polling should not modify the pipe. I think the cleanest thing would be to switch to byte oriented pipes on Windows and create PipeIO which subclasses RawIOBase. (Since socket handles are really just overlapped file handles, PipeIO works for them too as long as closesocket() is used instead of CloseHandle().) On Unix FileIO would be used instead. Then Connection can just be a thin wrapper around a file object. Polling on Windows can then be done by creating a wrapper for an overlapped object which represents a zero length read, and can be used with WaitForMultipleObjects(). This lets us implement a select-like wait() function in Python which works with both sockets and Connection objects. Attached is an extension implementing PipeIO (and the overlapped wrapper), a Python module implementing Connection and wait(), and a test. |
I have done an updated patch. (It does *not* switch to using bytes oriented pipes as I suggested in the previous message.) The patch also adds a wait() function with signature wait(object_list, timeout=None) for polling multiple objects at once. On Unix it is just a wrapper for select.select(object_list, [], [], timeout) except that it retries when it gets EINTR. wait() works with "connected" sockets too, although on Windows it does not work for "listening" sockets. The patch removes SentinelReady and changes concurrent.futures to use wait() instead. Polling is now done by issuing zero length overlapped reads. This means that the pipe is not modified except possibly if a zero length message is removed. I changed ReadFile(), WriteFile() and GetOverlappedResult() to return pairs, the second entry of which is zero or an "expected" error code. (Unexpected errors still raise an exception.) This avoids the need to ever use GetLastError(). |
Hmm, I tried to apply the latest patch to the default branch and it failed. |
I will do an updated patch against a "public" changeset. (I usually use a patch for the project files to disable those extensions which my windows setup cannot compile.) |
Updated patch against 2822765e48a7. |
Updated patch addressing Antoine's comments. |
New changeset 75c27daa592e by Antoine Pitrou in branch 'default': |
Committed, thanks! |
One of our buildbots seems to have recurring issues with the timeout issues. I don't know if the machine is very loaded: ====================================================================== Traceback (most recent call last):
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_multiprocessing.py", line 2617, in test_wait_integer
self.assertLess(delta, expected + 2)
AssertionError: 8.592355012893677 not less than 7 ====================================================================== Traceback (most recent call last):
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_multiprocessing.py", line 2590, in test_wait_timeout
self.assertLess(delta, expected + 0.5)
AssertionError: 1.532202959060669 not less than 1.5 http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x |
New changeset bdad874195d6 by Victor Stinner in branch '3.4': New changeset 2e4692a762d5 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: