-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce a finite bound on the time gap between signal receipt and signal handler execution. #19481
Conversation
Manually verified that this allows for the desired simplifications in #19465, but I'm still working on unit tests. Just wanted to get eyes on this. @lidizheng |
I encountered the same problem with I personally prefer setting infinite timeout than periodically check, because IMO the later solution might have greater overhead while spinning. |
@lidizheng If we want to be responsive to handlers, some sort of spinning is going to be necessary. It's just a question of which layer it's going to be happening in. This is the code path that is exercised when you supply a timeout to Edit: I stand corrected. That second link I posted is for those platforms that do not have their own |
cf0d5d6
to
88f7865
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to the design! Good work!
Please take a look at failed test cases.
spin_cb() | ||
|
||
|
||
def wait(wait_fn, wait_complete_fn, timeout=None, spin_cb=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: if we move this function to Cython layer, we can gain free performance boost to reduce the overhead of introducing this spin wait mechanism.
Also, should we add a skip condition if the |
@lidizheng I considered it, but I'm wondering if checking the TID on every call to |
@lidizheng Thinking about it a little bit more, I don't think it makes sense to only add this behavior to the main thread. Suppose we're on some other thread and the application blocks indefinitely in a C-level function while holding the GIL (e.g. |
So I went ahead and disabled the gevent tests for this PR, as I'm bumping up against #18980. On a separate branch, I've determined the root cause and have a fix that makes this test pass under gevent (though perhaps not the ideal fix). My plan is to merge this to master and then remove the test from the blocklist in the follow-up PR that addresses #18980. |
That makes sense. Thank you for thinking through this optimization.
…On Wed, Jul 3, 2019 at 13:59 Richard Belleville ***@***.***> wrote:
@lidizheng <https://github.com/lidizheng> Thinking about it a little bit
more, I don't think it makes sense to only add this behavior to the main
thread. Suppose we're on some other thread and the application blocks
indefinitely in a C-level function while holding the GIL (e.g.
waiter.acquire()). Since it has the GIL, the main thread *still* will not
be able to execute the signal handler.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19481?email_source=notifications&email_token=ABYNM4BIBHVII2WVTLOUJIDP5UALBA5CNFSM4H3V3472YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZFQZHY#issuecomment-508234911>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYNM4DVPSISZ5ID4L7EO6DP5UALBANCNFSM4H3V347Q>
.
|
@lidizheng PTALA. |
…dler Execution Previously, signal handlers were only given a chance to run upon receipt of an entry in the RPC stream. Since there is no time bound on how long that might take, there can be an arbitrarily long time gap between receipt of the signal and the execution of the application's signal handlers. Signal handlers are only run on the main thread. The cpython implementation takes great care to ensure that the main thread does not block for an arbitrarily long period between signal checks. Our indefinite blocking was due to wait() invocations on condition variables without a timeout. This changes all usages of wait() in the the channel implementation to use a wrapper that is responsive to signals even while waiting on an RPC. A test has been added to verify this. Tests are currently disabled under gevent due to grpc#18980, but a fix for that has been found and should be merged shortly.
8a3f89a
to
af1b09f
Compare
While drafting #19465, I found that the following simple snippet of code did not work:
Signal handlers were only given a chance to run upon receipt of an entry in the RPC stream. Since there is no time bound on how long that might take, there can be an arbitrarily long time gap between receipt of the signal and the execution of the application's signal handlers.
It turns out that this issue was not limited to streaming RPCs. Unary RPCs exhibited the same property.
Signal handlers are only run on the main thread. The cpython implementation takes great care to ensure that the main thread does not block for an arbitrarily long period between signal checks.
Our indefinite blocking was due to
wait()
invocations on condition variables without a timeout.Related: