New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
absl: Potential Mutex deadlock - when using signal handler #24884
Comments
I have the same issue.
|
This is reproducible on the HEAD and it appears that this happens because the custom shutdown callback which is going to call server's shutdown on the its wait-thread which already holds the its mutex. Server's shutdown also tries to get the same mutex and that's why abseil complaint. (This isn't reproducible on Mac, though) @vjpai Do you think this needs a fix? and is there a way to handle shutdown using signal properly? |
This is interesting. AFAIK the actual CV waiting process should be releasing the mutex (since that would block out any further progress in any case) but in general signal handlers can be processed at any thread and if we happen to be in the moment between entering the CV::wait and the time that the mu is released, we could still see this deadlock. Maybe we need to investigate mutex-free shutdown or waiting. |
A workaround is to have the signal handling function do the actual work in a new detached thread so I'm comfortable calling it a P2 or even P3. |
Note that in general, even for libc, you can't use arbitrary library functions inside signal handlers. For example, Linux only supports the use of certain standard functions: http://manpages.ubuntu.com/manpages/bionic/man7/signal-safety.7.html . You should just assume that all gRPC functions are not async-signal-safe. If we find something that is async-signal-safe, we can document that at some point. Otherwise, if you want to do real gRPC work via signals, a safe way to do so is to start a "signal handler thread" at the beginning of main that you join just before exit and use signal masks to make sure that your signals get delivered only to that thread (being careful to only perform activities that are at least multithreaded-safe wrt other gRPC activities at the time). Since this is by no means a gRPC-specific issue, I'd recommend closing. @veblush I'll let you decide on closure. |
Closing this since calling gRPC functions in the signal handler is discouraged. |
I'm hit by this as well. I've had exactly this code (^c -> signal handler -> _server->Shutdown(); ), for very long. It has worked until grpc 1.32.0 included, and we're stuck with this version because of this bug. It seems like I've misunderstood something important, and that I'm not alone. Hence two important questions I have:
|
Your handler for SIGINT should just set some global variable, let's call it sigint_received . When you enter main, spawn a thread that does nothing but check the value of sigint_received (maybe sleep for 1 second between checks) and do your shutdown processing once it becomes true. You can also check for any other exit conditions that you want in that thread. And then remember to join that thread before exit'ing main. Or if possible, just do all that work in main itself, if main doesn't do any blocking operations. Those are all operations that are async-safe from within the signal handler, and then you synchronize it against the main thread with that check and the join. |
Note that if directly calling shutdown from signal handler ever worked, it was mere coincidence. Even libc doesn't have async-signal-safe except for a fraction of functions, and we haven't made that guarantee for any of our functions. |
For the record, here is what I do, and it works with recent grpc (1.43.0).
|
@vjpai @orzel Running a sort of infinite while loop seems very inefficient. The task can be easily achieved by conditional variables. Here is my more efficient approach...
|
Hi @MohammadArik, thanks for your input. I'm not familiar with those conditional variables. But according to cppreference, you should also acquire a mutex in So i have this now, and it kinda seems to work (just did a few quick tests):
(and you can remove the |
Hello @orzel! Nice to see your response. It's actually not necessary as it could have be done without mutex. There is no scope of data-race. Using the mutex just to use condition_variable as the wait method requires a unique_lock. So not required... |
Oh, you think ? That's because |
I haven't responded to this thread despite being tagged since I'm not in gRPC anymore, but I feel strongly about data races so I wanted to put this out there. The suggestion about dropping the mutex is not correct because the resulting code snippet has a race on it between the write of |
And, to clarify that I'm not just in the doom and gloom business, you can solve some of this using |
Yes the atomic var. with wait is better. I missed it. Thanks @vjpai |
Well, my workaround is std::signal(SIGINT, [](int) {
std::async(std::launch::async, [] {
if (_server) _server->Shutdown();
}).get();
});
_server->Wait(); not pretty, but less code. Call |
What version of gRPC and what language are you using?
1.33.2 / C++
What operating system (Linux, Windows,...) and version?
Ubuntu 18.04
What runtime / compiler are you using (e.g. python version or version of gcc)
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
What did you do?
Tried to implement shutdown of a grpc::Server with a signal handler when pressing ctrl+c (SIGINT).
See repro case below.
What did you expect to see?
Normal shutdown and exit.
What did you see instead?
When built with dbg
When built with opt
Anything else we should know about your project / environment?
Works as expected on grpc 1.28.1 which was the previous version used before upgrading to 1.33.2.
The text was updated successfully, but these errors were encountered: