New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http_server deadlocks on shutdown #1598
Comments
If CancelIoEx fails, it does not call the read/write callbacks, so the AIOs are never finished, leading to a deadlock when closing a TCP server.
If CancelIoEx fails, it does not call the read/write callbacks, so the AIOs are never finished, leading to a deadlock when closing a TCP server.
Ok, I have tracked down the issue. I have added a PR solving this (#1599), but I have only focused on |
Thanks for your work to track this down. Please see my reply. I'm concerned that you may be observing behavior that is not the root cause, and addressing that symptom may expose us to further races/corruption. Please see my comments on that PR. If I have time I'll try to dig in a bit more deeply as well, although I'm not really a Windows expert. Thanks again for your efforts here. |
@gdamore This is exactly why I wanted to share the code with you prior to make any other changes, as I didn't really know how to handle the error.
I really don't know. The documentation for
In this particular case, yes, the error is The code I showed above is the complete code leading to this issue. Summarizing the problem: Maybe a better solution would be to check for ERROR_NOT_FOUND prior to clear all the AIOs, instead of doing it on any error... |
If CancelIoEx fails, it does not call the read/write callbacks, so the AIOs are never finished, leading to a deadlock when closing a TCP server.
@gdamore So I am completely lost here..... |
It strikes me that there can be a race here. CancelIoEx is getting called while the callback itself is pending/dispatched. So from the I/O perspective, there is nothing left to cancel. Really, CancelIoEx shouldn't fail, and if it does, it should be fine, because it should indicate that I/O is already being completed. That was the original design intent I believe. I'll need some time to figure this out. Teardown is always the hard part of these problems. |
If CancelIoEx fails, it does not call the read/write callbacks, so the AIOs are never finished, leading to an infinite wait when closing a TCP server.
I think so, but it fails while the completion is not called. I have created a PR with a different workaround, which should be safer than before, which consists on pushing a 0 byte completion packet to the pending aio, only if CancelIoEx fails with not found BUT there are still aios pending. |
You were completely right. It wasn't related to CancelIoEx, but the root cause was somewhere else, in the setsockopt function call. The PR solves this issue, and now I can close the HTTP server without any other issue. Thanks for your help and support! |
Describe the bug
Probably is not a bug, but an issue on my code, but I cannot figure out what should be done.
So, before calling it a bug, I would like to know if the shutdown code is correct, or I missed some other function.
Expected behavior
After receiving a particular GET query, the server must shut down.
Actual Behavior
The server deadlocks in
nng_http_server_stop
** Environment Details **
Additional context
The issue is a lot more visible when using the VPN client
AstrillVPN
.With it, it happens like 80% of the times, even if it is not enabled (just being installed).
To Reproduce
The following code deadlocks.
All error handling has been removed for clarity. No errors are thrown.
The code deadlocks when calling
nng_http_server_stop()
.It hangs waiting for closing of all connections (in
http_server_stop()
, linenni_cv_wait(&s->cv);
)While the reaper thread trying to close the connections is waiting for completion of the task on
conn->rd_aio
(in
nni_http_conn_fini()
, linenni_aio_stop(conn->rd_aio);
Thank you for any input you can provide, as I am out of ideas about what to do here....
Best regards,
Ruben
The text was updated successfully, but these errors were encountered: