New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout for closing connections at the graceful shutdown #1108

Merged
merged 7 commits into from Feb 23, 2017

Conversation

Projects
None yet
2 participants
@deweerdt
Member

deweerdt commented Oct 20, 2016

On graceful shutdown, after the second GOAWAY message has been written,
it should be possible for use to safely close the connection, without
necessarily waiting for all streams to be processed.

Close the connection at the end of the graceful shutdown
On graceful shutdown, after the second GOAWAY message has been written,
it should be possible for use to safely close the connection, without
necessarily waiting for all streams to be processed.
@deweerdt

This comment has been minimized.

Show comment
Hide comment
@deweerdt

deweerdt Oct 20, 2016

Member

Both the current behavior and what this PR suggests are within what the RFC permits, i think.

Maybe we need a config knob? One downside of the current behavior is that a never ending download might pin an H2O instance forever. Having the connection close makes server stops more predictable in terms of timing.

Member

deweerdt commented Oct 20, 2016

Both the current behavior and what this PR suggests are within what the RFC permits, i think.

Maybe we need a config knob? One downside of the current behavior is that a never ending download might pin an H2O instance forever. Having the connection close makes server stops more predictable in terms of timing.

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Oct 21, 2016

Member

Thank you for the PR.

Even after the second GOAWAY, some responses might still be in-flight (the specification discuss the possibility of losing requests, not responses).

In such case, it would be desirable to postpone closing the connection until those responses finish. Otherwise, the clients would be left with partial responses, and since they are partial, they would never get retried.

So if this is an issue, I would suggest adding a timer that starts after sending the second GOAWAY and wait until then, or, add a function to server-starter (assuming that you use it to control H2O) that sends SIGKILL when the server fails to stop after some time after SIGTERM has been delivered.

Member

kazuho commented Oct 21, 2016

Thank you for the PR.

Even after the second GOAWAY, some responses might still be in-flight (the specification discuss the possibility of losing requests, not responses).

In such case, it would be desirable to postpone closing the connection until those responses finish. Otherwise, the clients would be left with partial responses, and since they are partial, they would never get retried.

So if this is an issue, I would suggest adding a timer that starts after sending the second GOAWAY and wait until then, or, add a function to server-starter (assuming that you use it to control H2O) that sends SIGKILL when the server fails to stop after some time after SIGTERM has been delivered.

Add a second timeout callback, in which we close the connection.
This gives extra time to clients after we've sent the second GOAWAY
frame.
@deweerdt

This comment has been minimized.

Show comment
Hide comment
@deweerdt

deweerdt Oct 27, 2016

Member

@kazuho, thanks for your comments, I've implemented your suggestion to have a second timer in 00126d7. I believe that would be easier to troubleshoot in the long run, being a bit more predictable in behavior.

Member

deweerdt commented Oct 27, 2016

@kazuho, thanks for your comments, I've implemented your suggestion to have a second timer in 00126d7. I believe that would be easier to troubleshoot in the long run, being a bit more predictable in behavior.

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Oct 28, 2016

Member

@deweerdt Thank you for the update!

I believe that would be easier to troubleshoot in the long run, being a bit more predictable in behavior.

Sounds reasonable.

The PR looks good to me; the only thing I'm a bit worried is that just waiting for one second might not be enough (for example for servers serving large files). Could we make it a configuration variable?

Member

kazuho commented Oct 28, 2016

@deweerdt Thank you for the update!

I believe that would be easier to troubleshoot in the long run, being a bit more predictable in behavior.

Sounds reasonable.

The PR looks good to me; the only thing I'm a bit worried is that just waiting for one second might not be enough (for example for servers serving large files). Could we make it a configuration variable?

deweerdt added some commits Oct 28, 2016

Add a 'http2-graceful-shutdown-timeout' config paramter that allows to
configure how long H2O should wait for connections to close when doing a
graceful shutdown
@deweerdt

This comment has been minimized.

Show comment
Hide comment
@deweerdt

deweerdt Oct 29, 2016

Member

@kazuho, thank you for your feedback. I've added a config item to make the timeout configurable in 7429037. It defaults to 0 which makes it inactive (H2O behaves as it used to).

Member

deweerdt commented Oct 29, 2016

@kazuho, thank you for your feedback. I've added a config item to make the timeout configurable in 7429037. It defaults to 0 which makes it inactive (H2O behaves as it used to).

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Nov 7, 2016

Member

(just in case you missed) CI seems to be failing due to t/00unit.libuv.t receiving a SIGSEGV:

�[31mt/00unit.libuv.t                                   (Wstat: 139 Tests: 22 Failed: 0)�[0m
�[31m  Non-zero wait status: 139�[0m
Member

kazuho commented Nov 7, 2016

(just in case you missed) CI seems to be failing due to t/00unit.libuv.t receiving a SIGSEGV:

�[31mt/00unit.libuv.t                                   (Wstat: 139 Tests: 22 Failed: 0)�[0m
�[31m  Non-zero wait status: 139�[0m
Add missing h2o_timeout_dispose call, this would result in invalid reads
in libuv because the lib would still reference a free'd context.
@deweerdt

This comment has been minimized.

Show comment
Hide comment
@deweerdt

deweerdt Nov 7, 2016

Member

@kazuho I had missed it, thanks for pointing this out. I've added the missing h2o_timeout_dispose call in fe0931f.

Member

deweerdt commented Nov 7, 2016

@kazuho I had missed it, thanks for pointing this out. I've added the missing h2o_timeout_dispose call in fe0931f.

@kazuho

Thank you for the update. I'm glad to see the CI passing now.

Re-reading the PR, I have one concern. Please let me know what you think.

Show outdated Hide outdated lib/http2/connection.c
for (node = ctx->http2._conns.next; node != &ctx->http2._conns; node = next) {
h2o_http2_conn_t *conn = H2O_STRUCT_FROM_MEMBER(h2o_http2_conn_t, _conns, node);
next = node->next;
if (conn->state < H2O_HTTP2_CONN_STATE_IS_CLOSING) {

This comment has been minimized.

@kazuho

kazuho Nov 10, 2016

Member

This if would mean that if there is a pending write, then closing the socket (and therefore the exit of the server) will be deferred until the write completes.

Is this what we intend, or should we better close a socket with a pending write immediately as well?

@kazuho

kazuho Nov 10, 2016

Member

This if would mean that if there is a pending write, then closing the socket (and therefore the exit of the server) will be deferred until the write completes.

Is this what we intend, or should we better close a socket with a pending write immediately as well?

This comment has been minimized.

@deweerdt

deweerdt Nov 10, 2016

Member

That was indeed my intention, but my assumption was that this would happen relatively fast (next scheduler round), could it be delayed much longer?

@deweerdt

deweerdt Nov 10, 2016

Member

That was indeed my intention, but my assumption was that this would happen relatively fast (next scheduler round), could it be delayed much longer?

This comment has been minimized.

@kazuho

kazuho Nov 30, 2016

Member

If there is a pending write and the client never sends ACKs, then the write would stall for fairly long time without the write completion callback getting called. Eventually the idle timeout will kick in, but until then, I believe that the connection would be kept open.

@kazuho

kazuho Nov 30, 2016

Member

If there is a pending write and the client never sends ACKs, then the write would stall for fairly long time without the write completion callback getting called. Eventually the idle timeout will kick in, but until then, I believe that the connection would be kept open.

This comment has been minimized.

@deweerdt

deweerdt Nov 30, 2016

Member

Ah, I see. Fixing this. Thanks!

@deweerdt

deweerdt Nov 30, 2016

Member

Ah, I see. Fixing this. Thanks!

@deweerdt

This comment has been minimized.

Show comment
Hide comment
@deweerdt

deweerdt Dec 12, 2016

Member

@kazuho I believe d5198ac has all suggested changes, plus test coverage

Member

deweerdt commented Dec 12, 2016

@kazuho I believe d5198ac has all suggested changes, plus test coverage

@kazuho kazuho changed the title from Close the connection at the end of the graceful shutdown to add timeout for closing connections at the graceful shutdown Feb 23, 2017

@kazuho kazuho merged commit 80d3595 into h2o:master Feb 23, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Feb 23, 2017

Member

Thank you for the changes. Merged to master.

Member

kazuho commented Feb 23, 2017

Thank you for the changes. Merged to master.

kazuho added a commit that referenced this pull request Feb 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment