New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way for a user to initiate connection shutdown. #3715
Conversation
Awesome! We look forward to more contributions from @databricks ❤️. |
core/src/main/java/com/linecorp/armeria/server/DefaultServiceRequestContext.java
Show resolved
Hide resolved
45b887b
to
1ad81af
Compare
core/src/main/java/com/linecorp/armeria/server/ServiceRequestContextWrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/Http2ServerConnectionHandler.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServerBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServerConfig.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServiceRequestContextWrapper.java
Outdated
Show resolved
Hide resolved
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #3715 +/- ##
============================================
+ Coverage 73.35% 73.36% +0.01%
- Complexity 14613 14653 +40
============================================
Files 1283 1285 +2
Lines 56160 56272 +112
Branches 7181 7185 +4
============================================
+ Hits 41194 41282 +88
- Misses 11323 11338 +15
- Partials 3643 3652 +9
Continue to review full report at Codecov.
|
I see one test failure, which may or may not be a flaky one:
Could you take a look if your changes affected it? |
I think it's related to my change. It's flaky and doesn't happen all the time, but I was able to repro locally by running many tests in a loop in this branch, cannot reproduce in the master branch. Investigating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some minor comments.
core/src/main/java/com/linecorp/armeria/internal/common/GracefulConnectionShutdownHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/DefaultServiceRequestContext.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServerBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServerBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServerHttp1ObjectEncoder.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/Http1RequestDecoder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/Http1ServerKeepAliveHandler.java
Outdated
Show resolved
Hide resolved
* | ||
* @param durationMicros the drain duration. {@code 0} or negative value disables the drain. | ||
*/ | ||
public ServerBuilder connectionDrainDurationMicros(long durationMicros) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering network delay, microsecond precision of drain for in flight requests seems not effective.
How about using milliseconds for connection drain duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://datatracker.ietf.org/doc/html/rfc7540#section-6.8 suggests that drain duration can be as low as round-trip time. Roundtrip inside the datacenter is < 1ms, around few hundred microseconds. If client is connected via localhost (e.g. sidecar process) it's gonna be few microseconds. I chose microsecond precision to give users an ability to configure in those lower ranges if they need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Make sense. Let's keep it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat concerned about using micros for a long parameter because this is probably the only place we use micros instead of millis. Would it be a bad idea to accept a millis value here? A user can use Duration
for micros precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this builder method - its name contains Micros
so there should be little room for the confusion. If you think that providing an override with Millis
is valuable I can add that. But I think there's value in keeping Micros
override too, to make it explicit that server builder supports that level of precision for connection drain duration.
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/DefaultServiceRequestContext.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServiceRequestContextBuilder.java
Outdated
Show resolved
Hide resolved
gracefulConnectionShutdownHandler.updateDrainDuration( | ||
((InitiateConnectionShutdown) evt).drainDurationMicros()); | ||
ctx.channel().close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add return after closing a channel?
gracefulConnectionShutdownHandler.updateDrainDuration( | |
((InitiateConnectionShutdown) evt).drainDurationMicros()); | |
ctx.channel().close(); | |
gracefulConnectionShutdownHandler.updateDrainDuration( | |
((InitiateConnectionShutdown) evt).drainDurationMicros()); | |
ctx.channel().close(); | |
return; |
- I imagine that
InitiateConnectionShutdown
events are simultaneously triggered by a throttling decorator.
It would be inefficient thatctx.channel().close()
is constantly called from multiple requests.
Could we reschedule the ongoingdrainFuture
usinggracefulConnectionShutdownHandler.updateDrainDuration()
and callctx.channel().close()
only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize that calling ctx.channel().close()
multiple times is expensive, fixed.
@trustin re: #3715 (comment) I think it should be fixed by 1431a68, I'm not able to reproduce it anymore. |
core/src/main/java/com/linecorp/armeria/internal/common/GracefulConnectionShutdownHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/DefaultServiceRequestContext.java
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public CompletableFuture<Void> initiateConnectionShutdown(long drainDurationMicros) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat concerned about using micros for a long
parameter because this is probably the only place we use micros instead of millis. Would it be a bad idea to accept a millis value here? A user can use Duration
for micros precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this one might be confusing, but in this case I think your earlier argument about GC pressure applies as well. What if user needs micros precision and their server handles O(million) connections and they use this API? Constructing and destructing Duration objects for each of the connections will cause high GC pressure.
What do you think about making the method name more descriptive? E.g. initiateConnectionShutdownWithDrainDurationMicros(long)
for custom micros, initiateConnectionShutdownWithDrainDuration(Duration)
for custom duration, initiateConnectionShutdown()
for default? Sort of similar to the server builder API #3715 (comment).
* | ||
* @param durationMicros the drain duration. {@code 0} or negative value disables the drain. | ||
*/ | ||
public ServerBuilder connectionDrainDurationMicros(long durationMicros) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat concerned about using micros for a long parameter because this is probably the only place we use micros instead of millis. Would it be a bad idea to accept a millis value here? A user can use Duration
for micros precision.
core/src/main/java/com/linecorp/armeria/server/ServiceRequestContext.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServiceRequestContext.java
Outdated
Show resolved
Hide resolved
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Show resolved
Hide resolved
"/goaway_async?duration=1", | ||
"/goaway_async?duration=100", | ||
}) | ||
void initiateConnectionShutdownHttp1(String path) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making sure all test methods in this class check if the connection is actually closed by the server, even if the client doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL. Found that Armeria doesn't seem to propagate config.idleTimeoutMillis during the HTTP/2 upgrade, so it's always seem to be set to Netty default. Added https://github.com/line/armeria/pull/3715/files#diff-e055c2b1d925141bd5d258b22992c79d706c678dd614399284e4b5e54ea22ac3R42, please let me know if that looks reasonable to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: discussed offline with Trustin, for HTTP/2 server connection handler we can completely disable internal Netty gracefulShutdownTimeoutMillis because drain implemented in this PR is a better way to handle graceful shutdown. This also decouples keep-alive configuration (idleTimeoutMillis) from the graceful shutdown configuration (this PR).
core/src/main/java/com/linecorp/armeria/internal/common/GracefulConnectionShutdownHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/GracefulConnectionShutdownHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/GracefulConnectionShutdownHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/GracefulConnectionShutdownHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
Motivation: line#3516. A user sometimes needs to send a GOAWAY frame manually when they wants to make a client establish a new connection explicitly. Armeria currently doesn't provide an API for this. Modifications: - Add CompletableFuture<Void> initiateConnectionShutdown() method to ServiceRequestContext interface that returns CompletableFuture, which is completed when channel is closed. Mark as @UnstableApi to allow follow up modification, see *Future work* below, e.g. adding method parameters for grace period between first and second GOAWAY frames described there. - Implement method for DefaultServiceRequestContext - calling the method for HTTP/2 requests results in sending a GOAWAY frame with stream ID 2^31-1 and NO_ERROR code. This signals to the client that a shutdown is imminent and that initiating further requests is prohibited https://datatracker.ietf.org/doc/html/rfc7540#section-6.8. HTTP/1 is currently not supported, if called for HTTP/1 request - CompletableFuture completes exceptionally with UnsupportedOperationException. - Implement method for ServiceRequestContextWrapper. - Add documentation to AbstractHttp2ConnectionHandler.goAway which mentions that flushing is responsibility of the caller. Initially, I forgot to add the `ctx.flush()` and spent some time debugging unexpected behavior, hopefully extra documentation will help future readers. Result: - Provided an API for sending GOAWAY frame from request handler. Future work: - HTTP/1 support - calling initiateConnectionShutdown() would result in sending `Connection: close` header. - Support for sending second GOAWAY frame with latest stream ID after grace period, and rejecting incoming streams with higher IDs using RST_STREAM frame, error code REFUSED_STREAM, see https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.4.
… once we settle on implementation.
…eConnectionShutdown. HTTP/2 sends GOAWAY frames, HTTP/1 sends Connection: close. Add grace period argument to the initiateConnectionShutdown to allow multiple GOAWAY frames to be sent as described in https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.4. Pass ChannelHandlerContext to the DefaultServiceRequestContext. Add FakeChannelHandlerContext private class to ServiceRequestContextBuilder - use-case similar to FakeChannel in AbstractRequestContextBuilder.
…tdown event to trigger graceful connection drain before shutdown.
…wn. Move graceful connection shutdown logic to the close() method of the handler.
…that doesn't take grace period and doesn't override current grace period value.
…eriod end in HTTP/2 handler.
Co-authored-by: Trustin Lee <t@motd.kr>
…rConnectionHandlerBuilder. Tune idleTimeoutMillis setting in tests to avoid flaky failures.
Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…for HTTP/1. Update Http2ServerConnectionHandler build process - disable gracefulShutdownTimeoutMillis since it's replaced with drain functionality introduced in this PR.
…Test - CI machines are not very powerful and small timeouts cause various kinds of flakiness.
…ction alive by sending periodic requests, then verify that eventually it closes even without idle connection. I think the source of flakiness is previous test design where code inside untilAsserted was attempting to create requests **around the moment** when keep alive handler is trying to close connection due to max-age. It's inherently prone to race conditions because untilAsserted sleeps between polls and may schedule **after** the connection was closed, effectively opening another connection.
…nce Netty's HTTP/2 server connection graceful shutdown was disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits. Looking great otherwise!
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/AbstractHttp2ConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServerBuilder.java
Outdated
Show resolved
Hide resolved
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/com/linecorp/armeria/server/InitiateConnectionShutdownTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/common/InitiateConnectionShutdown.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/Http2ServerConnectionHandler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/server/ServiceRequestContext.java
Show resolved
Hide resolved
* @see #initiateConnectionShutdown(long) | ||
*/ | ||
@UnstableApi | ||
CompletableFuture<Void> initiateConnectionShutdown(Duration drainDuration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3715 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that comment is not relevant for this method since it only converts Duration
to long
and can be implemented as a default. PTAL.
@Override | ||
public CompletableFuture<Void> initiateConnectionShutdown(Duration drainDuration) { | ||
return delegate().initiateConnectionShutdown(drainDuration); | ||
} | ||
|
||
@Override | ||
public CompletableFuture<Void> initiateConnectionShutdown() { | ||
return delegate().initiateConnectionShutdown(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove this if make them default methods. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3715 (comment)
Co-authored-by: Trustin Lee <t@motd.kr> Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…inDuration)` a default method. Rolled back suggested change with default `public CompletableFuture<Void> initiateConnectionShutdown()` since it changes behavior.
Recent checks failed with the following memory leak
Note |
@alexc-db Re: leak: That's weird. We don't merge the base branch at all in PR runs. Please run |
For the record https://github.com/line/armeria/runs/3224114462 - the leak report and stack trace can be found inside the reports-JVM-15 artifact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your patience. High quality code FTW 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🎉 🎉
Thanks a lot for your hard work, @alexc-db!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! @alexc-db
Motivation:
#3516. A user sometimes needs to send a GOAWAY frame manually when they wants to make a client establish a new connection explicitly. Armeria currently doesn't provide an API for this.
Modifications:
CompletableFuture<Void> initiateConnectionShutdown()
andCompletableFuture<Void> initiateConnectionShutdown(Duration gracePeriod)
methods toDefaultServiceRequestContext
.CompletableFuture
is completed when channel is closed. Mark as@UnstableApi
to allow follow up modification. Calling this method sendsInitiateConnectionShutdown
event usingChannelPipeline.fireUserEventTriggered
API. The event is handled by the respective protocol handler.KeepAliveHandler
. This adds "Connection: close" header to the last response and closes connection after response is done.ctx.flush()
and spent some time debugging unexpected behavior, hopefully extra documentation will help future readers.Result: