-
Notifications
You must be signed in to change notification settings - Fork 116
Propagate Connection Closed Information up to top-level (fix #465) #545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Propagate Connection Closed Information up to top-level (fix #465) #545
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd be better to use the state machine to work out state, instead of adding additional variables, outside of the state machine.
Why don't you just check for the disconnected state in LambdaRuntimeClient.nextInvocation
and if it is set throw a LambdaRuntimeError(code: .connectionToControlPlaneLost)
@adam-fowler This was already implemented (not by me). But |
That doesn't work in this context because I need to propagate the error from a callback inside Throwing errors is not an option neither because the only place where I can trap the |
You are checking for the existence of the |
Because |
Can you not add to the top of
|
Nope, I have to think more about it. The first time we arrive in this method, we are in a disconnected state :-) |
Here is a new version.
|
Here is a new version.
|
I further simplified by removing a necessary extra state I introduced in the previous commit. Core changes are
23 lines of code. |
|
||
// resume any pending continuation on the handler | ||
if case .connected(_, let handler) = runtimeClient.connectionState { | ||
if case .connected(_, let lambdaState) = handler.state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary, as LambdaChannelHandler.channelInactive
will have resumed the continuation (with error ioOnClosedChannel
). None of the tests trigger this code either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LambdaChannelHandlerHandler.channelInnactive
is indeed called when the server closes a connection but not in the state waitingForNextInvocation
.
When the server closes the connection, the state in the LambdaChannelHandler
is
connected
and LambdaState.idle
To reproduce what I observe:
MAX_INVOCATIONS=3 MODE=json PORT=7777 LOG_LEVEL=trace swift run MockServer
- from another terminal, on the
sebsto/shutdown_on_lost_connection
branch :cd Examples/HelloJSON ; LAMBDA_USE_LOCAL_DEPS=../.. LOG_LEVEL=trace AWS_LAMBDA_RUNTIME_API=127.0.0.1:7777 swift run
The MockServer will start and will shutdown after three invocations.
The runtime will pull and process three events from the MockServer, and then will receive a connection closed event when trying to fetch the next event.
You will see that sometimes, the runtime catches the closed connection and gracefully shutdowns with Connection refused (errno: 61)
and sometimes it hangs. In both cases LambdaChannelHandler.channelInactive()
is called with self state == .connected(_, .idle)
and lastError == nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to .idle
happens after the response has been sent on this line
Looks like we have two behaviors, depending when the client detects the connection in closed.
Either the response has been sent, LambdaChannelHandler
is in .idle
state and the runtime detects the close of the connection before nextInvocation()
has a chance to change the status and to send the next request. nextInvocation()
correctly reports Connection refused (errno: 61)
(we can trap the error by adding a do cacth
block here
Either nextInvocation()
already changed the state. It created a new promise and switched the state again to .waitingForNextInvocation
(on this line)
This is where we have a problem, because this new promise is never fulfilled. We are after the call to LambdaChannelHandler.channelInactive()
and there is no throwing functions that we can wrap in a do catch
to trap the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of capturing the new unfulfilled promise in channel.closeFuture.whenComplete
, I can capture it in LambdaRuntimeClient.channelClosed()
which is called by the latter. It's not a big difference.
Anyway, when we detect a connection is closed in the LambdaRuntimeClient
, sometimes it happens after the call to LambdaChannelHandler.channelInactive()
And this is IMHO the code that allows to fulfill the promise.
if case .connected(_, let handler) = self.connectionState {
if case .connected(_, let lambdaState) = handler.state {
if case .waitingForNextInvocation(let continuation) = lambdaState {
continuation.resume(throwing: LambdaRuntimeError(code: .connectionToControlPlaneLost))
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a test that replicates this code being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, b0234f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you noticed, the connnectionToControlPlaneLost
error was not triggered by the previous commit. This commit adds a test to make sure the two types of errors are triggered 156e6c7
… is closed, whatever is the state of the LambdaChannelHandler
This PR implements a mechanism to propagate connection loss information from the Lambda runtime client to the runtime loop, enabling termination without backtrace when the connection to the Lambda control plane (or a Mock Server) is lost.
The changes are:
When the connection is lost, the
channel.closeFuture.whenComplete
callback now correctly callsresume(throwing:)
for the pending continuation, eliminating the hangs.I added top-level error handling on
LambdaRuntime._run()
================================================
The below is the original PR description. It is kept here for history.
This PR implements a mechanism to propagate connection loss information from the Lambda runtime client to the runtime loop using
EventLoopFuture
, enabling termination without backtrace when the connection to the Lambda control plane (or a Mock Server) is lost.This PR fixes #465
Changes Made
Core Changes:
LambdaRuntimeClient
: AddedfutureConnectionClosed
property to signal connection loss viaEventLoopFuture
Lambda.swift
: ModifiedrunLoop
to monitor connection status and throw an error on connection lossLambdaRuntime
andServiceLifecycle
Integration: Modified to properly handle connection loss (and other top-level) errors and avoid application to crash with a backtrace.Termination without a backtrace
Use Cases