-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent SSL Handshakes cause IO threads to stay busy for long time #7020
Comments
ChannelInboundHandlers will execute on the EventLoop thread by default. Here are general recommendations to work through this performance issue:
|
Hi @Scottmitch
|
@anshuman-osc I did not say its unexpected I just said I would be interested to see a profiler snapshot ;) Can you share the snapshot ? |
Hi @normanmaurer Thanks |
If you are using Java for SSL I would also recommend updating your version of java.
Where are these attached? Can you provide a link? |
Hi @Scottmitch I have attached the screenshots to the issue itself, are you not able to view them? https://user-images.githubusercontent.com/7992699/28565800-9868cc76-7170-11e7-9f4e-55b4780a5c71.png https://user-images.githubusercontent.com/7992699/28565823-aaa3600e-7170-11e7-9173-36a44f5c3095.png Thanks |
I can see the screenshots. What we would like is the actual profile snapshot files themselves to understand what is going on in more detail. |
HI @Scottmitch Thanks |
+1 on this ticket. We have theorized about this attack vector and it'd be great to be able to offload the handshaking to a different thread pool that doesn't interfere with the normal worker threads. |
Hi @Scottmitch , @normanmaurer Did the flight recording help? Thanks |
sorry I have not had cycles to investigate this yet |
Same but still on my to do list
… Am 23.08.2017 um 17:28 schrieb Scott Mitchell ***@***.***>:
sorry I have not had cycles to investigate this yet
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
In my mind... Split SslHandler into two classes. Let there be a new SslHandshakeHandler and some class that does everything but the handshake portion. Wrap these two into a codec style class for the convenient path and give people to use the handlers separately. Then use That is obviously an API breaking change. For compatibility this could be baked into SslHandler which could take an optional configuration parameter in the form of an executor. Just throwing it out there. |
I still have this issue on my to do list but just had not cycles yet :( In general we should just “correctly” support an Executor here as SSLEngine.getTask() already allow doing this.
… On 6. Oct 2017, at 17:37, Roger ***@***.***> wrote:
In my mind... Split SslHandler into two classes. Let there be a new SslHandshakeHandler and some class that does everything but the handshake portion. Wrap these two into a codec style class for the convenient path and give people to use the handlers separately.
Then use ChannelPipeline#add(EventExecutorGroup, SslHandshakeHandler) if you want the handshakes to happen on a different thread than the default.
That is obviously an API breaking change. For compatibility this could be baked into SslHandler which could take an optional configuration parameter in the form of an executor.
Just throwing it out there.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#7020 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAa0QqF25oxKOwYJddEZfNbI6P9HrE3Rks5spklHgaJpZM4OiTMP>.
|
So I took a peek, looks like we have javax.net.ssl.SSLEngine#getDelegatedTask which is "implemented" here io.netty.handler.ssl.ReferenceCountedOpenSslEngine#getDelegatedTask.. @Override
public final Runnable getDelegatedTask() {
// Currently, we do not delegate SSL computation tasks
// TODO: in the future, possibly create tasks to do encrypt / decrypt async
return null;
} The Java implementation uses this exclusively for the handshake SSL computation which I think would work well enough to mitigate the DoS attack vector @rkapsi and I are concerned about (large amounts of SSL handshakes which in turn would slow down the event loop and potentially cause connection drops / latency). |
@normanmaurer if you don't mind I can try and add this to our scaling project? |
@johnou yep thats the way to go... I just had no time yet and its not trivial to make it work with the state-machine. Feel free to try tho :) |
@normanmaurer I started by pulling out all the handshake related code from ReferenceCountedOpenSslEngine into it's own class then realised that runDelegatedTasks will anyway block until the delegated task returns, even if we use another thread pool that would still block the event loop.. did you have something else in mind? |
Wouldn't it make more sense modifying io.netty.handler.ssl.SslHandler#handshake to execute in another thread pool and then set the promise allowing the connection to proceed without blocking the event loop. |
Or better yet.. why not just create a second event loop and when handshake completes, move the channel over. @Override
public void channelActive(ChannelHandlerContext context) throws Exception {
context.pipeline().get(SslHandler.class).handshakeFuture().addListener(sslFuture -> {
if (sslFuture.isSuccess()) {
context.channel().deregister().addListener(future -> {
if (future.isSuccess()) {
fastEventLoop.register(context.channel());
} else {
// log bad stuff
}
});
}
});
super.channelActive(context);
} |
@anshuman-osc would you please try with another event loop and see if that helps? |
In the linked SO thread (see the issue description) it was suggested that
registering/deregistering was hacky business, which may not be supported in
future.
…On 10 May 2018 7:19:12 pm Johno Crawford ***@***.***> wrote:
@anshuman-osc would you please try with another event loop and see if that
helps?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@normanmaurer we heavily rely on registering/deregistering channels, would be a nightmare to lose this feature (after implementing the below solution I recall we observed a nice drop in CPU), consider that 60 people are in a chat room, when an incoming message needs to be sent to all players the remote message comes in (from outside the event loop) then needs to written to every single channel waking up each individual event loop (multiple times), what I've done is taken all the event loops and put them in a helper class which uses deterministic routing.. public EventLoop getEventLoop(int hashCode) {
return eventLoops[(Math.abs(hashCode)) % eventLoops.length];
} when a player enters a room we can take the hash code of that actor / room id and move the client to the correct event loop.. final EventLoop newEventLoop = eventLoopHelper.getEventLoop(hashCode);
if (clientSession.getChannel().eventLoop() != newEventLoop) {
clientSession.getChannel().deregister().addListener(future -> {
if (future.isSuccess()) {
newEventLoop.register(clientSession.getChannel());
} else {
logger.warn("Failed moving player #" + clientSession.getPlayerId() +
" to new event loop", future.cause());
}
});
} now that we know that every single client is on the same event loop for room x when processing a remote message to broadcast we can use the event loop to send it like this.. return new SpaceStreamProxyObject<T>(this, streamActorRef,
eventLoopHelper.getEventLoop(actorIdentity.hashCode())); @Override
public Task<Void> onNext(final T data, final StreamSequenceToken sequenceToken) {
return Task.done().whenCompleteAsync((aVoid, throwable) -> {
// invoke onNext from the correct event loop thread
}, executor).thenCompose(x -> onNext0(data));
} and the message is finally sent to all interested clients on the correct event loop thread avoiding un-necessary wakeups 🎉 |
SSL handshake is really the bottleneck, especially on Oracle JSSE as it processes a lot of algorithm constraint and generates a lot of garbage and CPU intensive. |
…en processing TLS / SSL via the SslHandler. Motivation: The SSLEngine does provide a way to signal to the caller that it may need to execute a blocking / long-running task which then can be offloaded to an Executor to ensure the I/O thread is not blocked. Currently how we handle this in SslHandler is not really optimal as while we offload to the Executor we still block the I/O Thread. Modifications: - Correctly support offloading the task to the Executor while suspending processing of SSL in the I/O Thread - Add new methods to SslContext to specify the Executor when creating a SslHandler - Remove @deprecated annotations from SslHandler constructor that takes an Executor - Adjust tests to also run with the Executor to ensure all works as expected. Result: Be able to offload long running tasks to an Executor when using SslHandler. Partly fixes #7862 and #7020.
…en processing TLS / SSL via the SslHandler. Motivation: The SSLEngine does provide a way to signal to the caller that it may need to execute a blocking / long-running task which then can be offloaded to an Executor to ensure the I/O thread is not blocked. Currently how we handle this in SslHandler is not really optimal as while we offload to the Executor we still block the I/O Thread. Modifications: - Correctly support offloading the task to the Executor while suspending processing of SSL in the I/O Thread - Add new methods to SslContext to specify the Executor when creating a SslHandler - Remove @deprecated annotations from SslHandler constructor that takes an Executor - Adjust tests to also run with the Executor to ensure all works as expected. Result: Be able to offload long running tasks to an Executor when using SslHandler. Partly fixes #7862 and #7020.
#8847) Motivation: The SSLEngine does provide a way to signal to the caller that it may need to execute a blocking / long-running task which then can be offloaded to an Executor to ensure the I/O thread is not blocked. Currently how we handle this in SslHandler is not really optimal as while we offload to the Executor we still block the I/O Thread. Modifications: - Correctly support offloading the task to the Executor while suspending processing of SSL in the I/O Thread - Add new methods to SslContext to specify the Executor when creating a SslHandler - Remove @deprecated annotations from SslHandler constructor that takes an Executor - Adjust tests to also run with the Executor to ensure all works as expected. Result: Be able to offload long running tasks to an Executor when using SslHandler. Partly fixes #7862 and #7020.
#8847) Motivation: The SSLEngine does provide a way to signal to the caller that it may need to execute a blocking / long-running task which then can be offloaded to an Executor to ensure the I/O thread is not blocked. Currently how we handle this in SslHandler is not really optimal as while we offload to the Executor we still block the I/O Thread. Modifications: - Correctly support offloading the task to the Executor while suspending processing of SSL in the I/O Thread - Add new methods to SslContext to specify the Executor when creating a SslHandler - Remove @deprecated annotations from SslHandler constructor that takes an Executor - Adjust tests to also run with the Executor to ensure all works as expected. Result: Be able to offload long running tasks to an Executor when using SslHandler. Partly fixes #7862 and #7020.
Is probably related to https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8219617 |
As of today we support offloading handshake stuff from the EventLoop by specify an |
-Dio.netty.handler.ssl.openssl.useTasks=true must be passed into the JVM. |
Expected behavior
Existing connections should not be impacted and should continue to exchange messages if new connections are added
Actual behavior
If there are a few thousand connections to the server, and then we get a burst of say another few thousand connections, these new SSL handshakes keep the worker threads busy for a long time, which causes the existing connections to start timing out.
Steps to reproduce
Compile and build the reproducer code using
mvn clean package assembly:single
Start the tcp server using this command:
java -jar tcp-server-1.0.0-SNAPSHOT-jar-with-dependencies.jar
This starts a TCP server with 8 IO threads and 1 acceptor thread on port 8000. For customizing, look at the class TcpServer
Open mission control and monitor the Connection Count using the TcpServer.getConnectionCount managed attribute. This should be 0 to start with
On a second machine, trigger the load using the tcp client jar.
java -jar tcp-client-1.0.0-SNAPSHOT-jar-with-dependencies.jar
This will trigger 5000 concurrent handshakes to the server. These connections are kept open and the client keeps sending requests to the server (1 at a time). These values can also be customized using the LoadRunner class
Monitor for a couple of minutes, the connection count reported would be 5000
On a third machine, repeat step 4 and monitor the connection count. You will see the connection count to rise to 10000, and after a couple of minutes the clients start timing out (graph attached)
Minimal yet complete reproducer code (or URL to code)
https://github.com/anshuman-osc/netty.git
Netty version
4.1.6.Final, 4.1.12.Final
JVM version (e.g.
java -version
)java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
OS version (e.g.
uname -a
)Tested on 2 Linux distributions:
Linux 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Linux 3.0.101-63-default #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c) x86_64 x86_64 x86_64 GNU/Linux
![io-threads-blocked-counts](https://user-images.githubusercontent.com/7992699/28565800-9868cc76-7170-11e7-9f4e-55b4780a5c71.png)
Stack overflow thread
https://stackoverflow.com/questions/44751058/multiple-worker-event-loop-groups
The text was updated successfully, but these errors were encountered: