Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netty doesnt handle 1k request in one connection #10348

Open
sukawanth opened this issue Jun 11, 2020 · 9 comments
Open

netty doesnt handle 1k request in one connection #10348

sukawanth opened this issue Jun 11, 2020 · 9 comments

Comments

@sukawanth
Copy link

We are using spring cloud gateway as a HTTP2 proxy, with reactor netty to receive request and uses "Spring WebClient + reactor Jetty" to forward the request to the other hop.

We are sending 4k requests over four connections to proxy. Sometimes only header Frame is sent to jetty client but it fails to receive the request content.

We are running the proxy in a Kubernetes environment with 4VCPU and 4GB as the profile for the container.

Our understanding is 4 threads are created in reactor netty server and each thread is handling one connection. Can it be possible that netty threads are busy and cannot respond to jetty client content subscription request. We tried all other options and narrowed down to this cause.

If we run the same traffic with 32 connections towards proxy then there is no issue...!
If there any tuning to be done to achieve this kind of traffic model.

We are using Vegeta to do HTTP2 load testing and as I mentioned earlier we created 4 connections 20 seconds apart and running 1k traffic on each. We see four connections taking equal amount of load and suddenly one thread starts dangling and requests only on the particular thread starts to timeout.

Our observation is when ever second connection is made the first connection CPU utilization shoots to 80% on one core. The timeouts are happening because jetty client is not able to read the request body content form reactor netty thread and its always the first thread that is unstable.
Is it because the first thread also acts as a boss thread?

Is there a way to set max-concurrent streams and window_size in the HTTP2 Settings frame?.

Netty Version: netty 4.1.50.Final
JVM version (javar -version): open jdk 14
OS and version (eg uname -a): Linux 4.1.12-112.16.4.el7uek.x86_64

@amizurov
Copy link
Sponsor Contributor

Hi, Gateway doesn't use Netty directly it build on top of another Spring components (gateway - spring (web server/client) - reactor-netty - netty) there are a lot of components in this chain that can influence the correct behavior. Did you create an issue on Gateway project ?

@johnou
Copy link
Contributor

johnou commented Jun 12, 2020

@sukawanth did you configure your Netty server with two separate event loops? io.netty.bootstrap.ServerBootstrap#group(io.netty.channel.EventLoopGroup, io.netty.channel.EventLoopGroup)

bossGroup = Epoll.isAvailable() ? new EpollEventLoopGroup(1) : new NioEventLoopGroup(1);
workerGroup = Epoll.isAvailable() ? new EpollEventLoopGroup(0) : new NioEventLoopGroup(0);
bootstrap.group(bossGroup, workerGroup);
bootstrap.channel(Epoll.isAvailable() ? EpollServerSocketChannel.class : NioServerSocketChannel.class);

@hyperxpro
Copy link
Contributor

hyperxpro commented Jun 12, 2020

The title doesn't look good to me. We built our Load Balancer on top of Netty and doing 10M connections is a piece of cake for us now. So I'm pretty sure Netty is not stopping you from 1k request in 1 connection, must be something else.

@sukawanth
Copy link
Author

@amizurov Yes I have raised an issue on the reactor netty also.

@sukawanth
Copy link
Author

@hyperxpro so the issue is not with the number of connection, it works fine if we test the same load on 32 connections or more. Its about the number of requests that each connection can handle with a configured latency of 2sec.

@hyperxpro
Copy link
Contributor

hyperxpro commented Jun 12, 2020

Which OS are you testing it on?

@sukawanth
Copy link
Author

@hyperxpro As i mentioned it is running on K8S cluster and reactor-netty is running on top of netty.
K8S version -v1.13.2
Reactor version(s) used: 0.9.5.RELEASE
JVM version (javar -version): open jdk 14
OS and version (eg uname -a): Linux 4.1.12-112.16.4.el7uek.x86_64

@hyperxpro
Copy link
Contributor

Sorry I didn't check that. Have you tuned underlying OS networking?

@sukawanth
Copy link
Author

@hyperxpro Since it runs on Linux it uses the native epoll and reactor netty sitting on top of it. We did some configurations like allocating LoopResources and increasing the number of worker and selector threads, but nothing seems to help.

Since It is a kubernetes cluster we don't have that much access since the cluster is shared by many parties and we dont have permissions for it. Is there any tuning suggestions proposed.

Thanks in advance..!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants