-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server performance issues #3826
Comments
@xiaoshuang-lu would you mind sharing the code to construct your gRPC server? The following is mine and it works well: Executor executor = Utils.getDefaultExecutor(threadNum);
bossEventGroup = new EpollEventLoopGroup();
workerEventGroup = new EpollEventLoopGroup();
server = NettyServerBuilder.forPort(port)
.addService(
ServerInterceptors.intercept(
new Service(backendFutureStub),
new AuthInterceptor()
)
)
.bossEventLoopGroup(bossEventGroup)
.workerEventLoopGroup(workerEventGroup)
.channelType(EpollServerSocketChannel.class)
.executor(executor)
.build(); |
Server
Service
|
@xiaoshuang-lu Can you try using ForkJoinPool instead of ThreadPoolExecutor? It's what we use for gRPC benchmarks because it has better behavior under thread contention. |
@carl-mastrangelo If ForkJoinPool is used, do we probably exhaust threads? |
@xiaoshuang-lu I think they are created as needed. They don't go above the desired parallelism unless you use ManagedBlocker. (this is rare). |
Thread dump indicates that most netty workers are in SerializingExecutor.execute(). |
@xiaoshuang-lu that means they are running your application code. What is the QPS you get? |
|
From the output of that text, it looks like you didn't use ForkJoinPool. |
No, I didn't. Yesterday I have tried ForkJoinPool instead of ThreadPoolExecutor. Unfortunately JVM crashed. I will investigate this problem. Thread dump hints that bottlenecks are in SerializingExecutor. |
Hi @carl-mastrangelo, are there some performance testing reports the users can refer to? |
We have a live dashboard of performance by language: You can also look at our benchmarks/ directory to see the code that performs the benchmarks We don't have any tutorials yet on how to optimize. |
Thank you, @carl-mastrangelo. I will try the programs in benchmarks directory. |
QPS in Unary secure throughput server CPU (32 core client to 32 core server) is about 850k. I ran the benchmarks (qps_server and qps_client) in my environment. But I just could not achieve QPS as the live dashboard. Here are my commands. Did I make something wrong?
PS: I used 40 threads as netty server workers. |
And what QPS did you get with that configuration? If you could include information like the load averages, or strace output, or running Alternatively, if you could post your code it would reduce the number of roundtrips on this issue. |
I will upload my code package next week. Thanks. |
@carl-mastrangelo Here is my code package. :-) |
So as I mentioned up above, you need to replace Second, you have too many netty threads. In normal, non-grpc netty, worker threads do most of the work. In gRPC, we move work off the net threads as soon as possible. The net effect to you is that you really only need ~1-2 workers in your Event Loop. |
It does not make any difference. |
If you can give a reproducible example I can look into this further. I will need the exact command you used to run and the specs of the hardware you ran on. |
Hardware and Software Specs
Server Command
Client Command
|
I ran the code you provided in grpc-benchmark. I made the following changes:
Some things are kind of odd:
|
@carl-mastrangelo Thank you very much. I appreciate your reply.
Each client has 4 threads, a thread pool with 2 threads and a event loop group with 2 threads. So the total thread number should be clientNumber * 4.
I limited the event loop group thread number per client to 2. As mentioned above this may be not a problem.
I leveraged dropwizard metrics to collect statistics data. The code can be found in GRPCTestServiceServer.java line 45 and GRPCTestServiceClient.java line 90. You can add reporters (e.g. GraphiteReporter) to MetricRegistry. Again, thanks Carl. |
Looks like this is resolved. Closing. |
I am afraid this issue is not resolved... :( |
@carl-mastrangelo, do you know what needs to be done here? |
Without more data about the actual run, I don't know what more can be done here. @xiaoshuang-lu If you follow the instructions here: https://medium.com/netflix-techblog/java-in-flames-e763b3d32166 It can give you much more detailed data about why grpc is slow. |
Hi, I think it may not be difficult to reproduce this issue if there is a ganglia server or something similar. Add |
@xiaoshuang-lu, did you have success on getting a repro? |
Closing it due to inactivity. @xiaoshuang-lu, feel free to reopen if you have new updates. |
Hi grpc-java guys, I have wrote some programs to do the performance testing for grpc-java. The service is very simple, just echo the strings (about 400 bytes per request) received from clients. (Actually, I disabled servers' log.) Appears that it's is difficult for me to archive high requests-per-second and low response-time.
requests-per-second: about 220k (expected: 1000MB/400B)
reponse-time: p99 25ms, p999 33ms (expected: p99 below 4ms)
My environment and configurations are listed as follows.
Environment: 48 cores/128GB memory/10000Mbps net/CentOS Linux release 7.2.1511
gRPC Version: 1.8.0
JVM options: JDK 1.7.80 -Xms4g -Xmx8g
bossGroup: 4 threads
workerGroup: 160 threads
executor: ThreadPoolExecutor(64, 64, 1000 * 60 * 10, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(65536))
threaddump.txt
According to the thread dump, many worker threads stick to SerializingExecutor.
The text was updated successfully, but these errors were encountered: