New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU utilization in Spring Cloud Gateway application #1908
Comments
Please try again with boot 2.3.3 and Hoxton.SR7 |
Hi @spencergibb, I tried again with boot 2.3.3 and Hoxton.SR7 but do not see much difference. Below are some screen shots of "top | grep 18316" command after few minutes of start of PST 18316 lmpserv+ 20 0 5280316 571796 16488 S 52.9 3.5 1:35.54 java Noticed one more thread this time showing in top list of "top -n 1 -H -p 18316" occasionally but mostly it is the 4 "epoll" threads "AsyncAppender-Worker-async" #14 daemon prio=5 os_prio=0 tid=0x00007f5f5caa9000 nid=0x47a6 waiting on condition [0x00007f5f325c6000] Below are some of the parameters set in Jmeter JMX file
|
We also have a requirement to log request and response. For that we have the below filter. With this filter in place, the utilization is almost double (~ 80% after the initial few mins of PST start). Its is mostly the 4 epoll threads which show high CPU utilization. This is for an application which is going to production soon. Any timely help would be appreciated as this issue has become a bottleneck Below is the filter code
|
Seeing below in log file....Could this have anything to do with issue
|
The library has been successfully loaded. The |
If I remember correctly this issue is related to logging usage |
We are experiencing the same issue. Our stack is on 2.3.3.RELEASE, Hoxton.SR7. The gateway is running in a docker container on top of Centos 7.8.2003 with AdoptOpenJdk (build 11.0.8+10). I tried to experiment with both nio and epoll, different sizes of the selector, and worker thread pools, but the results are practically the same. We are migrating from Zuul gateway and the CPU usage is > ~2x higher under the same load. The only issue I was able to spot is with the lookup of ReactorLoadBalancer in the ReactiveLoadBalancerClientFilter (#1941), but I do not think this explains the whole story. We are using plain Netty or Spring WebFlux (on netty-reactor) in different microservices (using the same underlying jvm, os, and infra), but we did not run into this issue. |
@jacob2221 did the logging hint help? |
I had the same problem before, and I've resolved it by remove usage of |
@lkolisko have you upgraded to Hoxton.SR8 and boot 2.3.4? |
We migrated to Hoxton.SR8 and boot 2.3.4 recently. I did some preliminary performance testing and did not see any change in the CPU utilization metric based on the upgrade. However, I was able to lower CPU utilization disabling Spring Sleuth spring-cloud/spring-cloud-sleuth#1397 and implementing ReactiveLoadBalancerClientFilter lookup fix (#1941). |
#1941 was already merged into |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue. |
Hi, We're seeing the same issue using spring-cloud (Hoxton.SR9), spring-boot (2.3.6.RELEASE) including spring-cloud-sleuth propagation enabled and spring.sleuth.reactor.decorate-on-each=false.
standalone-gateway and embedded-gateway both have exactly the same spring-cloud-sleuth configurations regarding reactor decoration, correlation fields, local fields and remote fields. Load test starts by gradually ramping up to 180 req per seconds and maintaining that rate for 5 minutes. We increase the load to 240 req/secs for 5 minutes We increase the load to 300 req/secs for 10 minutes We try to increase the load to 400 req/secs The main differences between standalone-gateway and embedded-gateway I've managed to see are:
I hope it helps. |
@qqeasonchen Hi, have you already found out the cause of the issue and solve it ? |
Hi, not yet.
…On Mon, Mar 22, 2021 at 1:44 PM flyonce ***@***.***> wrote:
@qqeasonchen <https://github.com/qqeasonchen> Hi, have you already found
out the cause the issue and solve it ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1908 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGH3J5OWEW3ZCMUY5AMLFDTTE3KLDANCNFSM4QE4YIEA>
.
|
In my case, the trace log looks like issues mentioned in: To add timeout on handling request, it help to me to recover from the high cpu utilization.
|
Hi,
Sample Spring Cloud gateway app used for PST is available at - https://github.com/jacob2221/basic-gateway
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
JRE version used - Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
RESULTS
Below command was executed while PST was running
top -n 1 -H -p
Using the hex value of the thread PIDs from above command on search the output of "jstack ", it shows 4 threads each having name like "reactor-http-epoll-*" and stack as below -
"reactor-http-epoll-2" #21 daemon prio=5 os_prio=0 tid=0x00007fcd20005000 nid=0x1b086 runnable [0x00007fcd40a6e000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:148)
at io.netty.channel.epoll.Native.epollWait(Native.java:141)
at io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange(EpollEventLoop.java:290)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:347)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
JVM arguments used for the test
MEM_ARGS="-Xms2048m -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupanc
yOnly -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=102
40K -Xloggc:$FILENAME -verbose:gc -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$LOGDIR -XX:+PrintStringTableStatistics -Djava.security.egd=file:///dev/./urandom -Dr
eactor.netty.http.server.accessLogEnabled=true -Dio.netty.leakDetection.level=advanced"
Could you help advise why these threads are consuming high CPU? Are there any further settings we can do to optimize this?
The text was updated successfully, but these errors were encountered: