New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High cpu usage when up to about 500 connections. #5278
Comments
After the test, i found that the threshold value of the problem (high cpu usage) is related to the threads number directly, which can be set by '-Dio.netty.eventLoopThreads'. If -Dio.netty.eventLoopThreads=8, then threshold value is about 500 ~~ 8 * 64; So, I guess the problem happens only when most of the threads is up to 64 connections. However, why does only one computer will happen ??? What is the real reason to the problem ? |
+1 @windie - Also great to see you are a Netty contributor now! Welcome aboard!! |
@Scottmitch - Thanks :) |
@windie There is also the same problem with 4.1.0.CR7 in the aforementioned computer. It's really difficult for me to make sure what in the computer causes the bug. Do you have any idea? |
@alvin-xu what JDK? are you able to take a cpu snapshot with VisualVM or another profiling tool? |
@alvin-xu I cannot reproduce it. Do you have a simple reproducer? |
I've had this problem too. The problem will disappear if I use |
JDK version is as following: @johnou
This is the VisualVM Snapshot, you should change the suffix to “nps”,then you can read it. |
In the above result, I just start a common server received connection and do nothing, with the init configuration, -Dio.netty.eventLoopThreads==2. I think the code is not the reason, but the specific platform. @windie private EventLoopGroup bossGroup;
private EventLoopGroup workerGroup;
private ServerBootstrap bootstrap;
private int port;
public EapManageServer(int port) {
bossGroup = new NioEventLoopGroup();
workerGroup = new NioEventLoopGroup();
bootstrap = new ServerBootstrap();
this.port = port;
bootstrap.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class)
.option(ChannelOption.SO_BACKLOG, 100).option(ChannelOption.TCP_NODELAY, true)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel socketChannel) throws Exception {
System.out.println("connected");
socketChannel.pipeline().addLast(new DiscardServerHandler());
}
});
}
public void run() throws InterruptedException {
bootstrap.bind(port).sync().channel().closeFuture().sync();
}
public void stop() {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
}` And then start a number of clients to connect the server. When the client amount reach 120+, then cpu usage will rise to 50% ( I only start with 2 threads, and the computer has 4 cpu center). public static void client(String host, int port) throws InterruptedException {
EventLoopGroup workerGroup = new NioEventLoopGroup();
try {
Bootstrap b = new Bootstrap();
b.group(workerGroup);
b.channel(NioSocketChannel.class);
b.option(ChannelOption.SO_KEEPALIVE, true);
b.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
ch.pipeline().addLast(new TimeClientHandler());
}
});
// Start the client.
ChannelFuture f = b.connect(host, port).sync();
// Wait until the connection is closed.
f.channel().closeFuture().sync();
} finally {
workerGroup.shutdownGracefully();
}
}
public static void main(String[] args) throws Exception {
ExecutorService es = Executors.newCachedThreadPool();
for(int i=0;i<120; i++) {
es.execute(new Runnable() {
@Override
public void run() {
try {
client("192.168.0.23", 8080);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
System.out.println(i);
}
} |
can you provide a few stack traces while in this condition (use |
There are three files with jstack. Jstack_0 is captured when server started up and no connections, jstack_124 is up to 124 connections but normal, the last one is the error one up to 128 connections, CPU usage is up to 50% by the netty server at this time. However, I can't find any problem, May be you could. : ) @Scottmitch |
@chenxiuheng The “netty-transport-native-epoll” is only supported in Linux, but my problem is occured in windows. Thank you for your help anyway. |
@alvin-xu - Thanks for the stack dumps. It would be useful if you could provide a few stack traces while in the CPU is at a persistent elevated usage. IIUC only the last stack trace was at elevated CPU usage. I would like to compare the stack traces to see if there is any interesting pattern. Lets start with 4 stack traces while CPU is elevated. |
I'm sorry for that I can't understand your intent clearly. I used jstack barely. I think the following stack dumps may what you want. @Scottmitch . They are all captured while CPU is elevated. |
@alvin-xu - Thanks for the info. It appears that you are using Netty 5.x? Can you provide the same information running with Netty 4.1.0.Final (5.x has been deprecated as @windie mentioned #5278 (comment))? |
@Scottmitch I used 4.1.0.Final, and also has the same problem on that computer. The following is stack trace. I closed the last 2 connections in "jstackl_4.1.0_10_normal.txt" and "jstackl_4.1.0_11_normal.txt", which will make cpu usage normal. Then, I opened another 2 connections in the latter tester and problem show again. jstackl_4.1.0_1.txt |
@alvin-xu - Thanks again for the stack traces. Most of the stack traces look like we are waiting on the poll mechanism. However the 12th stack trace shows an interesting sign that we are being interrupted [1]. It is possible that the polling mechanism is behaving in a way we don't expect and we are just spinning calling poll ... can you verify which threads are consuming unusually high amount of CPU via a profiler (visualvm should work)? Can you also verify if you are seeing log statements like [2]: [1]
[2]
|
@Scottmitch I took 3 snapshot with cpu profiler as following:
The log i captured is as following. I think it much like what you said.
|
Yes it is... Can you tell us what 'SelectorProvider.provider()' returns on this system?
|
@normanmaurer It's I used
|
Is the other computer running 1.7.0_79 too? |
@johnou No, It's |
So it not happens on 1.8.0_60? If so can you uprade on the system where you see the problem and check again?
|
Are you able to upgrade to 1.8 to rule out a bug in 1.7 JRE? |
It seems the problem has no direct relation with JDK. I have uninstalled JDK1.7 and installed JDK1.8, however, the problem still there.
|
I try to debug the library. And I found that the select function [1] does not block for the specify So, the focus is why the select function [1] does not block ? And the above actions which you suggested to do could rule out the factor of JDK。 Are there any other reasons which could cause the problem? such as Network Interface Card。
|
It soubds like either a OS or JDK bug
|
@normanmaurer I only found the problem with one computer I mentioned at beginning. This is also really strange. So I doubt that some hardware of the computer cause the problem ... |
@alvin-xu Rarely meet Chinese player... |
Haha, does my Chinese English sells me out? @freevest |
Let me close this as its very old and there was no more followup. |
Let me close this as its very old and there was no more followup. |
This is a really strange problem which happened only in a specific computer.
Computer: Dell Latitude E5520, 4G memory, Windows 7 32bit.
netty-version: 5.0.0.Alpha1, 5.0.0.Alpha2
When the connection to the netty server up to about 500 (there is a threshold value, but not certainly). A new connection will cause one thread use cpu 25%, and always print the following message. And another new connection will increase to 50% cpu usage. If i close the above two connections, the cpu usage will become normal and not print the abnormal message immediately. If i close other unrelated connections, amounts may be 10 or 12, not ceratinly, but cpu usage will be normal finally.
2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [WARN]-[Slf4JLogger.java:136] - Selector.select() returned prematurely 1024 times in a row; rebuilding selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [INFO]-[Slf4JLogger.java:101] - Migrated 64 channel(s) to the new Selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [WARN]-[Slf4JLogger.java:136] - Selector.select() returned prematurely 1024 times in a row; rebuilding selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [INFO]-[Slf4JLogger.java:101] - Migrated 64 channel(s) to the new Selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [WARN]-[Slf4JLogger.java:136] - Selector.select() returned prematurely 1024 times in a row; rebuilding selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [INFO]-[Slf4JLogger.java:101] - Migrated 64 channel(s) to the new Selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [WARN]-[Slf4JLogger.java:136] - Selector.select() returned prematurely 1024 times in a row; rebuilding selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [INFO]-[Slf4JLogger.java:101] - Migrated 64 channel(s) to the new Selector. 2016-05-19 11:19:01 [Thread-1] [DEBUG]-[EapBootstrap.java:415] - send heartbeat... 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [WARN]-[Slf4JLogger.java:136] - Selector.select() returned prematurely 1024 times in a row; rebuilding selector. 2016-05-19 11:19:01 [nioEventLoopGroup-11-1] [INFO]-[Slf4JLogger.java:101] - Migrated 64 channel(s) to the new Selector.
The text was updated successfully, but these errors were encountered: