Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selector.select() returned prematurely 512 times in a row #8698

Open
mikomarrache opened this issue Dec 30, 2018 · 24 comments
Open

Selector.select() returned prematurely 512 times in a row #8698

mikomarrache opened this issue Dec 30, 2018 · 24 comments
Assignees

Comments

@mikomarrache
Copy link

I'm using Netty 4.1.32 and I see the following very very often in the logs:

2018-12-30 10:06:53.286  WARN 22878 --- [nioEventLoopGroup-2-7] io.netty.channel.nio.NioEventLoop        : Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@280b6859.
2018-12-30 10:06:53.287  INFO 22878 --- [nioEventLoopGroup-2-7] io.netty.channel.nio.NioEventLoop        : Migrated 20 channel(s) to the new Selector.

I assume I'm doing something wrong since it's logged as a warning.

My app is running on CentOS 7.

@normanmaurer
Copy link
Member

Can you share a reproducer ?

@mikomarrache
Copy link
Author

mikomarrache commented Dec 30, 2018

Not really, I'm using Netty inside a big application that I can't share.

Is there anything else I can do to help understand what is causing that?

@normanmaurer
Copy link
Member

Sorry but without some sort of reproducer there is almost zero chance I can help. Generally speaking what you see is unexpected

@mikomarrache
Copy link
Author

mikomarrache commented Dec 31, 2018

I tried to disable some Netty-based components in the application and this is what I get in the logs:

2018-12-31 12:01:07.035  WARN 10590 --- [nioEventLoopGroup-2-2] io.netty.channel.DefaultChannelPipeline  : An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle
 the exception.

io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 889192448, max: 893386752)
        at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655) ~[netty-common-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610) ~[netty-common-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:214) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) ~[netty-buffer-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.socket.nio.NioDatagramChannel.doReadMessages(NioDatagramChannel.java:244) ~[netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:75) ~[netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591) [netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508) [netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar!/:4.1.32.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.32.Final.jar!/:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]

2018-12-31 12:01:07.053  WARN 10590 --- [nioEventLoopGroup-2-2] io.netty.channel.nio.NioEventLoop        : Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@5cabe2d2.
2018-12-31 12:01:07.055  INFO 10590 --- [nioEventLoopGroup-2-2] io.netty.channel.nio.NioEventLoop        : Migrated 0 channel(s) to the new Selector.

I'm pretty sure I don't have a leak here. I also enabled -Dio.netty.leakDetectionLevel=advanced and there is nothing in the logs.

I saw on other posts that it may be related to the application writing too fast. I'm sure it's not the case here since the application is sending a small UDP datagram every minute. However, I know that the app also receives a LOT of datagrams that it processes. Am I doing something wrong here?

@mikomarrache
Copy link
Author

I made 2 changes that solved the issues:

  1. I release the ReferenceCounted (in my case, a DatagramPacket) as early as possible. This is clear to me that it has an impact on the direct memory used at any point of time.

  2. I have a single Channel receiving DatagramPackets. The pipeline of this Channel consists of a single handler that dispatches processing of every DatagramPacket to an ExecutorService. The processing of a DatagramPacket involved an IO operation that was performed synchronously from this thread. I moved this operation to be executed later on a different ExecutorService.

First, I made the changes for point 1 and it didn't solve the issue (though I understand that it should help.

Then, I made the changes for the point 2 and the issues doesn't appear anymore.

I'm not sure why 2 solves the issue since my event loop single thread simply dispatches the packets (and submission is never rejected).

@mikomarrache
Copy link
Author

After running a few days, I got the OutOfDirectMemoryError again but not the selector's issue.

I suspected the possibility that I'm writing too fast and indeed, using Channel.isWritable(), I see that the channels are very very often in a non-writable state.

What can be the cause?

@normanmaurer
Copy link
Member

@mikomarrache most likely the remote peer does not "read fast enough" or you submit writes faster then the EventLoop can process these from outside the EventLoop. Another possibility is that you do not call flush frequently enough while using write (and not writeAndFlush).

@mikomarrache
Copy link
Author

@normanmaurer

I'm seeing this behavior some time (a few hours, maybe more - I'm investigating) after the application is started.

I'm sure it's not related to the flush since I only call writeAndFlush.

I also strongly doubt that the remote peer doesn't read fast enough since, after establishing the connection and a few initialization messages, I only send small ping messages every 30 seconds.

Therefore, I'm trying to understand what you are saying about the EventLoop. My bootstrap is configured as follows:

bootstrap = new Bootstrap()
	.group(new EpollEventLoopGroup())
	.channel(EpollSocketChannel.class)
	.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
	.handler(/* My handler */);

I'm using the default EventLoopGroup constructor, so it creates threads based on the number of cores.

@mikomarrache
Copy link
Author

mikomarrache commented Feb 14, 2019

@normanmaurer

FYI, I'm not using the Bootstrap.connect() method as usual but I do the following:

final ChannelFuture registerFuture = bootstrap.register().syncUninterruptibly();
final Channel channel = registerFuture.channel();

/* Keep a reference of the channel somewhere */

channel.connect(/* Socket address */)
	.addListener((ChannelFuture future) -> {
		        if (!future.isSuccess()) {
			            future.channel().eventLoop().schedule(() -> {
			                doConnect();
			            }, 30, TimeUnit.SECONDS);
		                    return; 
                       }
		        
		       channel.closeFuture().addListener((ChannelFuture _f) -> {
				doConnect();
			});
	});

I don't think that's related but just in case...

@mikomarrache
Copy link
Author

I have multiple environments that have exactly the same application. The only difference is the number of Channels created by the application (and therefore the number of read/write operations handled by the EventLoop). The problematic environment is the one with 200 Channels while the other environments manage not more than 60 Channels. Therefore, I'm tend to think the issue is related to the EventLoop not being able to manage such number of operations. However, I'm very surprised that 200 TCP connections (with TLS) is the upper bound here.

Is there an easy way to check that it is really the issue? And if that's really the issue, what can be done to solve it?

@xzzh999
Copy link

xzzh999 commented Feb 27, 2019

@normanmaurer
HI~

I have the same problem, just keep logging and CPU comes too high (run in windows 7 64bit after 1~2 hours):

2019-02-27 13:50:31.007 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@eb3195b.
2019-02-27 13:50:31.009 INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
2019-02-27 13:50:36.013 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@62cac379.
2019-02-27 13:50:36.013 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@26c6900e.
2019-02-27 13:50:36.013 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@6b07426c.
2019-02-27 13:50:36.015 INFO  io.netty.channel.nio.NioEventLoop - Migrated 2 channel(s) to the new Selector.
2019-02-27 13:50:36.015 INFO  io.netty.channel.nio.NioEventLoop - Migrated 0 channel(s) to the new Selector.
2019-02-27 13:50:36.016 INFO  io.netty.channel.nio.NioEventLoop - Migrated 0 channel(s) to the new Selector.
2019-02-27 13:50:49.906 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@21f0d42d.
2019-02-27 13:50:49.906 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@8836b33.
2019-02-27 13:50:49.908 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@5bb72d7c.
2019-02-27 13:50:49.906 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@51838af8.
2019-02-27 13:50:49.913 INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
2019-02-27 13:50:49.913 INFO  io.netty.channel.nio.NioEventLoop - Migrated 0 channel(s) to the new Selector.
2019-02-27 13:50:49.913 INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
2019-02-27 13:50:49.913 INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
2019-02-27 13:50:49.946 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@5131c2fa.
2019-02-27 13:50:49.946 WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@530b16ca.
2019-02-27 13:50:49.949 INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
2019-02-27 13:50:49.949 INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.

Is this issue relative to this for 4 years? Any better solution?

@normanmaurer
Copy link
Member

@xzzh999 any reproducer ?

@xzzh999
Copy link

xzzh999 commented Feb 27, 2019

@normanmaurer

use:
java 1.8.0_161
windows 7 64bit
io.netty: netty-all 4.1.24.Final

almost at the time of 1~2 hours after startup a pool and do some udp requests every 10 seconds, here is the simple code snapshot:

final EventLoopGroup group = new NioEventLoopGroup();
final Bootstrap strap = new Bootstrap();

strap.group(group).channel(NioDatagramChannel.class);
poolMap = new AbstractChannelPoolMap<InetSocketAddress, FixedChannelPool>() {
	@Override
	protected FixedChannelPool newPool(InetSocketAddress key) {
		return new FixedChannelPool(strap.remoteAddress(key), new DeviceClientPoolHandler(), MAX_CONNECTION_CHANNEL_SIZE);
	}
};
public List<ByteBuf> callSendMsgToPoint(ReqMsg req, RemotePoint rp, TimeoutParam tp) {
        InetSocketAddress addr = new InetSocketAddress(rp.ip, rp.port);
        FixedChannelPool scp = poolMap.get(addr);
        Future<Channel> fc = scp.acquire();
        if(fc == null)
        {
            logger.error("callSendMsgToPoint failed, client pool acquire failed!");
            return null;
        }
        try{
            // wait a very small time to block to get channel from the pool
            Channel ch = fc.get(1500,TimeUnit.MILLISECONDS);
            if(ch == null){
                logger.warn("callSendMsgToPoint get null channel, ignore and return null!" );
                return null;
            }
            // call the channel handle's method
            DeviceClientHandler dch = (DeviceClientHandler) ch.pipeline().last();
            dch.resetRemoteAddr(rp.ip, rp.port);
            List<ByteBuf> lst_bb = dch.doRequestMsg(req, tp.timeout, tp.unit);
            scp.release(ch);
            return lst_bb;

        }catch (Exception e){
            logger.error("callSendMsgToPoint got exception! " + e.toString());
        }
        return null;
    }

@normanmaurer
Copy link
Member

@xzzh999 so you not even serve any traffic ?

@xzzh999
Copy link

xzzh999 commented Feb 27, 2019

@normanmaurer very low traffic, there are about 20 remote endpoints, but only 2 are alive.

@normanmaurer
Copy link
Member

unfortunally I will need some kind of "full reproducer" that I can run.

@normanmaurer
Copy link
Member

@xzzh999 ping

@xzzh999
Copy link

xzzh999 commented May 26, 2019

@normanmaurer

Sorry for the late reply...

I found some memory leaks of release the bytebuf message at the read callback function, after i fixed these leaks it seems the CPU is getting normal. Though its still print the follow messages:

Selector.select() returned prematurely 512 times in a row;

...

I'm not sure if this is associated...

@normanmaurer
Copy link
Member

@xzzh999 this log should still not show up usually... how often you see this ?

@xzzh999
Copy link

xzzh999 commented May 27, 2019

@normanmaurer

about 20~30 min

@normanmaurer
Copy link
Member

@xzzh999 can you retest with latest 4.1 branch and see if it is still a problem ?

@jamesdbloom
Copy link

jamesdbloom commented Jan 1, 2020

If it helps I have noticed as I increase the number of event loop threads the number of errors I see go up roughly in proportion.

As I increase -
threads: 25 to 65 (2.6x)
errors increase: 10 to 20 (2.1x)
As I increase -
threads: 65 to 80 (1.23x)
errors increase: 21 to 28 (1.3x)

I am seeing these error while running the MockServer build, for example, https://buildkite.com/mockserver/mockserver/builds/1012 for repo: https://github.com/mock-server/mockserver

This is a reproducer but perhaps not the most helpful one.

@afelino
Copy link

afelino commented Oct 20, 2022

I've met the same problem on Android on 4.1.82.Final. Do you need such a reproducer?

@apkelly
Copy link

apkelly commented Nov 10, 2023

I too am seeing this error on a device running Oreo

11-01 22:07:02.567 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.580 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@b16c460.
11-01 22:07:02.582 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.597 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@71461de.
11-01 22:07:02.597 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.609 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@893c68c.
11-01 22:07:02.610 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.622 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@a80bdea.
11-01 22:07:02.623 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.635 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@6fcbf78.
11-01 22:07:02.636 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.648 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@6a4ceb6.
11-01 22:07:02.649 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.660 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@cfa5b24.
11-01 22:07:02.661 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.672 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@6272042.
11-01 22:07:02.673 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.684 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@1140590.
11-01 22:07:02.685 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.696 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@58afe8e.
11-01 22:07:02.697 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.708 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@8dbeabc.
11-01 22:07:02.709 3393 4917 I NioEventLoop: Migrated 1 channel(s) to the new Selector.
11-01 22:07:02.720 3393 4917 W NioEventLoop: Selector.select() returned prematurely 512 times in a row; rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@43c759a.

I'm unable to reproduce this, but we use Netty indirectly through Ktor v1.6.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants