Selector.select() returned prematurely 512 times in a row; rebuilding selector while run testsuite #2616

Closed
normanmaurer opened this Issue Jun 27, 2014 · 7 comments

Comments

Projects
None yet
5 participants
Owner

normanmaurer commented Jun 27, 2014

The following is logged on my crunchbang linux installation:

21:30:14.072 [Thread-1] INFO  i.n.t.t.socket.SocketStartTlsTest - Running: testStartTls[0: serverEngine = io.netty.handler.ssl.JdkSslServerContext@50aeaf68, clientEngine = io.netty.handler.ssl.JdkSslClientContext@33bd9009] 3 of 3 (ServerBootstrap(group: OioEventLoopGroup, channelFactory: OioServerSocketChannel.class, localAddress: /0:0:0:0:0:0:0:1%1:33570, options: {SO_TIMEOUT=10, ALLOCATOR=UnpooledByteBufAllocator(directByDefault: true)}, childGroup: OioEventLoopGroup, childOptions: {ALLOCATOR=UnpooledByteBufAllocator(directByDefault: true)}) + Bootstrap(group: NioEventLoopGroup, channelFactory: NioSocketChannel.class, options: {ALLOCATOR=UnpooledByteBufAllocator(directByDefault: true)}, remoteAddress: /0:0:0:0:0:0:0:1%1:33570)) with UnpooledByteBufAllocator
21:30:14.075 [testsuite-nio-worker-4-3] WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding selector.
21:30:14.075 [testsuite-nio-worker-4-3] INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
21:30:14.075 [testsuite-nio-worker-4-3] WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding selector.
21:30:14.075 [testsuite-nio-worker-4-3] INFO  io.netty.channel.nio.NioEventLoop - Migrated 1 channel(s) to the new Selector.
21:30:14.076 [testsuite-nio-worker-4-3] WARN  io.netty.channel.nio.NioEventLoop - Selector.select() returned prematurely 512 times in a row; rebuilding selector.
➜  testsuite git:(4.0) ✗ java -version
java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-2)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
➜  testsuite git:(4.0) ✗ uname -a
Linux dev05 3.14-1-amd64 #1 SMP Debian 3.14.7-1 (2014-06-16) x86_64 GNU/Linux

This is reproducible all the time

@normanmaurer normanmaurer added this to the 4.0.21.Final milestone Jun 27, 2014

Owner

normanmaurer commented Jul 1, 2014

After more debugging this only happens on linux kernel 3.14. After downgrade to kernel 3.10.45 all works like expected.

The problem can be reproduced with pure java.nio:

import java.io.IOException;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.SocketChannel;
import java.util.Iterator;
import java.util.concurrent.TimeUnit;

public class SelectorBug {

    public static void main(String[] args) throws Exception {

        SelectorLoop selectThread = new SelectorLoop();
        selectThread.start();
        SocketChannel channel = SocketChannel.open();
        channel.configureBlocking(false);
        SelectionKey key = channel.register(selectThread.selector, 0, null);
        selectThread.selector.wakeup();
        Thread.sleep(10000);
        selectThread.shutdown();
        key.cancel();
    }

    private static final class SelectorLoop extends Thread {
        private volatile boolean shutdown;
        private final Selector selector;
        SelectorLoop() throws IOException {
            selector = Selector.open();
            setDaemon(true);
        }
        @Override
        public void run() {
            while (!shutdown) {
                try {
                    long before = System.nanoTime();
                    int selected = selector.select(1000);
                    System.out.println("Selected=" + selected + ' ' + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - before));
                    if (selected > 0) {
                        Iterator<SelectionKey> keys = selector.selectedKeys().iterator();
                        while (keys.hasNext()) {
                            keys.next();
                            keys.remove();
                        }
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                    break;
                }
            }
            try {
                selector.close();
            } catch (IOException e) {
                // ignore
            }
        }

        public void shutdown() {
            shutdown = true;
        }
    }
}

Normally you should not see more then about 11 print statements. With kernel 3.14 I see thousands.

Owner

trustin commented Jul 1, 2014

Linux Kernel 3.15:

trustin@infinity:/home/trustin/Sandbox
$ uname -r
3.15.2-1-ARCH
trustin@infinity:/home/trustin/Sandbox
$ java -cp . SelectorBug
Selected=0 1001
Selected=0 0
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
Selected=0 1001
trustin@infinity:/home/trustin/Sandbox
$ 
Owner

normanmaurer commented Jul 1, 2014

Also fixed in latest 3.14.10 kernel:

➜  src  java SelectorBug      
Selected=0 1
Selected=0 1001
Selected=0 1001
Selected=0 1002
Selected=0 1001
Selected=0 1002
Selected=0 1002
Selected=0 1001
Selected=0 1001
Selected=0 1001
➜  src  uname -r
3.14.10

@trustin trustin added the defect label Aug 14, 2014

@normanmaurer normanmaurer self-assigned this Aug 14, 2014

@trustin trustin added not a bug and removed defect labels Aug 14, 2014

@trustin trustin removed this from the 4.0.22.Final milestone Aug 14, 2014

I've the same problem with linux kernel 4.2 and 3.16. (Debian Jessie)

@adleong adleong referenced this issue in linkerd/linkerd Sep 12, 2017

Closed

High CPU load and many netty errors for Namerd 1.1.3 #1627

0 of 1 task complete
Contributor

timboudreau commented Oct 12, 2017

Also seeing this with a custom kernel 4.13.3 on Gentoo. Any idea if this is a kernel configuration issue, not a kernel bug?

Owner

normanmaurer commented Oct 22, 2017

@timboudreau sorry no idea :(

We're seeing this as well on our Play! app.

Netty 4.0.51.Final

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

uname -r : 4.9.20-11.31.amzn1.x86_64

Happens under heavy load.

Any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment