Epoll consuming lot more CPU than Nio #11695

praveen97uma · 2021-09-20T13:08:37Z

I am building a websocket based broker. During load testing, we found that Epoll transport uses around 55% CPU compared to around 20% for Nio, just for maintaining the connections without doing any business specific IO on those connections. Is this expected? What could I be doing wrong? Happy to share any more info required around this.

Total concurrent connections: Around 27K
Boss Threads: 1
Worker Threads: 32

Cores in the VM: 8

Relevant code that sets up netty.

  EventExecutorChooserFactory chooserFactory = DefaultEventExecutorChooserFactory.INSTANCE;

  ThreadFactory bossThreadFactory = ThreadFactoryUtil.createInstrumented("boss", metricRegistry);

  Executor workerExecutor = new ThreadPerTaskExecutor(
          ThreadFactoryUtil.createAffinityThreadFactory("worker.thread", metricRegistry));

private void setupNioEventLoopGroups(EventExecutorChooserFactory chooserFactory, ThreadFactory bossThreadFactory,
            Executor workerExecutor) {
        bossPool = new NioEventLoopGroup(acceptorThreads, bossThreadFactory);
        workerPool = new NioEventLoopGroup(workerThreads, workerExecutor, chooserFactory, SelectorProvider.provider(),
                DefaultSelectStrategyFactory.INSTANCE);
        channelType = NioServerSocketChannel.class;
        log.info("Initialiazing Java NIO Event System");
    }

    private void setupEpollEventLoopGroups(EventExecutorChooserFactory chooserFactory, ThreadFactory bossThreadFactory,
            Executor workerExecutor) {
        bossPool = new EpollEventLoopGroup(acceptorThreads, bossThreadFactory);
        workerPool = new EpollEventLoopGroup(workerThreads, workerExecutor, chooserFactory,
                DefaultSelectStrategyFactory.INSTANCE);
        channelType = EpollServerSocketChannel.class;
        log.info("Initialiazing Epoll IO Event System");
    }


        ServerBootstrap serverBootstrap = new ServerBootstrap().group(bossPool, workerPool);

        // Choose socket options.
        Map<ChannelOption<?>, Object> channelOptions = new HashMap<>();
        channelOptions.put(ChannelOption.SO_BACKLOG, 256);
        channelOptions.put(ChannelOption.ALLOCATOR, new PooledByteBufAllocator(true));
        channelOptions.put(ChannelOption.SO_TIMEOUT, 3000);

        channelOptions.forEach(
                (key, value) -> serverBootstrap.option(ChannelOption.valueOf(String.valueOf(key)), value));

        // Set transport options
        serverBootstrap.childOption(ChannelOption.TCP_NODELAY, true);
        serverBootstrap.childOption(ChannelOption.SO_KEEPALIVE, true);
        serverBootstrap.childOption(ChannelOption.SO_LINGER, -1);
        serverBootstrap.childOption(ChannelOption.SO_REUSEADDR, true);
        serverBootstrap.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
        serverBootstrap.childOption(ChannelOption.ALLOW_HALF_CLOSURE, false);
        serverBootstrap.childOption(ChannelOption.SO_SNDBUF, 10 * 1024);
        serverBootstrap.channel(serverGroup.getChannelType());
        serverBootstrap.childHandler(channelInitializer);

Netty version

4.1.68

JVM version (e.g. `java -version`)

openjdk version "11.0.12" 2021-07-20 LTS
OpenJDK Runtime Environment Zulu11.50+19-CA (build 11.0.12+7-LTS)
OpenJDK 64-Bit Server VM Zulu11.50+19-CA (build 11.0.12+7-LTS, mixed mode)

OS version (e.g. `uname -a`)

Linux bolt-004 5.8.0-1041-azure #44~20.04.1-Ubuntu SMP Fri Aug 20 20:41:09 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Epoll

Nio

The text was updated successfully, but these errors were encountered:

hyperxpro · 2021-09-20T13:13:22Z

More information needed.

praveen97uma · 2021-09-20T13:23:26Z

@hyperxpro I have added some code. Please let me know if anything specific info would help.

franz1981 · 2021-09-20T13:31:54Z

@praveen97uma
The load and performance result is the same for both the tests?

eg
NIO perform the test in X seconds.
EPOLL perform the test in Y seconds.

praveen97uma · 2021-09-20T13:38:35Z

@franz1981 Yes, The ramp up to 27K connections is same for both NIO and EPOLL. The graph I shared are in steady state after the ramp up with nothing being written to the connections, except the websocket ping/pongs every 5 sec, which again happens for both the transports.

franz1981 · 2021-09-20T13:46:13Z

Yes, The ramp up to 27K connections is same for both NIO and EPOLL

It would worth to capture some profiling using https://github.com/jvm-profiling-tools/async-profiler

https://github.com/jvm-profiling-tools/async-profiler#basic-usage
add -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints as JVM flags
run the profiler with -t -d 10 (or whatever duration you think is ok)

And share the flamegraphs (-f nio.html and -f epoll.html) somehow

qq

Worker Threads: 32
Cores in the VM: 8

if you don't suppose to block on the event loop, why not using 8 worker threads (or less?)

praveen97uma · 2021-09-20T14:19:28Z

@franz1981 @hyperxpro Here are the files. Profiled for 30s.

https://raw.githubusercontent.com/praveen97uma/experiments/main/epoll.html
https://raw.githubusercontent.com/praveen97uma/experiments/main/nio.html

Screenshots of the same files if convenient.

franz1981 · 2021-09-20T14:26:13Z

there are an insane amount of samples on epollWait0 due to timerset.
What's this instrumented runnable? can be disabled?

Another thing: I won't suggest to set affinity with netty threads > number of cores...is a BAD idea :)
as suggested on

if you don't suppose to block on the event loop, why not using 8 worker threads (or less?)

hyperxpro · 2021-09-20T14:30:00Z

Also, try this: ServerBootstrap#option(EpollChannelOption.EPOLL_MODE, EpollMode.EDGE_TRIGGERED)

franz1981 · 2021-09-20T14:31:37Z

Good point, it could be that wakeup is spuriously happening (each time, so not so "spuriously") or that there are scheduled tasks that cause epoll to awake and go asleep continuosly

praveen97uma · 2021-09-20T14:38:15Z

@franz1981 @hyperxpro Thank you both for the help . So, I am doing the following changes and will keep you guys posted.

Remove the InstrumntedThreadFactory which instruments threads created/running/terminated created by a thread factory.
Reduce the worker threads to 8 (equal to no of cores). I am anyway using separate event loop group for blocking channel handlers. Found out the hard way to not set affinity to threads more than > no of cores.
3). try ServerBootstrap#option(EpollChannelOption.EPOLL_MODE, EpollMode.EDGE_TRIGGERED)

praveen97uma · 2021-09-20T15:33:07Z

@franz1981 I made the changes. The CPU has dropped to 40% but, IMO, it is still high.

franz1981 · 2021-09-20T15:36:39Z

It's still too much time...
There are any scheduled task submitted?
looking at the huge amount of sample spent while dequeing the prio q on the right I would say yes

praveen97uma · 2021-09-20T15:57:05Z

I have scheduled tasks which run at every 200ms to consume messages from a buffer queue for that connection and push to the client. This is scheduled on the channel's event executor itself. That is the cause of all this I guess then. @franz1981

franz1981 · 2021-09-20T15:59:52Z

yeah, probably rescheduling it is a bit more costy with epoll, or it could be due to the hypervised env you run the app, according to the flamegraphs it doesn't seem a bare metal server no? Maybe I'm wrong :)

hyperxpro · 2021-09-20T16:00:35Z

Are you running the program on Shared-CPU VM or Bare Metal?

praveen97uma · 2021-09-20T16:11:37Z

Its not a bare metal but Azure's VM. These are virtual CPUs.

normanmaurer · 2021-09-20T16:24:11Z

@praveen97uma can you upload the flame graph so we can navigate in it ? It seems it spend quite some time on timerfd_settime

franz1981 · 2021-09-20T16:27:29Z

@normanmaurer It looks to me that due to the too tight period of the scheduled tasks (for that virtual env) epoll always find some new scheduled task to rearm, causing that method to be called too much.

                                if (curDeadlineNanos == prevDeadlineNanos) {
                                    // No timer activity needed
                                    strategy = epollWaitNoTimerChange();
                                } else {
                                    // Timerfd needs to be re-armed or disarmed
                                    prevDeadlineNanos = curDeadlineNanos;
                                    strategy = epollWait(curDeadlineNanos);
                                }

on EpollEventLoop

praveen97uma · 2021-09-20T16:32:48Z

@normanmaurer Here is the flame graph(updated). https://raw.githubusercontent.com/praveen97uma/experiments/main/epoll.html

praveen97uma · 2021-10-03T16:08:51Z

@normanmaurer @franz1981 @hyperxpro Guys, so ByteBuf.duplicate() creates a new UnpooledDuplicatedByteBuf(). Is it possible to let netty create a Pooled version of the bytebuf? This particular call is causing the highest allocations in my profiling.

normanmaurer · 2021-10-04T07:54:15Z

@praveen97uma if you use the PooledByteBufAllocator it should pool these.

normanmaurer · 2021-10-04T07:56:08Z

@praveen97uma also can you explain why and how you schedule so many tasks ?

praveen97uma · 2021-10-04T09:59:29Z

@normanmaurer We had to batch push messages over the connection every 200ms. These msgs are merged in a specific structure to reduce bytes on the wire. I have a buffer queue for every connection and tasks are scheduled for every connection on the eventloops that do this batching and flush over the connection. Is this entirely wrong approach and is there a better way to do this in some other way in netty? Had read about FlushConsolidationHandler but not sure if there is a hook that I can use to do the merging I want to do before flushing.

I am using PolledByteBufAllocator.

serverBootstrap.childOption(ChannelOption.ALLOCATOR, new PooledByteBufAllocator(true));


    public static ByteBuf encodePublishOperation(ByteBufAllocator byteBufAllocator,
            List<ServerPublishOperation> topicPublishes) {

        ByteBuf byteBuf = byteBufAllocator.buffer();

ServerPublishOperation is a ByteBufHolder. Channel's alloc is passed to the above function.

normanmaurer · 2021-10-04T10:05:37Z

@normanmaurer We had to batch push messages over the connection every 200ms. These msgs are merged in a specific structure to reduce bytes on the wire. I have a buffer queue for every connection and tasks are scheduled for every connection on the eventloops that do this batching and flush over the connection. Is this entirely wrong approach and is there a better way to do this in some other way in netty? Had read about FlushConsolidationHandler but not sure if there is a hook that I can use to do the merging I want to do before flushing.

I am using PolledByteBufAllocator.

serverBootstrap.childOption(ChannelOption.ALLOCATOR, new PooledByteBufAllocator(true));
    public static ByteBuf encodePublishOperation(ByteBufAllocator byteBufAllocator,
            List<ServerPublishOperation> topicPublishes) {

        ByteBuf byteBuf = byteBufAllocator.buffer();
ServerPublishOperation is a ByteBufHolder. Channel's alloc is passed to the above function.

I see... If you switch to retainedDuplicate() it should be pooled as well. Just note that this also will increment the reference count by 1.

chrisvest · 2021-10-04T14:10:00Z

I have a buffer queue for every connection and tasks are scheduled for every connection on the eventloops that do this batching and flush over the connection. Is this entirely wrong approach and is there a better way to do this in some other way in netty?

If you keep track of your connections (ideally just the ones with messages), you could have a single task scheduled, that go over all current connections and process their batches.

NiteshKant · 2021-10-04T23:14:40Z

If you keep track of your connections (ideally just the ones with messages), you could have a single task scheduled, that go over all current connections and process their batches.

+1 .. may be not a single task but a single task per eventloop so that you do not have to jump the eventloop for flushing the connection.

normanmaurer · 2021-10-13T16:12:33Z

I wonder if the time used by timerfd_settime is any indicator that we might be better of to not use it when possible (when the timeout is milliseconds) and just use epoll_wait(...) with the right timeout. WDYT ?

chrisvest · 2021-10-14T09:25:18Z

Only using epoll_wait when the timeout is greater than a millisecond will cause rounding in timeouts, though. For 200 millisecond timeouts that probably won't matter, but it might for a 1,5 millisecond timeout. So we'd need to pick a cut-off point. On kernel 5.11 and newer, we can use epoll_pwait2, which takes a timespec parameter.

normanmaurer · 2022-03-03T10:25:33Z

Let me put this on my todo list...

normanmaurer · 2022-03-04T16:06:10Z

@praveen97uma I think this should improve things: #12145

…uts are scheduled Motivation: At the moment we might end up calling timerfd_settime everytime a new timer is scheduled. This can produce quite some overhead. We should try to reduce the number of syscalls when possible. Modifications: - If we are using Linux Kernel >= 5.11 use directly epoll_pwait2(...) - If the scheduled timeout is big enough just use epoll_wait(...) without timerfd_settime and accept some inaccuracy. Result: Fixes #11695

…uts are scheduled(#12145) Motivation: At the moment we might end up calling timerfd_settime everytime a new timer is scheduled. This can produce quite some overhead. We should try to reduce the number of syscalls when possible. Modifications: - If we are using Linux Kernel >= 5.11 use directly epoll_pwait2(...) - If the scheduled timeout is big enough just use epoll_wait(...) without timerfd_settime and accept some inaccuracy. Result: Fixes #11695

Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695

) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes #14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004

…che#15212) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004

…che#15212) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004 (cherry picked from commit 332a3c7)

) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes #14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004 (cherry picked from commit 332a3c7)

…che#15212) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004 (cherry picked from commit 332a3c7)

) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes #14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004 (cherry picked from commit 332a3c7)

…che#15212) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004 (cherry picked from commit 332a3c7)

…uts are scheduled(#12145) Motivation: At the moment we might end up calling timerfd_settime everytime a new timer is scheduled. This can produce quite some overhead. We should try to reduce the number of syscalls when possible. Modifications: - If we are using Linux Kernel >= 5.11 use directly epoll_pwait2(...) - If the scheduled timeout is big enough just use epoll_wait(...) without timerfd_settime and accept some inaccuracy. Result: Fixes #11695

… are scheduled (#12196) Motivation: At the moment we might end up calling timerfd_settime everytime a new timer is scheduled. This can produce quite some overhead. We should try to reduce the number of syscalls when possible. Modifications: - If we are using Linux Kernel >= 5.11 use directly epoll_pwait2(...) - If the scheduled timeout is big enough just use epoll_wait(...) without timerfd_settime and accept some inaccuracy. Result: Fixes #11695

…uts are scheduled(netty#12145) Motivation: At the moment we might end up calling timerfd_settime everytime a new timer is scheduled. This can produce quite some overhead. We should try to reduce the number of syscalls when possible. Modifications: - If we are using Linux Kernel >= 5.11 use directly epoll_pwait2(...) - If the scheduled timeout is big enough just use epoll_wait(...) without timerfd_settime and accept some inaccuracy. Result: Fixes netty#11695

…che#15212) * Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695 * Upgrade grpc to 1.45.1 and protobuf to 3.19.2 - grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final - grpc/grpc-java#9004 (cherry picked from commit 332a3c7)

…uts are scheduled(netty#12145) Motivation: At the moment we might end up calling timerfd_settime everytime a new timer is scheduled. This can produce quite some overhead. We should try to reduce the number of syscalls when possible. Modifications: - If we are using Linux Kernel >= 5.11 use directly epoll_pwait2(...) - If the scheduled timeout is big enough just use epoll_wait(...) without timerfd_settime and accept some inaccuracy. Result: Fixes netty#11695

normanmaurer self-assigned this Mar 3, 2022

normanmaurer mentioned this issue Mar 4, 2022

Reduce syscall overhead in native epoll transport when a lot of timeo… #12145

Merged

normanmaurer modified the milestones: 4.1.75.Final, 4.1.76.Final Mar 10, 2022

lhotari mentioned this issue Mar 13, 2022

Excessive CPU use in idle mode apache/pulsar#14015

Closed

normanmaurer closed this as completed in #12145 Mar 18, 2022

normanmaurer mentioned this issue Mar 18, 2022

Reduce syscall overhead in native epoll transport when a lot of timeo… #12196

Merged

lhotari added a commit to lhotari/pulsar that referenced this issue Apr 19, 2022

Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

7053221

Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695

lhotari mentioned this issue Apr 19, 2022

Upgrade Netty to 4.1.76.Final, Netty Tcnative, grpc and protobuf apache/pulsar#15212

Merged

lhotari added a commit to lhotari/pulsar that referenced this issue Apr 19, 2022

Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

de79918

Fixes apache#14015 - release notes https://netty.io/news/2022/04/12/4-1-76-Final.html - contains fix for netty/netty#11695

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epoll consuming lot more CPU than Nio #11695

Epoll consuming lot more CPU than Nio #11695

praveen97uma commented Sep 20, 2021 •

edited

hyperxpro commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021

praveen97uma commented Sep 20, 2021 •

edited

franz1981 commented Sep 20, 2021 •

edited

hyperxpro commented Sep 20, 2021

franz1981 commented Sep 20, 2021 •

edited

praveen97uma commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021 •

edited

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021 •

edited

hyperxpro commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

normanmaurer commented Sep 20, 2021

franz1981 commented Sep 20, 2021 •

edited

praveen97uma commented Sep 20, 2021 •

edited

praveen97uma commented Oct 3, 2021

normanmaurer commented Oct 4, 2021

normanmaurer commented Oct 4, 2021

praveen97uma commented Oct 4, 2021

normanmaurer commented Oct 4, 2021

chrisvest commented Oct 4, 2021

NiteshKant commented Oct 4, 2021

normanmaurer commented Oct 13, 2021

chrisvest commented Oct 14, 2021

normanmaurer commented Mar 3, 2022

normanmaurer commented Mar 4, 2022

Epoll consuming lot more CPU than Nio #11695

Epoll consuming lot more CPU than Nio #11695

Comments

praveen97uma commented Sep 20, 2021 • edited

Netty version

JVM version (e.g. java -version)

OS version (e.g. uname -a)

hyperxpro commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021

praveen97uma commented Sep 20, 2021 • edited

franz1981 commented Sep 20, 2021 • edited

hyperxpro commented Sep 20, 2021

franz1981 commented Sep 20, 2021 • edited

praveen97uma commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021 • edited

praveen97uma commented Sep 20, 2021

franz1981 commented Sep 20, 2021 • edited

hyperxpro commented Sep 20, 2021

praveen97uma commented Sep 20, 2021

normanmaurer commented Sep 20, 2021

franz1981 commented Sep 20, 2021 • edited

praveen97uma commented Sep 20, 2021 • edited

praveen97uma commented Oct 3, 2021

normanmaurer commented Oct 4, 2021

normanmaurer commented Oct 4, 2021

praveen97uma commented Oct 4, 2021

normanmaurer commented Oct 4, 2021

chrisvest commented Oct 4, 2021

NiteshKant commented Oct 4, 2021

normanmaurer commented Oct 13, 2021

chrisvest commented Oct 14, 2021

normanmaurer commented Mar 3, 2022

normanmaurer commented Mar 4, 2022

praveen97uma commented Sep 20, 2021 •

edited

JVM version (e.g. `java -version`)

OS version (e.g. `uname -a`)

praveen97uma commented Sep 20, 2021 •

edited

franz1981 commented Sep 20, 2021 •

edited

franz1981 commented Sep 20, 2021 •

edited

franz1981 commented Sep 20, 2021 •

edited

franz1981 commented Sep 20, 2021 •

edited

franz1981 commented Sep 20, 2021 •

edited

praveen97uma commented Sep 20, 2021 •

edited