NIO.2 support #2515

Closed
chrisprobst opened this Issue May 26, 2014 · 9 comments

Projects

None yet

3 participants

@chrisprobst
Contributor

Hi,

I saw that Netty.4.Final dropped support for AIO. In the book "Netty in Action" I read that it was dropped mainly because it was not faster and that netty's threading model would make it hard to interoperate, which makes sense.

But maybe I could provide another view, since I'm heavily interested in windows support:

The NIO.1 Selector works using kqueue, epoll and select on pretty much all platforms. Now Windows does have IOCP which is way better than select (select on windows have scalability issues). But, of course, IOCP is the proactor model and does not fit in the reactor-selector-pattern.

That's why JDK7 finally brought us AIO, which implements the proactor pattern. The good thing about the proactor pattern is that it abstract's both select-based-techniques and IOCP very well.

What this all means is, that NIO.1 (Netty =( ) on windows simply does not scale as good as it could be.

I do understand the technical consequences of AIO for netty, though. AIO does have limitations, too.

  • First AIO expects a buffer before something was actually read, so if a lot of parallel clients are connected, the reading must be limited in some way.
  • AIO runs the completion handlers on the given executor. So if the executor has multiple threads the handlers may be invoked on different threads which would destroy netty's wonderful threading-model. But it's basically simple: Every event-loop in netty would have it's own AsynchronousChannelProvider running using single thread executor. This should work.

I know that this means some work and I am willing to contribute, but I would like to hear your point of view, maybe I do not have enough knowledge about this topic ;=)

Cheers,
Chris

@trustin
Member
trustin commented May 28, 2014

Hi Chris,

We obviously did not consider Windows as a serious platform so far, and that's why we were neglecting NIO.2 AIO API which was implemented using IOCP on Windows. (On Linux, it wasn't any faster because it was using the same OS facility - epoll.)

There are two ways to implement your idea:

  1. Implement a new transport (Channel and EventLoop impl) using NIO.2 AIO API
  2. Implement a new transport using JNI

If you are proficient with Win32/C programming, I would recommend the second approach because there's not much point in having another layer of abstraction (NIO.2). As demonstrated by netty-transport-native-epoll, removing an abstraction layer between Netty and OS yields better performance.

It you are inclined to the first solution, you could check out some old pre-release Netty version which ships the AIO transport and start from there. In this case, I would be interested to compare the performance between the NIO transport and the AIO transport on Windows before merging it.

@trustin trustin added this to the 4.1.0.Final milestone May 28, 2014
@trustin
Member
trustin commented May 28, 2014

Oh, and did I say you are more than welcome to contribute it to the project? :-)

@chrisprobst
Contributor

Hi trustin!

Great, I will look into this issue the next weekend. Your point about native vs NIO-abstractions is definitely something I will consider. I will start with NIO.2 and check if it's worth it.

Some stuff I saw so far in the sun.nio.ch.WindowsSelectorImpl:

54 // Should be INIT_CAP times a power of 2
55 private final static int MAX_SELECTABLE_FDS = 1024;

69 // Number of helper threads needed for select. We need one thread per
70 // each additional set of MAX_SELECTABLE_FDS - 1 channels.
71 private int threadsCount = 0;

As you can see every 1024 sockets the select implementation will use a new thread, which is not that bad but because select is O(n) and especially bad on windows NIO.2 should bring a noticeable improvement. The bad thing about it is that I can not really benchmark it since I have only a dated windows laptop. I mainly work on OSX but windows is important for me nonetheless.

@trustin
Member
trustin commented May 28, 2014

Sounds great. Let me stay tuned to this thread. Cheers!

@chrisprobst
Contributor

I found some time today to investigate the AIO implementation of Netty.4.0.0.CR3.

It's basically exactly what I was thinking of. The custom implementation of AioEventLoop should be fine, since every continuation is scheduled in one thread - the event loop thread. No useless context switches =).

The AioSocketChannel was a bit complicated but I think I got it now. After all I will test this implementation this evening or probably tomorrow on a windows network against the select version in terms of latency and CPU overhead (posix-select could eventually drive the cpu crazy on windows systems).

But my guess is that the AIO impl. will perform quite well. If this is the case I would kindly ask to take the AIO implementation back into master-branch =). I think it makes not much sense to write another AIO implementation if the existing one is already quite sophisticated ;=).

Or let's put it this way: What disadvantages would you see to support AIO ? I mean, of course, it adds maintenance costs which might or might not be worth it.

According to the book the main reasons were:

  • Not faster than NIO (epoll) on unix systems (which is true)
  • There is no daragram suppport
  • Unnecessary threading model (too much abstraction without usage)

I agree that AIO will not easily replace NIO, but it is useful for windows developers nonetheless.

These are my thoughts so far

@trustin
Member
trustin commented May 29, 2014

Sounds good. Looking forward to your pull request. Please do not forget to modify SocketTestPermutation so that the testsuite module tests your transport thoroughly.

@normanmaurer
Member

Sounds good... Thanks!

Am 29.05.2014 um 07:36 schrieb Trustin Lee notifications@github.com:

Sounds good. Looking forward to your pull request. Please do not forget to modify SocketTestPermutation so that the testsuite module tests your transport thoroughly.


Reply to this email directly or view it on GitHub.

@chrisprobst
Contributor

Unfortunately I did not get access to the windows network, the admin was against it, as always 8-|
Because of this I spoke to a JVM expert, because I had to solve this issue theoretically. He assured me that the 1024-socket-per-selector-thread is a well proven and well tested implementation on windows and that IOCP is not going to improve a lot, besides less thread-context-switches.

I told him the fact that windows-select is crappy and he told me that this is not true anymore for Win7 & 8, which was really interesting for me. He said that it was actually very true for XP (probably even Vista), but since 7 the network stack got an update, which was new to me. IOCP still has some advantages but selector based networks are definitely capable of handling a lot of throughput since Win7, which is nice =).

I do not have any numbers but the project lead is happy with this now and so my priorities have changed, so for me this issue is kind if closed.

@normanmaurer
Member

@chrisprobst @trustin let me close this ...

@trustin trustin modified the milestone: 4.1.0.Beta1, 4.1.0.Final Jul 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment