Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage generated in EpollDatagramChannel #8770

Closed
estaban opened this issue Jan 23, 2019 · 6 comments
Closed

Garbage generated in EpollDatagramChannel #8770

estaban opened this issue Jan 23, 2019 · 6 comments

Comments

@estaban
Copy link

estaban commented Jan 23, 2019

I am developing a high performance UDP client on Linux using Netty 4.1 with native transport, and DatagramSocketAddress objects are the main source of allocations in the JVM. The native code EpollDatagramChannel creates a DatagramSocketAddress object for each received UDP datagram:

return createDatagramSocketAddress(env, &addr, res, local);

Those allocations are quite heavy, as internally, char[], String, Inet4Address, InetAddressHolder, InetSocketAddressHolder, InetAddress[], byte[] objects are getting generated when constructing the object.

On the client side, if the socket is connected, the address cannot change and does not have to be re-instanciated every time. What do you think about caching the sender address in EpollDatagramChannel and allocating it only if needed? That is what is done in DatagramChannelImpl from the JDK, which considerably reduces GC:
https://github.com/frohoff/jdk8u-jdk/blob/master/src/windows/native/sun/nio/ch/DatagramChannelImpl.c#L179

That can easily be reproduced using any echo client/server UDP example configured to be using native transport.

Expected behavior

EpollDatagramChannel caches the DatagramSocketAddress object when receiving a datagram

Actual behavior

EpollDatagramChannel allocates one DatagramSocketAddress object per incoming datagram

Steps to reproduce

Simple UDP client configured with native transport

Netty version

4.1

JVM version (e.g. java -version)

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

OS version (e.g. uname -a)

Ubuntu 16.04 x86_64 kernel 4.15.0-43-generic

@normanmaurer
Copy link
Member

normanmaurer commented Jan 23, 2019 via email

@estaban
Copy link
Author

estaban commented Jan 23, 2019

I don't think I will have time to work on that in the next few weeks, I will post an update on that thread when I find some time.

@estaban
Copy link
Author

estaban commented Jan 24, 2019

When the fix is out and validated, would it be an option to backport it to Netty 4.0, or is development over for that version?

@normanmaurer
Copy link
Member

@estaban this ship has sailed. 4.0.x is EOL for a long time.

normanmaurer added a commit that referenced this issue Jan 30, 2019
…connected mode.

Motivation:

In the native code EpollDatagramChannel / KQueueDatagramChannel creates a DatagramSocketAddress object for each received UDP datagram even when in connected mode as it uses the recvfrom(...) / recvmsg(...)  method. Creating these is quite heavy in terms of allocations as internally, char[], String, Inet4Address, InetAddressHolder, InetSocketAddressHolder, InetAddress[], byte[] objects are getting generated when constructing the object. When in connected mode we can just use regular read(...) calls which do not need to allocate all of these.

Modifications:

- When in connected mode use read(...) and NOT recvfrom(..) / readmsg(...) to reduce allocations when possible.
- Adjust tests to ensure read works as expected when in connected mode.

Result:

Less allocations and GC when using native datagram channels in connected mode. Fixes #8770.
@normanmaurer
Copy link
Member

@estaban PTAL #8806. I think this actually fixes it in an easier way. Also I was thinking of maybe introducing a ChannelOption that would allow to set that if used in connected mode we would just fire the ByteBuf though the ChannelPipeline and not wrap it at all in a DatagramPacket. This would reduce GC even more. WDYT ?

@estaban
Copy link
Author

estaban commented Jan 30, 2019

@normanmaurer Thanks for the fix, it is much simpler than the one I proposed. When I looked into that option, I was concerned about race conditions between connect calls (potentially several of them?) and reads (using the cached address), but I missed that they are all done on the event loop thread, so that should not be a problem.

Firing a ByteBuf when reading UDP datagrams would be a great idea!

normanmaurer added a commit that referenced this issue Feb 1, 2019
…connected mode. (#8806)

Motivation:

In the native code EpollDatagramChannel / KQueueDatagramChannel creates a DatagramSocketAddress object for each received UDP datagram even when in connected mode as it uses the recvfrom(...) / recvmsg(...)  method. Creating these is quite heavy in terms of allocations as internally, char[], String, Inet4Address, InetAddressHolder, InetSocketAddressHolder, InetAddress[], byte[] objects are getting generated when constructing the object. When in connected mode we can just use regular read(...) calls which do not need to allocate all of these.

Modifications:

- When in connected mode use read(...) and NOT recvfrom(..) / readmsg(...) to reduce allocations when possible.
- Adjust tests to ensure read works as expected when in connected mode.

Result:

Less allocations and GC when using native datagram channels in connected mode. Fixes #8770.
normanmaurer added a commit that referenced this issue Feb 1, 2019
…connected mode. (#8806)

Motivation:

In the native code EpollDatagramChannel / KQueueDatagramChannel creates a DatagramSocketAddress object for each received UDP datagram even when in connected mode as it uses the recvfrom(...) / recvmsg(...)  method. Creating these is quite heavy in terms of allocations as internally, char[], String, Inet4Address, InetAddressHolder, InetSocketAddressHolder, InetAddress[], byte[] objects are getting generated when constructing the object. When in connected mode we can just use regular read(...) calls which do not need to allocate all of these.

Modifications:

- When in connected mode use read(...) and NOT recvfrom(..) / readmsg(...) to reduce allocations when possible.
- Adjust tests to ensure read works as expected when in connected mode.

Result:

Less allocations and GC when using native datagram channels in connected mode. Fixes #8770.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants