New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: add UDPMsg, (*UDPConn).ReadUDPMsgs, (*UDPConn).WriteUDPMsgs #45886
Comments
What should the allocation strategy be here? The existing |
At a high level, the caller should supply all the memory. The whole thing should be zero allocations (see also: #43451 and https://golang.org/cl/291509), otherwise the sort of people who'd want to use this probably wouldn't want to use it. Probably pass a slice of messages similar to ipv4.PacketConn.ReadBatch but likely with a slightly different Message. I don't think the Message.Addr net.Addr field is amenable to the midstack inlining optimization from https://golang.org/cl/291509. |
A possible concrete API:
This API preserves the existing limitation that there is no way to provide |
UDP doesn't have out-of-band data. TCP does. I have no idea why For that matter I'm not sure off hand how to read out-of-band data for a TCP socket using the net package. |
Oh, right. I mean "control" data--the |
Whoops, that led me astray also. But the end result is the same. As you say, the |
UDP sockets can have ancillary data: Setting the |
Ah, OK, thanks. |
We may want to wait on doing anything here until we figure out what to do with IP addresses, which still allocate. |
It is important that the caller be able to specify whether they want to block for N packets, or receive up-to-N packets but only block until at least one is available. On linux that's accomplished through flags, but we might want a nicer, higher-level way to express that. |
@josharian Is it important to support both? For |
Hmm. Yeah, I think always up-to-N seems fine, at least for my uses. |
There is also a sendmmsg(2): https://man7.org/linux/man-pages/man2/sendmmsg.2.html |
Russ's comment (#45886 (comment)) is already outdated 'cause we've new IP addresses: #46518 |
Another use-case (that I actively work on) is mass SNMP polling or other type of packet-based polling (polling hundreds of thousands of routers), mostly for their interface metrics. For that I can't exactly create socket per host or create a pool of sockets to host polling sessions of individual hosts, I have to multiplex packets. Syscalls such as sendmmsg and recvmmsg help by queuing up multiple small packets (500-1000 bytes each). You can further increase performance by using GSO if your kernel allows it. Performance benefits of the above can be read here as well: https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/ Currently I am manually creating non-blocking UDP sockets with additional epoll descriptors to pretty much circumvent Go's networking stack for that extra efficiency. That being said, if your host does not have more than 10G interface then there isn't much point doing these optimizations as a basic UDP connection will max out the interface unless your packets are extremely small (few hundred bytes or less). |
I use this heavily from https://github.com/anacrolix/torrent, where inbound UDP over uTP (a UDP-based protocol) is a bottleneck). On Linux I'm able to use https://github.com/anacrolix/mmsg from https://github.com/anacrolix/go-libutp, which I've adapted from golang.org/x/net to fix up some issues there around handling recvmmsg efficiently. |
Updated proposal using netip:
|
This proposal has been added to the active column of the proposals project |
Briefly discussed at proposal review. Given spelling of ReadMsgUDP, we should probably call the type UDPMsg, so that there is only one spelling of "message" in the package. And then to avoid the new term Batch, we could call the methods ReadUDPMsgs and WriteUDPMsgs. Otherwise the semantics look fine (or at least as good as ReadMsgUDP with respect to the flags.) |
Note, Aug 31 2022: Changed type of Buffer and Control from []byte to [][]byte. Updated API (only the names are different):
Does anyone object to adding this API? |
It is not clear to me from the proposed documentation what happens if the length of the I assume it truncates the data since ReadMsgUDP and recvmsg/recvmmsg all do, but we should explicitly mention the behavior in the documentation:
|
Why does UDPMsg have Is there any downside of using multiple buffers per message? |
Sounds like the UDP GSO concern doesn't need to block this API after all. |
No change in consensus, so accepted. 🎉 |
@bradfitz wrote:
Just asking for clarification on this point here; as far as I understand, the |
@matzf I think you are correct. |
@rsc It's not entirely clear to me how we are going to reslice It'd be much easier to understand and use if we leave
|
Yes, thanks.
OK, let's add Len int to the struct and just leave the slices alone. That will be a little more work for the caller but it's not any more than in C using iovecs. So the new struct would be:
|
@database64128 are you implementing this change? |
This proposal has been added to the active column of the proposals project |
Any objections to #45886 (comment) ? |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Author of quic-go here 👋 . For obvious reasons, we're very interested in this new API. Can't wait to play around with it! We've been working on reducing allocations this year, and managed to significantly reduce GC pressure. We're now at a point where a large fraction of the remaining allocations happens within the standard library, especially during IP address handling, on both the send and the receive path. I see that the updated proposal is using the I agree with @bradfitz's point that zero-allocation should be an explicit design goal. Being able to do a QUIC transfer that doesn't allocate at all (amortized) would be huge! |
It looks like this is not currently implemented? Can I try this one? |
Nobody is currently working on this, so far as I know. I intend to get to it, but I don't know when that will be and I don't want to block someone else from working on it if they have time. Since this proposal was written, UDP GSO/GRO (generic segmentation offload / generic receive offload) have gained some attention. (I'm not certain to what extent OSs supported these features when we wrote this proposal; either they didn't exist yet, or we didn't know about them.) GSO/GRO have proven highly effective at improving performance of applications with high UDP throughput. For example: https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/ Any implementation of this proposal should take advantage of GSO/GRO when available. If it isn't possible to support GSO/GRO within the API as proposed, then we should go back and redesign it to make it possible. I don't think we should add an implementation of this API that doesn't support GSO/GRO on at least one platform, to validate that we aren't adding an API that is already obsolete. /cc @marten-seemann, who may have opinions on what quic-go needs from the net package here. |
Is it not possible to just enable GSO segmentation with your own custom size as needed via ancillary/control buffers? Worst case it can also be done via setsockopt from syscall package for the UDP socket. Maybe an extra method can be added to UDPConn to "set/enable segmentation". Either way, shouldn't GSO/GRO be a separate feature to be added to UDPConn (which I would highly support) since this feature is about enabling batch-sending and batch-receiving of UDP packets? Also, GSO segmentation may not make sense if you are dealing with variable-sized UDP packets (you would need to pad them with junk bytes to align to the segmentation size and hope receiving end knows how to extract the actual data ignoring the said padding). |
The feature in this issue is about efficient batch sending/receiving of UDP packets. One way to do that is sendmmsg/recvmmsg. Another is GSO/GRO. And the two approaches can of course be combined at the same time. (At the time we wrote this proposal, either GSO/GRO didn't exist, or Brad and I weren't aware of it.) I think that to the extent possible, the A perfect abstraction isn't possible, since GSO/GRO require a contiguous buffer (and GSO in particular requires, I believe, all sent packets to be the same size), but if we're adding new API surface to
Possibly, but if so, I think we should understand what that feature looks like before we commit to the API in this proposal, to make sure the two work together cleanly. And implement both sendmmsg/recvmmsg and GSO/GRO to verify they work together before committing to either implementation alone. |
(co-written with @neild)
Linux has
recvmmsg
to read multiple UDP packets from the kernel at once.There is no
Recvmmsg
wrapper func in golang.org/x/sys/unix. That's easy enough to add, but it's not ideal: it means all callers of it would be using a thread while blocked waiting for a packet.There is, however, batch support in golang.org/x/net/ipv{4,6}: e.g. https://pkg.go.dev/golang.org/x/net/ipv4#PacketConn.ReadBatch (added around golang/net@b8b1343). But it has the same thread-blocking problem. And it has the additional problem of having separate packages for IPv4 vs IPv6.
It'd be nicer to integrate with the runtime's poller.
Adding API to do this in the net package would mean both:
For writing,
net.Buffers
already exists, as does golang.org/x/net/ipv{4,6}'s PacketConn.WriteBatch, so is less important, but could be done for consistency.As far as a potential API, https://pkg.go.dev/golang.org/x/net/ipv4#PacketConn.ReadBatch is close, but the platform-specific flags should probably not be included, at least as an int. While there's some precedent with https://golang.org/pkg/net/#UDPConn.ReadMsgUDP use of
flags int
, we could probably use a better type if we need flags for some reason.Alternatively, if callers of x/sys/unix or x/net/ipv{4,6} could do this efficiently with the runtime poller, that'd also work (even if they'd need to use some build tags, which is probably tolerable for anybody who cares about this).
The text was updated successfully, but these errors were encountered: