-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: add UDPMsg, (*UDPConn).ReadUDPMsgs, (*UDPConn).WriteUDPMsgs #45886
Comments
What should the allocation strategy be here? The existing |
At a high level, the caller should supply all the memory. The whole thing should be zero allocations (see also: #43451 and https://golang.org/cl/291509), otherwise the sort of people who'd want to use this probably wouldn't want to use it. Probably pass a slice of messages similar to ipv4.PacketConn.ReadBatch but likely with a slightly different Message. I don't think the Message.Addr net.Addr field is amenable to the midstack inlining optimization from https://golang.org/cl/291509. |
A possible concrete API:
This API preserves the existing limitation that there is no way to provide |
UDP doesn't have out-of-band data. TCP does. I have no idea why For that matter I'm not sure off hand how to read out-of-band data for a TCP socket using the net package. |
Oh, right. I mean "control" data--the |
Whoops, that led me astray also. But the end result is the same. As you say, the |
UDP sockets can have ancillary data: Setting the |
Ah, OK, thanks. |
We may want to wait on doing anything here until we figure out what to do with IP addresses, which still allocate. |
It is important that the caller be able to specify whether they want to block for N packets, or receive up-to-N packets but only block until at least one is available. On linux that's accomplished through flags, but we might want a nicer, higher-level way to express that. |
@josharian Is it important to support both? For |
Hmm. Yeah, I think always up-to-N seems fine, at least for my uses. |
There is also a sendmmsg(2): https://man7.org/linux/man-pages/man2/sendmmsg.2.html |
Russ's comment (#45886 (comment)) is already outdated 'cause we've new IP addresses: #46518 |
Another use-case (that I actively work on) is mass SNMP polling or other type of packet-based polling (polling hundreds of thousands of routers), mostly for their interface metrics. For that I can't exactly create socket per host or create a pool of sockets to host polling sessions of individual hosts, I have to multiplex packets. Syscalls such as sendmmsg and recvmmsg help by queuing up multiple small packets (500-1000 bytes each). You can further increase performance by using GSO if your kernel allows it. Performance benefits of the above can be read here as well: https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/ Currently I am manually creating non-blocking UDP sockets with additional epoll descriptors to pretty much circumvent Go's networking stack for that extra efficiency. That being said, if your host does not have more than 10G interface then there isn't much point doing these optimizations as a basic UDP connection will max out the interface unless your packets are extremely small (few hundred bytes or less). |
I use this heavily from https://github.com/anacrolix/torrent, where inbound UDP over uTP (a UDP-based protocol) is a bottleneck). On Linux I'm able to use https://github.com/anacrolix/mmsg from https://github.com/anacrolix/go-libutp, which I've adapted from golang.org/x/net to fix up some issues there around handling recvmmsg efficiently. |
Updated proposal using netip:
|
This proposal has been added to the active column of the proposals project |
Briefly discussed at proposal review. Given spelling of ReadMsgUDP, we should probably call the type UDPMsg, so that there is only one spelling of "message" in the package. And then to avoid the new term Batch, we could call the methods ReadUDPMsgs and WriteUDPMsgs. Otherwise the semantics look fine (or at least as good as ReadMsgUDP with respect to the flags.) |
Note, Aug 31 2022: Changed type of Buffer and Control from []byte to [][]byte. Updated API (only the names are different):
Does anyone object to adding this API? |
Any objections to #45886 (comment) ? |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Author of quic-go here 👋 . For obvious reasons, we're very interested in this new API. Can't wait to play around with it! We've been working on reducing allocations this year, and managed to significantly reduce GC pressure. We're now at a point where a large fraction of the remaining allocations happens within the standard library, especially during IP address handling, on both the send and the receive path. I see that the updated proposal is using the I agree with @bradfitz's point that zero-allocation should be an explicit design goal. Being able to do a QUIC transfer that doesn't allocate at all (amortized) would be huge! |
It looks like this is not currently implemented? Can I try this one? |
Nobody is currently working on this, so far as I know. I intend to get to it, but I don't know when that will be and I don't want to block someone else from working on it if they have time. Since this proposal was written, UDP GSO/GRO (generic segmentation offload / generic receive offload) have gained some attention. (I'm not certain to what extent OSs supported these features when we wrote this proposal; either they didn't exist yet, or we didn't know about them.) GSO/GRO have proven highly effective at improving performance of applications with high UDP throughput. For example: https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/ Any implementation of this proposal should take advantage of GSO/GRO when available. If it isn't possible to support GSO/GRO within the API as proposed, then we should go back and redesign it to make it possible. I don't think we should add an implementation of this API that doesn't support GSO/GRO on at least one platform, to validate that we aren't adding an API that is already obsolete. /cc @marten-seemann, who may have opinions on what quic-go needs from the net package here. |
Is it not possible to just enable GSO segmentation with your own custom size as needed via ancillary/control buffers? Worst case it can also be done via setsockopt from syscall package for the UDP socket. Maybe an extra method can be added to UDPConn to "set/enable segmentation". Either way, shouldn't GSO/GRO be a separate feature to be added to UDPConn (which I would highly support) since this feature is about enabling batch-sending and batch-receiving of UDP packets? Also, GSO segmentation may not make sense if you are dealing with variable-sized UDP packets (you would need to pad them with junk bytes to align to the segmentation size and hope receiving end knows how to extract the actual data ignoring the said padding). |
The feature in this issue is about efficient batch sending/receiving of UDP packets. One way to do that is sendmmsg/recvmmsg. Another is GSO/GRO. And the two approaches can of course be combined at the same time. (At the time we wrote this proposal, either GSO/GRO didn't exist, or Brad and I weren't aware of it.) I think that to the extent possible, the A perfect abstraction isn't possible, since GSO/GRO require a contiguous buffer (and GSO in particular requires, I believe, all sent packets to be the same size), but if we're adding new API surface to
Possibly, but if so, I think we should understand what that feature looks like before we commit to the API in this proposal, to make sure the two work together cleanly. And implement both sendmmsg/recvmmsg and GSO/GRO to verify they work together before committing to either implementation alone. |
Thank you very much, I'm trying to work on this, and as far as I can tell from the information I've consulted (which may not be comprehensive) UDP GSO was added to the Linux kernel at v4.18. If you need to use GSO/GRO, the Linux kernel cannot be lower than these versions. How should ReadUDPMsgs and WriteUDPMsgs be handled in kernels below these versions? Also GSO/GRO can improve ReadMsgUDP and WriteMsgUDP as well. I had some problems doing this when I tried to run A lot of constants were removed from And some of the type fields in I am running it through Docker and here is how I am executing it. docker run -v $(pwd):/work/go -it golang bash
cd /work/go/src/syscall
GOOS=linux GOARCH=amd64 ./mkall.sh Kernel versions are as follows
|
I believe that GSO/GRO can't be relied on even in recent Linux kernels; it's possible for them to be disabled. I'm afraid I don't know the details, however. There needs to be a graceful fallback when GSO/GRO aren't available. One possibility would be for the A single API for batch send with a graceful fallback would be simplest to use, but perhaps exposing to the user whether a contiguous buffer is required or not would be better. I'm not certain. I think the real work here is going to be in figuring out what the API looks like and how fallback works. |
I would like API for GSO/GRO to be opt-in since it being always forced could make certain UDP workloads un-optimal. Most notably handling batches of packets with varying sizes. Correct me if I am wrong but, AFAIK, GSO segmentation requires each packet to be same size in the super-buffer. |
During my verification, I learnt that GSO is for sockfd, after setting gsosize for sockfd, all subsequent sendmmsg and sendmsg will be affected by gsosize. syscall.SetsockoptInt(int(fd), 17, 103, 20)
conn.WriteToUDPAddrPort(make([]byte, 100), raddr.AddrPort()) WriteToUDPAddrPort passes no control data
If on a platform with GSO, turning on GSO means that all writes need to be adjusted. |
I believe it's also possible to set GSO on a per-sendmsg basis by setting SOL_UDP/UDP_SEGMENT in the cmsg. |
You are right that setting UDP_SEGMENT via cmsg works, so there are a couple of problems that follow.
|
GSO can be tricky to implement in practice. For reference: quic-go/quic-go#3911 |
recvmmsg needs to pass a flags, do we need to provide it in this method? https://man7.org/linux/man-pages/man2/recvmmsg.2.html
The flags here are different from the flags in the UDPMessage, which need to be passed in.
func (c *UDPConn) ReadUDPMsgs(ms []UDPMsg, flags int) (msgsRead int, err error) |
We don't pass flags for other I think the |
There's an original issue for this at #3661. I've been using https://pkg.go.dev/github.com/anacrolix/mmsg@v1.0.1/socket for this for the last 6-7 years, which exposes an API that uses recvmmsg/sendmmsg if they're available otherwise falls back to the scalar operations. |
(co-written with @neild)
Linux has
recvmmsg
to read multiple UDP packets from the kernel at once.There is no
Recvmmsg
wrapper func in golang.org/x/sys/unix. That's easy enough to add, but it's not ideal: it means all callers of it would be using a thread while blocked waiting for a packet.There is, however, batch support in golang.org/x/net/ipv{4,6}: e.g. https://pkg.go.dev/golang.org/x/net/ipv4#PacketConn.ReadBatch (added around golang/net@b8b1343). But it has the same thread-blocking problem. And it has the additional problem of having separate packages for IPv4 vs IPv6.
It'd be nicer to integrate with the runtime's poller.
Adding API to do this in the net package would mean both:
For writing,
net.Buffers
already exists, as does golang.org/x/net/ipv{4,6}'s PacketConn.WriteBatch, so is less important, but could be done for consistency.As far as a potential API, https://pkg.go.dev/golang.org/x/net/ipv4#PacketConn.ReadBatch is close, but the platform-specific flags should probably not be included, at least as an int. While there's some precedent with https://golang.org/pkg/net/#UDPConn.ReadMsgUDP use of
flags int
, we could probably use a better type if we need flags for some reason.Alternatively, if callers of x/sys/unix or x/net/ipv{4,6} could do this efficiently with the runtime poller, that'd also work (even if they'd need to use some build tags, which is probably tolerable for anybody who cares about this).
The text was updated successfully, but these errors were encountered: