-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: UDPConn.WriteTo and UDPConn.ReadFromUDP both allocate #43451
Comments
I have a vague impression of pointing this out to @FiloSottile 2 years ago but I don't remember the conclusion of our conversation. CCing in case he has a better recollection. |
I found some old notes. The conclusion from last I looked into this was that the API made it unavoidable. As a result I wound up making direct syscalls on Linux but didn't port that to all platforms. I wonder if this warrants adding a new API. |
Change https://golang.org/cl/280934 mentions this issue: |
IIUC, most sockaddrs will be re-used and they are never mutated. Given that, we could intern them. E.g. we could use a sync.Pool of maps to opportunistically re-use them. (This is the technique that https://github.com/josharian/intern uses for strings; it is fast and pretty good, but not perfect. There are other options, with their own trade-offs, like go4.org/intern.) I threw together CL 280934 to illustrate using the sync.Pool of maps approach. |
Ugh. Nope, that is not safe. We end up exposing the sockaddr memory to the caller in this line (udpsock_posix.go:50): addr = &UDPAddr{IP: sa.Addr[0:], Port: sa.Port} Avoiding that would require making a copy of the We might still be able to intern the sockaddrs that are destined for the kernel at least. If we (a) switched to inet.af/netaddr's IP type and (b) returned a We could do both by letting people provide their own |
Evil idea: if the |
Hyrum's Law says no. (Plus the Go standard library generally doesn't go for such evil tricks, entertaining though they be.) |
Maybe a variant of that might be acceptable: Right now people pass in a buffer of the maximum size of data they want: data := make([]byte, 1472)
n, addr, err := conn.ReadFromUDP(data)
data = data[:n] My initial idea was to place the sockaddr allocations in the region of It occurred to me that other Go APIs sometimes work by taking slice to append to and then return a new slice. The reasoning goes that the caller can preallocate by allocating a slice with a large capacity but a zero length, and then the appending operation is free. What if we use a related trick here: data := make([]byte, 1472, 2000)
n, addr, err := conn.ReadFromUDP(data)
data = data[:n] In this instance, rather than placing addr at |
Mmm, looks like that can still lead to unexpected problems: package main
import (
"fmt"
)
func doTheAliasingTrick(slice []byte) *byte {
for i := range slice {
slice[i] = 41
}
return &append(slice, 42)[len(slice)]
}
func main() {
data := make([]byte, 1472, 2000)
x := doTheAliasingTrick(data)
fmt.Printf("*x = %d\n", *x) // Prints 42
_ = append(data, 43)
fmt.Printf("*x = %d\n", *x) // Prints 43
} |
I think you can outline that. It will require some care on the caller side, but if someone needs it they can make sure they don't get in the way of the inliner. |
Change https://golang.org/cl/291390 mentions this issue: |
Change https://golang.org/cl/291509 mentions this issue: |
This commit rewrites ReadFromUDP to be mid-stack inlined and pass a UDPAddr for lower layers to fill in. This lets performance-sensitive clients avoid an allocation. It requires some care on their part to prevent the UDPAddr from escaping, but it is now possible. The UDPAddr trivially does not escape in the benchmark, as it is immediately discarded. name old time/op new time/op delta WriteToReadFromUDP-8 17.2µs ± 6% 17.1µs ± 5% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 112B ± 0% 64B ± 0% -42.86% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) Updates golang#43451 Co-authored-by: Filippo Valsorda <filippo@golang.org> Change-Id: I1f9d2ab66bd7e4eff07fe39000cfa0b45717bd13
This commit rewrites ReadFromUDP to be mid-stack inlined and pass a UDPAddr for lower layers to fill in. This lets performance-sensitive clients avoid an allocation. It requires some care on their part to prevent the UDPAddr from escaping, but it is now possible. The UDPAddr trivially does not escape in the benchmark, as it is immediately discarded. name old time/op new time/op delta WriteToReadFromUDP-8 17.2µs ± 6% 17.1µs ± 5% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 112B ± 0% 64B ± 0% -42.86% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) Updates golang#43451 Co-authored-by: Filippo Valsorda <filippo@golang.org> Change-Id: I1f9d2ab66bd7e4eff07fe39000cfa0b45717bd13
I played with this a bit more. Results: With CL 291509 and those two commits, we would have zero allocs per write and one net.IP backing array allocated per receive (4 or 16 bytes). But those two commits involve new API and duplicating subtle code. :( The new API is: // WriterTo returns an io.Writer that writes UDP packets to addr.
// This is more efficient than WriteTo when many packets will be sent to the same addr.
func (c *UDPConn) WriterTo(addr Addr) (io.Writer, error) I'm happy to propose the API and mail the other change if there's interest, but my working assumption is that they're both non-starters. |
… an allocation This commit rewrites ReadFromUDP to be mid-stack inlined and pass a UDPAddr for lower layers to fill in. This lets performance-sensitive clients avoid an allocation. It requires some care on their part to prevent the UDPAddr from escaping, but it is now possible. The UDPAddr trivially does not escape in the benchmark, as it is immediately discarded. name old time/op new time/op delta WriteToReadFromUDP-8 17.2µs ± 6% 17.1µs ± 5% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 112B ± 0% 64B ± 0% -42.86% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) Updates golang#43451 Co-authored-by: Filippo Valsorda <filippo@golang.org> Change-Id: I1f9d2ab66bd7e4eff07fe39000cfa0b45717bd13
This commit rewrites ReadFromUDP to be mid-stack inlined and pass a UDPAddr for lower layers to fill in. This lets performance-sensitive clients avoid an allocation. It requires some care on their part to prevent the UDPAddr from escaping, but it is now possible. The UDPAddr trivially does not escape in the benchmark, as it is immediately discarded. name old time/op new time/op delta WriteToReadFromUDP-8 17.2µs ± 6% 17.1µs ± 5% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 112B ± 0% 64B ± 0% -42.86% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) Updates #43451 Co-authored-by: Filippo Valsorda <filippo@golang.org> Change-Id: I1f9d2ab66bd7e4eff07fe39000cfa0b45717bd13 Reviewed-on: https://go-review.googlesource.com/c/go/+/291509 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Trust: Filippo Valsorda <filippo@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Duplicate some code to avoid an interface. name old time/op new time/op delta WriteToReadFromUDP-8 6.38µs ±20% 5.59µs ±10% -12.38% (p=0.001 n=10+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Updates golang#43451 Change-Id: Ied15ff92268c652cf445836e0446025eaeb60cc9
tailscale@1074dae removes all allocations for tailscale@4b4fb83 reduces the size of the allocation for Both come at the cost of duplicating a bunch of code, some of it non-trivial. I'm happy to mail for 1.18, if folks aren't too horrified by that. Remaining the last |
I'm not horrified by the amount of code duplication in those CLs. |
Great. I'll plan to mail them soon or early in the 1.18 cycle. And if I forget or don't get to it quickly enough, you (or anyone) has my explicit permission to do so. |
Change https://golang.org/cl/331489 mentions this issue: |
Change https://golang.org/cl/331490 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Change https://golang.org/cl/331511 mentions this issue: |
Duplicate some code to avoid an interface. name old time/op new time/op delta WriteToReadFromUDP-8 6.38µs ±20% 5.59µs ±10% -12.38% (p=0.001 n=10+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331489) Change-Id: Ied15ff92268c652cf445836e0446025eaeb60cc9
Switch to concrete types. Bring your own object to fill in. Allocate just enough for the IP byte slice. The allocation is now just 4 bytes for IPv4, which puts it in the tiny allocator, which is much faster. name old time/op new time/op delta WriteToReadFromUDP-8 13.7µs ± 1% 13.4µs ± 2% -2.49% (p=0.000 n=10+10) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 32.0B ± 0% 4.0B ± 0% -87.50% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331490) Change-Id: Ief506f891b401d28715d22dce6ebda037941924e
This brings the optimizations added in CLs 331489 and 331490 to Windows. Updates golang#43451 (cherry picked from golang.org/cl/331511) Change-Id: I75cf520050325d9eb5c2785d6d8677cc864fcac8
Duplicate some code to avoid an interface. name old time/op new time/op delta WriteToReadFromUDP-8 6.38µs ±20% 5.59µs ±10% -12.38% (p=0.001 n=10+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331489) Change-Id: Ied15ff92268c652cf445836e0446025eaeb60cc9
Switch to concrete types. Bring your own object to fill in. Allocate just enough for the IP byte slice. The allocation is now just 4 bytes for IPv4, which puts it in the tiny allocator, which is much faster. name old time/op new time/op delta WriteToReadFromUDP-8 13.7µs ± 1% 13.4µs ± 2% -2.49% (p=0.000 n=10+10) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 32.0B ± 0% 4.0B ± 0% -87.50% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331490) Change-Id: Ief506f891b401d28715d22dce6ebda037941924e
This brings the optimizations added in CLs 331489 and 331490 to Windows. Updates golang#43451 (cherry picked from golang.org/cl/331511) Change-Id: I75cf520050325d9eb5c2785d6d8677cc864fcac8
Duplicate some code to avoid an interface. name old time/op new time/op delta WriteToReadFromUDP-8 6.38µs ±20% 5.59µs ±10% -12.38% (p=0.001 n=10+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331489) Change-Id: Ied15ff92268c652cf445836e0446025eaeb60cc9
Switch to concrete types. Bring your own object to fill in. Allocate just enough for the IP byte slice. The allocation is now just 4 bytes for IPv4, which puts it in the tiny allocator, which is much faster. name old time/op new time/op delta WriteToReadFromUDP-8 13.7µs ± 1% 13.4µs ± 2% -2.49% (p=0.000 n=10+10) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 32.0B ± 0% 4.0B ± 0% -87.50% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331490) Change-Id: Ief506f891b401d28715d22dce6ebda037941924e
This brings the optimizations added in CLs 331489 and 331490 to Windows. Updates golang#43451 (cherry picked from golang.org/cl/331511) Change-Id: I75cf520050325d9eb5c2785d6d8677cc864fcac8
Duplicate some code to avoid an interface. name old time/op new time/op delta WriteToReadFromUDP-8 6.38µs ±20% 5.59µs ±10% -12.38% (p=0.001 n=10+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331489) Change-Id: Ied15ff92268c652cf445836e0446025eaeb60cc9
Switch to concrete types. Bring your own object to fill in. Allocate just enough for the IP byte slice. The allocation is now just 4 bytes for IPv4, which puts it in the tiny allocator, which is much faster. name old time/op new time/op delta WriteToReadFromUDP-8 13.7µs ± 1% 13.4µs ± 2% -2.49% (p=0.000 n=10+10) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 32.0B ± 0% 4.0B ± 0% -87.50% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Windows is temporarily stubbed out. Updates golang#43451 (cherry picked from golang.org/cl/331490) Change-Id: Ief506f891b401d28715d22dce6ebda037941924e
This brings the optimizations added in CLs 331489 and 331490 to Windows. Updates golang#43451 (cherry picked from golang.org/cl/331511) Change-Id: I75cf520050325d9eb5c2785d6d8677cc864fcac8
Duplicate some code to avoid an interface. name old time/op new time/op delta WriteToReadFromUDP-8 6.38µs ±20% 5.59µs ±10% -12.38% (p=0.001 n=10+9) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) Windows is temporarily stubbed out. Updates #43451 Change-Id: Ied15ff92268c652cf445836e0446025eaeb60cc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/331489 Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Damien Neil <dneil@google.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
Switch to concrete types. Bring your own object to fill in. Allocate just enough for the IP byte slice. The allocation is now just 4 bytes for IPv4, which puts it in the tiny allocator, which is much faster. name old time/op new time/op delta WriteToReadFromUDP-8 13.7µs ± 1% 13.4µs ± 2% -2.49% (p=0.000 n=10+10) name old alloc/op new alloc/op delta WriteToReadFromUDP-8 32.0B ± 0% 4.0B ± 0% -87.50% (p=0.000 n=10+10) name old allocs/op new allocs/op delta WriteToReadFromUDP-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Windows is temporarily stubbed out. Updates #43451 Change-Id: Ief506f891b401d28715d22dce6ebda037941924e Reviewed-on: https://go-review.googlesource.com/c/go/+/331490 Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Damien Neil <dneil@google.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Damien Neil <dneil@google.com>
This brings the optimizations added in CLs 331489 and 331490 to Windows. Updates #43451 Change-Id: I75cf520050325d9eb5c2785d6d8677cc864fcac8 Reviewed-on: https://go-review.googlesource.com/c/go/+/331511 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I'd like to be able to write a program that uses UDPConn.WriteTo and UDPConn.ReadFromUDP without allocating per-packet.
This benchmark indicates one alloc per WriteTo and two allocs per ReadFromUDP.
Two of the allocs come from constructing syscall.Sockaddrs. Maybe this is fixable, but I don't see an easy way.
The last alloc is from constructing a
*UDPAddr
to return fromReadFromUDP
. I fear the API may make this one unavoidable.cc @bradfitz @danderson @zx2c4
The text was updated successfully, but these errors were encountered: