Skip to content

net, internal/poll, runtime: remove mutex from UDP reads/writes #17520

@neild

Description

@neild

Reads and writes on net.UDPConns are guarded by a mutex. Contention on the mutex makes it difficult to efficiently handle UDP requests concurrently. Or perhaps I'm overlooking the right way to do this.

The attached benchmark attempts to demonstrate the problem:
socks_test.go.txt

Annotated benchmark results from my desktop:

# All tests are of a server reading 64-byte UDP messages and responding to them.
#
# /echo tests are of an echo server--no processing is done of messages, so
# the test time is entirely spent in socket operations.
#
# /sha tests compute a SHA-256 sum of the input message 50 times, to simulate
# doing a small amount of real work per message.

# The read_1 tests process messages in serial on a single goroutine.
#
# Increasing GOMAXPROCS introduces a minor inefficiency for some reason,
# but these results are largely what you would expect from a non-concurrent server.
BenchmarkUDP/read_1/echo                 1000000              8698 ns/op
BenchmarkUDP/read_1/echo-2               1000000             11229 ns/op
BenchmarkUDP/read_1/echo-4               1000000             11873 ns/op
BenchmarkUDP/read_1/sha                   200000             29676 ns/op
BenchmarkUDP/read_1/sha-2                 200000             30997 ns/op
BenchmarkUDP/read_1/sha-4                 200000             35817 ns/op

# The read_n tests start multiple goroutines, each of which reads from
# and writes to a shared UDP socket.
#
# Increasing the number of goroutines causes the server to become slower,
# presumably due to lock contention on the socket.
BenchmarkUDP/read_n/echo                 1000000             10201 ns/op
BenchmarkUDP/read_n/echo-2                500000             19274 ns/op
BenchmarkUDP/read_n/echo-4                300000             24263 ns/op
BenchmarkUDP/read_n/sha                   200000             29522 ns/op
BenchmarkUDP/read_n/sha-2                 200000             41015 ns/op
BenchmarkUDP/read_n/sha-4                 200000             58748 ns/op

# The read_1n1 tests start one reader, one writer, and multiple worker goroutines
# connected by channels.
#
# Increasing the number of worker goroutines does not improve performance here either,
# presumably due to lock contention on the channels.
BenchmarkUDP/read_1n1/echo               1000000             11194 ns/op
BenchmarkUDP/read_1n1/echo-2              500000             20991 ns/op
BenchmarkUDP/read_1n1/echo-4              300000             28297 ns/op
BenchmarkUDP/read_1n1/sha                 200000             39178 ns/op
BenchmarkUDP/read_1n1/sha-2               200000             45770 ns/op
BenchmarkUDP/read_1n1/sha-4               200000             38197 ns/op

# The read_fake tests just run the work function in a loop without network operations.
# Performance scales mostly linearly with the number of worker goroutines.
BenchmarkUDP/read_fake/echo             2000000000               4.05 ns/op
BenchmarkUDP/read_fake/echo-2           3000000000               2.00 ns/op
BenchmarkUDP/read_fake/echo-4           10000000000              1.02 ns/op
BenchmarkUDP/read_fake/sha                300000             21178 ns/op
BenchmarkUDP/read_fake/sha-2              500000             10691 ns/op
BenchmarkUDP/read_fake/sha-4             1000000              5609 ns/op

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions