-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Open
Labels
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performance
Milestone
Description
Reads and writes on net.UDPConns are guarded by a mutex. Contention on the mutex makes it difficult to efficiently handle UDP requests concurrently. Or perhaps I'm overlooking the right way to do this.
The attached benchmark attempts to demonstrate the problem:
socks_test.go.txt
Annotated benchmark results from my desktop:
# All tests are of a server reading 64-byte UDP messages and responding to them.
#
# /echo tests are of an echo server--no processing is done of messages, so
# the test time is entirely spent in socket operations.
#
# /sha tests compute a SHA-256 sum of the input message 50 times, to simulate
# doing a small amount of real work per message.
# The read_1 tests process messages in serial on a single goroutine.
#
# Increasing GOMAXPROCS introduces a minor inefficiency for some reason,
# but these results are largely what you would expect from a non-concurrent server.
BenchmarkUDP/read_1/echo 1000000 8698 ns/op
BenchmarkUDP/read_1/echo-2 1000000 11229 ns/op
BenchmarkUDP/read_1/echo-4 1000000 11873 ns/op
BenchmarkUDP/read_1/sha 200000 29676 ns/op
BenchmarkUDP/read_1/sha-2 200000 30997 ns/op
BenchmarkUDP/read_1/sha-4 200000 35817 ns/op
# The read_n tests start multiple goroutines, each of which reads from
# and writes to a shared UDP socket.
#
# Increasing the number of goroutines causes the server to become slower,
# presumably due to lock contention on the socket.
BenchmarkUDP/read_n/echo 1000000 10201 ns/op
BenchmarkUDP/read_n/echo-2 500000 19274 ns/op
BenchmarkUDP/read_n/echo-4 300000 24263 ns/op
BenchmarkUDP/read_n/sha 200000 29522 ns/op
BenchmarkUDP/read_n/sha-2 200000 41015 ns/op
BenchmarkUDP/read_n/sha-4 200000 58748 ns/op
# The read_1n1 tests start one reader, one writer, and multiple worker goroutines
# connected by channels.
#
# Increasing the number of worker goroutines does not improve performance here either,
# presumably due to lock contention on the channels.
BenchmarkUDP/read_1n1/echo 1000000 11194 ns/op
BenchmarkUDP/read_1n1/echo-2 500000 20991 ns/op
BenchmarkUDP/read_1n1/echo-4 300000 28297 ns/op
BenchmarkUDP/read_1n1/sha 200000 39178 ns/op
BenchmarkUDP/read_1n1/sha-2 200000 45770 ns/op
BenchmarkUDP/read_1n1/sha-4 200000 38197 ns/op
# The read_fake tests just run the work function in a loop without network operations.
# Performance scales mostly linearly with the number of worker goroutines.
BenchmarkUDP/read_fake/echo 2000000000 4.05 ns/op
BenchmarkUDP/read_fake/echo-2 3000000000 2.00 ns/op
BenchmarkUDP/read_fake/echo-4 10000000000 1.02 ns/op
BenchmarkUDP/read_fake/sha 300000 21178 ns/op
BenchmarkUDP/read_fake/sha-2 500000 10691 ns/op
BenchmarkUDP/read_fake/sha-4 1000000 5609 ns/op
Metadata
Metadata
Assignees
Labels
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performance