Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goroutine Pool for UDP datagram reads #1416

Closed
wants to merge 3 commits into from

Conversation

horahoradev
Copy link

@horahoradev horahoradev commented Feb 4, 2023

This PR puts UDP server datagram reads into a Goroutine pool, and was made to address this issue within CoreDNS: coredns/coredns#5595 (comment)

Here are some benchmarks of CoreDNS with and without my changes, with one server, using DNSperf.

Without my change:

  Queries sent:         13988687
  Queries completed:    13988288 (100.00%)
  Queries lost:         300 (0.00%)
  Queries interrupted:  99 (0.00%)

  Response codes:       NOERROR 13988288 (100.00%)
  Average packet size:  request 32, response 100
  Run time (s):         105.433637
  Queries per second:   132673.863845

  Average Latency (s):  0.000627 (min 0.000023, max 0.022370)
  Latency StdDev (s):   0.000151

CPU utilization ~420%

With my change:

  Queries sent:         5735429
  Queries completed:    5735336 (100.00%)
  Queries lost:         0 (0.00%)
  Queries interrupted:  93 (0.00%)

  Response codes:       NOERROR 5735336 (100.00%)
  Average packet size:  request 32, response 100
  Run time (s):         29.624372
  Queries per second:   193601.943697

  Average Latency (s):  0.000483 (min 0.000014, max 0.009152)
  Latency StdDev (s):   0.000336

CPU utilization ~560%

So notably we see CPU utilization went up significantly but QPS went up ~50%, and avg latency is down.

This is a pretty stupid change, but it seems pretty impactful. Let me know if there's anything you'd like to see here; from my interpretation, all affected method calls should be thread-safe (but let me know if I've missed anything!).

Copy link
Collaborator

@tmthrgd tmthrgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to see this with errgroup as I think that’s going to be better here.

server.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
@horahoradev
Copy link
Author

Thanks for the quick review! I changed the waitgroup/error channel out for an error group.
I opted not to cause the rest of the Goroutine read loops to terminate if one of them returns an error, but let me know if you'd like to see something different here.

@miekg
Copy link
Owner

miekg commented Apr 27, 2023

@tmthrgd is this good to merge?

@tmthrgd
Copy link
Collaborator

tmthrgd commented Apr 29, 2023

@miekg I'm not a huge fan. Pools and multiple readers are hard to get right in a general sense. What might improve things for one workload, could easily worsen things in others. In particular I know things can get very counterintuitive with high core count CPUs.

"strings"
"sync"

"golang.org/x/sync/errgroup"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't formatted properly. Stdlib should be grouped up separately from other imports.

)

if isUDP {
m, sUDP, err = reader.ReadUDP(lUDP, rtimeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing to actually ensure this call unblocks when the errgroup returns with an error. This could easily end up still reading in one go routine while others have already exited.

if cap(m) == srv.UDPSize {
srv.udpPool.Put(m[:srv.UDPSize])

g := new(errgroup.Group)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var g errgroup.Group


g := new(errgroup.Group)

for i := 0; i < runtime.NumCPU(); i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if NumCPU is correct here or not.

@tmthrgd
Copy link
Collaborator

tmthrgd commented Apr 29, 2023

Maybe we could define some sort of interface for people who want to implement pools of some kind. I'm not sure.

@miekg
Copy link
Owner

miekg commented Apr 29, 2023 via email

@horahoradev horahoradev closed this May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants