Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use workers instead spawning goroutines for each incoming DNS request #664

Merged
merged 2 commits into from
May 9, 2018

Conversation

UladzimirTrehubenka
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Apr 2, 2018

Codecov Report

Merging #664 into master will increase coverage by 0.07%.
The diff coverage is 81.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #664      +/-   ##
==========================================
+ Coverage   58.05%   58.12%   +0.07%     
==========================================
  Files          37       37              
  Lines        9999    10026      +27     
==========================================
+ Hits         5805     5828      +23     
- Misses       3144     3149       +5     
+ Partials     1050     1049       -1
Impacted Files Coverage Δ
server.go 61.78% <81.13%> (+1.78%) ⬆️
dnssec.go 59.1% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01d5935...f0dd4e5. Read the comment docs.

server.go Outdated
select {
case srv.queueTCPConn <- w:
default:
srv.spawnWorkerTCP()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to unbounded goroutine growth. Maybe that’s not a problem in reality, but even a small burst of requests would keep a large number of goroutines forever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we have unbounded goroutine growth already in current implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UladzimirTrehubenka Not strictly true, currently the goroutines will be garbage collected once they've served the request. These ones will be kept around forever.

Consider an constant load of 1req/s with a sudden burst of 100req/s for 1s:

  • Currently one routine will be created per request leading to a peak of 100 goroutines that are then garbage collected.
  • This pull request will cause 100 goroutines to be active forever.

The above would be particularly bad if a server was swamped with a very large number of bogus requests as in a DDOS situation, because the memory load can't be reclaimed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added workers limit (10000) and exit for idle worker

server.go Outdated
case srv.queueTCPConn <- w:
default:
srv.spawnWorkerTCP()
srv.queueTCPConn <- w
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a race condition here. If another request comes in and reaches the select statement above before this line is reached, it may steal the new worker.

A solution to this is simple, pass w directly to spawnWorkerTCP. If the argument is non-nil, it can be handled before the for range loop.

Copy link
Contributor Author

@UladzimirTrehubenka UladzimirTrehubenka Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't follow about race - this is one thread and srv.queue is unbuffered channel,
so we cannot reach select until we push the job to the queue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UladzimirTrehubenka It's a race with a separate request that hits the select. The spawning and sending are not atomic operations as a set.

This isn't a correctness issue, but it could lead to hard to diagnose latency spikes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UladzimirTrehubenka You're dead right actually, ignore me. I didn't realise this was only ever called from one goroutine.

You could still see, an admittedly small, performance improvement by skipping the channel send as I suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@miekg
Copy link
Owner

miekg commented Apr 24, 2018

Note that I believe the initial impetus for this is that is faster? I can't tell because the PR description and the commit description are pretty thin...
The go standard library http.server does

		tempDelay = 0
		c := srv.newConn(rw)
		c.setState(c.rwc, StateNew) // before Serve can return
		go c.serve(ctx)

Which is what we do know, short lived go routines handling a request; that we somehow need to bound this.. is because of UDP and spoofing?

There is another issue open that says that go 1.11 will support more socket option making the increased speed argument less convincing.

I'll take a short peek at the PR none the less.

server.go Outdated
@@ -12,9 +12,6 @@ import (
"time"
)

// Maximum number of TCP queries before we close the socket.
const maxTCPQueries = 128
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now not done anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow why we need to limit TCP queries. If we have constant load between two CoreDNS it is very expensive to drop connection on reach maxTCPQueries (say after each 128 packet first CoreDNS does reconnect to second CoreDNS).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping this limit would be best done as a separate pull request IMO, as it's orthogonal to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert back maxTCPQueries - will remove in separate PR

server.go Outdated
srv.spawnWorkerUDP()
}

func (srv *Server) spawnWorkerTCP() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just call go spanWorkerTCP no need to do this in this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

server.go Outdated
// Shutdown handling
lock sync.RWMutex
started bool
}

func (srv *Server) spawnWorkerUDP() {
go func() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This anonymous func should be a named func so it shows up in panics and is clearer to debug.

Replace it with something like func (srv *Server) workerUDP() { for range ... } and then replace srv.spawnWorkerUDP() with go srv.workerUDP().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tmthrgd
Copy link
Collaborator

tmthrgd commented Apr 25, 2018

@UladzimirTrehubenka Do you have any benchmarks for this change? I can think of a few alternative implementations that won't suffer from the unbounded growth problem, but it's hard to compare without numbers.

Copy link
Collaborator

@tmthrgd tmthrgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happen to really like this approach, but I think it does need to be justified by some benchmarks (gentle ping).

server.go Outdated
func (srv *Server) worker(w *response) {
workersCount := atomic.LoadInt32(&srv.workersCount)
if workersCount > maxWorkersCount {
w.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still call srv.serve(w). The goroutine has already been created, there’s no reason to drop this request on the floor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So need to move workers count check outside worker func - otherwise if we call srv.serve(w) - it will work as original CoreDNS - say spawn goroutine for each request.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works to, it depends whether it should be a hard limit on all goroutines or only a hard limit on alive goroutines.

Personally I like this approach more because performance can’t be worse than previous. If you put it outside the goroutine and remove maxTCPQueries, there will be an exploitable DOS vector.

server.go Outdated
w.Close()
return
}
atomic.AddInt32(&srv.workersCount, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not hugely important, but this has a race with the LoadInt32 above.

This should be written something like:

for {
  count := atomic.LoadInt32(...)
  if count > ... {
    ...
    return
  }
  if atomic.CompareAndSwapInt32(..., count, count + 1) {
    break
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

server.go Outdated
@@ -295,26 +303,76 @@ type Server struct {
DecorateReader DecorateReader
// DecorateWriter is optional, allows customization of the process that writes raw DNS messages.
DecorateWriter DecorateWriter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the new line seperating public from private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

server.go Outdated
break LOOP
}
count = 0
timeout = time.After(idleWorkerTimeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safe to call timeout.Reset(...) here paired with timer.NewTicker (? - from memory) instead of creating a new timer each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with Ticker - don't follow about Reset()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of time.NewTimer + Reset - I didn’t check the docs.

I don’t think time.NewTicker is right here, because it drops channel sends on the floor, meaning this would be sutbley wrong.

server.go Outdated
timeout = time.After(idleWorkerTimeout)
}
}
atomic.AddInt32(&srv.workersCount, -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a defer and pair it with the atomic operation above. The overhead of defer is negligible and it keeps correctness clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

server.go Outdated
// ListenAndServe starts a nameserver on the configured address in *Server.
func (srv *Server) ListenAndServe() error {
srv.lock.Lock()
defer srv.lock.Unlock()
if srv.started {
return &Error{err: "server already started"}
}

if srv.Handler == nil {
srv.Handler = DefaultServeMux
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d move this conditional into srv.serveDNS, there’s no need to modify srv here or below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely moved this code from serveTCP/serveUDP because this code should be called only once on server start - I totally don't understand why we need to call this code on each serveDNS (and another one gap in original code - send handler as argument instead using srv.Handler).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not wrong by any means, but it’s very much not idiomatic. Go code like this very rarely mutates public fields. (This is the net/http case: https://golang.org/src/net/http/server.go?s=82393:82459#L2676).

Also a single if statement will have zero effect on performance.

(You don’t need to pass it in as an argument like it was before, but you can. It should be just put in serveDNS).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@johnbelamaric
Copy link

@tmthrgd agreed it needs benchmarks, but this will also fix a silent crash that happens when too many go routines are spawned.

server.go Outdated
srv.serve(w)

count := 0
ticker := time.NewTicker(idleWorkerTimeout)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be time.NewTimer with ticker.Reset below. time.NewTicker will drop sends on the floor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

server.go Outdated
default:
}

for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t work as well here and dropping requests on the floor is very much the wrong approach. It leads to trivial DOS attacks against the TCP server in particular which would be made worse by removing maxTCPQueries.

Either move this back into worker with a serveDNS before w.Close or replace the w.Close below with a blocking send.

Copy link
Contributor Author

@UladzimirTrehubenka UladzimirTrehubenka Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move check to worker() - it means that maxWorkersCount makes no sense.
10000 workers are busy and we continue spawn goroutines.
We cannot prevent DOS in any case - the difference is that your proposal doesn't prevent out of resources - but current code does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has a trivial DOS vector, the existing code doesn’t which is the key point to me. The existing code, and my suggestion to move the check, will scale as far as the host machine will. This code has an artificial hard limit of 10000, which is arbitrary. There are systems which can scale farther than that. The maxWorkersCount would still have value because it would limit the long-lived goroutines.

Some quick numbers: _StackMin is 2KiB meaning 10,000 goroutines uses 2048*10000/2^20 = 19GiB of memory for stack. That’s too large for many systems and far too small for many others.

I like the approach of the check being in the goroutine because it saves the overhead of creating and destroying goroutines, but doesn’t limit the servers maximum performance. This one just drops requests on the floor which feels very wrong to me.

I think conflating resource control with long-lived worker goroutines is a mistake, because they’re too distinct issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW 2KiB * 10000 ~ 20 MiB not 20 GiB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops @ my math.

@UladzimirTrehubenka
Copy link
Contributor Author

Regarding benchmark - I am not sure that go benchmark is useful.
I am testing CoreDNS with dnsperf tool.
ATM observed that PR's code works faster on 10% with simple forward config (130K vs 144K).
But configuration with Themis policy plugin shows 30% performance (93K vs 125K).

@tmthrgd
Copy link
Collaborator

tmthrgd commented Apr 27, 2018

@UladzimirTrehubenka Those are exactly the sort of benchmarks I was hoping to see.

@UladzimirTrehubenka
Copy link
Contributor Author

@miekg just a friendly reminder: could you provide some feedback?

@miekg
Copy link
Owner

miekg commented May 4, 2018

@tmthrgd already took a look, so that's good. I want echo the sentiment having a benchmark test will help - I haven't had time to follow the entire discussion. Taking a look now.

@miekg
Copy link
Owner

miekg commented May 4, 2018

checked out the branch, seeing the same increase @UladzimirTrehubenka also saw, pasting here:

./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     437041 times

  Queries sent:         1311126 queries
  Queries completed:    1311126 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.008981 sec
  RTT min:              0.000004 sec
  RTT average:          0.000215 sec
  RTT std deviation:    0.000172 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:45:58 2018
  Finished at:          Fri May  4 18:46:13 2018
  Ran for:              15.000139 seconds

  Queries per second:   87407.590023 qps

3.59s user 8.98s system 15.01s elapsed 83%CPU (make TIME=15 PORT=1043 queryperf)
% make TIME=15 PORT=1043 queryperf
./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     433578 times

  Queries sent:         1300736 queries
  Queries completed:    1300736 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.008250 sec
  RTT min:              0.000005 sec
  RTT average:          0.000217 sec
  RTT std deviation:    0.000174 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:46:15 2018
  Finished at:          Fri May  4 18:46:30 2018
  Ran for:              15.000274 seconds

  Queries per second:   86714.149355 qps

3.24s user 9.19s system 15.02s elapsed 82%CPU (make TIME=15 PORT=1043 queryperf)
% make TIME=15 PORT=1043 queryperf
./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     536946 times

  Queries sent:         1610839 queries
  Queries completed:    1610839 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.010821 sec
  RTT min:              0.000006 sec
  RTT average:          0.000154 sec
  RTT std deviation:    0.000153 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:48:19 2018
  Finished at:          Fri May  4 18:48:34 2018
  Ran for:              15.000164 seconds

  Queries per second:   107388.092557 qps

3.45s user 9.44s system 15.01s elapsed 85%CPU (make TIME=15 PORT=1043 queryperf)
% make TIME=15 PORT=1043 queryperf
./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     503934 times

  Queries sent:         1511804 queries
  Queries completed:    1511804 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.011372 sec
  RTT min:              0.000005 sec
  RTT average:          0.000165 sec
  RTT std deviation:    0.000170 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:48:40 2018
  Finished at:          Fri May  4 18:48:55 2018
  Ran for:              15.000168 seconds

  Queries per second:   100785.804532 qps

3.36s user 9.26s system 15.01s elapsed 84%CPU (make TIME=15 PORT=1043 queryperf)

@miekg
Copy link
Owner

miekg commented May 4, 2018

pprof: (don't fully understand but lower number, second one is with this applied)

Showing top 10 nodes out of 160
      flat  flat%   sum%        cum   cum%
    10.44s 17.39% 17.39%     10.92s 18.18%  syscall.Syscall
     3.60s  6.00% 23.38%      6.19s 10.31%  runtime.mallocgc
     2.88s  4.80% 28.18%      2.88s  4.80%  runtime.procyield
     2.70s  4.50% 32.67%      5.41s  9.01%  runtime.pcvalue
     2.19s  3.65% 36.32%      2.71s  4.51%  runtime.step
     2.16s  3.60% 39.92%      2.16s  3.60%  runtime.futex
     2.02s  3.36% 43.28%      2.51s  4.18%  runtime.scanobject
     1.87s  3.11% 46.39%      2.35s  3.91%  syscall.Syscall6
     1.55s  2.58% 48.98%     10.88s 18.12%  runtime.gentraceback
     1.19s  1.98% 50.96%      1.19s  1.98%  runtime.adjustpointers

patched:

Showing top 10 nodes out of 173
      flat  flat%   sum%        cum   cum%
    8670ms 19.81% 19.81%     9000ms 20.56%  syscall.Syscall
    3370ms  7.70% 27.51%     7030ms 16.06%  runtime.mallocgc
    1850ms  4.23% 31.73%     2510ms  5.73%  runtime.scanobject
    1820ms  4.16% 35.89%     1820ms  4.16%  runtime.futex
    1460ms  3.34% 39.23%     1860ms  4.25%  syscall.Syscall6
     930ms  2.12% 41.35%      930ms  2.12%  runtime.heapBitsSetType
     860ms  1.96% 43.32%      940ms  2.15%  github.com/miekg/dns.packDomainName
     840ms  1.92% 45.24%     2880ms  6.58%  github.com/miekg/dns.sprintName
     760ms  1.74% 46.97%      760ms  1.74%  runtime.heapBitsForObject
     640ms  1.46% 48.44%      880ms  2.01%  runtime.lock

@miekg
Copy link
Owner

miekg commented May 4, 2018

diff looks pretty minimal, don't have any major concerns. @tmthrgd this good to go?

Copy link
Collaborator

@tmthrgd tmthrgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sensible and good to me.

server.go Outdated

defer atomic.AddInt32(&srv.workersCount, -1)

count := 0
Copy link
Collaborator

@tmthrgd tmthrgd May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for an actual counter here. Replace this with a simple bool and set it to true in the <-srv.queue case and to false in the <-timeout.C case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tmthrgd
Copy link
Collaborator

tmthrgd commented May 5, 2018

@miekg Aside from one small nit, this LGTM. 👍

@miekg
Copy link
Owner

miekg commented May 5, 2018 via email

@miekg miekg merged commit 98a1ef4 into miekg:master May 9, 2018
@miekg
Copy link
Owner

miekg commented May 9, 2018

probably mark this 1.0.6 soon.

@UladzimirTrehubenka UladzimirTrehubenka deleted the worker branch May 10, 2018 06:38
@abh
Copy link
Contributor

abh commented Jul 17, 2018

In a simple test of GeoDNS on my mac the qps went up from ~29k to ~37k qps with this change; nice work!

(Specifically I was testing from the commit before this change was merged to whatever is the latest now; and I didn't do anything to make other things not run at the same time, configure the network stack to make sure it wasn't a bottleneck, etc etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants