Use workers instead spawning goroutines for each incoming DNS request #664

UladzimirTrehubenka · 2018-04-02T13:46:23Z

No description provided.

codecov-io · 2018-04-02T13:51:22Z

Codecov Report

Merging #664 into master will increase coverage by 0.07%.
The diff coverage is 81.13%.

@@            Coverage Diff             @@
##           master     #664      +/-   ##
==========================================
+ Coverage   58.05%   58.12%   +0.07%     
==========================================
  Files          37       37              
  Lines        9999    10026      +27     
==========================================
+ Hits         5805     5828      +23     
- Misses       3144     3149       +5     
+ Partials     1050     1049       -1

Impacted Files	Coverage Δ
server.go	`61.78% <81.13%> (+1.78%)`	⬆️
dnssec.go	`59.1% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01d5935...f0dd4e5. Read the comment docs.

tmthrgd · 2018-04-24T14:03:34Z

server.go

+		select {
+		case srv.queueTCPConn <- w:
+		default:
+			srv.spawnWorkerTCP()


This will lead to unbounded goroutine growth. Maybe that’s not a problem in reality, but even a small burst of requests would keep a large number of goroutines forever.

BTW we have unbounded goroutine growth already in current implementation.

@UladzimirTrehubenka Not strictly true, currently the goroutines will be garbage collected once they've served the request. These ones will be kept around forever.

Consider an constant load of 1req/s with a sudden burst of 100req/s for 1s:

Currently one routine will be created per request leading to a peak of 100 goroutines that are then garbage collected.

This pull request will cause 100 goroutines to be active forever.

The above would be particularly bad if a server was swamped with a very large number of bogus requests as in a DDOS situation, because the memory load can't be reclaimed.

Added workers limit (10000) and exit for idle worker

tmthrgd · 2018-04-24T14:10:06Z

server.go

+		case srv.queueTCPConn <- w:
+		default:
+			srv.spawnWorkerTCP()
+			srv.queueTCPConn <- w


There’s a race condition here. If another request comes in and reaches the select statement above before this line is reached, it may steal the new worker.

A solution to this is simple, pass w directly to spawnWorkerTCP. If the argument is non-nil, it can be handled before the for range loop.

Don't follow about race - this is one thread and srv.queue is unbuffered channel,
so we cannot reach select until we push the job to the queue.

@UladzimirTrehubenka It's a race with a separate request that hits the select. The spawning and sending are not atomic operations as a set.

This isn't a correctness issue, but it could lead to hard to diagnose latency spikes.

@UladzimirTrehubenka You're dead right actually, ignore me. I didn't realise this was only ever called from one goroutine.

You could still see, an admittedly small, performance improvement by skipping the channel send as I suggested.

miekg · 2018-04-24T17:37:08Z

Note that I believe the initial impetus for this is that is faster? I can't tell because the PR description and the commit description are pretty thin...
The go standard library http.server does

		tempDelay = 0
		c := srv.newConn(rw)
		c.setState(c.rwc, StateNew) // before Serve can return
		go c.serve(ctx)

Which is what we do know, short lived go routines handling a request; that we somehow need to bound this.. is because of UDP and spoofing?

There is another issue open that says that go 1.11 will support more socket option making the increased speed argument less convincing.

I'll take a short peek at the PR none the less.

miekg · 2018-04-24T18:16:45Z

server.go

@@ -12,9 +12,6 @@ import (
 	"time"
 )

-// Maximum number of TCP queries before we close the socket.
-const maxTCPQueries = 128


this is now not done anymore?

I don't follow why we need to limit TCP queries. If we have constant load between two CoreDNS it is very expensive to drop connection on reach maxTCPQueries (say after each 128 packet first CoreDNS does reconnect to second CoreDNS).

Dropping this limit would be best done as a separate pull request IMO, as it's orthogonal to this.

Revert back maxTCPQueries - will remove in separate PR

miekg · 2018-04-24T18:18:14Z

server.go

+	srv.spawnWorkerUDP()
+}
+
+func (srv *Server) spawnWorkerTCP() {


you can just call go spanWorkerTCP no need to do this in this function

tmthrgd · 2018-04-25T09:59:22Z

server.go

 	// Shutdown handling
 	lock    sync.RWMutex
 	started bool
 }

+func (srv *Server) spawnWorkerUDP() {
+	go func() {


This anonymous func should be a named func so it shows up in panics and is clearer to debug.

Replace it with something like func (srv *Server) workerUDP() { for range ... } and then replace srv.spawnWorkerUDP() with go srv.workerUDP().

tmthrgd · 2018-04-25T10:01:52Z

@UladzimirTrehubenka Do you have any benchmarks for this change? I can think of a few alternative implementations that won't suffer from the unbounded growth problem, but it's hard to compare without numbers.

tmthrgd

I happen to really like this approach, but I think it does need to be justified by some benchmarks (gentle ping).

tmthrgd · 2018-04-26T13:36:27Z

server.go

+func (srv *Server) worker(w *response) {
+	workersCount := atomic.LoadInt32(&srv.workersCount)
+	if workersCount > maxWorkersCount {
+		w.Close()


This should still call srv.serve(w). The goroutine has already been created, there’s no reason to drop this request on the floor.

So need to move workers count check outside worker func - otherwise if we call srv.serve(w) - it will work as original CoreDNS - say spawn goroutine for each request.

That works to, it depends whether it should be a hard limit on all goroutines or only a hard limit on alive goroutines.

Personally I like this approach more because performance can’t be worse than previous. If you put it outside the goroutine and remove maxTCPQueries, there will be an exploitable DOS vector.

tmthrgd · 2018-04-26T13:40:31Z

server.go

+		w.Close()
+		return
+	}
+	atomic.AddInt32(&srv.workersCount, 1)


It’s not hugely important, but this has a race with the LoadInt32 above.

This should be written something like:

for { count := atomic.LoadInt32(...) if count > ... { ... return } if atomic.CompareAndSwapInt32(..., count, count + 1) { break } }

tmthrgd · 2018-04-26T13:41:57Z

server.go

@@ -295,26 +303,76 @@ type Server struct {
 	DecorateReader DecorateReader
 	// DecorateWriter is optional, allows customization of the process that writes raw DNS messages.
 	DecorateWriter DecorateWriter
-


Leave the new line seperating public from private.

tmthrgd · 2018-04-26T13:43:39Z

server.go

+				break LOOP
+			}
+			count = 0
+			timeout = time.After(idleWorkerTimeout)


It should be safe to call timeout.Reset(...) here paired with timer.NewTicker (? - from memory) instead of creating a new timer each iteration.

Done with Ticker - don't follow about Reset()

I was thinking of time.NewTimer + Reset - I didn’t check the docs.

I don’t think time.NewTicker is right here, because it drops channel sends on the floor, meaning this would be sutbley wrong.

tmthrgd · 2018-04-26T13:44:47Z

server.go

+			timeout = time.After(idleWorkerTimeout)
+		}
+	}
+	atomic.AddInt32(&srv.workersCount, -1)


Use a defer and pair it with the atomic operation above. The overhead of defer is negligible and it keeps correctness clearer.

tmthrgd · 2018-04-26T13:49:51Z

server.go

 // ListenAndServe starts a nameserver on the configured address in *Server.
 func (srv *Server) ListenAndServe() error {
 	srv.lock.Lock()
 	defer srv.lock.Unlock()
 	if srv.started {
 		return &Error{err: "server already started"}
 	}
+
+	if srv.Handler == nil {
+		srv.Handler = DefaultServeMux


I’d move this conditional into srv.serveDNS, there’s no need to modify srv here or below.

I purposely moved this code from serveTCP/serveUDP because this code should be called only once on server start - I totally don't understand why we need to call this code on each serveDNS (and another one gap in original code - send handler as argument instead using srv.Handler).

It’s not wrong by any means, but it’s very much not idiomatic. Go code like this very rarely mutates public fields. (This is the net/http case: https://golang.org/src/net/http/server.go?s=82393:82459#L2676).

Also a single if statement will have zero effect on performance.

(You don’t need to pass it in as an argument like it was before, but you can. It should be just put in serveDNS).

johnbelamaric · 2018-04-26T14:14:59Z

@tmthrgd agreed it needs benchmarks, but this will also fix a silent crash that happens when too many go routines are spawned.

tmthrgd · 2018-04-26T15:24:17Z

server.go

+	srv.serve(w)
+
+	count := 0
+	ticker := time.NewTicker(idleWorkerTimeout)


This should be time.NewTimer with ticker.Reset below. time.NewTicker will drop sends on the floor.

tmthrgd · 2018-04-26T15:28:01Z

server.go

+	default:
+	}
+
+	for {


This doesn’t work as well here and dropping requests on the floor is very much the wrong approach. It leads to trivial DOS attacks against the TCP server in particular which would be made worse by removing maxTCPQueries.

Either move this back into worker with a serveDNS before w.Close or replace the w.Close below with a blocking send.

If we move check to worker() - it means that maxWorkersCount makes no sense.
10000 workers are busy and we continue spawn goroutines.
We cannot prevent DOS in any case - the difference is that your proposal doesn't prevent out of resources - but current code does.

This approach has a trivial DOS vector, the existing code doesn’t which is the key point to me. The existing code, and my suggestion to move the check, will scale as far as the host machine will. This code has an artificial hard limit of 10000, which is arbitrary. There are systems which can scale farther than that. The maxWorkersCount would still have value because it would limit the long-lived goroutines.

Some quick numbers: _StackMin is 2KiB meaning 10,000 goroutines uses 2048*10000/2^20 = 19GiB of memory for stack. That’s too large for many systems and far too small for many others.

I like the approach of the check being in the goroutine because it saves the overhead of creating and destroying goroutines, but doesn’t limit the servers maximum performance. This one just drops requests on the floor which feels very wrong to me.

I think conflating resource control with long-lived worker goroutines is a mistake, because they’re too distinct issues.

BTW 2KiB * 10000 ~ 20 MiB not 20 GiB

Whoops @ my math.

UladzimirTrehubenka · 2018-04-26T15:34:45Z

Regarding benchmark - I am not sure that go benchmark is useful.
I am testing CoreDNS with dnsperf tool.
ATM observed that PR's code works faster on 10% with simple forward config (130K vs 144K).
But configuration with Themis policy plugin shows 30% performance (93K vs 125K).

tmthrgd · 2018-04-27T02:37:05Z

@UladzimirTrehubenka Those are exactly the sort of benchmarks I was hoping to see.

UladzimirTrehubenka · 2018-05-04T13:47:56Z

@miekg just a friendly reminder: could you provide some feedback?

miekg · 2018-05-04T16:43:59Z

@tmthrgd already took a look, so that's good. I want echo the sentiment having a benchmark test will help - I haven't had time to follow the entire discussion. Taking a look now.

miekg · 2018-05-04T16:49:37Z

checked out the branch, seeing the same increase @UladzimirTrehubenka also saw, pasting here:

./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     437041 times

  Queries sent:         1311126 queries
  Queries completed:    1311126 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.008981 sec
  RTT min:              0.000004 sec
  RTT average:          0.000215 sec
  RTT std deviation:    0.000172 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:45:58 2018
  Finished at:          Fri May  4 18:46:13 2018
  Ran for:              15.000139 seconds

  Queries per second:   87407.590023 qps

3.59s user 8.98s system 15.01s elapsed 83%CPU (make TIME=15 PORT=1043 queryperf)
% make TIME=15 PORT=1043 queryperf
./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     433578 times

  Queries sent:         1300736 queries
  Queries completed:    1300736 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.008250 sec
  RTT min:              0.000005 sec
  RTT average:          0.000217 sec
  RTT std deviation:    0.000174 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:46:15 2018
  Finished at:          Fri May  4 18:46:30 2018
  Ran for:              15.000274 seconds

  Queries per second:   86714.149355 qps

3.24s user 9.19s system 15.02s elapsed 82%CPU (make TIME=15 PORT=1043 queryperf)
% make TIME=15 PORT=1043 queryperf
./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     536946 times

  Queries sent:         1610839 queries
  Queries completed:    1610839 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.010821 sec
  RTT min:              0.000006 sec
  RTT average:          0.000154 sec
  RTT std deviation:    0.000153 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:48:19 2018
  Finished at:          Fri May  4 18:48:34 2018
  Ran for:              15.000164 seconds

  Queries per second:   107388.092557 qps

3.45s user 9.44s system 15.01s elapsed 85%CPU (make TIME=15 PORT=1043 queryperf)
% make TIME=15 PORT=1043 queryperf
./bin/queryperf-linux_amd64 -d domain.lst.queryperf -l 15 -s 127.0.0.1 -p 1043

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 127.0.0.1)
[Status] Testing complete

Statistics:

  Parse input file:     multiple times
  Run time limit:       15 seconds
  Ran through file:     503934 times

  Queries sent:         1511804 queries
  Queries completed:    1511804 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:         	0.011372 sec
  RTT min:              0.000005 sec
  RTT average:          0.000165 sec
  RTT std deviation:    0.000170 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Fri May  4 18:48:40 2018
  Finished at:          Fri May  4 18:48:55 2018
  Ran for:              15.000168 seconds

  Queries per second:   100785.804532 qps

3.36s user 9.26s system 15.01s elapsed 84%CPU (make TIME=15 PORT=1043 queryperf)

miekg · 2018-05-04T16:51:23Z

pprof: (don't fully understand but lower number, second one is with this applied)

Showing top 10 nodes out of 160
      flat  flat%   sum%        cum   cum%
    10.44s 17.39% 17.39%     10.92s 18.18%  syscall.Syscall
     3.60s  6.00% 23.38%      6.19s 10.31%  runtime.mallocgc
     2.88s  4.80% 28.18%      2.88s  4.80%  runtime.procyield
     2.70s  4.50% 32.67%      5.41s  9.01%  runtime.pcvalue
     2.19s  3.65% 36.32%      2.71s  4.51%  runtime.step
     2.16s  3.60% 39.92%      2.16s  3.60%  runtime.futex
     2.02s  3.36% 43.28%      2.51s  4.18%  runtime.scanobject
     1.87s  3.11% 46.39%      2.35s  3.91%  syscall.Syscall6
     1.55s  2.58% 48.98%     10.88s 18.12%  runtime.gentraceback
     1.19s  1.98% 50.96%      1.19s  1.98%  runtime.adjustpointers

patched:

Showing top 10 nodes out of 173
      flat  flat%   sum%        cum   cum%
    8670ms 19.81% 19.81%     9000ms 20.56%  syscall.Syscall
    3370ms  7.70% 27.51%     7030ms 16.06%  runtime.mallocgc
    1850ms  4.23% 31.73%     2510ms  5.73%  runtime.scanobject
    1820ms  4.16% 35.89%     1820ms  4.16%  runtime.futex
    1460ms  3.34% 39.23%     1860ms  4.25%  syscall.Syscall6
     930ms  2.12% 41.35%      930ms  2.12%  runtime.heapBitsSetType
     860ms  1.96% 43.32%      940ms  2.15%  github.com/miekg/dns.packDomainName
     840ms  1.92% 45.24%     2880ms  6.58%  github.com/miekg/dns.sprintName
     760ms  1.74% 46.97%      760ms  1.74%  runtime.heapBitsForObject
     640ms  1.46% 48.44%      880ms  2.01%  runtime.lock

miekg · 2018-05-04T16:52:47Z

diff looks pretty minimal, don't have any major concerns. @tmthrgd this good to go?

tmthrgd

This looks sensible and good to me.

tmthrgd · 2018-05-05T15:14:24Z

server.go

+
+	defer atomic.AddInt32(&srv.workersCount, -1)
+
+	count := 0


There's no need for an actual counter here. Replace this with a simple bool and set it to true in the <-srv.queue case and to false in the <-timeout.C case.

tmthrgd · 2018-05-05T15:17:18Z

@miekg Aside from one small nit, this LGTM. 👍

miekg · 2018-05-05T17:04:49Z

[ Quoting <notifications@github.com> in "Re: [miekg/dns] Use workers instead..." ]

@miekg Aside from one small nit, this LGTM. 👍

nice! Ok, let's wait for that and I'll merge. Thanks for the effort here, the both of you!

miekg · 2018-05-09T15:45:05Z

probably mark this 1.0.6 soon.

abh · 2018-07-17T05:32:02Z

In a simple test of GeoDNS on my mac the qps went up from ~29k to ~37k qps with this change; nice work!

(Specifically I was testing from the commit before this change was merged to whatever is the latest now; and I didn't do anything to make other things not run at the same time, configure the network stack to make sure it wasn't a bottleneck, etc etc).

UladzimirTrehubenka mentioned this pull request Apr 2, 2018

Use workers instead spawning goroutines for each incoming DNS request #639

Closed

UladzimirTrehubenka force-pushed the worker branch from 75bb3d2 to 43210a0 Compare April 9, 2018 09:14

UladzimirTrehubenka force-pushed the worker branch 2 times, most recently from 4b44e42 to de0cd6f Compare April 24, 2018 10:52

tmthrgd reviewed Apr 24, 2018

View reviewed changes

miekg reviewed Apr 24, 2018

View reviewed changes

tmthrgd reviewed Apr 25, 2018

View reviewed changes

UladzimirTrehubenka force-pushed the worker branch 7 times, most recently from 1c00dd3 to 0ce1c8a Compare April 26, 2018 12:41

tmthrgd reviewed Apr 26, 2018

View reviewed changes

UladzimirTrehubenka force-pushed the worker branch 2 times, most recently from 83a18b6 to 996c46c Compare April 26, 2018 15:18

tmthrgd reviewed Apr 26, 2018

View reviewed changes

UladzimirTrehubenka force-pushed the worker branch from 996c46c to 53d2807 Compare April 26, 2018 16:13

Use workers instead spawning goroutines for each incoming DNS request

bb0e09e

UladzimirTrehubenka force-pushed the worker branch from 53d2807 to bb0e09e Compare April 27, 2018 08:14

johnbelamaric mentioned this pull request May 3, 2018

High CPU usage compared to dnsmasq coredns/coredns#1757

Closed

tmthrgd approved these changes May 5, 2018

View reviewed changes

Replace count (int) with inUse (bool)

f0dd4e5

miekg merged commit 98a1ef4 into miekg:master May 9, 2018

UladzimirTrehubenka deleted the worker branch May 10, 2018 06:38

tmthrgd mentioned this pull request Mar 8, 2019

Request spike use CPU and memory #916

Closed

Use workers instead spawning goroutines for each incoming DNS request #664

Use workers instead spawning goroutines for each incoming DNS request #664

Conversation

UladzimirTrehubenka commented Apr 2, 2018

codecov-io commented Apr 2, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

UladzimirTrehubenka Apr 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miekg commented Apr 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmthrgd commented Apr 25, 2018

tmthrgd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric commented Apr 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

UladzimirTrehubenka Apr 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

UladzimirTrehubenka commented Apr 26, 2018

tmthrgd commented Apr 27, 2018

UladzimirTrehubenka commented May 4, 2018

miekg commented May 4, 2018

miekg commented May 4, 2018

miekg commented May 4, 2018

miekg commented May 4, 2018

tmthrgd left a comment

Choose a reason for hiding this comment

tmthrgd May 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmthrgd commented May 5, 2018

miekg commented May 5, 2018 via email

miekg commented May 9, 2018

abh commented Jul 17, 2018 • edited Loading

codecov-io commented Apr 2, 2018 •

edited

Loading

UladzimirTrehubenka Apr 24, 2018 •

edited

Loading

UladzimirTrehubenka Apr 26, 2018 •

edited

Loading

tmthrgd May 5, 2018 •

edited

Loading

abh commented Jul 17, 2018 •

edited

Loading