net: better DNS transport for builtin DNS stub resolver #23866
Split from #18588. The existing DNS stub resolver uses per-query DNS transport that would cause socket descriptor exhaustion. It might be better to have some mechanism for DNS transport to reduce the use of socket descriptors. For UDP-based DNS transport, it might be the use of shared UDP endpoints. For TCP-based DNS transport, it might be the use of connection pooling.
This was referenced
Feb 16, 2018
referenced this issue
Feb 27, 2018
referenced this issue
Jun 26, 2018
I have code written to implement sharing sockets for DNS. The change ended up being more significant than I first realized and I wanted to discuss the new code before asking for codereview.
This is my first codereview request and so I'm sorry if this description is too detailed and long. Can I ask for a codereview based on this description?
The following two public options are added to Resolver:
For implementation, the Resolver has a private map dnsConns and a mutex dnsConnsMu. The map key is the (network, server) tuple.
The type dnsConn has the following mutable fields:
r.exchange now calls a general dnsRoundTrip() function, in a new dns_conn.go file. It handles the r.dial(), Conn.Read/Readfull, conn.Write, and conn.Close functions instead of exchange, dnsPacketRoundTrip and dnsStreamRoundTrip functions.
dnsRoundTrip locks the dnsConns map. If the *dnsConn does not exist for (network, server), then the slice is created and a connection is opened with openRefs of 1. If *dnsConn exists for (network, server), dnsRoundTrip checks each existing dnsConn for one with openRefs < MaxConnsPerSocket. Also, hasErr cannot be true to use the dnsConn. If dnsRoundTrip gets to end of slice, a new dnsConn is created.
The newDnsConn function starts three different go routines: open, read and loop. There is also one go routine created for each individual request, to handle specific context and timeout. There are many channels for the routines to communicate to each other:
For each request, dnsRoundTrip puts the request information on the AddIdCh, which is received by Loop. There is a callback in dnsConnAddIdParam and dnsRoundTrip blocks each request dnsRoundTrip receives on this callback channel. The callback channel will receiver (dnsmessage.Parser, dnsmessage.Header, error) struct, including errors from opening connection.
hasOpenedCh is first sent on by the Open go routine, after r.dial either returns a Conn or an error. Read and Loop go routines each take the open results and put it back. Read takes the Open result once to get the Conn and exits if there was an open error. Loop, when it handles the AddIdCh, will block until it receives on hasOpenedCh. Loop puts the result back into the channel. If there is an Open error, Loop sends the error on the callback channel to dnsRoundTrip. If there is not an open error, Loop gets a unique ID, adds the ID to the request map and writes the DNS message on the wire.
stopReadCh is a callback channel sent from Loop to Read back to Loop. When Loop is closing the connection, Loop uses the stopReadCh channel to ensure the Read go routine closes.
readErrorCh goes from Read to Loop in case of an error from c.Read. The channel is buffered and so does not block. Read then blocks until it receives on stopReadCh. Loop handles readErrorCh by sending the read error back to all dnsRoundTrip callbacks still listening.
readCh goes from Read to Loop is only for the byte slice from successful Conn.Read or Conn.ReadFull. Loop handles readCh by first parsing only the Header. For both udp and tcp connections, the loop ignores parsing errors. Before, only bad udp responses ignored parsing errors. I needed to add tcp connections as well since a tcp connection could send a correct response for a request timed out. If the request ID is found and it matches the question for that ID, the results are sent on the addIdCh callback and the go routine for the request's timer is cancelled.
reqTimedOutCh, reqCancelledCh and reqTimerExitCh are sent from go routines for each request to Loop. reqTimedOutCh is sent on when timer.C returns, after the DNS request's timeout. reqCancelledCh is to handle ctx.Done for the specific request. There is also a buffered cancelTimer for every specific request, which Loop sends on for a successful request. reqTimerExitCh is sent from request go routine to Loop for every request go routine exit. Only after receiving reqTimerExitCh does Loop delete the ID.
closeCh goes from dnsRoundTrip to the loop. When dnsRoundTrip receives the result back from callback sent on addIdCh, the dnsConnsMu mutex is first locked. Then openRefs is decremented and if dnsRoundTrip receives an error, sets hasErr to true.
If openRefs is 0 and hasErr is true, dnsRoundTrip sends on closeCh and dnsConn is deleted from dnsConns. If openRefs is 0 but Conn is not in error, timer.AfterFunc is called. The passed function checks locks dnsConnsMu. The passed function sends on closeCh and deletes dnsConn if all three are true: dnsConn still exists, openRefs is 0 and lastNewConn does not indicate a new connection after when timer.AfterFunc was called. Loop's closeCh handler makes sure the Read and request timer go routines all close and then Loop is exited.
For testing, along with some new tests, fakeDNSConn had a channel added from the Write function to Read function. The previous tests assumed Write would be called before Read. dnsConn likely starts Read before the first Write. So Read for fakeDNSConn blocks until it receives a parsed question from Write. Write starts a go routine to write on that channel but not block.
Two existing tests needed to be substantially changed. TestIgnoreDNSForgeries called dnsPacketRoundTrip directly. LookupHost was called instead, with the fake connection still sending two bad responses and one good one for every request. TestStrictErrorsLookupIP had an issue with flaky failures on its last test case. The test sent a correct response for the A request and a timeout error for the AAAA request. If the dnsConn Loop received the error for the AAAA request first, it would return the timeout error for both requests. The rh function now checks that the correct A request was returned before the timeout for the AAAA request, which is how a DNS resolver which ignores AAAA records would act.
++ on addressing the root cause here. I believe that I've run into this as well with a highly concurrent system making many HTTP requests. Some of the requests are to the same API endpoint and we see results where one connection works and another fails with
This is on macOS where the default per-process file descriptor limit is 256. Increasing the limit resolved the problem.