Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: BSD: DNS go resolver loses first UDP outbound packet if onlink target not in neighbor cache #43398

Open
philpennock opened this issue Dec 28, 2020 · 28 comments

Comments

@philpennock
Copy link

@philpennock philpennock commented Dec 28, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.6 freebsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pdp/.cache/go-build"
GOENV="/home/pdp/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOINSECURE=""
GOMODCACHE="/home/pdp/go/pkg/mod"
GONOPROXY="github.com/ConnectEverything"
GONOSUMDB="github.com/ConnectEverything"
GOOS="freebsd"
GOPATH="/home/pdp/go"
GOPRIVATE="github.com/ConnectEverything"
GOPROXY="https://athens.home.pennock.cloud,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/home/pdp/tmp/go-build876596449=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I have recurring problems with DNS resolution taking 5+ seconds "sometimes", and only for Go software. I finally got lucky and got a tcpdump showing it.

dns_resolve.go
package main

import (
	"context"
	"fmt"
	"net"
	"os"
	"path/filepath"
	"strconv"
	"time"

	"golang.org/x/net/idna"
)

const EX_USAGE = 64 // per sysexits.h

var Progname string

func Stderr(spec string, args ...interface{}) {
	call := make([]interface{}, 1, 1+len(args))
	call[0] = Progname
	call = append(call, args...)
	fmt.Fprintf(os.Stderr, "%s: "+spec+"\n", call...)
}

func main() {
	Progname = filepath.Base(os.Args[0])
	retval := 0
	defer func() {
		if retval != 0 {
			os.Exit(retval)
		}
	}()

	type resolveDuration struct {
		name     string
		duration time.Duration
	}

	durations := make([]resolveDuration, 0, len(os.Args)) // need _at most_ len(..)-1

	startAll := time.Now()

	res := &net.Resolver{}
	ctx := context.Background()
	count := 0
	for _, hostname := range os.Args[1:] {
		count += 1
		puny, err := idna.ToASCII(hostname)
		if err != nil {
			Stderr("punycode translation of %q failed: %v", hostname, err)
			retval = 1
			continue
		}
		var labelQuoted string
		if puny != hostname {
			labelQuoted = strconv.Quote(hostname) + " [via " + strconv.Quote(puny) + "]"
		} else {
			labelQuoted = strconv.Quote(hostname)
		}
		startThis := time.Now()
		hostlist, err := res.LookupHost(ctx, puny)
		if err != nil {
			Stderr("failed to resolve %s: %v", labelQuoted, err)
			retval = 1
			continue
		}
		durations = append(durations, resolveDuration{name: puny, duration: time.Now().Sub(startThis)})
		fmt.Printf("%s: %s is %+v\n", Progname, labelQuoted, hostlist)
	}
	if count == 0 {
		Stderr("need hostnames to resolve")
		retval = EX_USAGE
	}
	overallDuration := time.Now().Sub(startAll).Round(time.Millisecond)
	Stderr("[%s] overall", overallDuration)
	for _, rd := range durations {
		Stderr("[%s]   %q", rd.duration.Round(time.Millisecond), rd.name)
	}
}
% dns_resolve nats.lan
dns_resolve: "nats.lan" is [192.168.120.160 fd81:1c9e:b379:0:ff:60ff:fe10:e88d]
dns_resolve: [5.039s] overall
dns_resolve: [5.039s]   "nats.lan"

% cat /etc/resolv.conf
# Generated by resolvconf
search lan
nameserver 192.168.120.23
nameserver 192.168.120.24
nameserver 192.168.120.210

All DNS resolvers are fully functional (Unbound 1.13.0), all responding. The .lan zone is served from the resolvers, has an SOA and NS records, has IPv4 and IPv6.

What did you expect to see?

A fraction of a second for net.Resolver.LookupHost() to return A and AAAA records.

What did you see instead?

As you can see in this TCP dump output, the A response came in immediately, but it was 5 seconds before the LookupHost() call issued the AAAA query. There was no retry, the result from the first query was accepted. Go's DNS resolution just wedged for a duration which happens to be equal to the timeout period, before accepting the response.

tcpdump output
# tcpdump -vvvnpi vnet0.37 -Xs0 port 53
tcpdump: listening on vnet0.37, link-type EN10MB (Ethernet), capture size 262144 bytes
00:39:58.588176 IP (tos 0x0, ttl 64, id 24746, offset 0, flags [none], proto UDP (17), length 54)
    192.168.120.241.25371 > 192.168.120.23.53: [udp sum ok] 61515+ A? nats.lan. (26)
        0x0000:  4500 0036 60aa 0000 4011 a7b3 c0a8 78f1  E..6`...@.....x.
        0x0010:  c0a8 7817 631b 0035 0022 8569 f04b 0100  ..x.c..5.".i.K..
        0x0020:  0001 0000 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 0001 0001                           n.....
00:39:58.588948 IP (tos 0x0, ttl 64, id 56517, offset 0, flags [none], proto UDP (17), length 70)
    192.168.120.23.53 > 192.168.120.241.25371: [udp sum ok] 61515* q: A? nats.lan. 1/0/0 nats.lan. [1h] A 192.168.120.160 (42)
        0x0000:  4500 0046 dcc5 0000 4011 2b88 c0a8 7817  E..F....@.+...x.
        0x0010:  c0a8 78f1 0035 631b 0032 f95b f04b 8580  ..x..5c..2.[.K..
        0x0020:  0001 0001 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 0001 0001 c00c 0001 0001 0000 0e10  n...............
        0x0040:  0004 c0a8 78a0                           ....x.
00:40:03.624669 IP (tos 0x0, ttl 64, id 30198, offset 0, flags [none], proto UDP (17), length 54)
    192.168.120.241.27018 > 192.168.120.24.53: [udp sum ok] 19282+ AAAA? nats.lan. (26)
        0x0000:  4500 0036 75f6 0000 4011 9266 c0a8 78f1  E..6u...@..f..x.
        0x0010:  c0a8 7818 698a 0035 0022 23d8 4b52 0100  ..x.i..5."#.KR..
        0x0020:  0001 0000 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 001c 0001                           n.....
00:40:03.625467 IP (tos 0x0, ttl 64, id 8673, offset 0, flags [none], proto UDP (17), length 82)
    192.168.120.24.53 > 192.168.120.241.27018: [udp sum ok] 19282* q: AAAA? nats.lan. 1/0/0 nats.lan. [1h] AAAA fd81:1c9e:b379:0:ff:60ff:fe10:e88d (54)
        0x0000:  4500 0052 21e1 0000 4011 e65f c0a8 7818  E..R!...@.._..x.
        0x0010:  c0a8 78f1 0035 698a 003e ba9d 4b52 8580  ..x..5i..>..KR..
        0x0020:  0001 0001 0000 0000 046e 6174 7303 6c61  .........nats.la
        0x0030:  6e00 001c 0001 c00c 001c 0001 0000 0e10  n...............
        0x0040:  0010 fd81 1c9e b379 0000 00ff 60ff fe10  .......y....`...
        0x0050:  e88d                                     ..
^C
4 packets captured
101 packets received by filter
0 packets dropped by kernel
@philpennock
Copy link
Author

@philpennock philpennock commented Dec 28, 2020

While I can prove this happens with FreeBSD, I think I've seen it with Linux too. I just don't have things hitting DNS in the same way from Linux clients, as regularly, and I can't swear that I've seen it on Linux.

I first saw this behavior with the natscli(1) tool for talking to NATS servers, so I wrote a client which did what I wanted in one process. But I still saw the behavior, and after a lot of head-scratching, I put in a pre-flight DNS resolution and managed to isolate the delays there. I wrote a simplified DNS resolution tool to just do the DNS lookups, with timing. And then late last night I got very lucky and managed to trigger the delay again, with just the simplified tool, while I was running a tcpdump looking for the delay.

I was expecting to see retransmissions. Not to see that the response came back immediately and the client never asked that DNS question again, so it did process and accept the response, eventually.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 28, 2020

What is the contents of /etc/nsswitch.conf on the system where this happens? I'm wondering whether the net package is using the native Go DNS resolver or whether it is calling out to the system DNS resolver by calling the C function getaddrinfo. It might also be interesting to run the program with GODEBUG=netdns=1 in the environment.

@ianlancetaylor ianlancetaylor changed the title DNS stalls for timeout before trying AAAA net: DNS stalls for timeout before trying AAAA Dec 28, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Dec 28, 2020
@philpennock
Copy link
Author

@philpennock philpennock commented Dec 28, 2020

Should have said, sorry: hosts: files dns -- second thing I checked.

$ cat /etc/nsswitch.conf 
#
# nsswitch.conf(5) - name service switch configuration file
# $FreeBSD: releng/11.3/etc/nsswitch.conf 301711 2016-06-09 01:28:44Z markj $
#
group: compat
group_compat: nis
hosts: files dns
netgroup: compat
networks: files
passwd: compat
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
@philpennock
Copy link
Author

@philpennock philpennock commented Dec 28, 2020

$ GODEBUG=netdns=1 dns_resolve nats.lan
go package net: dynamic selection of DNS resolver
dns_resolve: "nats.lan" is [192.168.120.160 fd81:1c9e:b379:0:ff:60ff:fe10:e88d]
dns_resolve: [5.035s] overall
dns_resolve: [5.035s]   "nats.lan"
@philpennock
Copy link
Author

@philpennock philpennock commented Dec 28, 2020

I kept periodically trying again with a forced resolver, finally got a 5seconds delay: Go resolver.

0:pdp@olaf-z[14:05](66)~$ GODEBUG=netdns=2+go dns_resolve nats.lan
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(nats.lan) = files,dns
dns_resolve: "nats.lan" is [192.168.120.160 fd81:1c9e:b379:0:ff:60ff:fe10:e88d]
dns_resolve: [5.027s] overall
dns_resolve: [5.027s]   "nats.lan"
@philpennock
Copy link
Author

@philpennock philpennock commented Dec 29, 2020

I can now reproduce this more reliably: if the IP address of the first DNS resolver is not in the ARP cache, then after the response once the query goes out, the IP moves into the ARP cache but the Go programs seem to stall instead of processing the DNS response UDP packet in a timely manner.

I can not reproduce this "stalling after clearing ARP cache" with C programs doing DNS work, only with Go programs.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 29, 2020

That's weird. I don't understand why the ARP cache would be involved at all.

5 seconds is the default timeout for waiting for a response from a DNS server. So it seems that Go is not recognizing the DNS server's response for some reason. But also Go normally sends both an A and an AAAA query at the same time, unless single-request appears in the resolv.conf file, which doesn't seem to be the case here. So at this point I have no idea what is going on.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 29, 2020

Some rebuilding later: if the DNS resolvers in /etc/resolv.conf are given as IPv6 addresses, then the same effect applies if I remove the entries from the NDP cache.

My assumption is that the ARP and NDP caches have to be populated before the outbound query can be passed onto the DNS server, so these should have been populated before the response comes back.

The FreeBSD environments where these tests are running are FreeBSD Jails on a FreeNAS box, with vnet virtualized network stacks connected to the NAS's bridge, which is bridged onto the home network. The NAS and the DNS resolvers are all wired ethernet. IPv6 is ULA with no global connectivity. There's no BPF inside the jail, so I can't tcpdump there; I can try to change that to see if the tcpdump from the host vs from the jail see different packets.


Without a reset ARP cache, I see the Go app send out the AAAA query just before the A query, in what looks like a Happy Eyeballs approach.

So at this point, it looks like if there's no entry in the cache, the first packet is held up, or dropped, before the timeout triggers a retransmit, while the second DNS UDP packet goes through fine.

This can't be a generic "UDP packet dropped when entry not in cache" issue though, since I don't see delays in this scenario with C language DNS tools.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 29, 2020

Unless the connected UDP socket is in non-blocking mode, so the Go code is getting an -1/EAGAIN response to the UDP send when it's an on-link address with no entry in the appropriate cache, and the Go code is not handling it that EAGAIN?

Rampant speculation, sorry, but it's the only thing which makes any kind of sense to me.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 29, 2020

The Go standard library always uses non-blocking network I/O, so it always gets EAGAIN errors. It seems unlikely that that would be a problem here.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 29, 2020

With hindsight: the fact that when the AAAA packet is eventually observed in the tcpdump, it's going to the second DNS resolver is a stronger signal that "something was lost" with the original AAAA query. So it's not "stalls for timeout before trying AAAA" as I originally titled this, but "loses first transmitted DNS query and retries to second resolver after timeout".

@philpennock philpennock changed the title net: DNS stalls for timeout before trying AAAA net: DNS native resolver loses first UDP outbound packet if onlink target not in neighbor cache Dec 29, 2020
@philpennock philpennock changed the title net: DNS native resolver loses first UDP outbound packet if onlink target not in neighbor cache net: DNS go resolver loses first UDP outbound packet if onlink target not in neighbor cache Dec 29, 2020
@philpennock
Copy link
Author

@philpennock philpennock commented Dec 29, 2020

I can confirm that with GODEBUG=netdns=2+cgo there are no problems in DNS resolution, this only occurs with GODEBUG=netdns=2+go

@philpennock philpennock changed the title net: DNS go resolver loses first UDP outbound packet if onlink target not in neighbor cache net: BSD: DNS go resolver loses first UDP outbound packet if onlink target not in neighbor cache Dec 29, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 29, 2020

The Go DNS resolver should normally send out queries for both the A and the AAAA records simultaneously, so it's weird that that is not happening.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 30, 2020

Found it. The parallel transmission is triggering BSD behavior (shared with macOSX, but Go forces cgo DNS resolver on macOSX for other reasons) of dropping all but the most recent UDP packet for a destination on the local network when the destination is not in the ARP cache.

man 4 arp contains:

     ARP caches Internet-Ethernet address mappings.  When an interface
     requests a mapping for an address not in the cache, ARP queues the
     message which requires the mapping and broadcasts a message on the
     associated network requesting the address mapping.  If a response is
     provided, the new mapping is cached and any pending message is
     transmitted.  ARP will queue at most one packet while waiting for a
     response to a mapping request; only the most recently ``transmitted''
     packet is kept.  If the target host does not respond after several
     requests, the host is considered to be down allowing an error to be
     returned to transmission attempts.  Further demand for this mapping
     causes ARP request retransmissions, that are ratelimited to one packet
     per second.  The error is EHOSTDOWN for a non-responding destination
     host, and EHOSTUNREACH for a non-responding router.

So the difference between C programs and Go programs is that most C programs are not doing happy-eyeballs by default, so there's only one concurrent DNS request pending.

Go's DNS resolver is too aggressive for the OS's UDP stack when the DNS resolver is not in the relevant neighbor cache.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2020

Thanks, nice analysis. So in practice this only matters if the DNS server is on the local network, rather than being reached through a router.

Now we have to figure out what to do about it.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 30, 2020

On Linux, arp(7) documents the unres_qlen entry for procfs neighbor entries (/proc/sys/net/ipv4/neigh/default/unres_qlen for instance) as being:

The maximum number of packets which may be queued for each unresolved address by other network layers. Defaults to 3.

On my laptop, using wifi, this value appears to have been bumped to 101, for default as well as the wifi interface.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 30, 2020

I think that the same applies if the DNS resolver is reached through a router, if the router is not currently in the ARP cache.

This seems like something the BSDs should be fixing in a world of Happy Eyeballs. But that doesn't help us now.

In my case, I had a gitolite server using a post-commit hook to publish to NATS; the server runs in a VNET jail, so gitolite is normally the only thing populating the ARP cache, so I'd see 5 second stalls "annoyingly often" when doing a git push. I've worked around it for now, since these are all static hosts and known to my config management system, by publishing static ARP/NDP files with entries for the DNS servers, flagged as permanent entries, so I won't see timeouts here. I can remove this to help test any proposed solutions.

Spitballing:

  1. We could have a half-second timeout as the first step for DNS resolutions and flag "it's alive" when we get a response, and do an early retransmit of DNS for those entries.
  2. the ARP cache is exposed via sysctl MIBs; if net.link.ether.inet.max_age seconds have passed since we last talked to a DNS server or this is our first transmission to it, we could check to see if it's in the ARP cache and pre-warm
  3. Contrary to the above text, further down in the same man-page we find that net.link.ether.inet.maxhold is a sysctl which defaults to 1, so it's possible to raise the limit system-wide. If net.link.ether.inet.maxhold is 1 on BSDs, we could set conf.singleRequest and disable concurrent DNS queries. GODEBUG=netdns could warn that this happened.

Excuse me, I have to go tune a sysctl ...

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 30, 2020

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252278 filed asking the FreeBSD maintainers to raise the default, which is a longer-term mitigation.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2020

I"m not quite sure that this is a problem to be fixed on the Go side at all. The timeout could also be reduced in /etc/resolv.conf. Or /etc/resolv.conf could set single-request to disable Happy Eyeballs. Or, of course, GODEBUG=netdns=cgo.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 30, 2020

We build a client tool for multiple OSes, single-binary so env hacks are not viable. We added cross-compilation for FreeBSD and I've been testing the tools on FreeBSD. The stalls, only happening with Go, are significant enough to cause concern. We can't tune system settings for people installing our software. We generally try to avoid having problems which warrant FAQ entries.

So releasing client tools and client libraries, where we can't tune the system or update /etc/resolv.conf we're looking for "what can we do in our Golang code to make the Go binaries behave with acceptable performance". Unfortunately the inability to replace the net resolver or to touch much of its state limits our options here. I believe that building with the netcgo tag would block us from using cross-compilation without a lot of extra framework setup, right? If there were a way to do the opposite of setting net.DefaultResolver.PreferGo then that would work. Or any way of setting the .singleRequest field from Go.

Ideally we'd call a target-os-build-tag provided PlatformSetup function which we'd write, where on FreeBSD we'd grab the integer MIB for maxhold from sysctl and if it's 1, force on SingleRequest.

Failing that, I guess we just document the limitation and ask people to tune their systems.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2020

As far as I can see this problem is quite unlikely to happen in real situations. It is likely that the DNS server is reached through a router, or is on the router itself; in such a case that router will reliably be in the ARP cache. When the DNS server is on the local network but is not the router, then any name lookup will contact the DNS server, so again the DNS server is likely to be in the ARP cache. So as far as I can tell the problem only occurs in cases that are quite unlikely to occur in practice, for all that they can occur in test setups.

I'm not strongly opposed to fixing this in the Go standard library, but for what seems to me to be an unlikely case I don't see a good argument for adding new API, and I wouldn't want to see a lot of OS-specific code to check the ARP cache.

We could consider making single-request the default for BSD based systems, though that would slightly punish BSD systems for an unlikely case.

@philpennock
Copy link
Author

@philpennock philpennock commented Dec 31, 2020

I did not encounter this in a test setup. This was real.

Any virtualization using a virtualized network stack for the container will encounter it more often. I would expect to see this behavior in most offices, where DNS is not served from the home router, when FreeBSD is in use. FreeNAS is a popular NAS storage OS, so this will affect people running Go software in jails on their NAS.

Of my three spitballed suggestions, number 3 seems the sanest to me. There are already multiple sysctl int retrievals in the net package on BSDs, in sock_bsd.go and sysctl is found in a few other places already. So a start-up sysctl call to check to see if net.link.ether.inet.maxhold is greater than 1, and if not to set single-request seems like the sanest low-impact way to avoid spurious long delays.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 1, 2021

OK, want to send a patch?

@philpennock
Copy link
Author

@philpennock philpennock commented Jan 3, 2021

Sure. I suspect my contributor status might have lapsed, and I'm not at Google, so my 2009-era status might need a refresh? (My golang.org email forwarding was removed when I left, for sure) I can set up gerrit accounts etc, but I'm also happy for you to just take this patch and brush it up. LMK which way you'd like me to go here.

Unfortunately the current tip doesn't install on FreeBSD, broken tests, and I'm not chasing that down right now, so this is a "rough draft" of what I think the solution looks like. I've chased down the manual-pages on the current BSDs to back the claims I make here.

From 916c9766b40249d13e9a3e4e9d02eb954d03b296 Mon Sep 17 00:00:00 2001
From: Phil Pennock <phil@pennock-tech.com>
Date: Sun, 3 Jan 2021 00:03:16 -0500
Subject: [PATCH] Throttle DNS to singleRequest where ARP limited

The concurrent DNS resolution of the Go resolver is not safe when the next-hop
IP address of the DNS resolver is not currently in the neighbor cache (ARP for
IPv4, NDP for IPv6), unless the kernel will retain more than one packet for a
given IP while awaiting ARP/NDP resolution.

On Linux, the default number of packets to hold while waiting is 101 and
there's no problem.  Many OSes still use historical BSD sockets behavior of
limiting this to just 1.  FreeBSD and Darwin use a sysctl to let the limit be
varied, but while Darwin raises the default, FreeBSD currently still defaults
to 1.

If we think the limit is 1 and any of the resolvers are not on a loopback IP,
then implicitly set singleRequest for the DNS resolver.

Resolves #43398
---
 src/net/conf.go            | 27 +++++++++++++++++++++++++++
 src/net/limits.go          | 18 ++++++++++++++++++
 src/net/limits_bsdconst.go | 15 +++++++++++++++
 src/net/limits_bsddyn.go   | 26 ++++++++++++++++++++++++++
 src/net/limits_linux.go    | 32 ++++++++++++++++++++++++++++++++
 src/net/limits_other.go    | 16 ++++++++++++++++
 6 files changed, 134 insertions(+)
 create mode 100644 src/net/limits.go
 create mode 100644 src/net/limits_bsdconst.go
 create mode 100644 src/net/limits_bsddyn.go
 create mode 100644 src/net/limits_linux.go
 create mode 100644 src/net/limits_other.go

diff --git a/src/net/conf.go b/src/net/conf.go
index f1bbfedad0..66bdc3e7e2 100644
--- a/src/net/conf.go
+++ b/src/net/conf.go
@@ -107,6 +107,33 @@ func initConfVal() {
 		confVal.forceCgoLookupHost = true
 	}
 
+	// LookupHost stalls of DNS timeout are seen when the DNS resolver is
+	// reached via a next-hop IP not currently in the ARP/NDP cache if the
+	// kernel only keeps the very latest packet for that IP while waiting for
+	// ARP/NDP resolution.  Our Go resolver concurrency results in performance
+	// loss.  We can't sanely tell if it's safe "right now" without a lot more
+	// intricate ARP parsing and remaining time evaluation, so instead we just
+	// disable concurrent DNS on hosts where the limit is 1.
+	if safeMaxUDP, _ := MaxSafeConcurrentUDP(); safeMaxUDP == 1 {
+		// A common pattern is to have a localhost DNS resolver; it's common
+		// enough to be worth a hail mary check before throttling.
+		allLocal := true
+		for _, ipPortStr := range confVal.resolv.servers {
+			ipStr, _, err := SplitHostPort(ipPortStr)
+			if err != nil {
+				allLocal = false
+				break
+			}
+			if !ParseIP(ipStr).IsLoopback() {
+				allLocal = false
+				break
+			}
+		}
+		if !allLocal {
+			confVal.resolv.singleRequest = true
+		}
+	}
+
 	if _, err := os.Stat("/etc/mdns.allow"); err == nil {
 		confVal.hasMDNSAllow = true
 	}
diff --git a/src/net/limits.go b/src/net/limits.go
new file mode 100644
index 0000000000..63eb1ec8fd
--- /dev/null
+++ b/src/net/limits.go
@@ -0,0 +1,18 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package net
+
+// MaxSafeConcurrentUDP is the safe limit on concurrent UDP packets to a
+// destination when the local next-hop IP is not in the L2 cache (assuming
+// ethernet; ARP for IPv4, NDP for IPv6).  Until the MAC address is known, most
+// OSes will hold "a small number" of packets for a given next-hop and then
+// start discarding earlier packets.  Unfortunately this limit is still 1 on
+// some OSes.
+//
+// The maxSafe return value will be 1 in error scenarios, so should always be a
+// safe value to use as a safe limit.
+func MaxSafeConcurrentUDP() (maxSafe int, ok bool) {
+	return maxSafeConcurrentUDP()
+}
diff --git a/src/net/limits_bsdconst.go b/src/net/limits_bsdconst.go
new file mode 100644
index 0000000000..1f5ca38d74
--- /dev/null
+++ b/src/net/limits_bsdconst.go
@@ -0,0 +1,15 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly netbsd openbsd
+
+package net
+
+// Unless and until we know of something like the FreeBSD sysctl, we rely upon
+// the manual-page arp(4) documented limit.
+// As of 2021-01, NetBSD, OpenBSD and DragonFlyBSD all document a static value
+// of 1.
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+	return 1, true
+}
diff --git a/src/net/limits_bsddyn.go b/src/net/limits_bsddyn.go
new file mode 100644
index 0000000000..d5294b406c
--- /dev/null
+++ b/src/net/limits_bsddyn.go
@@ -0,0 +1,26 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd darwin
+
+package net
+
+import (
+	"syscall"
+)
+
+// On FreeBSD, as of 2021-01 the default untuned value of the sysctl is 1; on a
+// test mac mini running Darwin 17.7.0, the value observed is 16.
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+	var (
+		n   uint32
+		err error
+	)
+	n, err = syscall.SysctlUint32("net.link.ether.inet.maxhold")
+	if n == 0 || err != nil {
+		return 1, false
+	}
+	return int(n), true
+
+}
diff --git a/src/net/limits_linux.go b/src/net/limits_linux.go
new file mode 100644
index 0000000000..c602731e22
--- /dev/null
+++ b/src/net/limits_linux.go
@@ -0,0 +1,32 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+
+package net
+
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+	// For Linux, we return the default value, not the specific value for a
+	// given interface, as being "close enough".  We don't want code sending
+	// packets to destinations to also need to care about which interface the
+	// packets are going out of: the complexity would spread too far up the
+	// APIs to support an edge scenario.  Modern Linux has sane defaults anyway
+	// and is not the reason for this check.
+	fd, err := open("/proc/sys/net/ipv4/neigh/default/unres_qlen")
+	if err != nil {
+		return 1, false
+	}
+	defer fd.close()
+	l, ok := fd.readLine()
+	if !ok {
+		return 1, false
+	}
+	f := getFields(l)
+	n, _, ok := dtoi(f[0])
+	if n == 0 || !ok {
+		return 1, false
+	}
+
+	return n, true
+}
diff --git a/src/net/limits_other.go b/src/net/limits_other.go
new file mode 100644
index 0000000000..1b48e9c0b6
--- /dev/null
+++ b/src/net/limits_other.go
@@ -0,0 +1,16 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !linux
+// +build !darwin,!freebsd,!netbsd,!openbsd
+// +build !dragonfly,!netbsd,!openbsd
+
+package net
+
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+	// Our assumption is that the old BSD socket behavior is the default unless
+	// and until we learn otherwise.  This appears to be true on Windows, at
+	// least.
+	return 1, false
+}
-- 
2.28.0
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 3, 2021

If you're willing to send the change through Gerrit or GitHub, please do that, as it will ensure that we have the right copyright agreement. Thanks.

@philpennock
Copy link
Author

@philpennock philpennock commented Jan 4, 2021

SG. Talking to CEO/COO to double-check CLA status, will get this in Mon/Tue (I expect).

Looks like very old Linux kernels don't have the proc path I reference above, but will have /proc/sys/net/ipv4/neigh/eth0/unres_qlen assuming eth0 exists: it's a missing default interface that bites. Could try eth0 as an appropriate fallback for kernels of that vintage, and if that fails too then on Linux we might make the default value be 3 instead of 1. The "too old, badly needs a rebuild" box I discovered this on has 31 in that file.

@dmgk
Copy link
Contributor

@dmgk dmgk commented Jan 6, 2021

@philpennock
Copy link
Author

@philpennock philpennock commented Jan 6, 2021

@dmgk I didn't see that much detail but it being net/http rings a bell.

$currentEmployer now has an org CLA signed so I expect to get the patch submitted that way today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.