Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: DNS broken on darwin without cgo (1.13 regression) #31705

Open
bradfitz opened this issue Apr 26, 2019 · 18 comments

Comments

Projects
None yet
6 participants
@bradfitz
Copy link
Member

commented Apr 26, 2019

I was testing some new code for the Go build system and found that a simple TCP dial doesn't work on Mac anymore, at least when the binary is cross-compiled.

Code is just:

var coordDialer = &net.Dialer{
        Timeout:   10 * time.Second,
        KeepAlive: 15 * time.Second,
}       

// dialCoordinatorTCP returns a TCP connection to the coordinator, making                                                                                                                                                                                          
// a CONNECT request to a proxy as a fallback.                                                                                                                                                                                                                     
func dialCoordinatorTCP(ctx context.Context, addr string) (net.Conn, error) {
        tcpConn, err := coordDialer.DialContext(ctx, "tcp", addr)

... with a context.Background() for ctx.

It always times out after 10 seconds.

But if I redeploy the same code but built with Go 1.12.x, it works fine.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2019

My best guess is f6b42a5 ("net: use libSystem bindings for DNS resolution on macos if cgo is unavailable").

We might need some more test coverage. Or a no-cgo darwin builder.

/cc @ianlancetaylor @grantseltzer

@bradfitz bradfitz changed the title net: buildlet doesn't work on darwin-amd64 with Go master net: cross-compiled cgo-less buildlet doesn't work on darwin-amd64 with Go master Apr 26, 2019

@groob

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2019

FWIW I can't seem to reproduce with a binary compiled on a 10.14.4 mac. I tested building with cgo disabled and setting GODEBUG=netdns=go.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

I built on Linux and ran on a Mac, without setting any special environment variables.

@grantseltzer

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

Could this be because when you cross compile on Linux the linker doesn't have access to libSystem?

Not sure how this done for every other binding when there's cross compilation

Also, this is with CGO enabled, netcgo not specified, cross compiled for darwin on linux?

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

We just disabled cgo support for darwin/386 (per #31751) so we now have a CGO_ENABLED=0 Mac builder, which now hits this issue. Which is good in that we can reproduce it.

Looks like it's stuck in DNS queries, so f6b42a5 looks implicated.

https://build.golang.org/log/289a154e730768cccbc64dd0ea2af16b4b48db88

ok  	mime	0.017s
ok  	mime/multipart	0.365s
ok  	mime/quotedprintable	0.112s
panic: test timed out after 3m0s

goroutine 343 [running]:
testing.(*M).startAlarm.func1()
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:1380 +0xc5
created by time.goFunc
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/time/sleep.go:169 +0x31

goroutine 1 [chan receive, 2 minutes]:
testing.(*T).Run(0x11720f00, 0x239388, 0xf, 0x2482c8, 0x1)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:964 +0x2c5
testing.runTests.func1(0x114da000)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:1205 +0x54
testing.tRunner(0x114da000, 0x11498f10)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:912 +0x90
testing.runTests(0x114a8020, 0x3eef00, 0xdf, 0xdf, 0x0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:1203 +0x227
testing.(*M).Run(0x11474200, 0x0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:1120 +0x111
net.TestMain(0x11474200)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/net/main_test.go:52 +0x25
main.main()
	_testmain.go:552 +0xfa

goroutine 612 [chan receive, 2 minutes]:
testing.(*T).Parallel(0x11720aa0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:817 +0x18d
net.TestLookupGoogleSRV(0x11720aa0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/net/lookup_test.go:70 +0x29
testing.tRunner(0x11720aa0, 0x2482f0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:912 +0x90
created by testing.(*T).Run
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:963 +0x2a6

goroutine 613 [chan receive, 2 minutes]:
testing.(*T).Parallel(0x11720b40)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:817 +0x18d
net.TestLookupGmailMX(0x11720b40)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/net/lookup_test.go:119 +0x1f
testing.tRunner(0x11720b40, 0x2482d8)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:912 +0x90
created by testing.(*T).Run
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:963 +0x2a6

goroutine 614 [chan receive, 2 minutes]:
testing.(*T).Parallel(0x11720be0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:817 +0x18d
net.TestLookupGmailNS(0x11720be0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/net/lookup_test.go:165 +0x1f
testing.tRunner(0x11720be0, 0x2482dc)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:912 +0x90
created by testing.(*T).Run
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:963 +0x2a6

goroutine 615 [chan receive, 2 minutes]:
testing.(*T).Parallel(0x11720c80)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:817 +0x18d
net.TestLookupGmailTXT(0x11720c80)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/net/lookup_test.go:214 +0x29
testing.tRunner(0x11720c80, 0x2482e0)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:912 +0x90
created by testing.(*T).Run
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:963 +0x2a6

goroutine 619 [running]:
	goroutine running on other thread; stack unavailable
created by testing.(*T).Run
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/testing/testing.go:963 +0x2a6
FAIL	net	180.028s
@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Could this be because when you cross compile on Linux the linker doesn't have access to libSystem?

I don't think this should matter. We don't actually need access to libSystem to build a binary which dynamically links to it. Building on Linux and running on a Mac should work fine with regards to this feature.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 30, 2019

Change https://golang.org/cl/174637 mentions this issue: dashboard: add darwin-amd64-nocgo config, remove nacl-386 trybot

gopherbot pushed a commit to golang/build that referenced this issue Apr 30, 2019

dashboard: add darwin-amd64-nocgo config, remove nacl-386 trybot
Also remove dead nacl-arm. It hasn't run in ages.

And update netbsd comment about why 386 doesn't run. And correct its
VM image name.

Updates golang/go#31705
Updates golang/go#31726

Change-Id: I9de4605f34a052d0a84684fca098388d75602a82
Reviewed-on: https://go-review.googlesource.com/c/build/+/174637
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bcmills

This comment has been minimized.

Copy link
Member

commented May 6, 2019

This seems to be the same failure on the darwin-386-10_14 buildlet; retitling accordingly.

https://build.golang.org/log/d3e5f74f9924af70e34b2de05f64cb9cdfb41310

@bcmills bcmills changed the title net: cross-compiled cgo-less buildlet doesn't work on darwin-amd64 with Go master net: cross-compiled cgo-less buildlet doesn't work on darwin with Go master May 6, 2019

@grantseltzer

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

My theory is that this has to do with the build constraints here. Perhaps CgoEnabled isn't checked correctly? I know that cgo is disabled when cross compiling.

Is reproducing this just cross compiling from macos to linux?

@bradfitz bradfitz changed the title net: cross-compiled cgo-less buildlet doesn't work on darwin with Go master net: DNS broken on darwin without cgo (1.13 regression) May 6, 2019

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 6, 2019

Is reproducing this just cross compiling from macos to linux?

You can reproduce this on a Mac, without cross compiling. Just build with CGO_ENABLED=0.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 15, 2019

Easy way to repro on a mac:

Works:

barloga5k:net $ CGO_ENABLED=1 go test -v -short -run=TestGoLookupIP net 
=== RUN   TestGoLookupIPWithResolverConfig
--- PASS: TestGoLookupIPWithResolverConfig (0.03s)
=== RUN   TestGoLookupIPOrderFallbackToFile
--- PASS: TestGoLookupIPOrderFallbackToFile (0.00s)
PASS
ok  	net

Hangs:

barloga5k:net $ CGO_ENABLED=0 go test -v -short -run=TestGoLookupIP net 
=== RUN   TestGoLookupIPWithResolverConfig
--- PASS: TestGoLookupIPWithResolverConfig (0.03s)
=== RUN   TestGoLookupIPOrderFallbackToFile
--- PASS: TestGoLookupIPOrderFallbackToFile (0.00s)
=== RUN   TestGoLookupIP

(hang)

^\SIGQUIT: quit
PC=0x7fff7b71c1b2 m=0 sigcode=0

goroutine 0 [idle]:
runtime.pthread_cond_wait(0x149b788, 0x149b748, 0x7ffe00000000)
	/Users/bradfitz/go/src/runtime/sys_darwin.go:368 +0x39
runtime.semasleep(0xffffffffffffffff, 0x104afac)
	/Users/bradfitz/go/src/runtime/os_darwin.go:63 +0x85
runtime.notesleep(0x149b548)
	/Users/bradfitz/go/src/runtime/lock_sema.go:173 +0xe0
runtime.stopm()
	/Users/bradfitz/go/src/runtime/proc.go:1928 +0xc0
runtime.findrunnable(0xc00001e000, 0x0)
	/Users/bradfitz/go/src/runtime/proc.go:2391 +0x53f
runtime.schedule()
	/Users/bradfitz/go/src/runtime/proc.go:2524 +0x2be
runtime.park_m(0xc000062600)
	/Users/bradfitz/go/src/runtime/proc.go:2610 +0x9d
runtime.mcall(0x105a796)
	/Users/bradfitz/go/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000116200, 0x12ad59e, 0xe, 0x12bcfc0, 0x10a9c01)
	/Users/bradfitz/go/src/testing/testing.go:964 +0x377
testing.runTests.func1(0xc000116000)
	/Users/bradfitz/go/src/testing/testing.go:1210 +0x78
testing.tRunner(0xc000116000, 0xc00007ae08)
	/Users/bradfitz/go/src/testing/testing.go:912 +0xbf
testing.runTests(0xc00000c040, 0x14972c0, 0xdf, 0xdf, 0x0)
	/Users/bradfitz/go/src/testing/testing.go:1208 +0x2a7
testing.(*M).Run(0xc0000de000, 0x0)
	/Users/bradfitz/go/src/testing/testing.go:1125 +0x160
net.TestMain(0xc0000de000)
	/Users/bradfitz/go/src/net/main_test.go:52 +0x39
main.main()
	_testmain.go:554 +0x135

goroutine 73 [running]:
	goroutine running on other thread; stack unavailable
created by testing.(*T).Run
	/Users/bradfitz/go/src/testing/testing.go:963 +0x350

rax    0x104
rbx    0x2
rcx    0x7ffeefbff548
rdx    0xb00
rdi    0x149b788
rsi    0x290100002a00
rbp    0x7ffeefbff5d0
rsp    0x7ffeefbff548
r8     0x0
r9     0xa0
r10    0x0
r11    0x202
r12    0x149b788
r13    0x16
r14    0x290100002a00
r15    0x882f5c0
rip    0x7fff7b71c1b2
rflags 0x203
cs     0x7
fs     0x0
gs     0x0
FAIL	net	13.181s
FAIL

The hang is somewhere inside res_search.

The path that hangs is:

netgo_unix_test.go => cgo_darwin_stub.go : func cgoLookupIP (misleading name, supposed to only conditionally use cgo) => func resolverGetResources => res_search (defined in runtime).

Note that the libSystem call to res_init does succeed, at least, so it's not some libcCall cgo_import_dynamic problem in general.

@randall77, any ideas?

@groob

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

When I run the test above and look at logs I seem to run into error messages related to /var/db/DetachedSignatures, which is usually a code signing check. Similarly MacOS error: -67062.

I wonder if the issue occurs when the binary is code signed.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 15, 2019

I didn't sign any binary. I'm not aware of any part of the Go build process that automatically signs binaries, either?

@groob

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

I'm suggesting that a signed binary might succeed while an unsigned one will get blocked by the kernel. Sorry for the confusion.

I'll test my theory.

@groob

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

Can some share a reproducible func main example? I can reproduce by running the test, but I've tried multiple combinations of CGO_ENABLED and GODEBUG=netdns and I can't get the issue to show up that way.

@groob

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

(the code signing logs were not an issue)

I modified runtime/lookup_darwin.go to add a println and I can see that I'm actually getting to the res_search function, but the function returns successfully an my binary works fine.

@groob

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

Looking at the build tags.
The failing test is set to have !cgo netgo and darwin,

// +build !cgo netgo
// +build darwin dragonfly freebsd linux netbsd openbsd solaris

but the stub which is getting invoked is the one with

// +build !netgo,!cgo
// +build darwin

which was added as part of f6b42a5#diff-612b16c746d269bd07f162f3fd6ea47eR6

The test is still expecting to reach cgo_stub.go which now has !darwin

This "fixes" the issue by making the tags match up:

diff --git a/src/net/netgo_unix_test.go b/src/net/netgo_unix_test.go
index c672d3e8eb..3cd85d2ccd 100644
--- a/src/net/netgo_unix_test.go
+++ b/src/net/netgo_unix_test.go
@@ -3,7 +3,7 @@
 // license that can be found in the LICENSE file.

 // +build !cgo netgo
-// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build !darwin dragonfly freebsd linux netbsd openbsd solaris

 package net

I'm not sure what the correct solution is, but it doesn't seem like this test should pass on darwin anymore, since it's expecting the placeholder, not an actual function call.

	_, err, ok := cgoLookupIP(ctx, "ip", host)
	if ok {
		t.Errorf("cgoLookupIP must be a placeholder")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.