Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Support the /etc/resolver DNS resolution configuration hierarchy on OS X #12524

Closed
Rotonen opened this Issue Sep 6, 2015 · 34 comments

Comments

Projects
None yet
@Rotonen
Copy link

Rotonen commented Sep 6, 2015

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man5/resolver.5.html

OS X allows you to add TLD specific resolver configurations. Quite popular ones are /etc/resolver/vm for local virtual machines and /etc/resolver/dev for local development purposes.

https://golang.org/src/net/dnsclient_unix.go#L231

Go seems to be hardcoded to only take /etc/resolv.conf into account on Unix platforms.

@nodirt

This comment has been minimized.

Copy link
Member

nodirt commented Sep 6, 2015

I don't think Go-native DNS resolving mechanism is used on Mac.
https://golang.org/src/net/dnsclient_unix.go#L231 is not executed if I run

addrs, err := net.LookupHost("google.com")

on my Mac.

If I enable debugging (GODEBUG=netdns=2 go run test.go), the following is printed:

go package net: using cgo DNS resolver
go package net: hostLookupOrder(google.com) = cgo

which means that OS-native DNS resolving is used.

Can you supply an exact configuration file, Go code, actual and expected output?

@titanous

This comment has been minimized.

Copy link
Member

titanous commented Sep 6, 2015

@nodirt This is for a binary with cgo off.

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Sep 6, 2015

If cgo is disabled then the pure go DNS resolver will be used. If you want
to use the Mac DNS resolver, plese build with cgo.

On Mon, 7 Sep 2015 07:47 Jonathan Rudenberg notifications@github.com
wrote:

@nodirt https://github.com/nodirt This is for a binary with cgo off.


Reply to this email directly or view it on GitHub
#12524 (comment).

@nodirt

This comment has been minimized.

Copy link
Member

nodirt commented Sep 6, 2015

Shouldn't be a problem since this is needed only on a dev machine.

On Sun, Sep 6, 2015 at 4:06 PM Dave Cheney notifications@github.com wrote:

If cgo is disabled then the pure go DNS resolver will be used. If you want
to use the Mac DNS resolver, plese build with cgo.

On Mon, 7 Sep 2015 07:47 Jonathan Rudenberg notifications@github.com
wrote:

@nodirt https://github.com/nodirt This is for a binary with cgo off.


Reply to this email directly or view it on GitHub
#12524 (comment).


Reply to this email directly or view it on GitHub
#12524 (comment).

@titanous

This comment has been minimized.

Copy link
Member

titanous commented Sep 6, 2015

In this specific case, @Rotonen was using the Flynn binary that we distribute as a compiled artifact, it is compiled without cgo to ease cross-compilation. Just because the user is a developer doesn't mean that they are a Go developer or want to compile the binary for themselves. The only question here is if this feature is out of scope for the pure-Go resolver.

@minux

This comment has been minimized.

Copy link
Member

minux commented Sep 7, 2015

@ianlancetaylor ianlancetaylor changed the title Support the /etc/resolver DNS resolution configuration hierarchy on OS X net: Support the /etc/resolver DNS resolution configuration hierarchy on OS X Sep 8, 2015

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Sep 8, 2015

I don't see anything wrong with supporting the OS X /etc/resolver directory. That said, my understanding is that the Go DNS resolver does not work well on most OS X machines. That is why it is disabled by default.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Sep 8, 2015

@mterron

This comment has been minimized.

Copy link

mterron commented Jun 1, 2016

This would be great in all platforms anyway. Is there any disadvantage from supporting this behaviour? It seems that it'd neatly resolve the need to install and configure dnsmasq to provide the simple function of having different resolvers for different TLDs.

@jason-riddle

This comment has been minimized.

Copy link

jason-riddle commented Feb 16, 2017

i know this issue is quite old but has there been any traction on this?

@pantocrator27

This comment has been minimized.

Copy link

pantocrator27 commented Jun 23, 2017

any resolution?

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jun 23, 2017

Any updates would be posted here. No updates have been posted here.

@bitglue

This comment has been minimized.

Copy link

bitglue commented Apr 20, 2018

See resolver(5). Just reading the files out of /etc/resolver/* will miss out on other mechanisms for configuring the same thing, for example configuration profiles or IKE attributes.

@flyinprogrammer

This comment has been minimized.

Copy link

flyinprogrammer commented Sep 25, 2018

Just stumbled upon this today while attempting to use coredns as a dns proxy for local development. It's a real bummer to discover how naive our support for os x is.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Sep 25, 2018

We've generally assumed people use cgo on Darwin, so this bug has never been a priority.

I do admit that practically means that Darwin binaries need to be built on Darwin, which is difficult for people wanting to cross-compile for a dozen platforms as part of their release process.

Perhaps on Darwin without cgo we could just shell out to a program to do DNS resolution (e.g. host, dig, nslookup?). At least nslookup has an interactive mode that would permit re-using a child process for multiple lookups, if that proves necessary for performance.

@bitglue

This comment has been minimized.

Copy link

bitglue commented Sep 26, 2018

I think reality is most command-line utilities will compile for two platforms: Linux and OS X, and the OS X build will always have cgo disabled. Some subset of the OS X users are using VPN, expect .local names to resolve, or have some other situation where hostname resolution is more than "just query this one DNS server always". Some subset of those users will actually open an issue with the tool, and of those even a smaller subset identify go as the problem and raise an issue here.

So I think you underestimate the impact of the problem.

Shelling out to nslookup will not fix it. The problem is "doing a DNS query" is not the same thing as "resolving a hostname". Resolving a hostname involves more, such as:

  • /etc/hosts
  • RFC6762 .local names
  • Other hostname resolution protocols, such as NIS or LDAP, if configured
  • Honoring the domain search path, if configured
  • If DNS is to be used, determining which server to use.

Tools like host, nslookup, and dig do DNS queries by design, not resolve hostnames. This is equally true on Linux as well as OS X. Unfortunately somehow OS X has acquired some lore about having "two DNS systems", which is simply false. Or at least it was false, until go command-line utilities gained popularity.

If you do want to shell out to a command to perform host resolution, the correct command on OS X is dscacheutil -q host -a name $hostname. This is analogous to getent hosts $hostname on Linux.

Another path is to make the go resolver's behavior more consistent with the OS X system resolver. This begins with obtaining resolver configuration from SystemConfiguration.framework or scutil --dns, not /etc/resolv.conf.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Sep 26, 2018

dscacheutil sounds good. I was thinking of lookupd when I wrote the comment above but my local machine didn't have lookupd so I omitted it. Now I see that dscacheutil replaced lookupd.

I don't think we want to get into the business of reimplementing Darwin's name resolution.

@randall77, since you're having fun with macOS lately, any thoughts here? Could we have non-cgo binaries still call into the macOS name resolution code somehow with some assembly/linker goo?

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Sep 26, 2018

Let's see if we can use the libSystem bindings directly even when cgo is ostensibly disabled.

@Rotonen

This comment has been minimized.

Copy link
Author

Rotonen commented Sep 26, 2018

expect .local names to resolve

I actually expect .local names to resolve on all platforms per mDNS anyway, if the target responds to the broadcast appropriately.

@nordicmachine

This comment has been minimized.

Copy link

nordicmachine commented Jan 21, 2019

@bitglue is correct. I think a lot of people are going to file issues against a tool and not raise issues to the Go project. A good example of this is Homebrew. They recently removed support for options in their install which now means people can't install packages written in Go, like Hashicorp's Vault with cgo support. We used to be able to do 'brew install vault --with-dyanmic' to enable cgo support to get correct DNS resolution, but now that is removed and we're stuck with having to hack their install script to get Vault compiled with cgo. It would be nice to see Go's native resolver work in a less naive fashion so we don't need to worry about this issue anymore.

See Homebrew/homebrew-core#33507 for reference.

@timfallmk

This comment has been minimized.

Copy link

timfallmk commented Feb 12, 2019

I would chime in and venture that the root of this issue might be that the net package treats all Unix systems the same. Perhaps there should be a stubbed out variant for MacOS to deal with it's configd based resolution?

This issue, as has been noted, will affect every binary not compiled with cgo when users are using VPNs, which would seem to be a common use case.

@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Feb 12, 2019

@rsc Can you provide some detail on how we might be able to call libSystem bindings without cgo?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Feb 12, 2019

@grantseltzer The current runtime package is full of examples of calling into libSystem. See runtime/sys_darwin.go.

@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Feb 19, 2019

I'm taking a stab at this, I have a branch on my github fork here: https://github.com/grantseltzer/go but could use some help

The function call i'm looking for is res_search which is in libresolv (/usr/lib/libresolv.9.dylib)

I have the cgo_import_dynamic directive:

//go:cgo_import_dynamic libresolv_res_search res_search "/usr/lib/libresolv.9.dylib"

The Go function that makes the libcCall call and trampoline (sys_darwin.go):

//go:nosplit
//go:cgo_unsafe_args
func Res_search(name *byte, class int32, rtype int32, answer *byte, anslen int32) int32 {
	return libcCall(unsafe.Pointer(funcPC(res_search_trampoline)), unsafe.Pointer(&name))
}
func res_search_trampoline()

and defined the amd64 assembly routine (sys_darwin_amd64.s):

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
	PUSHQ	BP
	MOVQ	SP, BP
	MOVL	0(DI), SI		// arg 1 name
	MOVQ	8(DI), DX		// arg 2 class
	MOVQ	12(DI), CX		// arg 3 type
	MOVQ	16(DI), R8		// arg 4 answer
	MOVQ	24(DI), R9		// arg 5 anslen
	CALL	libresolv_res_search(SB)
	POPQ	BP
	RET

When testing the function (which is exported just for testing), I get a return code of -1 and no response in buffer:

func main() {

	name := "google.com"
	var nameAddr = name[0]

	var buffer = [512]byte{}

	x := runtime.Res_search(&nameAddr, 255,
		255, &buffer[0], 512)

	fmt.Println("res_search return code:", x)
	fmt.Printf("Buffer: %s\n", buffer)
}

Anything glaring that i'm missing? Perhaps my datatypes or stack offset sizes.

Most importantly, can someone link me to documentation on how to debug the code at this level?

EDIT:
more testing/version information:

uname -a

Darwin Grant-SelzterRichman 17.7.0 Darwin Kernel Version 17.7.0: Thu Dec 20 21:47:19 PST 2018; root:xnu-4570.71.22~1/RELEASE_X86_64 x86_64
go version go1.11.5 darwin/amd64
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Feb 19, 2019

@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Feb 25, 2019

I believe I was misusing MOVQ vs MOVL (now potentially fixed to this):

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
	PUSHQ	BP
	MOVQ	SP, BP
	MOVQ	0(DI), DI		// arg 1 name
	MOVL	8(DI), SI		// arg 2 class
	MOVL	12(DI), DX		// arg 3 type
	MOVQ	16(DI), CX		// arg 4 answer
	MOVL	24(DI), R8		// arg 5 anslen
	CALL	libresolv_res_search(SB)
	POPQ	BP
	RET

Still not there yet though.

I'm stepping through with delve and my hunch is that RDI has not been properly initialized when entering the res_search_trampoline in sys_darwin_amd64.s

When moving from offsets off DI to the respective arg registers the program appears to be blowing away the destination registers instead (pictured below):

debugger-blowing-away-regs

Another thing that's confusing me is that when I step into Res_search (the go function that makes the call to libcCall) my arguments are unreadable:

screen shot 2019-02-25 at 3 00 45 pm

Anyone have a hunch of why this call isn't working or have advice on debugging?

@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Feb 27, 2019

Update:

I am getting DNS records using the libresolv res_search binding with cgo disabled :D!

Working to confirm that this actually honors the /etc/resolver files, not sure if it is at the moment.

screen shot 2019-02-27 at 4 01 36 pm

Would still love to hear an explanation for this, but the way I got it working was by changing the order of the arguments being loaded to the order of them listed in the dlv screenshot above:

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
	PUSHQ	BP
	MOVQ	SP, BP
	MOVL	(DI), R8		// arg 5 anslen
	MOVQ	16(DI), CX		// arg 4 answer
	MOVL	8(DI), SI		// arg 2 class
	MOVQ	0(DI), DI		// arg 1 name
	MOVL	12(DI), DX		// arg 3 type
	CALL	libresolv_res_search(SB)
	POPQ	BP
	RET
@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Mar 1, 2019

Current update: Calling this routine does in fact honor /etc/resolver/ files. I'm currently trying to figure out an issue where the specified query 'type' is not being honored and only AAAA queries are sent.

My questions for once I fix that and prepare it for a CL:

  1. Should this routine be defined for all of i386, x86_64, ARM, and ARM64?
  2. What testing mechanisms exist for code at this level beyond manually?
  3. Should the cgo bindings exist in runtime or are they appropriate for the net package?
@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Mar 8, 2019

Opened #30686

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Mar 8, 2019

Change https://golang.org/cl/166297 mentions this issue: net: Use libSystem bindings for DNS resolution on macos if CGO is unavailable

@mikioh

This comment has been minimized.

Copy link
Contributor

mikioh commented Mar 12, 2019

If we want to accommodate several DNS stub resolver implementations, typically it would be as follows:

  • well-cooked external getaddrinfo based one; currently enabled by netdns=cgo,
  • half-baked external resolver library, res_xxx, based one,
  • from scratch; currently enabled by netdns=go.

However, I'm still not sure we really need to hold all of the implementations in the package net. Is there any specific reason not making a new API that accepts external stub resolver implementations? Once we open up the API, we are also able to use the API for upcoming fancy technologies such as DoH (DNS over HTTPS).

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Mar 12, 2019

TEXT runtime·res_search_trampoline(SB),NOSPLIT,$0
	PUSHQ	BP
	MOVQ	SP, BP
	MOVL	(DI), R8		// arg 5 anslen
	MOVQ	16(DI), CX		// arg 4 answer
	MOVL	8(DI), SI		// arg 2 class
	MOVQ	0(DI), DI		// arg 1 name
	MOVL	12(DI), DX		// arg 3 type
	CALL	libresolv_res_search(SB)
	POPQ	BP
	RET

The last MOVL is using a DI value that just got clobbered in the previous instruction. You have to load DI last.
The manpage is unclear about what the return value of res_search is. You might need to call libc_error if the return value is <0 to get an actual error code. See mmap for an example.

Debugging this stuff is hard generally. Sorry about that. It does seem that you're making progress though.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 12, 2019

By the way, if Darwin supports res_nsearch and friends, we should probably use them, as they are thread-safe. res_search and res_nsearch normally return the length of the response and I assume the same is true on Darwin.

@grantseltzer

This comment has been minimized.

Copy link
Contributor

grantseltzer commented Mar 12, 2019

@randall77 Ah that makes a lot of sense, thank you! I pushed changes including the error checks (they return size of response, unless error which is -1)

@ianlancetaylor I have been working on this today, as well as changing the GODEBUG/CGO set logic discussed on gerrit.

res_nsearch is supported.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Mar 12, 2019

In order to use res_nsearch we would have to use res_ninit. I don't know whether res_search would also work OK, but it's troubling that it's not considered to be thread-safe on GNU/Linux. I don't know about Darwin. I don't know when the global variable is modified.

But I guess that to use res_ninit and res_nsearch we would need to at least know the size of res_state. Probably the best approach would be to double-check that on Darwin res_state is <= 512 bytes, as I expect it is, and then use [64]uint64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.