Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect: duplicate reflect.Type values under dynamic linking #18729

Closed
fsenart opened this issue Jan 20, 2017 · 23 comments
Closed

reflect: duplicate reflect.Type values under dynamic linking #18729

fsenart opened this issue Jan 20, 2017 · 23 comments

Comments

@fsenart
Copy link

fsenart commented Jan 20, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8rc2 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/fsenart/projects"
GORACE=""
GOROOT="/home/fsenart/tools/go1.8rc2"
GOTOOLDIR="/home/fsenart/tools/go1.8rc2/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build840231100=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

// main.go

package main

import "plugin"

func main() {
	p, err := plugin.Open("plugin.so")
	if err != nil {
		panic(err)
	}
	f, err := p.Lookup("F")
	if err != nil {
		panic(err)
	}

	f.(func())()
}
// plugin.go

package main

import "C"
import "net/http"

func F() {
	_, err := http.Get("https://amazon.com")
	if err != nil {
		panic(err)
	}

	_, err = http.Get("https://google.com")
	if err != nil {
		panic(err)
	}
}
test: build
	@./main

build: clean
	@go build -buildmode=plugin -o plugin.so plugin.go
	@go build -o main main.go

clean:
	@rm -rf main

What did you expect to see?

I don't expect any panic to happen.

What did you see instead?

A panic happens when requesting https://google.com while all works as expected when requesting https://amazon.com.

panic: Get https://google.com: remote error: tls: bad record MAC

goroutine 1 [running]:
plugin/unnamed-7da7c3830d2c48526e010430dd8f9f30a6c7f139.F()
	/home/fsenart/projects/src/issues/issue_badmac/plugin.go:14 +0xa7
main.main()
	/home/fsenart/projects/src/issues/issue_badmac/main.go:15 +0xb6
Makefile:2: recipe for target 'test' failed
make: *** [test] Error 2

Observations

@bradfitz
Copy link
Contributor

/cc @crawshaw @randall77

@bradfitz bradfitz added this to the Go1.8Maybe milestone Jan 20, 2017
@crawshaw
Copy link
Member

This is probably a reflect bug of some sort meaning a message is being encoded incorrectly.

While it's a nice short test case (thanks!) it's tricky to pull apart because the "bad record MAC" is an error coming back from the server. I'm going to see if I can get some of the cyrpto/tls unit tests running inside a plugin and hope those narrow it down.

If they don't, does anyone have a TLS server designed for debugging? (A TLS-aware equivalent of tcpdump?)

@crawshaw
Copy link
Member

crawshaw commented Jan 24, 2017

The same error occurs with -buildmode=shared, which gives us some (not exactly scrutable) unit test failures.

$ go test -linkshared -short crypto/tls
--- FAIL: TestClientResumption (0.02s)
	handshake_client_test.go:687: DifferentCipherSuite: handshake failed: local error: tls: bad record MAC
--- FAIL: TestVersion (0.01s)
	handshake_server_test.go:340: handshake failed: local error: tls: bad record MAC
--- FAIL: TestSCTHandshake (0.01s)
	handshake_server_test.go:390: handshake failed: local error: tls: bad record MAC
--- FAIL: TestHandshakeClientECDHERSAChaCha20 (0.00s)
	handshake_client_test.go:396: TLSv12-ECDHE-RSA-CHACHA20-POLY1305 #2: mismatch on read: got:16030300251000002120459a0798b78ac5dc1220f0798323cb75f8ce32eb80fbdeb886eaf25de1b57f541403030001011603030020a879516ad07f72e38cc4ec2a6a947bd25db2acce314e1101365e995f9ef0f643 want:160303002510000021202fe57da347cd62431528daac5fbb290730fff684afc4cfc2ed90995f58cb3b7414030300010116030300209d2fa6b72156ad38a831200b2edc3f8a3464de810ed3a5b1c1fc0518d93e7735
--- FAIL: TestHandshakClientSCTs (0.00s)
	handshake_client_test.go:396: TLSv12-SCT #2: mismatch on read: got:16030300251000002120459a0798b78ac5dc1220f0798323cb75f8ce32eb80fbdeb886eaf25de1b57f54140303000101160303002098378699d1c39a3dd65fd0e3575bfd63e7101d88cd3e47f79aa9ccdeda303546 want:160303002510000021202fe57da347cd62431528daac5fbb290730fff684afc4cfc2ed90995f58cb3b7414030300010116030300207a58e133d4ceca57efeab99df24decce864be9c2b564dd0f32f066654274d859
...

After many more similar failures it locks up. Building with -c and sending it a SIGQUIT show it stuck in a way that suggests one end of the client/server has locked up. EDIT: I'm omitting the stack dump but it's available if anyone wants it.

I suspect going after the unit test failures will be easier than the lockup, so I'll spend some time today trying to simplify one of them.

cc @mwhudson as this now affects shared libraries. Would be nice if we had a -linkshared builder to have caught this earlier.

@crawshaw
Copy link
Member

Ah ha, I'm going to start here:

$ go install -buildmode=shared runtime sync runtime/cgo time
$ go test -short -linkshared encoding/...
?   	encoding	[no test files]
ok  	encoding/ascii85	0.041s
--- FAIL: TestUnmarshal (0.00s)
	asn1_test.go:506: #3:
		have &[]int{1, 2, 3}
		want &[]int{1, 2, 3}
FAIL
FAIL	encoding/asn1	0.033s
ok  	encoding/base32	0.035s
ok  	encoding/base64	0.037s
ok  	encoding/binary	0.028s
ok  	encoding/csv	0.030s
ok  	encoding/gob	0.067s
ok  	encoding/hex	0.028s
ok  	encoding/json	0.404s
ok  	encoding/pem	0.054s
ok  	encoding/xml	0.060s

Oddly if I try to run go test -short -linkshared encoding/asn1 directly, it hangs, but in this form, the failure is consistent.

@fsenart
Copy link
Author

fsenart commented Jan 24, 2017

@crawshaw do you remember the other issue I've opened about encoding/asn1 #18584. The two may be related.

@crawshaw
Copy link
Member

I had forgotten about that bug. Presumably the fix for #18252 wasn't enough (which has been my working assumption so far, that there's a reflect/type-info bug). I'll dig around.

@crawshaw
Copy link
Member

Simplified test case:

package main // prg.go

import "reflect"

var d interface{} = &[]int{}

func main() {
	pv := reflect.New(reflect.TypeOf(d).Elem())
	v := pv.Interface()

	println(reflect.TypeOf(v) == reflect.TypeOf(d))
}

At tip:

$ go run prg.go 
true
$ go run -linkshared prg.go 
false
$

@crawshaw crawshaw changed the title plugin: tls bad record MAC reflect: duplicate reflect.Type values under dynamic linking Jan 24, 2017
@crawshaw
Copy link
Member

In the above example, both the binary and libstd.so contain type symbols for *[]int, as you would expect. But the program does not contain any dynamic relocations for it:

$ go tool nm libstd.so | grep "type.\*\[]int$"
 147e240 D type.*[]int
$ go tool nm prg | grep "type.\*\[]int$"
  602960 D type.*[]int
$ readelf -r prg | grep type
0000006029d0  000400000001 R_X86_64_64       0000000000000000 type.int + 0
000000602a10  000400000001 R_X86_64_64       0000000000000000 type.int + 0
000000603040  001400000001 R_X86_64_64       0000000000000000 type.8b7X1see + 0
000000603048  001100000001 R_X86_64_64       0000000000000000 type.H0+h5nWH + 0

@mwhudson I think this is basically the eface version of the itab problem from #18252. Shouldn't there be a dynamic relocation for the *rtype in the binary prg?

@mwhudson
Copy link
Contributor

Well no, because prg is an executable so the system linker can assume no other type.*[]int symbol will be interposed in front of the one in prg.

Adding

	println("d", reflect.TypeOf(d))
	println("v", reflect.TypeOf(v))

to prg.go gives as output:

d (0x603040,0x602940)
v (0x603040,0x7f84580f66a0)

d looks correct to my level of understanding, 0x602940 is the address of type.[]int in the executable. 0x7f84580f66a0 is the address of type.[]int in the library though, which is wrong. I don't understand right now how v got constructed, but I guess if this address was computed by adding some offsets this might have happened. (The bug is present in 1.7 but not 1.6 which lends support to this theory).

Have we gotten @ianlancetaylor to look at this bug yet?

@crawshaw
Copy link
Member

Ah of course, thanks, I was inverting the libstd.so and prg modules in my mind. That gives me a new place to hunt.

@crawshaw
Copy link
Member

The reflect.New function takes a type []int and returns a value wrapping a *[]int. It gets to the pointer using the ptrToThis field. This is resolved via the moduledata typemap, and it resolves to the wrong module. That is, it points to the *[]int symbol that comes from libstd.so, not the program binary.

This is because libstd.so is listed first the list of modules. I instrumented typelinksinit and firstmoduledata is libstd.so, with firstmoduledata.next being the program binary module.

That's wrong. We want the linked list of modules to have same load order the dynamic linker uses. But the runtime package is defined in libstd.so, and that's where firstmoduledata is defined and first used.

Problematically it's backwards in plugins. The runtime package is always in the initial binary, so firstmoduledata is the initial program.

(I feel like I've run into this before, and I thought module load order was solved. Clearly it's not.)

We actually have a slice of active modules now, independent from the linked list. So if we can come up with a way to order the modules correctly, we can reorder in the slice before typelinksinit.

One hack I'm considering: sort the modules from lowest to highest values of md.text. From what I've seen in my printing on linux/glibc and macOS that matches what the dynamic linker does. (But I don't know it for a fact.)

@mwhudson
Copy link
Contributor

mwhudson commented Jan 25, 2017 via email

@crawshaw
Copy link
Member

crawshaw commented Jan 25, 2017

A less evil hack would be to swap the position of firstmoduledata with whatever module contains the main_main symbol. That would be correct in all practical situations we will see with 1.8 (libstd.so, and plugins). It will eventually have to be revisited, if someone builds a complex enough loading scenario. WDYT?

@crawshaw
Copy link
Member

(At this point I'm looking for something small enough to go into 1.8. I agree going after initarray is more principled, it just sounds a bit hairy to me for a close-to-release change, because of that asm blob in the linker.)

@crawshaw
Copy link
Member

Of course, this bug was originally about a plugin failure. Runtime module ordering does not explain why the type would be wrong in a plugin. Either this diagnosis is wrong, or the failing test case I extracted is for a different bug.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/35644 mentions this issue.

@mwhudson
Copy link
Contributor

go test -linkshared seems to be pretty broken right now (sadface) but doing things by hand suggests that go test -linkshared crypto/tls is still broken with that CL (but encoding/asn1 is indeed fixed).

gopherbot pushed a commit that referenced this issue Jan 25, 2017
Modules appear in the moduledata linked list in the order they are
loaded by the dynamic loader, with one exception: the
firstmoduledata itself the module that contains the runtime.
This is not always the first module (when using -buildmode=shared,
it is typically libstd.so, the second module).

The order matters for typelinksinit, so we swap the first module
with whatever module contains the main function.

Updates #18729

This fixes the test case extracted with -linkshared, and now

	go test -linkshared encoding/...

passes. However the original issue about a plugin failure is not
yet fixed.

Change-Id: I9f399ecc3518e22e6b0a350358e90b0baa44ac96
Reviewed-on: https://go-review.googlesource.com/35644
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@crawshaw
Copy link
Member

Now there are a couple of bugs here, and I'm not sure how to separate them into issues. I'll put a bit of time tomorrow into poking at it all tomorrow.

@crawshaw
Copy link
Member

I hacked up the crypto/tls, encoding/asn1, and reflect unit tests to run as a plugin. The tls tests indeed fail just as this example fails. Unfortunately they are not easy things to debug.

The asn1 and reflect tests all pass except for this interesting one:

reflect_all.go:4800: typelinks not sorted: "struct {}" [1537] > "***int" [1538]

I'll work on fixing that, then try hacking up the other encoding tests to run under a plugin.

@crawshaw
Copy link
Member

Actually that typelinks error is just a bad unit test, it cannot handle multiple modules. Onward.

@crawshaw
Copy link
Member

crawshaw commented Jan 27, 2017

More interesting, from vendor/golang_org/x/crypto/curve25519:

curve25519_test.go:27: incorrect result: got 469423c8d7cc27bb36c7096de54d38d285740efa69594c1f98f49b113dcc7a19, want 89161fde887b2b53de549af483940106ecc114d6982daa98256de23bdf77661a

I'm guessing some of the amd64 asm in this package is trying to hold CX across a load, and when dynamic linking for -buildmode=plugin or -buildmode=shared, it doesn't work. If so, this could be the encoding/crypto hang @mwhudson mentioned above.

@ianlancetaylor
Copy link
Contributor

@crawshaw Are you still looking at this?

@crawshaw
Copy link
Member

crawshaw commented Jan 30, 2017

I believe the only bug remaining here is #18820, which has been split out.

There also needs to be better testing, which I split out as #18814.

As this issue ranged across several problems, and I believe everything left is split out into those two, I'm going to close this issue. @fsenart, please follow #18820 for a resolution to your original issue.

@golang golang locked and limited conversation to collaborators Jan 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants