Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: misleading stack trace related to struct embedding. #33724

Open
MrJoy opened this issue Aug 19, 2019 · 9 comments

Comments

@MrJoy
Copy link

commented Aug 19, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.8 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jonathon.frisby/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jonathon.frisby/go"
GOPROXY=""
GORACE=""
GOROOT="/Users/jonathon.frisby/.gimme/versions/go1.12.9.darwin.amd64"
GOTMPDIR=""
GOTOOLDIR="/Users/jonathon.frisby/.gimme/versions/go1.12.9.darwin.amd64/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/k5/w0_r1hrd6tqgb0k86pfgpq900000gp/T/go-build524396974=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This code: https://play.golang.org/p/vYYtv0x4IyJ

Produces this error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0xffffffff addr=0x0 pc=0xd97fb]

goroutine 1 [running]:
main.(*APIErrorResponseImpl).Error(...)
  /tmp/sandbox612567163/prog.go:21
main.main()
  /tmp/sandbox612567163/prog.go:31 +0x1b

However, if you uncomment the fmt.Printf statement: https://play.golang.org/p/A0mCRgbR8OJ

Then, you get this, which seems to be the "correct" stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0xffffffff addr=0x0 pc=0xd97fa]

goroutine 1 [running]:
main.main()
  /tmp/sandbox315389324/prog.go:31 +0x1a

What did you expect to see?

The second error.

What did you see instead?

The first error.

@agnivade agnivade changed the title Misleading stack trace related to struct embedding. runtime: misleading stack trace related to struct embedding. Aug 20, 2019

@agnivade

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

We have several other incorrect line number issues. This may likely be a duplicate of something else.

/cc @mdempsky @griesemer @randall77

@agnivade agnivade added this to the Go1.14 milestone Aug 20, 2019

@MrJoy

This comment has been minimized.

Copy link
Author

commented Aug 20, 2019

Looking at the asm produced for the "correct" version of the code, there's at least one surprising thing. Forgive me if I'm being obtuse about something obvious, but I don't understand what TESTB is doing in this code. I mean, it's setting a flag bit, but I don't see anything that reads the flags until after another instruction (CMQP) that sets flags:

"".main STEXT size=267 args=0x0 locals=0x68
	0x0000 00000 (/Users/jonathon.frisby/vm/test/main.go:29)	TEXT	"".main(SB), ABIInternal, $104-0
	0x0000 00000 (/Users/jonathon.frisby/vm/test/main.go:29)	MOVQ	(TLS), CX
	0x0009 00009 (/Users/jonathon.frisby/vm/test/main.go:29)	CMPQ	SP, 16(CX)
	0x000d 00013 (/Users/jonathon.frisby/vm/test/main.go:29)	JLS	257
	0x0013 00019 (/Users/jonathon.frisby/vm/test/main.go:29)	SUBQ	$104, SP
	0x0017 00023 (/Users/jonathon.frisby/vm/test/main.go:29)	MOVQ	BP, 96(SP)
	0x001c 00028 (/Users/jonathon.frisby/vm/test/main.go:29)	LEAQ	96(SP), BP
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:29)	FUNCDATA	$0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:29)	FUNCDATA	$1, gclocals·568470801006e5c0dc3947ea998fe279(SB)
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:29)	FUNCDATA	$3, gclocals·f08c0631293fa71ce239c95c8d919476(SB)
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:29)	FUNCDATA	$4, "".main.stkobj(SB)
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:31)	PCDATA	$2, $1
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:31)	PCDATA	$0, $0
	0x0021 00033 (/Users/jonathon.frisby/vm/test/main.go:31)	XORL	AX, AX
	0x0023 00035 (/Users/jonathon.frisby/vm/test/main.go:31)	PCDATA	$2, $0
	0x0023 00035 (/Users/jonathon.frisby/vm/test/main.go:31)	TESTB	AL, (AX)
	0x0025 00037 (/Users/jonathon.frisby/vm/test/main.go:31)	XCHGL	AX, AX
	0x0026 00038 (/Users/jonathon.frisby/vm/test/main.go:20)	PCDATA	$0, $1
	0x0026 00038 (/Users/jonathon.frisby/vm/test/main.go:20)	XORPS	X0, X0
	0x0029 00041 (/Users/jonathon.frisby/vm/test/main.go:20)	MOVUPS	X0, ""..autotmp_19+80(SP)
	0x002e 00046 (/Users/jonathon.frisby/vm/test/main.go:20)	PCDATA	$2, $2
	0x002e 00046 (/Users/jonathon.frisby/vm/test/main.go:20)	LEAQ	type.*"".APIErrorResponseImpl(SB), CX
	0x0035 00053 (/Users/jonathon.frisby/vm/test/main.go:20)	PCDATA	$2, $0
	0x0035 00053 (/Users/jonathon.frisby/vm/test/main.go:20)	MOVQ	CX, ""..autotmp_19+80(SP)
	0x003a 00058 (/Users/jonathon.frisby/vm/test/main.go:20)	MOVQ	$0, ""..autotmp_19+88(SP)
	0x0043 00067 (/Users/jonathon.frisby/vm/test/main.go:20)	XCHGL	AX, AX
	0x0044 00068 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $2
	0x0044 00068 ($GOROOT/src/fmt/print.go:208)	MOVQ	os.Stdout(SB), CX
	0x004b 00075 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $3
	0x004b 00075 ($GOROOT/src/fmt/print.go:208)	LEAQ	go.itab.*os.File,io.Writer(SB), DX
	0x0052 00082 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $2
	0x0052 00082 ($GOROOT/src/fmt/print.go:208)	MOVQ	DX, (SP)
	0x0056 00086 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $0
	0x0056 00086 ($GOROOT/src/fmt/print.go:208)	MOVQ	CX, 8(SP)
	0x005b 00091 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $2
	0x005b 00091 ($GOROOT/src/fmt/print.go:208)	LEAQ	go.string."%+v\n"(SB), CX
	0x0062 00098 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $0
	0x0062 00098 ($GOROOT/src/fmt/print.go:208)	MOVQ	CX, 16(SP)
	0x0067 00103 ($GOROOT/src/fmt/print.go:208)	MOVQ	$4, 24(SP)
	0x0070 00112 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $2
	0x0070 00112 ($GOROOT/src/fmt/print.go:208)	PCDATA	$0, $0
	0x0070 00112 ($GOROOT/src/fmt/print.go:208)	LEAQ	""..autotmp_19+80(SP), CX
	0x0075 00117 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $0
	0x0075 00117 ($GOROOT/src/fmt/print.go:208)	MOVQ	CX, 32(SP)
	0x007a 00122 ($GOROOT/src/fmt/print.go:208)	MOVQ	$1, 40(SP)
	0x0083 00131 ($GOROOT/src/fmt/print.go:208)	MOVQ	$1, 48(SP)
	0x008c 00140 ($GOROOT/src/fmt/print.go:208)	CALL	fmt.Fprintf(SB)
	0x0091 00145 (/Users/jonathon.frisby/vm/test/main.go:21)	PCDATA	$2, $1
	0x0091 00145 (/Users/jonathon.frisby/vm/test/main.go:21)	XORL	AX, AX
	0x0093 00147 (/Users/jonathon.frisby/vm/test/main.go:21)	PCDATA	$2, $4
	0x0093 00147 (/Users/jonathon.frisby/vm/test/main.go:21)	MOVQ	(AX), CX
	0x0096 00150 (/Users/jonathon.frisby/vm/test/main.go:21)	PCDATA	$2, $2
	0x0096 00150 (/Users/jonathon.frisby/vm/test/main.go:21)	CMPQ	8(AX), $9
	0x009b 00155 (/Users/jonathon.frisby/vm/test/main.go:31)	JNE	178
	0x009d 00157 (/Users/jonathon.frisby/vm/test/main.go:31)	MOVQ	$8461244823619071827, AX
	0x00a7 00167 (/Users/jonathon.frisby/vm/test/main.go:31)	CMPQ	(CX), AX
	0x00aa 00170 (/Users/jonathon.frisby/vm/test/main.go:31)	JNE	178
	0x00ac 00172 (/Users/jonathon.frisby/vm/test/main.go:31)	PCDATA	$2, $0
	0x00ac 00172 (/Users/jonathon.frisby/vm/test/main.go:31)	CMPB	8(CX), $101
	0x00b0 00176 (/Users/jonathon.frisby/vm/test/main.go:31)	JEQ	188
	0x00b2 00178 (<unknown line number>)	PCDATA	$2, $-2
	0x00b2 00178 (<unknown line number>)	PCDATA	$0, $-2
	0x00b2 00178 (<unknown line number>)	MOVQ	96(SP), BP
	0x00b7 00183 (<unknown line number>)	ADDQ	$104, SP
	0x00bb 00187 (<unknown line number>)	RET
	0x00bc 00188 (/Users/jonathon.frisby/vm/test/main.go:32)	PCDATA	$2, $0
	0x00bc 00188 (/Users/jonathon.frisby/vm/test/main.go:32)	PCDATA	$0, $0
	0x00bc 00188 (/Users/jonathon.frisby/vm/test/main.go:32)	XCHGL	AX, AX
	0x00bd 00189 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $1
	0x00bd 00189 ($GOROOT/src/fmt/print.go:208)	MOVQ	os.Stdout(SB), AX
	0x00c4 00196 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $4
	0x00c4 00196 ($GOROOT/src/fmt/print.go:208)	LEAQ	go.itab.*os.File,io.Writer(SB), CX
	0x00cb 00203 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $1
	0x00cb 00203 ($GOROOT/src/fmt/print.go:208)	MOVQ	CX, (SP)
	0x00cf 00207 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $0
	0x00cf 00207 ($GOROOT/src/fmt/print.go:208)	MOVQ	AX, 8(SP)
	0x00d4 00212 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $1
	0x00d4 00212 ($GOROOT/src/fmt/print.go:208)	LEAQ	go.string."Shouldn't get here of course"(SB), AX
	0x00db 00219 ($GOROOT/src/fmt/print.go:208)	PCDATA	$2, $0
	0x00db 00219 ($GOROOT/src/fmt/print.go:208)	MOVQ	AX, 16(SP)
	0x00e0 00224 ($GOROOT/src/fmt/print.go:208)	MOVQ	$28, 24(SP)
	0x00e9 00233 ($GOROOT/src/fmt/print.go:208)	MOVQ	$0, 32(SP)
	0x00f2 00242 ($GOROOT/src/fmt/print.go:208)	XORPS	X0, X0
	0x00f5 00245 ($GOROOT/src/fmt/print.go:208)	MOVUPS	X0, 40(SP)
	0x00fa 00250 ($GOROOT/src/fmt/print.go:208)	CALL	fmt.Fprintf(SB)
	0x00ff 00255 (<unknown line number>)	JMP	178
	0x0101 00257 (<unknown line number>)	NOP
	0x0101 00257 (/Users/jonathon.frisby/vm/test/main.go:29)	PCDATA	$0, $-1
	0x0101 00257 (/Users/jonathon.frisby/vm/test/main.go:29)	PCDATA	$2, $-1
	0x0101 00257 (/Users/jonathon.frisby/vm/test/main.go:29)	CALL	runtime.morestack_noctxt(SB)
	0x0106 00262 (/Users/jonathon.frisby/vm/test/main.go:29)	JMP	0
	0x0000 65 48 8b 0c 25 00 00 00 00 48 3b 61 10 0f 86 ee  eH..%....H;a....
	0x0010 00 00 00 48 83 ec 68 48 89 6c 24 60 48 8d 6c 24  ...H..hH.l$`H.l$
	0x0020 60 31 c0 84 00 90 0f 57 c0 0f 11 44 24 50 48 8d  `1.....W...D$PH.
	0x0030 0d 00 00 00 00 48 89 4c 24 50 48 c7 44 24 58 00  .....H.L$PH.D$X.
	0x0040 00 00 00 90 48 8b 0d 00 00 00 00 48 8d 15 00 00  ....H......H....
	0x0050 00 00 48 89 14 24 48 89 4c 24 08 48 8d 0d 00 00  ..H..$H.L$.H....
	0x0060 00 00 48 89 4c 24 10 48 c7 44 24 18 04 00 00 00  ..H.L$.H.D$.....
	0x0070 48 8d 4c 24 50 48 89 4c 24 20 48 c7 44 24 28 01  H.L$PH.L$ H.D$(.
	0x0080 00 00 00 48 c7 44 24 30 01 00 00 00 e8 00 00 00  ...H.D$0........
	0x0090 00 31 c0 48 8b 08 48 83 78 08 09 75 15 48 b8 53  .1.H..H.x..u.H.S
	0x00a0 6f 6d 65 56 61 6c 75 48 39 01 75 06 80 79 08 65  omeValuH9.u..y.e
	0x00b0 74 0a 48 8b 6c 24 60 48 83 c4 68 c3 90 48 8b 05  t.H.l$`H..h..H..
	0x00c0 00 00 00 00 48 8d 0d 00 00 00 00 48 89 0c 24 48  ....H......H..$H
	0x00d0 89 44 24 08 48 8d 05 00 00 00 00 48 89 44 24 10  .D$.H......H.D$.
	0x00e0 48 c7 44 24 18 1c 00 00 00 48 c7 44 24 20 00 00  H.D$.....H.D$ ..
	0x00f0 00 00 0f 57 c0 0f 11 44 24 28 e8 00 00 00 00 eb  ...W...D$(......
	0x0100 b1 e8 00 00 00 00 e9 f5 fe ff ff                 ...........
	rel 5+4 t=16 TLS+0
	rel 49+4 t=15 type.*"".APIErrorResponseImpl+0
	rel 71+4 t=15 os.Stdout+0
	rel 78+4 t=15 go.itab.*os.File,io.Writer+0
	rel 94+4 t=15 go.string."%+v\n"+0
	rel 141+4 t=8 fmt.Fprintf+0
	rel 192+4 t=15 os.Stdout+0
	rel 199+4 t=15 go.itab.*os.File,io.Writer+0
	rel 215+4 t=15 go.string."Shouldn't get here of course"+0
	rel 251+4 t=8 fmt.Fprintf+0
	rel 258+4 t=8 runtime.morestack_noctxt+0
@MrJoy

This comment has been minimized.

Copy link
Author

commented Aug 20, 2019

So this looks like it's setting AX to zero (XORL AX, AX), then attempting to dereference that -- which makes sense, since I'm dereferencing a nil pointer in my example -- as part of a test whose results... never get used?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2019

The TESTB is a nil pointer test, so that the nil pointer panic will occur at the right point during execution.

@mdempsky mdempsky changed the title runtime: misleading stack trace related to struct embedding. cmd/compile: misleading stack trace related to struct embedding. Aug 20, 2019

@mdempsky

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

I think this is an issue with one of the compiler's optimization passes. It doesn't seem like a runtime issue to me.

Also, //go:noinline also works to get the correct line number. I'm guessing when APIErrorResponseImpl.Error is inlined, the position information for its nil pointer check is getting preserved instead of the one that we actually want.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

Slightly simpler repro:

package main

type Inner struct {
	Err string
}

//go:noinline
func (i *Inner) Error() string {
	if i == nil {
		return ""
	}
	return i.Err
}

type Outer struct {
	Inner
}

func main() {
	var o *Outer
	println(o.Error())
}

Remove //go:noinline to see the panic incorrectly attributed to i.Err, whereas the panic is actually happening in the generated wrapper method.

Lastly, it's nothing related to the wrapper method being compiler generated and its stack frame elided by the runtime. If you explicitly add:

func (o *Outer) Error() string { return o.Inner.Error() }

then you see the same behavior, except with the extra stack frame.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

The problem is both a bit more benign and more interesting than that actually.

Looking at GOSSAFUNC=main output, the correct NilCheck is preserved in nilcheckelim.

But eventually we get to late nilcheck, and we have:

v5 (+20) = LoweredNilCheck <void> v4 v1
v8 (20) = InlMark <void> [0] v1
v27 (11) = MOVQload <int> [8] v4 v1 (~r0.len[int])

and LoweredNilCheck gets eliminated. This causes the panic to happen at the MOVQload at line 11 (return i.Err), instead of the LoweredNilCheck at line 20 (println(o.Error())).

One option might be that if late nilcheck eliminates a nil check X because of subsequent load L, it should change L's position information to X's. This might be a little confusing to people who are single stepping through the assembly, but it would give better (correct) stack traces for normal production builds.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

It looks like LoweredNilCheck is an IsStmt position (based on (+20) above), and that nilcheckelim2 has some logic to try to preserve those. It seems like that code's not working here though.

/cc @dr2chase for IsStmt stuff

@randall77

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

I think @mdempsky 's idea of updating the load's line number works.

We're effectively combining the nil check and the load into a single instruction. We can only pick one of the line numbers, so no matter what we're losing information. For panic backtraces, it always makes sense to use the line number of the nilcheck, because that's the semantic location of the panic.

For things like pprof backtraces, it isn't so clear. Maybe we want the line numbers to indicate the load location, not the nilcheck location. But that's a pretty vague objection, panic backtraces are more important IMO.

There are some trickier cases, like two nilchecks with different line numbers that are both made redundant by a subsequent load. We'd need to pick the first nilcheck in the store chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.