Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: nil-dereference panic refers to addr=0x8 #56568

Open
dennwc opened this issue Nov 4, 2022 · 6 comments
Open

runtime: nil-dereference panic refers to addr=0x8 #56568

dennwc opened this issue Nov 4, 2022 · 6 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dennwc
Copy link
Contributor

dennwc commented Nov 4, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.7 linux/amd64

Does this issue reproduce with the latest release?

Yes, confirmed via playground.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/9981"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/9981/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.7"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build662566094=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the following program:

package main

type A struct {
	F int
}

func (a *A) Foo() {
	if a != nil {
		a.F++
	}
}

type B struct {
	A
}

func main() {
	var a *A
	a.Foo()
	var b *B
	b.Foo()
}

Link to a playground.

What did you expect to see?

No panic for both method calls.

What did you see instead?

Panic on the b.Foo() method call:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x454ea2]

goroutine 1 [running]:
main.main()
	/tmp/sandbox1079324543/prog.go:21 +0x2

Discussion

Technically, I would expect b.Foo() to be equivalent to (*A).Foo(nil) in this case, at least when the struct offset of embedded A field is zero.

It is clear that the following would case panics in any case (struct offset is not zero, equivalent to (*A).Foo(0x8)):

type B struct {
	F2 int
	A
}

As well as this code (always requires pointer de-reference):

type B struct {
	*A
}

If I understand correctly, as of now the behavior of b.Foo() is similar to (*A).Foo(&b.A), where first b.A causes an operation analogous to pointer de-reference. I think this is a surprising behavior that makes it hard to embed structs (promoting methods) and make the new type nil-safe.

The only non-error-prone way to fix it currently, is to make a named type instead of embedding and manually expose all the methods:

type B A

func (b *B) Foo() {
	(*A)(b).Foo()
}

As I mentioned previously, I see no technical reason why the compiler cannot call (*A).Foo(nil) instead of requiring an additional boilerplate. Otherwise it defeats the purpose of struct composition, when nil-safety is required for promoted methods.

I believe this change doesn't break the backward compatibility promise, only makes more programs safe(er).

Open questions are:

  • Can this be implemented without slowing down existing programs?
  • Should the case of (*A).Foo(0x8) be translated to (*A).Foo(nil) as well, so that the struct offset doesn't matter? This complicates the implementation, so likely is unnecessary.

I admit this is not a bug per-se and works "as intended", however it's surprising enough to be worth fixing. Also, when using unsafe/cgo, I was able to somehow confuse the compiler to emit case similar to (*A).Foo(0x8) (thus skipping the initial nil check). I will try to provide a reproducer for it as well.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 4, 2022
@cherrymui
Copy link
Member

cherrymui commented Nov 4, 2022

I would expect b.Foo() to be equivalent to (*A).Foo(nil)

I'd expect b.Foo() be equivalent to b.A.Foo() (the embedding makes the .A implicit). And per spec, for a nil b, b.A panics.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 4, 2022

As you say, this is how the language works. It would be quite subtle for the language to behave differently depending on whether the embedded field is the first field in a struct. In any case, the way to propose a language change is the proposal process described at https://go.dev/s/proposal. However, iIn this case I think it is unlikely that the proposal would be accepted.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2022
@dennwc
Copy link
Contributor Author

dennwc commented Nov 5, 2022

And per spec, for a nil b, b.A panics.

@cherrymui Thank you, indeed I missed that it's actually guaranteed by the spec.

Having said that, I didn't find references in the spec about how exactly the promoted methods must be implemented. The spec currently specifies:

If S contains an embedded field T, the method sets of S and *S both include promoted methods with receiver T. The method set of *S also includes promoted methods with receiver *T.

There's no explicit mention that b.Foo() must be implemented as b.A.Foo(), causing evaluation of b.A instead of directly calling (*A).Foo(<addr of A>).

The reason it caused a confusion for me is closer to this example:

package main

import "unsafe"

type A struct {
	F    int
	next *A
}

func (a *A) NextA() *A {
	return (*A)(unsafe.Pointer(a.next))
}

func (a *A) Foo() {
	if a != nil {
		a.F++
	}
}

type B struct {
	A
}

func (b *B) NextB() *B {
	return (*B)(unsafe.Pointer(b.NextA()))
}

func main() {
	var a *A
	a.Foo()
	var b *B
	c := b.NextB()
	c.Foo()
}

This might be a different bug, or consequences of compiler optimization, but this is what can be observed here:

  • Assuming the spec implies that b.NextA() is the same as b.A.NextA(), the panic must be triggered at line 25 where b.A is first evaluated. But it doesn't. The actual panic is at line 11. Which leads us to...
  • The fact that the panic at line 11 says addr=0x8, meaning the compiler successfully evaluated &b.A without de-referencing b first. This is what my original message implies: there's no need to evaluate b.A when calling a promoted method.

To be clear, it still panics, but for a different reason: de-reference of (*A)(0x8) instead of (*B)(0x0).

Thus, going back to this example, either the spec should allow evaluation of &b.A during the promoted method call without de-referencing b (in which case my original proposal/bug report is valid), or it should explicitly mention that promoted method call must evaluate b.A (which is not how the compiler works currently, according to the example).

So @ianlancetaylor , I'm still not exactly sure how to proceed with this report. Should I then open a separate bug report for the second example? Should I open a proposal to clarify the spec in a way that allows promoting nil-protected methods with this example (e.g. propose an offset-independent implementation)? Both?

In fact, in the real program was able to pass the 0x8 pointer around, which caused a similar panic in unrelated part of the program. I was unable to reproduce it in the playground yet, but maybe I will be able eventually. This is why I think it may still be a bug of some kind.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 5, 2022

That second case does seem like a potential quality of implementation issue. gccgo reports

panic: runtime error: invalid memory address or nil pointer dereference

goroutine 1 [running]:
main.B.NextB
	foo.go:25
main.main
	foo.go:32

That is, I think we agree that the program should crash, but the crash report from the gc compiler could be clearer as to where the bad nil dereference occurs.

Go 1.4 reports

[signal 0xb code=0x1 addr=0x0 pc=0x400c45]

goroutine 1 [running]:
main.(*B).NextB(0x0, 0x0)
	/home/iant/foo11.go:25 +0x15
main.main()
	/home/iant/foo11.go:32 +0x37

but after that it gets more confusing due to the fact that older gc releases were not as reliable at reporting failures in inlined functions. At least I think that's the reason, I didn't look too deeply. The reference to addr=0x8 starts in Go 1.10.

Reopening in case this is something we want to fix in the gc compiler.

@ianlancetaylor ianlancetaylor reopened this Nov 5, 2022
@ianlancetaylor ianlancetaylor changed the title runtime: Panic calling promoted nil-protected methods on nil embedding struct pointers runtime: nil-dereference panic refers to addr=0x8 Nov 5, 2022
@mknyszek mknyszek added this to the Backlog milestone Nov 7, 2022
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 7, 2022
@cherrymui
Copy link
Member

cherrymui commented Nov 8, 2022

In the example above, both (*B).NextB and (*A).NextA are inlined into main. In the AST there is a nil check at line 25 (from NextB) before the load at line 11. But at a later stage the compiler somehow schedules the load before the nil check. I think we have logic to schedule nil checks before loads. This may be a compiler bug. I'll take a look. Thanks.

@randall77
Copy link
Contributor

randall77 commented Nov 8, 2022

Possibly related to #42673

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Todo
Development

No branches or pull requests

6 participants