Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: internal compiler error: panic: interface conversion: types.Object is nil, not *ir.Name #67746

Closed
Cylkal opened this issue May 31, 2024 · 13 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@Cylkal
Copy link

Cylkal commented May 31, 2024

Go version

go version go1.22.3 linux/amd64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOROOT='/root/go'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/root/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/root/code/test/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1081386337=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I have a project that uses ClickHouse, and it compiles successfully. However, when the pprof samples collected online service are placed in the main package of default.pgo, the compilation fails.

I apologize for not being able to provide the source code directly, but here are the pprof sample file default.pgo and the code main.go that can reproduce the compilation error.

ls

default.pgo  go.mod  go.sum  main.go
// main.go
// don't need to run
package main

import (
	"context"
	_ "net/http/pprof"

	"github.com/ClickHouse/clickhouse-go/v2"
)

func main() {
	cfg, err := clickhouse.ParseDSN("clickhouse://user:password@127.0.0.1:19000/default")
	if err != nil {
		panic(err)
	}
	conn, err := clickhouse.Open(cfg)
	if err != nil {
		panic(err)
	}
	defer conn.Close()
	b, err := conn.PrepareBatch(
		context.Background(),
		"INSERT INTO some_table (x) VALUES (?)",
	)
	if err != nil {
		panic(err)
	}
	b.Append(1)
	b.Send()
}
// go.mod
module test

go 1.22.3

require (
	github.com/ClickHouse/ch-go v0.61.5 // indirect
	github.com/ClickHouse/clickhouse-go/v2 v2.25.0
)

require (
	github.com/andybalholm/brotli v1.1.0 // indirect
	github.com/go-faster/city v1.0.1 // indirect
	github.com/go-faster/errors v0.7.1 // indirect
	github.com/google/uuid v1.6.0 // indirect
	github.com/klauspost/compress v1.17.7 // indirect
	github.com/kr/text v0.2.0 // indirect
	github.com/paulmach/orb v0.11.1 // indirect
	github.com/pierrec/lz4/v4 v4.1.21 // indirect
	github.com/pkg/errors v0.9.1 // indirect
	github.com/segmentio/asm v1.2.0 // indirect
	github.com/shopspring/decimal v1.4.0 // indirect
	go.opentelemetry.io/otel v1.26.0 // indirect
	go.opentelemetry.io/otel/trace v1.26.0 // indirect
	golang.org/x/sys v0.18.0 // indirect
	gopkg.in/yaml.v3 v3.0.1 // indirect
)

What did you see happen?

go build

# github.com/ClickHouse/clickhouse-go/v2/ext
/root/go/pkg/mod/github.com/!click!house/ch-go@v0.61.5/proto/col_nullable.go:20:6: internal compiler error: panic: interface conversion: types.Object is nil, not *ir.Name

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new

When the PGO file is removed, i.e., when PGO is disabled, it compiles successfully.

The PGO file comes from my online service, please rename the file profile.pb.gz to default.pgo.
profile.pb.gz

What did you expect to see?

Build pass when enable PGO

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 31, 2024
@cuonglm
Copy link
Member

cuonglm commented May 31, 2024

Interestingly, this is fixed at tip after https://go-review.googlesource.com/c/go/+/578815

How was your default.pgo generated?

@Cylkal
Copy link
Author

Cylkal commented Jun 3, 2024

@cuonglm It was collected from my online service by using curl -o pprof.profile http://localhost:7999/debug/pprof/profile, and the file has been processed. For example, using '--focus clickhouse' on the original sample file outputs only the samples related to ClickHouse. Of course, the original sample file can also trigger this issue.

I tried writing a separate demo program to perform batch insert operations into ClickHouse and found that this issue does not occur. This problem only appears when collecting samples from the online service.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 3, 2024

CC @golang/compiler

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 3, 2024
@mknyszek mknyszek added this to the Backlog milestone Jun 3, 2024
@thanm
Copy link
Contributor

thanm commented Jun 3, 2024

@cuonglm I think you mean fixed by https://go.dev/cl/581436, no?

@cuonglm
Copy link
Member

cuonglm commented Jun 3, 2024

@cuonglm I think you mean fixed by https://go.dev/cl/581436, no?

No, because go1.22.3 includes this CL but still trigger ICE.

Unless my operation was wrong, my git bisect points to https://go-review.googlesource.com/c/go/+/578815 which does fix this issue.

@thanm
Copy link
Contributor

thanm commented Jun 3, 2024

@cuonglm thanks for that clarification, I hadn't realized that we backported 581436. Interesting.

@thanm
Copy link
Contributor

thanm commented Jun 3, 2024

Stack trace from go build -gcflags=all=-d=panic=1:

goroutine 1 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x5e
cmd/compile/internal/base.FatalfAt({0x54f3e8?, 0xc0?}, {0xdf1b88, 0x9}, {0xc00054f418, 0x1, 0x1})
	cmd/compile/internal/base/print.go:225 +0x1d7
cmd/compile/internal/base.Fatalf(...)
	cmd/compile/internal/base/print.go:194
cmd/compile/internal/gc.handlePanic()
	cmd/compile/internal/gc/main.go:52 +0x90
panic({0xd722c0?, 0xc000c10f30?})
	runtime/panic.go:770 +0x132
cmd/compile/internal/escape.(*batch).initFunc(...)
	cmd/compile/internal/escape/escape.go:199
cmd/compile/internal/escape.Batch({0xc000c0e638, 0x1, 0xc000a339e0?}, 0x0?)
	cmd/compile/internal/escape/escape.go:139 +0x19de
cmd/compile/internal/ir.(*bottomUpVisitor).visit(0xc000c10480, 0xc000a339e0)
	cmd/compile/internal/ir/scc.go:121 +0x2e2
cmd/compile/internal/ir.(*bottomUpVisitor).visit.func1({0xf64760?, 0xc000a339e0?})
	cmd/compile/internal/ir/scc.go:78 +0x3b
cmd/compile/internal/ir.(*bottomUpVisitor).visit.func2({0xf64460, 0xc000be2ba0})
	cmd/compile/internal/ir/scc.go:92 +0xf7
cmd/compile/internal/ir.Visit.func1({0xf64460, 0xc000be2ba0})
	cmd/compile/internal/ir/visit.go:105 +0x2d
cmd/compile/internal/ir.(*CallExpr).doChildren(0xc000bdcfa0, 0xc000c20270)
	cmd/compile/internal/ir/node_gen.go:298 +0x9c
cmd/compile/internal/ir.DoChildren(...)
	cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf646a0, 0xc000bdcfa0})
	cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.doNodes(...)
	cmd/compile/internal/ir/node_gen.go:1797
cmd/compile/internal/ir.(*ReturnStmt).doChildren(0xc000bc7cc0, 0xc000c20270)
	cmd/compile/internal/ir/node_gen.go:1231 +0xbc
cmd/compile/internal/ir.DoChildren(...)
	cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf649a0, 0xc000bc7cc0})
	cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.doNodes(...)
	cmd/compile/internal/ir/node_gen.go:1797
cmd/compile/internal/ir.(*Func).doChildren(0xf64760?, 0xc000c20270)
	cmd/compile/internal/ir/func.go:177 +0x4e
cmd/compile/internal/ir.DoChildren(...)
	cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf64760, 0xc000be1440})
	cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.Visit({0xf64760, 0xc000be1440}, 0xc000c0f080)
	cmd/compile/internal/ir/visit.go:108 +0xb3
cmd/compile/internal/ir.(*bottomUpVisitor).visit(0xc000c10480, 0xc000be1440)
	cmd/compile/internal/ir/scc.go:84 +0x1bf
cmd/compile/internal/ir.VisitFuncsBottomUp({0xc000a76488, 0x43, 0x2?}, 0xe2af68)
	cmd/compile/internal/ir/scc.go:58 +0xb8
cmd/compile/internal/escape.Funcs(...)
	cmd/compile/internal/escape/escape.go:122
cmd/compile/internal/gc.Main(0xe2adb0)
	cmd/compile/internal/gc/main.go:256 +0x1076
main.main()
	cmd/compile/main.go:57 +0xf9

so this appears to be an escape analysis problem, at least initially.

@thanm
Copy link
Contributor

thanm commented Jun 4, 2024

OK, I spent a little while debugging this.

Escape analysis is bombing in this part of initFunc:

// Initialize resultIndex for result parameters.
for i, f := range fn.Type().Results() {
	e.oldLoc(f.Nname.(*ir.Name)).resultIndex = 1 + i
}

because f.Nname is nil as opposed to pointing to the name for the function, which in this case is proto.Nullable[uint64].IsSet.

What's happening here is that at some point during the compilation of the offending package github.com/ClickHouse/clickhouse-go/v2/ext we encounter a reference to this specific generic function, and the code that reads the export data marks it for inclusion/compilation within the current package. The key bit of code here is this:

https://go.googlesource.com/go/+/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/compile/internal/noder/reader.go#1123

What's supposed to happen at that point is that we'll read in the body for the function, and in fact there is a call to r.addBody later in funcExt, e.g. here.

The problem: at the point of that call, r.curfn is nil, and as a result instead of immediately reading in the body, it gets tacked onto the todoBodies list here.

Ordinarily this would be fine, but: at the point that this happens we're not doing a "regular" read of the export data, we're looking things up as part of PGO devirtualization. Here's the stack trace:

      cmd/compile/internal/noder.(*reader).funcExt /w/go1.22/src/cmd/compile/internal/noder/reader.go:1110
      cmd/compile/internal/noder.(*reader).method /w/go1.22/src/cmd/compile/internal/noder/reader.go:1039
      cmd/compile/internal/noder.(*pkgReader).objIdxMayFail /w/go1.22/src/cmd/compile/internal/noder/reader.go:818
      cmd/compile/internal/noder.(*pkgReader).objIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:672
      cmd/compile/internal/noder.(*pkgReader).objInstIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:664
      cmd/compile/internal/noder.(*reader).obj /w/go1.22/src/cmd/compile/internal/noder/reader.go:636
      ...
      <a couple hundred similar frames elided>
      ...
      cmd/compile/internal/noder.(*reader).doTyp /w/go1.22/src/cmd/compile/internal/noder/reader.go:524
      cmd/compile/internal/noder.(*pkgReader).typIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:431
      cmd/compile/internal/noder.(*reader).typWrapped /w/go1.22/src/cmd/compile/internal/noder/reader.go:390
      cmd/compile/internal/noder.(*pkgReader).objIdxMayFail /w/go1.22/src/cmd/compile/internal/noder/reader.go:807
      cmd/compile/internal/noder.lookupMethod /w/go1.22/src/cmd/compile/internal/noder/unified.go:113
      cmd/compile/internal/noder.LookupFunc /w/go1.22/src/cmd/compile/internal/noder/unified.go:65
      cmd/compile/internal/pgo.addIndirectEdges /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:443
      cmd/compile/internal/pgo.createIRGraph /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:291
      cmd/compile/internal/pgo.New /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:192
      cmd/compile/internal/gc.Main /w/go1.22/src/cmd/compile/internal/gc/main.go:221

So essentially we wind up with some dangling references on todoBodies, and we never actually read the body of the function in question.

Not sure yet what the best overall approach is here. I am also unsure as to why this problem would suddenly go away on tip, since as far as can see there haven't been specific changes there that would ameliorate the problem.

More work needed. CC @prattmic for additional eyeballs relating to PGO stuff.

@prattmic
Copy link
Member

prattmic commented Jun 4, 2024

@thanm Thanks for taking a look. noder.LookupFunc does not currently handle instantiating generic functions, so that part is unsurprising. However, the expectation is that LookupFunc would simply never find the symbol name that appears in the profile at all because that includes the .go.shape.blah component that doesn't exist in the high-level symbol name.

Since this is a method, we first look up the type, then search for the method. Perhaps one of the other methods on proto.Nullable[uint64] has already been instantiated, so we successfully find the type after all, but IsSet itself isn't instantiated? Just a thought, I'm not too familiar with instantiations.

@thanm
Copy link
Contributor

thanm commented Jun 5, 2024

However, the expectation is that LookupFunc would simply never find
the symbol name that appears in the profile at all because that includes
the .go.shape.blah component that doesn't exist in the high-level symbol name

Yes, this is certainly true. However what's happening here is that the pgoir stuff is looking for (and finding) the method

github.com/ClickHouse/clickhouse-go/v2/lib/column.(*Map).AppendRow

Nothing wrong there, but the body of AppendRow refers to a struct type that has a data member of generic type (proto.Nullable[uint64]) which has a method proto.Nullable[uint64].IsSet) and it is that generic method that is causing the problem.

The detailed sequence of events is as follows:

  1. cmd/compile/internal/pgo.addIndirectEdges runs and pulls in the IR for column.(*Map).AppendRow since it is a hot indirect target.

  2. the importing process leaves a lot of dangling functions on the todoBodies list (as described above), including 'proto.Nullable[uint64]'. In addition, the imported adds proto.Nullable[uint64] to the needWrapperTypes list.

  3. The inliner runs, but it turns out that the top level package being compiled, github.com/ClickHouse/clickhouse-go/v2/ext is unusual and it has zero inlining opportunities. If the inliner had run, it would have made a call to readBodies, but this never happens.

  4. Wrapper generation runs (after inlining, before escape analysis). Since proto.Nullable[uint64] is on the needWrapperTypes list, we manufacture pointer wrappers for all its methods, including proto.(*Nullable[uint64]).IsSet. This new wrapper has a direct call to proto.Nullable[uint64].IsSet (which still doesn't have a func body).

  5. Now escape analysis runs. It processes the two IsSet methods, at which point it crashes.

So we have to have a lot of things lining up in exactly the right way for this to happen.

I am pretty sure that the right fix here is to add a cleanup call to readBodies from pgo.createIRGraph, but I am having a heck of a time trying to create a minimized test case that will reproduce the problem.

The thing I keep stumbling over in my smaller test case is that the generic type in question gets added to needWrapperTypes (which is what I want) but also at some later point gets added to haveWrapperTypes, which means we don't get the necessary wrapper. Very frustrating. I'll keep hacking away on the minimization.

@cuonglm
Copy link
Member

cuonglm commented Jun 6, 2024

@thanm I double check and the CL that changes to reflect package actually "fix" this issue.

It's very weird.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/591076 mentions this issue: cmd/go: new test case for pgo-devirt compile problem

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/591075 mentions this issue: cmd/compile/internal: add a PGO devirt post-lookup cleanup hook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

6 participants