Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: panic: runtime error: hash of unhashable type [2]string #67608

Open
titpetric opened this issue May 23, 2024 · 14 comments
Open

runtime: panic: runtime error: hash of unhashable type [2]string #67608

titpetric opened this issue May 23, 2024 · 14 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@titpetric
Copy link

titpetric commented May 23, 2024

Go version

go1.22.3 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/root/tyk/tyk/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build999686032=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I'm running some integration tests trying to upgrade to 1.22.3 and am encountering a panic which seems impossible.

What did you see happen?

I get the following panic during the execution of the integration test.

tyk-1             | panic: runtime error: hash of unhashable type [2]string
tyk-1             | 
tyk-1             | goroutine 54 [running]:
tyk-1             | go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal/tracetransform.Spans({0xc00020af08, 0x4, 0xc000e6a080?})
tyk-1             | 	go.opentelemetry.io/otel/exporters/otlp/otlptrace@v1.26.0/internal/tracetransform/span.go:41 +0x2d9
tyk-1             | go.opentelemetry.io/otel/exporters/otlp/otlptrace.(*Exporter).ExportSpans(0xc000304370, {0x404dc18, 0xc0002dc0e0}, {0xc00020af08?, 0xc00008eef2?, 0xc0002936c0?})
tyk-1             | 	go.opentelemetry.io/otel/exporters/otlp/otlptrace@v1.26.0/exporter.go:31 +0x34
tyk-1             | go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).exportSpans(0xc00031c140, {0x404dba8, 0xc00017c6e0})
tyk-1             | 	go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:277 +0x238
tyk-1             | go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue(0xc00031c140)
tyk-1             | 	go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:305 +0x36e
tyk-1             | go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor.func1()
tyk-1             | 	go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:117 +0x54
tyk-1             | created by go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor in goroutine 1
tyk-1             | 	go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:115 +0x2e5
tyk-1 exited with code 2

So far it's particular to whenever the binary is built with goreleaser. I've verified go version -m pkg matches between the breaking and passing build. Both builds are built from the same source tree, but build from source doesn't trigger the panic. I've tried various debugging things like:

  • enable or disable buildvcs=false, git --safe-directory
  • added -race to build (no races reported)
  • ensured go version -m is reported without differences
  • cflags -N -l to disable optimizations, inlining - panic remains

The exact same goreleaser binary passes is used with recent 1.21 versions (1.21.8-1.21.10), and the resulting build doesn't trigger a panic. The panic is reliably triggered with 1.22.3 and doesn't seem racy, however I haven't been able to reproduce it with a direct build from source, using this Dockerfile or this one using the 1.22-bookworm base. It issues make build, which is essentially just a wrapper for go build -tags=goplugin -trimpath ..

One thing that made a difference was running the binary with delve debugger. In that case, the panic doesn't occur. Additionally, the panic itself is strange, because [2]string seems to be a valid map key, via playground: https://go.dev/play/p/tm_uKffqff0 ; the source code feels impossible to trigger the exact panic:

  • span.go source on L41
  • the map key is of a type key struct{ two fields } in local function scope, and the map value is a pointer
  • no idea where [2]string may be coming from

The build environment has been breaking with golang:1.22-bullseye and golang:1.22-bookworm (1.22.3).

What did you expect to see?

no panic

@titpetric titpetric changed the title runtime/build: impossible panic runtime/build: panic: runtime error: hash of unhashable type [2]string May 23, 2024
@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

This does seem impossible. Have you tried running the program under the race detector?

@ianlancetaylor ianlancetaylor changed the title runtime/build: panic: runtime error: hash of unhashable type [2]string runtime: panic: runtime error: hash of unhashable type [2]string May 23, 2024
@ianlancetaylor ianlancetaylor added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 23, 2024
@titpetric
Copy link
Author

CC @golang/runtime

This does seem impossible. Have you tried running the program under the race detector?

Added -race to the build, but unsure if there are additional steps required, the panic remained, no race reported.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 24, 2024
@cagedmantis cagedmantis added this to the Backlog milestone May 24, 2024
@titpetric
Copy link
Author

titpetric commented May 27, 2024

Passing on gotip tyk: devel go1.23-3776465 Fri May 24 22:23:55 2024 +0000, but panic on >=1.22.0 <=1.22.3

@randall77
Copy link
Contributor

If you have a passing and a failing state, then a binary search might reveal which CL changed things.

Does your program use plugins? I noticed the -tags=goplugin build tag, which is suggestive of that.
I would not be surprised if this was a plugin bug. Getting type identity right in the presence of plugins is tricky.

@titpetric
Copy link
Author

If you have a passing and a failing state, then a binary search might reveal which CL changed things.

👍 - is there perhaps a guide we could follow, or do we just list the git hashes and have at it?

Does your program use plugins? I noticed the -tags=goplugin build tag, which is suggestive of that. I would not be surprised if this was a plugin bug. Getting type identity right in the presence of plugins is tricky.

We do use plugins, however no plugins are loaded as part of the test (no .so files, no plugin.Open). Likely we can disable CGO (and disable plugins) and see if the issue persists too.

@randall77
Copy link
Contributor

You can use git bisect https://git-scm.com/docs/git-bisect
I have to relearn it each time I use it :(
At each stage you'll have to run make.bash in the Go repository and then build/run your test with the result.

@titpetric
Copy link
Author

@randall77 one of our brilliant SRE guys managed to do as you suggested:

In light of the traced CL, is there some work around to the behaviour, or some indication to what may be causing it? As mentioned, a direct build from source seems to be passing, so there should be some wider environment difference either at build or run time that result in the panic, meaning there should be some way to avoid it...

@randall77
Copy link
Contributor

the bisect seems to point to the fix to: https://go-review.googlesource.com/c/go/+/567335

Excellent, thanks. That CL certainly looks related.
I will investigate later today. Maybe we can roll that CL back, although it was fixing a different bug.

@randall77
Copy link
Contributor

I will investigate later today. Maybe we can roll that CL back, although it was fixing a different bug.

Never mind, that CL fixes the problem. So I guess we could reconsider backporting (which we chose not to do).
It is strange that your failure is only on 1.22, where #65957 was an issue since at least 1.18.

Both builds are built from the same source tree, but build from source doesn't trigger the panic.

I'm afraid I don't understand this. What other build is there? You mention goreleaser, but I don't know what that is.

is there some work around to the behaviour

Maybe. Using the type [2]string in just the right package (go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal?) might help.

@titpetric
Copy link
Author

It is strange that your failure is only on 1.22, where #65957 was an issue since at least 1.18.

Even stranger is that it's only a failure if built via goreleaser (release tooling). If built from source without the indirection, via Dockerfile, then the failure doesn't appear. Also doesn't appear on 1.21 with same goreleaser release pipeline.

I've asked if we could pinpoint the breaking CL as well.

I'm afraid I don't understand this. What other build is there? You mention goreleaser, but I don't know what that is.

We build two different processes as I tried to describe in the issue:

  • a local dockerfile, built directly with go build, which passes the test suite even with 1.22.3
  • a release CI action which uses goreleaser for building and packaging, docker installed via deb (task test:build for a local partial build, particular for amd64)

goreleaser is in essence release build tooling that provides a wrapper around go build, creating deb and rpm packages, and ultimately creating a docker image where those packages are installed ; our release build is failing those CI tests, however the very minimal dockerfile that skips all of those steps and just uses go build is passing the CI tests.

Maybe. Using the type [2]string in just the right package (go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal?) might help.

I didn't catch you there, the key is a type struct{ptr, ptr}, and it's just in an imported third party package, type safety shouldn't enable using [2]string in lieu of the code that lives there.

@randall77
Copy link
Contributor

Hm, I'm not sure what goreleaser might do differently then. Certainly trying to match what goreleaser does in your simple docker build, or paring back what goreleaser does to match the simple docker build, might illuminate things.

One thing I would check: make sure that you're actually getting the right Go version in both cases. You can print runtime.Version() into a log somewhere, or run go version <binary> on the binary in the final docker file.

I didn't catch you there, the key is a type struct{ptr, ptr}, and it's just in an imported third party package, type safety shouldn't enable using [2]string in lieu of the code that lives there.

I mean doing, anywhere in the package, something like:

var a any = [2]string{}
var b any = [2]string{}
func init() {
    if a != b {
        panic("bad")
    }
}

This just introduces a use of [2]string, and in a context where equality must work, into the package.

@titpetric
Copy link
Author

titpetric commented May 29, 2024

This would be the breaking CL: cf68384
Title: cmd/compile/internal/gc: steps towards work-queue

@titpetric
Copy link
Author

One thing I would check: make sure that you're actually getting the right Go version in both cases. You can print runtime.Version() into a log somewhere, or run go version on the binary in the final docker file.

Verified with go version -m before filing the issue, output between failing and passing binary matches 1-1. ☑️

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 29, 2024
@kellen-miller
Copy link

kellen-miller commented Jun 5, 2024

Just want to add we are seeing this as well. Same runtime panic within the opentelemetry code.

go 1.22.4 on MacOS 14.5 Sonoma with M1 Max chip.

go env output

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/kellen.miller/Library/Caches/go-build'
GOENV='/Users/kellen.miller/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/kellen.miller/go/pkg/mod'
GOOS='darwin'
GOPATH='/Users/kellen.miller/go'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.4/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.4/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/69/3t8bmcl909g3zc07kv0v1dyr0000gr/T/go-build3499618249=/tmp/go-build -gno-record-gcc-switches -fno-common'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Development

No branches or pull requests

6 participants