Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: malloc instancing generic function seems to deadlock(ish) during benchmark #58744

Closed
joeycumines opened this issue Feb 26, 2023 · 9 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@joeycumines
Copy link

joeycumines commented Feb 26, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20 linux/amd64

I believe I've also replicated it on windows, but I haven't verified the stack trace looks the same. On windows, I have experienced an issue with similar characteristics, but haven't managed to retrieve a stacktrace. I have managed to reliably replicate the issue on an M2 Macbook, using v1.20.1.

Does this issue reproduce with the latest release?

Yes, I've replicated it on v1.20.1. The stack trace looks the same.

What operating system and processor architecture are you using (go env)?

Note: I only just updated the go version.

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/joey/.cache/go-build"
GOENV="/home/joey/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/joey/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/joey/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/joey/dev/go-utilpkg/go.mod"
GOWORK="/home/joey/dev/go-utilpkg/go.work"
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3747613050=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I don't have a nice replicator, but it's an issue I ran into within a for-fun personal project, so I can share the code:

git clone https://github.com/joeycumines/go-utilpkg.git
cd go-utilpkg
git checkout dc3656b25a8e74fb38c60aed2d81f77b3e156d96
cd logiface/zerolog
while go test -timeout=15m -count=6 -benchmem -run=^$ -bench=^BenchmarkContextFieldType_Int$ 2>&1 >bench.txt; do date; sleep 1; done
less bench.txt

The stack trace seems to indicate it (could, possibly) be related to instancing the newBuilder generic function: https://github.com/joeycumines/go-utilpkg/blob/dc3656b25a8e74fb38c60aed2d81f77b3e156d96/logiface/logger.go#L349

This function is used as the sync.Pool.New field, and is implemented like:

func newBuilder[E Event]() any {
	return new(Builder[E])
}

What did you expect to see?

The benchmark run successfully.

What did you see instead?

goos: linux
goarch: amd64
pkg: github.com/joeycumines/go-utilpkg/logiface/zerolog
cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
BenchmarkContextFieldType_Int/variant=baseline-16         	 7817979	       158.2 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 7773796	       158.2 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 8651400	       141.2 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 8568614	       140.7 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 7566774	       140.2 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 8818876	       136.5 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=generic-16          	14591700	        84.28 ns/op	     152 B/op	       5 allocs/op
BenchmarkContextFieldType_Int/variant=generic-16          	SIGQUIT: quit
PC=0x45544a m=12 sigcode=0

goroutine 0 [idle]:
runtime.pcvalueCacheKey(...)
	/usr/local/go/src/runtime/symtab.go:906
runtime.pcvalue({0x6c6e98?, 0x6d3020?}, 0x4d699a00?, 0x45522a?, 0x7fe44d699b70?, 0x1?)
	/usr/local/go/src/runtime/symtab.go:923 +0x4a fp=0x7fe44d6999d0 sp=0x7fe44d699900 pc=0x45544a
runtime.pcdatavalue({0x6c6e98?, 0x6d3020?}, 0x0?, 0x18bfb6b4ed389?, 0x7fe44d699a58?)
	/usr/local/go/src/runtime/symtab.go:1109 +0x4f fp=0x7fe44d699a10 sp=0x7fe44d6999d0 pc=0x4560cf
runtime.gentraceback(0x10000c000047900?, 0x7fe44d699dc0?, 0x427805?, 0xc00033f6c0, 0x4, 0xc000209c58, 0x20, 0x0, 0x702420?, 0x0)
	/usr/local/go/src/runtime/traceback.go:362 +0x19e5 fp=0x7fe44d699d68 sp=0x7fe44d699a10 pc=0x45e165
runtime.callers.func1()
	/usr/local/go/src/runtime/traceback.go:844 +0x4f fp=0x7fe44d699dd0 sp=0x7fe44d699d68 pc=0x45f88f
runtime.systemstack()
	/usr/local/go/src/runtime/asm_amd64.s:496 +0x49 fp=0x7fe44d699dd8 sp=0x7fe44d699dd0 pc=0x4689c9

goroutine 667 [running]:
runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:463 fp=0xc000209bb0 sp=0xc000209ba8 pc=0x468960
runtime.callers(0x7fe4640bf828?, {0xc000209c58?, 0x7fe44ee9cd28?, 0x3c?})
	/usr/local/go/src/runtime/traceback.go:843 +0x98 fp=0xc000209c18 sp=0xc000209bb0 pc=0x45f7d8
runtime.mProf_Malloc(0xc000072010, 0x10)
	/usr/local/go/src/runtime/mprof.go:420 +0x6a fp=0xc000209d90 sp=0xc000209c18 pc=0x42e28a
runtime.profilealloc(0xc000072010?, 0x10?, 0x10?)
	/usr/local/go/src/runtime/malloc.go:1290 +0x85 fp=0xc000209dc8 sp=0xc000209d90 pc=0x40e5e5
runtime.mallocgc(0x10, 0x594180, 0x1)
	/usr/local/go/src/runtime/malloc.go:1138 +0x685 fp=0xc000209e30 sp=0xc000209dc8 pc=0x40e105
runtime.newobject(0xc000132000?)
	/usr/local/go/src/runtime/malloc.go:1254 +0x27 fp=0xc000209e58 sp=0xc000209e30 pc=0x40e467
github.com/joeycumines/go-utilpkg/logiface.newBuilder[...]()
	/home/joey/dev/go-utilpkg/logiface/logger.go:420 +0x25 fp=0xc000209e78 sp=0xc000209e58 pc=0x561a85
github.com/joeycumines/go-utilpkg/logiface.(*loggerShared[...]).init(...)
	/home/joey/dev/go-utilpkg/logiface/logger.go:349

.. TRUNCATED, 63MB file, it's just those two lines ..

github.com/joeycumines/go-utilpkg/logiface.(*loggerShared[...]).init(...)
	/home/joey/dev/go-utilpkg/logiface/logger.go:349
	/home/joey/dev/go-utilpkg/logiface/logger.go:*** Test killed: ran too long (6m0s).
signal: killed
FAIL	github.com/joeycumines/go-utilpkg/logiface/zerolog	365.059s
@qiulaidongfeng
Copy link
Member

GOOS/GOARCH=windows/amd64
go version : go version go1.20.1 windows/amd64
I didn't have a reproduction problem on my computer

@joeycumines
Copy link
Author

joeycumines commented Feb 27, 2023

Interesting, thanks for testing @qiulaidongfeng.

I've managed to replicate some sort of issue on an m2 macbook, still testing.
EDIT: I can replicate the issue using darwin/arm64, same behavior, and the stack trace is almost identical.

I can't seem to replicate on windows, though, which is not what I expected. Still testing that too though, using:

@echo off
setlocal EnableDelayedExpansion

:loop
go test -timeout=15m -count=6 -benchmem -run=^$ -bench=^BenchmarkContextFieldType_Int$ >bench.txt 2>&1 && (
    echo %date% %time%
    timeout /t 1 /nobreak >nul
    goto :loop
)

@qiulaidongfeng
Copy link
Member

I use virtualbox to reproduce the problem in GOOS=linux GOARCH=amd64
bench.txt
I tested GOOS=windows GOARCH=amd64 for dozens of minutes, and finally saw

*** Test killed: ran too long (16m0s).

exit status 1

@joeycumines
Copy link
Author

joeycumines commented Feb 27, 2023

I've updated the windows reproducer script, it seems I put 2>&1 in the wrong place (I didn't get the stack trace either).

I'm running it now, perhaps it'll output a useful stack trace.

EDIT: No dice, still no stack trace on windows. Does seem like it's probably like same issue, however.

goos: windows
goarch: amd64
pkg: github.com/joeycumines/go-utilpkg/logiface/zerolog
cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
BenchmarkContextFieldType_Int/variant=baseline-16         	 6666600	       170.2 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 7100318	       169.3 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 6332016	       182.0 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 7103192	       169.4 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 6305948	       184.1 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=baseline-16         	 7185529	       184.5 ns/op	     512 B/op	       1 allocs/op
BenchmarkContextFieldType_Int/variant=generic-16          	11053662	        91.35 ns/op	     152 B/op	       5 allocs/op
BenchmarkContextFieldType_Int/variant=generic-16          	15646696	        75.85 ns/op	     152 B/op	       5 allocs/op
BenchmarkContextFieldType_Int/variant=generic-16          	*** Test killed: ran too long (16m0s).
exit status 1
FAIL	github.com/joeycumines/go-utilpkg/logiface/zerolog	960.254s

@seankhliao seankhliao changed the title runtime/malloc (UNSURE): instancing generic function seems to deadlock(ish) during benchmark runtime: malloc instancing generic function seems to deadlock(ish) during benchmark Feb 27, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 27, 2023
@dmitshur
Copy link
Contributor

CC @golang/runtime.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 27, 2023
@dmitshur dmitshur added this to the Backlog milestone Feb 27, 2023
@prattmic
Copy link
Member

This sounds familiar to #58513.

cc @mdempsky

@mdempsky
Copy link
Member

Yeah, I suspect this is a duplicate of #58513 too. traceback.go:362 from the reported stack trace is inside the FUNCDATA_InlTree loop, which is consistent with that issue.

The fix for that issue has just been submitted to both master and release-branch.go1.20 (to be included in 1.20.2). It would be great if you could build either of these from source and confirm whether the issue still repros. Thanks.

@joeycumines
Copy link
Author

I've only tested on linux/amd64 with release-branch.go1.20, but nothing has blown up yet.

Might run it later tonight as well, but it looks promising. Thanks for the assistance, I guess close this if you'd like. I'll re-open if I reproduce it using a patched build.

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 1, 2023
@prattmic
Copy link
Member

prattmic commented Mar 1, 2023

I'm going to close this as a duplicate of #58513. Please reopen if it does reproduce.

@prattmic prattmic closed this as completed Mar 1, 2023
@golang golang locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

7 participants