Skip to content

runtime: deadlock when running concurrent builds on MacOS #59657

@sluongng

Description

@sluongng

What version of Go are you using (go version)?

$ go version
1.20.2

Does this issue reproduce with the latest release?

Yes, this could be reproduce on 1.20.3 but cannot be reproduce on 1.19.8

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/sluongng/Library/Caches/go-build"
GOENV="/Users/sluongng/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/sluongng/work/golang/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/sluongng/work/golang"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/private/var/tmp/_bazel_sluongng/5c02d8a9b82454292bdfef7b6f9ac04e/external/go_sdk"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/private/var/tmp/_bazel_sluongng/5c02d8a9b82454292bdfef7b6f9ac04e/external/go_sdk/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/sluongng/work/bazelbuild/rules_go/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ly/gjsqx9wd0cv1ck7xdslrt6t00000gn/T/go-build51901763=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

In Bazel's rules_go, we need to compile a helper binary using go build ... before building anything else.

However, due to different Bazel configurations existing in the same repo (i.e. CGO enable/disable), this same helper binary is built multiple times, once for each unique set of configurations.

When trying to build a test in rules_go with bazel build //tests/core/go_path:copy_path, the helper binary was scheduled by Bazel to be compiled 9 times in parallel. This resulted in 6 different go build processes running in parallel at once.

image

Each of these build process could spawn multiple compile subprocess as a result

image

What did you expect to see?

Build to complete without issue

What did you see instead?

All these builds are stuck in a deadlock that completely freeze applications running on the same machine. Opening a new Chrome tab, or running git config would never finish.

Using Activity Monitor's sampling feature we could see call graphs like such for each go build process

image
image

It seems like there are 3 types of stack relating to libsystem_kernel.dylib / libsystem_pthread.dylib

    2430 Thread_44978   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2430 ???  (in <unknown binary>)  [0x1358]
    +   2430 runtime.asmcgocall.abi0  (in go) + 124  [0x100b702ac]
    +     2430 runtime.syscall6.abi0  (in go) + 56  [0x100b71a98]
    +       2430 __wait4_nocancel  (in libsystem_kernel.dylib) + 8  [0x1a17a04f4]

    2430 Thread_44999
    + 2430 runtime.asmcgocall.abi0  (in go) + 201  [0x100b702f9]
    +   2430 runtime.pthread_cond_timedwait_relative_np_trampoline.abi0  (in go) + 28  [0x100b717ec]
    +     2430 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1276  [0x1a17d45a0]
    +       2430 __psynch_cvwait  (in libsystem_kernel.dylib) + 8  [0x1a1797710]

    2430 Thread_45038
    + 2430 runtime.kevent_trampoline.abi0  (in go) + 40  [0x100b71518]
    +   2430 kevent  (in libsystem_kernel.dylib) + 8  [0x1a179a060]

The last stack seems to be waiting for some event that never finishes (?).

This bug could be reproduced consistently for me with go 1.20.2 and 1.20.3. It's a bit painful though as I would need to restart my laptop each time. It would go away if I try to build the binary sequentially by limiting Bazel's parallelism, or if I were to downgrade the go version to 1.19.8.

Here is some additional system info

> uname -a
Darwin Sons-Laptop.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 arm64

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.OS-Darwincompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions