Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: new goroutine launches can starve other work indefinitely #52934

Open
rhysh opened this issue May 16, 2022 · 2 comments
Open

runtime: new goroutine launches can starve other work indefinitely #52934

rhysh opened this issue May 16, 2022 · 2 comments
Labels
NeedsInvestigation
Milestone

Comments

@rhysh
Copy link
Contributor

@rhysh rhysh commented May 16, 2022

What version of Go are you using (go version)?

$ go version
go version devel go1.19-db875f4d1b Mon May 16 16:15:58 2022 +0000 darwin/arm64
$ go1.18 version
go version go1.18.2 darwin/arm64

Does this issue reproduce with the latest release?

The problem is present in Go 1.18 and in the development tip.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rhys/go/pkg/mod"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="devel go1.19-db875f4d1b Mon May 16 16:15:58 2022 +0000"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lf/n4ry0qv5639_3z0nhyhy3z7c0000gs/T/go-build743809376=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

// Copyright 2022 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package main

import (
	"fmt"
	"os"
	"sync"
	"sync/atomic"
	"time"
)

const duration = 2000 * time.Millisecond
const workers = 3

var stop int32

func main() {
	time.AfterFunc(duration, func() { atomic.StoreInt32(&stop, 1) })

	var wg sync.WaitGroup
	wg.Add(1)
	go func() { wg.Done() }()
	for i := 0; i < workers; i++ {
		wg.Add(1)
		go func() {
			wg.Done()
			churn()
		}()
	}
	wg.Wait()

	if atomic.LoadInt32(&stop) == 1 {
		fmt.Printf("PROBLEM: first goroutine did not get a chance to run\n")
		os.Exit(1)
	}
	fmt.Printf("ok\n")
}

func churn() {
	if atomic.LoadInt32(&stop) == 0 {
		go churn()
	}
}

What did you expect to see?

I expected each goroutine in the program to get a chance to run within a few 10ms scheduling intervals. The result of that would be for both wg.Done calls to execute, for the wg.Wait call to complete, and for the program to exit quickly without error.

What did you see instead?

When I run the test program with GOMAXPROCS=1, the new goroutine launches keep the runtime busy so it does not give all existing goroutines a chance to run. The goroutine that does nothing but call wg.Done does not get a chance to run until the timer expiration stops the new goroutine launches after 2000 milliseconds.

This isn't specific to GOMAXPROCS=1, additional calls to go churn() can keep additional scheduler threads busy.

Refined from an original report in #52916. CC @golang/runtime

$ GOMAXPROCS=1 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go    
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=2 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=3 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=4 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
ok
$ GOMAXPROCS=1 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=2 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=3 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=4 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
ok
@gopherbot
Copy link

@gopherbot gopherbot commented May 16, 2022

Change https://go.dev/cl/406654 mentions this issue: runtime/pprof: slow new goroutine launches in test

gopherbot pushed a commit that referenced this issue May 16, 2022
The goroutine profiler tests include one that launches a steady stream
of goroutines. That creates a scheduler busy loop that can prevent
forward progress in the rest of the program. Slow down the launches a
bit so other goroutines have a chance to run.

Fixes #52916
For #52934

Change-Id: I748557201b94918b1fa4960544a51a48d9cacc6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/406654
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@mknyszek mknyszek added this to the Backlog milestone May 17, 2022
@mknyszek mknyszek added the NeedsInvestigation label May 17, 2022
@prattmic
Copy link
Member

@prattmic prattmic commented May 18, 2022

The runtime does generally give some priority to new goroutines, because runtime.newproc puts new goroutines in the runnext slot, and runtime.findRunnable (generally) prefers the local runq (of which runnext is on top) over the global runq. Perhaps we are a bit too aggressive there.

runnext does make the new goroutine share the same time slice with the parent goroutine (see inheritTime here), but I suspect that churn runs for such a short period of time that sysmon's attempts to preempt it are generally unsuccessful [1].

[1] I think to succeed, it would need to preempt after churn starts executing, but before it gets to the body of runtime.newproc, which is realistically probably only a dozen or so instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

4 participants