Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: timeouts prevent deadlock detection from working #69188

Open
myaaaaaaaaa opened this issue Aug 31, 2024 · 4 comments
Open

testing: timeouts prevent deadlock detection from working #69188

myaaaaaaaaa opened this issue Aug 31, 2024 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@myaaaaaaaaa
Copy link

myaaaaaaaaa commented Aug 31, 2024

Go version

go version go1.23.0 linux/amd64

Output of go env in your module/workspace:

Click to expand
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='~/.cache/go-build'
GOENV='~/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='~/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='~/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='~/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='0'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3043789877=/tmp/go-build -gno-record-gcc-switches'

What did you do?

package main

func main() {
	select {}
}



package main_test

import "testing"

func TestDeadlock(t *testing.T) {
	select {}
}

What did you see happen?

$ go run main.go
fatal error: all goroutines are asleep - deadlock!
...


$ go test -timeout 0
fatal error: all goroutines are asleep - deadlock!
...


$ go test -timeout 1s    # note that the default is 10m
panic: test timed out after 1s
	running tests:
		TestDeadlock (1s)
...

What did you expect to see?

 $ go run main.go
 fatal error: all goroutines are asleep - deadlock!
 ...
 
 
 $ go test -timeout 0
 fatal error: all goroutines are asleep - deadlock!
 ...
 
 
 $ go test -timeout 1s    # note that the default is 10m
-panic: test timed out after 1s
-	running tests:
-		TestDeadlock (1s)
+fatal error: all goroutines are asleep - deadlock!
 ...
@ianlancetaylor
Copy link
Contributor

If there is an active timer, the program is not deadlocked. It will keep running when the timer expires.

To fix this we would need to somehow add a special case for the testing timeout. I don't think it's worth it. There are already many other cases where the deadlock detector does not reliably detect deadlocks.

@myaaaaaaaaa
Copy link
Author

myaaaaaaaaa commented Sep 1, 2024

Would it suffice to simply keep a count of goroutines excluded from deadlock analysis? Something like the following pseudocode:

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index c4f175b..5304612 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -5902,10 +5902,12 @@ func incidlelocked(v int32) {
 		checkdead()
 	}
 	unlock(&sched.lock)
 }
 
+var excludedGoroutines atomic.Int32
+
 // Check for deadlock situation.
 // The check is based on number of running M's, if 0 -> deadlock.
 // sched.lock must be held.
 func checkdead() {
 	assertLockHeld(&sched.lock)
@@ -5931,10 +5933,11 @@ func checkdead() {
 	// for details.)
 	var run0 int32
 	if !iscgo && cgoHasExtraM && extraMLength.Load() > 0 {
 		run0 = 1
 	}
+	run0 += excludedGoroutines.Load()
 
 	run := mcount() - sched.nmidle - sched.nmidlelocked - sched.nmsys
 	if run > run0 {
 		return
 	}
diff --git a/src/testing/testing.go b/src/testing/testing.go
index 526cba3..d6731e2 100644
--- a/src/testing/testing.go
+++ b/src/testing/testing.go
@@ -2370,10 +2370,11 @@ func (m *M) startAlarm() time.Time {
 			}
 			extra = b.String()
 		}
 		panic(fmt.Sprintf("test timed out after %v%s", *timeout, extra))
 	})
+	excludedGoroutines.Add(1)
 	return deadline
 }
 
 // runningList returns the list of running tests.
 func runningList() []string {
@@ -2387,10 +2388,11 @@ func runningList() []string {
 }
 
 // stopAlarm turns off the alarm.
 func (m *M) stopAlarm() {
 	if *timeout > 0 {
+		excludedGoroutines.Add(-1)
 		m.timer.Stop()
 	}
 }
 
 func parseCpuList() {

Relevant code blocks:

go/src/runtime/proc.go

Lines 5928 to 5940 in 6885bad

// If we are not running under cgo, but we have an extra M then account
// for it. (It is possible to have an extra M on Windows without cgo to
// accommodate callbacks created by syscall.NewCallback. See issue #6751
// for details.)
var run0 int32
if !iscgo && cgoHasExtraM && extraMLength.Load() > 0 {
run0 = 1
}
run := mcount() - sched.nmidle - sched.nmidlelocked - sched.nmsys
if run > run0 {
return
}

go/src/testing/testing.go

Lines 2352 to 2394 in 6885bad

// startAlarm starts an alarm if requested.
func (m *M) startAlarm() time.Time {
if *timeout <= 0 {
return time.Time{}
}
deadline := time.Now().Add(*timeout)
m.timer = time.AfterFunc(*timeout, func() {
m.after()
debug.SetTraceback("all")
extra := ""
if list := runningList(); len(list) > 0 {
var b strings.Builder
b.WriteString("\nrunning tests:")
for _, name := range list {
b.WriteString("\n\t")
b.WriteString(name)
}
extra = b.String()
}
panic(fmt.Sprintf("test timed out after %v%s", *timeout, extra))
})
return deadline
}
// runningList returns the list of running tests.
func runningList() []string {
var list []string
running.Range(func(k, v any) bool {
list = append(list, fmt.Sprintf("%s (%v)", k.(string), highPrecisionTimeSince(v.(highPrecisionTime)).Round(time.Second)))
return true
})
slices.Sort(list)
return list
}
// stopAlarm turns off the alarm.
func (m *M) stopAlarm() {
if *timeout > 0 {
m.timer.Stop()
}
}

@ianlancetaylor
Copy link
Contributor

It doesn't really have anything to do with goroutines. A timer does not imply a goroutine. A timer just mean that at some point in the future a value will appear on a channel or a function will be run. That might in turn unblock some other goroutine.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 4, 2024
@dmitshur dmitshur added this to the Backlog milestone Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants