Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: syscall hangs during STW #36273

Closed
WangLeonard opened this issue Dec 24, 2019 · 2 comments
Closed

runtime: syscall hangs during STW #36273

WangLeonard opened this issue Dec 24, 2019 · 2 comments

Comments

@WangLeonard
Copy link

@WangLeonard WangLeonard commented Dec 24, 2019

What version of Go are you using (go version)?

$ go version
`go version devel +48ed1e6113 Tue Dec 24 04:59:06 2019 +0000 darwin/amd64`
and go1.13.3

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/wangdeyu/Library/Caches/go-build"
GOENV="/Users/wangdeyu/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/wangdeyu/Project/GOPATH"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/wangdeyu/Local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/wangdeyu/Local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/09/p8vzp3rn55ggpkq_rv1_hg100000gn/T/go-build532421777=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

//#include <stdio.h>
//#include <stdlib.h>
//#include <unistd.h>
// int mysleep(){
// usleep(1000000);
// return 0;
// }
import "C"
import (
   "log"
   "time"
)

//go:linkname stopTheWorld runtime.stopTheWorld
func stopTheWorld(reason string)

//go:linkname startTheWorld runtime.startTheWorld
func startTheWorld()

func loopDoAdd() {
   for {
      ans := 0
      for i := 0; i < 1000000; i++ {
             ans++
      }
   }
   time.Sleep(time.Microsecond * 10)
}

func main() {
    // Just keep the application busy.
   for i := 0; i < 20; i++ {
      go loopDoAdd()
   }

   for {
      stopTheWorld("TEST")
      C.mysleep()
      startTheWorld()
      log.Println("Done")
      time.Sleep(time.Millisecond)
   }
}

and need to add a empty.s file.

What did you expect to see?

continuously output Done

What did you see instead?

2019/12/24 22:02:16 Done
2019/12/24 22:02:17 Done

...

and hungup.

I was able to reproduce the problem on macos and one Ubuntu, but another Ubuntu will not reproduce (I am not sure of the retake strategy, and sysmon retake will not be triggered on this machine).

I found, if do syscall during STW, and the P where this goroutine is located is retaken by sysmon, this phenomenon will occur.

I have analyzed that this problem occurs when exitsyscall, exitsyscallfast fail->exitsyscall0->globrunqput(gp)->stopm

But in STW, schedule() has no chance to call globrunqget, so it will hungup.

I tried to make the following modification in func retake(now int64) uint32 of proc.go to avoid this problem.

But this is not a perfect solution, I guess handoffp has problem with P's status processing in STW.

func retake(now int64) uint32 {
   n := 0
   lock(&allpLock)
   for i := 0; i < len(allp); i++ {
      _p_ := allp[i]
      if _p_ == nil {
         continue
      }
      pd := &_p_.sysmontick
      s := _p_.status
      
			// skip syscall P in STW.
      if s == _Psyscall && sched.gcwaiting != 0 {
         continue
      }
     ……
     ……
@smasher164 smasher164 changed the title runtime: syscall in STW maybe hangup runtime: syscall hangs during STW Dec 25, 2019
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Dec 26, 2019

Thank you for this report and reproducer @WangLeonard!

Kindly looping in some runtime and garbage collection folks @aclements @mknyszek

@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Dec 26, 2019

stopTheWorld is a runtime internal function. Calling stopTheWorld from user code is not supported.

@cherrymui cherrymui closed this Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.