Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: preemption in startTemplateThread may cause infinite hang [1.13 backport] #38932

gopherbot opened this issue May 7, 2020 · 6 comments
CherryPickApproved FrozenDueToAge


Copy link

@gopherbot gopherbot commented May 7, 2020

@prattmic requested issue #38931 to be considered for backport to the next 1.13 minor release.

@gopherbot backport please 1.13 1.14

@gopherbot gopherbot added the CherryPickCandidate label May 7, 2020
@gopherbot gopherbot added this to the Go1.13.11 milestone May 7, 2020
Copy link

@prattmic prattmic commented May 8, 2020 states: "A “serious” problem is one that prevents a program from working at all. "Use a more recent stable version" is a valid workaround, so very few fixes will be backported to both previous issues."

Assuming #38933 is approved, then using 1.14 would be a valid workaround, so I think this cherry-pick should be rejected.

Copy link

@dmitshur dmitshur commented May 8, 2020

That paragraph on the MinorReleases wiki page is unfortunately out of date, and needs to be updated. Issue #34536 tracks the task of updating documentation. The current policy is best described in #34536 (comment).

Copy link

@prattmic prattmic commented May 8, 2020

Thanks for the clarification. Based on #38931 (comment), I'll do a bit more digging to reproduce this on 1.13.

Copy link

@prattmic prattmic commented May 11, 2020

This does indeed reproduce on 1.13:

$ ../bin/go test -run=TestLockOSThreadTemplateThreadRace -v runtime          
=== RUN   TestLockOSThreadTemplateThreadRace
--- FAIL: TestLockOSThreadTemplateThreadRace (60.58s)
    crash_test.go:105: /tmp/go-build942353858/testprog.exe LockOSThreadTemplateThreadRace exit status: exit status 2
    proc_test.go:929: run 20: want "OK\n", got "SIGQUIT: quit\nPC=0x458ac1 m=0 sigcode=0\n\ngoroutine 0 [idle]:\nruntime.futex(0x5d6d28, 0x80, 0x0, 0x0, 0x0, 0x7ffd0000001e, 0x43193a, 0x0, 0x7ffdb3a4bff0, 0x40a21f, ...)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/sys_linux_amd64.s:535 +0x21\nruntime.futexsleep(0x5d6d28, 0x0, 0xffffffffffffffff)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/os_linux.go:44 +0x46\nruntime.notesleep(0x5d6d28)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/lock_futex.go:151 +0x9f\nruntime.stoplockedm()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2074 +0x88\nruntime.schedule()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2475 +0x485\nruntime.goschedImpl(0xc000072780)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2631 +0xd7\nruntime.gosched_m(0xc000072780)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2639 +0x34\nruntime.mcall(0x0)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:318 +0x5b\n\ngoroutine 1 [semacquire]:\nsync.runtime_Semacquire(0xc0000a8028)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/sema.go:56 +0x42\nsync.(*WaitGroup).Wait(0xc0000a8020)\n\t/usr/local/google/home/mpratt/src/go/src/sync/waitgroup.go:130 +0x64\nmain.LockOSThreadTemplateThreadRace()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:241 +0x10a\nmain.main()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/main.go:34 +0x1cc\n\ngoroutine 18 [running]:\n\tgoroutine running on other thread; stack unavailable\ncreated by main.LockOSThreadTemplateThreadRace\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:213 +0x79\n\ngoroutine 19 [runnable]:\nruntime.startTemplateThread(...)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:1873\nruntime.LockOSThread()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:3543 +0x98\nmain.LockOSThreadTemplateThreadRace.func2(0xbfa67a88c162afba, 0x9a7d7b, 0x5d6420, 0xc0000a8020)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:235 +0x6d\ncreated by main.LockOSThreadTemplateThreadRace\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:227 +0xee\n\ngoroutine 20 [runnable, locked to thread]:\nruntime.Gosched(...)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/proc.go:269\nmain.LockOSThreadTemplateThreadRace.func2(0xbfa67a88c162afba, 0x9a7d7b, 0x5d6420, 0xc0000a8020)\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:236 +0x7e\ncreated by main.LockOSThreadTemplateThreadRace\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:227 +0xee\n\ngoroutine 21 [runnable]:\nmain.LockOSThreadTemplateThreadRace.func2.1()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:233\ncreated by main.LockOSThreadTemplateThreadRace.func2\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:233 +0x68\n\ngoroutine 33 [runnable]:\nmain.LockOSThreadTemplateThreadRace.func2.1()\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:233\ncreated by main.LockOSThreadTemplateThreadRace.func2\n\t/usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:233 +0x68\n\nrax    0xca\nrbx    0x5d6be0\nrcx    0x458ac3\nrdx    0x0\nrdi    0x5d6d28\nrsi    0x80\nrbp    0x7ffdb3a4bfb8\nrsp    0x7ffdb3a4bf70\nr8     0x0\nr9     0x0\nr10    0x0\nr11    0x286\nr12    0xf1\nr13    0x0\nr14    0x5255f4\nr15    0x0\nrip    0x458ac1\nrflags 0x286\ncs     0x33\nfs     0x0\ngs     0x0\n"
FAIL    runtime 60.605s

Once again, bugs (thankfully!) don't reproduce if you leave the fix enabled...

@andybons andybons removed this from the Go1.13.11 milestone May 14, 2020
@andybons andybons added this to the Go1.13.12 milestone May 14, 2020
Copy link

@gopherbot gopherbot commented May 21, 2020

Change mentions this issue: [release-branch.go1.13] runtime: disable preemption in startTemplateThread

@andybons andybons added the CherryPickApproved label May 27, 2020
@gopherbot gopherbot removed the CherryPickCandidate label May 27, 2020
Copy link

@gopherbot gopherbot commented May 27, 2020

Closed by merging 1a11781 to release-branch.go1.13.

gopherbot pushed a commit that referenced this issue May 27, 2020

When a locked M wants to start a new M, it hands off to the template
thread to actually call clone and start the thread. The template thread
is lazily created the first time a thread is locked (or if cgo is in

stoplockedm will release the P (_Pidle), then call handoffp to give the
P to another M. In the case of a pending STW, one of two things can

1. handoffp starts an M, which does acquirep followed by schedule, which
will finally enter _Pgcstop.

2. handoffp immediately enters _Pgcstop. This only occurs if the P has
no local work, GC work, and no spinning M is required.

If handoffp starts an M, and must create a new M to do so, then newm
will simply queue the M on newmHandoff for the template thread to do the

When a stop-the-world is required, stopTheWorldWithSema will start the
stop and then wait for all Ps to enter _Pgcstop. If the template thread
is not fully created because startTemplateThread gets stopped, then
another stoplockedm may queue an M that will never get created, and the
handoff P will never leave _Pidle. Thus stopTheWorldWithSema will wait

A sequence to trigger this hang when STW occurs can be visualized with
two threads:

  T1                                 T2
-------------------------------   -----------------------------

LockOSThread                      LockOSThread
  haveTemplateThread == 0
    haveTemplateThread = 1
    newm                            haveTemplateThread == 1
      preempt -> schedule           g.m.lockedExt++
        gcstopm -> _Pgcstop         g.m.lockedg = ...
        park                        g.lockedm = ...

                                 ... (any code)
                                   preempt -> schedule
                                       releasep -> _Pidle
                                         startm (first 3 handoffp cases)
                                            g.m.lockedExt != 0
                                            Add to newmHandoff, return

Note that the P in T2 is stuck sitting in _Pidle. Since the template
thread isn't running, the new M will not be started complete the
transition to _Pgcstop.

To resolve this, we disable preemption around the assignment of
haveTemplateThread and the creation of the template thread in order to
guarantee that if handTemplateThread is set then the template thread
will eventually exist, in the presence of stops.

For #38931
Fixes #38932

Change-Id: I50535fbbe2f328f47b18e24d9030136719274191
Run-TryBot: Michael Pratt <>
Reviewed-by: Austin Clements <>
TryBot-Result: Gobot Gobot <>
(cherry picked from commit 11b3730)
Reviewed-by: Cherry Zhang <>
@golang golang locked and limited conversation to collaborators May 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
CherryPickApproved FrozenDueToAge
None yet

No branches or pull requests

4 participants