runtime: preemption in startTemplateThread may cause infinite hang [1.13 backport] #38932
https://github.com/golang/go/wiki/MinorReleases states: "A “serious” problem is one that prevents a program from working at all. "Use a more recent stable version" is a valid workaround, so very few fixes will be backported to both previous issues."
Assuming #38933 is approved, then using 1.14 would be a valid workaround, so I think this cherry-pick should be rejected.
This does indeed reproduce on 1.13:
Once again, bugs (thankfully!) don't reproduce if you leave the fix enabled...
…hread When a locked M wants to start a new M, it hands off to the template thread to actually call clone and start the thread. The template thread is lazily created the first time a thread is locked (or if cgo is in use). stoplockedm will release the P (_Pidle), then call handoffp to give the P to another M. In the case of a pending STW, one of two things can happen: 1. handoffp starts an M, which does acquirep followed by schedule, which will finally enter _Pgcstop. 2. handoffp immediately enters _Pgcstop. This only occurs if the P has no local work, GC work, and no spinning M is required. If handoffp starts an M, and must create a new M to do so, then newm will simply queue the M on newmHandoff for the template thread to do the clone. When a stop-the-world is required, stopTheWorldWithSema will start the stop and then wait for all Ps to enter _Pgcstop. If the template thread is not fully created because startTemplateThread gets stopped, then another stoplockedm may queue an M that will never get created, and the handoff P will never leave _Pidle. Thus stopTheWorldWithSema will wait forever. A sequence to trigger this hang when STW occurs can be visualized with two threads: T1 T2 ------------------------------- ----------------------------- LockOSThread LockOSThread haveTemplateThread == 0 startTemplateThread haveTemplateThread = 1 newm haveTemplateThread == 1 preempt -> schedule g.m.lockedExt++ gcstopm -> _Pgcstop g.m.lockedg = ... park g.lockedm = ... return ... (any code) preempt -> schedule stoplockedm releasep -> _Pidle handoffp startm (first 3 handoffp cases) newm g.m.lockedExt != 0 Add to newmHandoff, return park Note that the P in T2 is stuck sitting in _Pidle. Since the template thread isn't running, the new M will not be started complete the transition to _Pgcstop. To resolve this, we disable preemption around the assignment of haveTemplateThread and the creation of the template thread in order to guarantee that if handTemplateThread is set then the template thread will eventually exist, in the presence of stops. For #38931 Fixes #38932 Change-Id: I50535fbbe2f328f47b18e24d9030136719274191 Reviewed-on: https://go-review.googlesource.com/c/go/+/232978 Run-TryBot: Michael Pratt <email@example.com> Reviewed-by: Austin Clements <firstname.lastname@example.org> TryBot-Result: Gobot Gobot <email@example.com> (cherry picked from commit 11b3730) Reviewed-on: https://go-review.googlesource.com/c/go/+/234888 Reviewed-by: Cherry Zhang <firstname.lastname@example.org>