On LR machines, our function prologue generally looks like (stack bounds check and frame pointer handling omitted): store the LR at -frame_size(SP), then decrement SP to SP-frame_size (if they cannot be done on the same instruction, either due to the lack of the instruction on the architecture or the stack frame too large).
If we decrement the SP first, if a signal arrives immediately after the instruction before saving the LR, the runtime will see the junk value at the LR slot and will fail to unwind the stack. So we store the LR first then decrement the SP.
This generally works. But in some cases if the signal stack is not set (e.g. #53374 , also iOS doesn't support sigaltstack), the signal will arrive on the current stack and the kernel will push the signal frame immediately below the SP, clobbering our saved LR. In those cases we have to store the LR again after decrementing the SP. This makes our function prologue a bit inefficient.
We can avoid the problem if we expand the stack unwind metadata to express "the SP has decremented but the return address is still in the LR register". Currently, the metadata is just about SP delta and the runtime always assumes the return address can be found at 0(SP) (for non-leaf functions). With the expanded metadata, we can decrement SP and then store the LR.
This also makes our prologue more similar to C functions.
I plan to do it in Go 1.20.
The text was updated successfully, but these errors were encountered:
When we create a thread with signals blocked. But glibc's
pthread_sigmask doesn't really allow us to block SIGSETXID. So we
may get a signal early on before the signal stack is set. If we
get a signal on the current stack, it will clobber anything below
the SP. This CL makes it to save LR and decrement SP in a single
MOVD.W instruction for small frames, so we don't write below the
We used to use a single MOVD.W instruction before CL 379075.
CL 379075 changed to use an STP instruction to save the LR and FP,
then decrementing the SP. This CL changes it back, just this part
(epilogues and large frame prologues are unchanged). For small
frames, it is the same number of instructions either way.
This decreases the size of a "small" frame from 0x1f0 to 0xf0.
For frame sizes in between, it could benefit from using an
STP instruction instead of using the prologue for the "large"
frame case. We don't bother it for now as this is a stop-gap
This only addresses the issue with small frames. Luckily, all
functions from thread entry to setting up the signal stack have
Other possible ideas:
- Expand the unwind info metadata, separate SP delta and the
location of the return address, so we can express "SP is
decremented but the return address is in the LR register". Then
we can always create the frame first then write the LR, without
writing anything below the SP (except the frame pointer at SP-8,
which is minor because it doesn't really affect program
- Set up the signal stack immediately in mstart in assembly.
For Go 1.19 we do this simple fix. We plan to do the metadata fix
in Go 1.20 ( #53609 ).
Other LR architectures are addressed in CL 413428.
Run-TryBot: Cherry Mui <email@example.com>
TryBot-Result: Gopher Robot <firstname.lastname@example.org>
Reviewed-by: Austin Clements <email@example.com>
Reviewed-by: Eric Fang <firstname.lastname@example.org>