Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: ppc64x binaries randomly segfault on linux 5.13rc6 [1.15 backport] #46857

Closed
gopherbot opened this issue Jun 21, 2021 · 3 comments
Closed
Labels
CherryPickApproved
Milestone

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 21, 2021

@pmur requested issue #46803 to be considered for backport to the next 1.15 minor release.

@gopherbot please consider a backport to 1.16 and 1.15. This is necessary for running any linux kernel version 5.13 or above.

@gopherbot gopherbot added the CherryPickCandidate label Jun 21, 2021
@gopherbot gopherbot added this to the Go1.15.14 milestone Jun 21, 2021
@dmitshur dmitshur removed this from the Go1.15.14 milestone Jul 12, 2021
@dmitshur dmitshur added this to the Go1.15.15 milestone Jul 12, 2021
@gopherbot
Copy link
Author

@gopherbot gopherbot commented Jul 13, 2021

Change https://golang.org/cl/334411 mentions this issue: [release-branch.go1.15] runtime: fix crash during VDSO calls on PowerPC

@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Jul 29, 2021

This issue is being approved. This is a major problem without a viable workaround.

@cagedmantis cagedmantis added the CherryPickApproved label Jul 29, 2021
@gopherbot gopherbot removed the CherryPickCandidate label Jul 29, 2021
@gopherbot
Copy link
Author

@gopherbot gopherbot commented Aug 3, 2021

Closed by merging 7921829 to release-branch.go1.15.

gopherbot pushed a commit that referenced this issue Aug 3, 2021
This patch reinstates a fix for PowerPC with regard to making VDSO calls
while receiving a signal, and subsequently crashing. The crash happens
because certain VDSO calls can modify the r30 register, which is where g
is stored. This change was reverted for PowerPC because r30 is supposed
to be a non-volatile register. This is true, but that only makes a
guarantee across function calls, but not "within" a function call. This
patch was seemingly fine before because the Linux kernel still had hand
rolled assembly VDSO function calls, however with a recent change to C
function calls it seems the compiler used can generate instructions
which temporarily clobber r30. This means that when we receive a signal
during one of these calls the value of r30 will not be the g as the
runtime expects, causing a segfault.

You can see from this assembly dump how the register is clobbered during
the call:

(the following is from a 5.13rc2 kernel)

```
Dump of assembler code for function __cvdso_clock_gettime_data:
   0x00007ffff7ff0700 <+0>:     cmplwi  r4,15
   0x00007ffff7ff0704 <+4>:     bgt     0x7ffff7ff07f0 <__cvdso_clock_gettime_data+240>
   0x00007ffff7ff0708 <+8>:     li      r9,1
   0x00007ffff7ff070c <+12>:    slw     r9,r9,r4
   0x00007ffff7ff0710 <+16>:    andi.   r10,r9,2179
   0x00007ffff7ff0714 <+20>:    beq     0x7ffff7ff0810 <__cvdso_clock_gettime_data+272>
   0x00007ffff7ff0718 <+24>:    rldicr  r10,r4,4,59
   0x00007ffff7ff071c <+28>:    lis     r9,32767
   0x00007ffff7ff0720 <+32>:    std     r30,-16(r1)
   0x00007ffff7ff0724 <+36>:    std     r31,-8(r1)
   0x00007ffff7ff0728 <+40>:    add     r6,r3,r10
   0x00007ffff7ff072c <+44>:    ori     r4,r9,65535
   0x00007ffff7ff0730 <+48>:    lwz     r8,0(r3)
   0x00007ffff7ff0734 <+52>:    andi.   r9,r8,1
   0x00007ffff7ff0738 <+56>:    bne     0x7ffff7ff07d0 <__cvdso_clock_gettime_data+208>
   0x00007ffff7ff073c <+60>:    lwsync
   0x00007ffff7ff0740 <+64>:    mftb    r30      <---- RIGHT HERE
=> 0x00007ffff7ff0744 <+68>:    ld      r12,40(r6)
```

What I believe is happening is that the kernel changed the PowerPC VDSO
calls to use standard C calls instead of using hand rolled assembly. The
hand rolled assembly calls never touched r30, so this change was safe to
roll back. That does not seem to be the case anymore as on the 5.13rc2
kernel the compiler *is* generating assembly which modifies r30, making
this change again unsafe and causing a crash when the program receives a
signal during these calls (which will happen often due to async
preempt). This change happened here:
https://lwn.net/ml/linux-kernel/235e5571959cfa89ced081d7e838ed5ff38447d2.1601365870.git.christophe.leroy@csgroup.eu/.

I realize this was reverted due to unexplained hangs in PowerPC
builders, but I think we should reinstate this change and investigate
those issues separately:
f4ca3c1

Fixes #46857

Change-Id: Ib18d7bbfc80a1a9cb558f0098878d41081324b52
GitHub-Last-Rev: c3002bc
GitHub-Pull-Request: #46767
Reviewed-on: https://go-review.googlesource.com/c/go/+/328110
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
(cherry picked from commit 16e82be)
Reviewed-on: https://go-review.googlesource.com/c/go/+/334411
Run-TryBot: Carlos Amedee <carlos@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CherryPickApproved
Projects
None yet
Development

No branches or pull requests

3 participants