Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memory corruption on Linux 5.2+ #35777

Closed
aclements opened this issue Nov 22, 2019 · 88 comments
Closed

runtime: memory corruption on Linux 5.2+ #35777

aclements opened this issue Nov 22, 2019 · 88 comments
Assignees
Milestone

Comments

@aclements
Copy link
Member

@aclements aclements commented Nov 22, 2019

We've had several reports of memory corruption on Linux 5.3.x (or later) kernels from people running tip since asynchronous preemption was committed. This is a super-bug to track these issues. I suspect they all have one root cause.

Typically these are "runtime error: invalid memory address or nil pointer dereference" or "runtime: unexpected return pc" or "segmentation violation" panics. They can also appear as self-detected data corruption.

If you encounter a crash that could be random memory corruption, are running Linux 5.3.x or later, and are running a recent tip Go (after commit 62e53b7), please file a new issue and add a comment here. If you can reproduce it, please try setting "GODEBUG=asyncpreemptoff=1" in your environment and seeing if you can still reproduce it.

Duplicate issues (I'll edit this comment to keep this up-to-date):

runtime: corrupt binary export data seen after signal preemption CL (#35326): Corruption in file version header observed by vet. Medium reproducible. Strong leads.

cmd/compile: panic during early copyelim crash (#35658): Invalid memory address in cmd/compile/internal/ssa.copyelim. Not reproducible. Nothing obvious in stack trace. Haven't dug into assembly.

runtime: SIGSEGV in mapassign_fast64 during cmd/vet (#35689): Invalid memory address in runtime.mapassign_fast64 in vet. Stack trace includes random pointers. Some assembly decoding work.

runtime: unexpected return pc for runtime.(*mheap).alloc (#35328): Unexpected return pc. Stack trace includes random pointers. Not reproducible.

cmd/dist: I/O error: read src/xxx.go: is a directory (#35776): Random misbehavior. Not reproducible.

runtime: "fatal error: mSpanList.insertBack" in mallocgc (#35771): Bad mspan next pointer (random and unaligned). Not reproducible.

cmd/compile: invalid memory address or nil pointer dereference in gc.convlit1 (#35621): Invalid memory address in cmd/compile/internal/gc.convlit1. Evidence of memory corruption, though no obvious random pointers. Not reproducible.

cmd/go: unexpected signal during runtime execution (#35783): Corruption in file version header observed by vet. Not reproducible.

runtime: unexpected return pc for runtime.systemstack_switch (#35592): Unexpected return pc. Stack trace includes random pointers. Not reproducible.

cmd/compile: random compile error running tests (#35760): Compiler data corruption. Not reproducible.

@aclements aclements added this to the Go1.14 milestone Nov 22, 2019
@aclements aclements self-assigned this Nov 22, 2019
@aclements aclements pinned this issue Nov 22, 2019
@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Nov 22, 2019

@aclements for your records, #35328 and #35776 might be related as well. Those two were on the same Linux 5.3.x machine of mine.

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Nov 22, 2019

Thanks @mvdan. I've folded those into the list above.

@zikaeroh

This comment has been minimized.

Copy link

@zikaeroh zikaeroh commented Nov 22, 2019

#35621 from me. One time, no repro.

@myitcv

This comment has been minimized.

Copy link
Member

@myitcv myitcv commented Nov 22, 2019

@aclements just saw #35783 for the record.

If you think we have enough "evidence" please say and I'll stop creating issues for now 😄

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 22, 2019

Have we roughly bisected which Linux versions are affected? Looking at the kernel changes in that region might yield a clue about where and whose the bug is.

5.3 = bad.
5.2 = ?

@zikaeroh

This comment has been minimized.

Copy link

@zikaeroh zikaeroh commented Nov 22, 2019

In #35326 (comment), I used Arch's 4.19 LTS and could not reproduce the bexport corruption. However, the kernel configuration differs between 4.19 and 5.3, so that may be unscientific. (I'm letting my machine rebuild 5.3 without PREEMPT set to see if that's the problem, but I have doubts. EDIT: It was not PREEMPT, so setting up a builder with a newer kernel would likely be good regardless.)

What set of kernels do the current Linux builders use? That might provide a lower bound, as I've never seen the issue there.

(I'd bring up #9505 to advocate for an Arch builder, but that issue is more about everything but the kernel version. I feel like there should be some builder which is at the latest Linux kernel, whatever that may be.)

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Nov 22, 2019

The existing Go Linux builders use Container Optimized OS with a Linux kernel 4.19.72+.

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Nov 22, 2019

Thanks @myitcv, I think we have enough reports. If you do happen to find another one that's reproducible, that would be very helpful, though.

@dr2chase

This comment has been minimized.

Copy link
Contributor

@dr2chase dr2chase commented Nov 25, 2019

To recap experiments last Friday (and I rechecked the test for the more mystifying of these Sunday afternoon), Cherry and I tried the following:

Double the size of the sigaltstack, just in case. Also sanity check the bounds within gdb, they were okay.

Modified the definition of fpstate to conform to what is defined in the linux header files.
Modified sigcontext to use the new Xstate:

fpstate *Xstate // *fpstate1

Wrote a method to allow us to store the ymm registers that were supplied (as registers) to the signal handler,

  1. tried an experiment in the assembly language handler to trash the YMM registers (not the data structures) before return. We never saw any sign of the trash but this seemed to raise the rate of the failures (running "go vet all"). The trashing string stored was "This_is_a_test. "

  2. tried printing the saved and current ymm registers in sigtrampgo.
    The saved ones looked like memmove artifacts (source code while running vet all), and the current ones were always zero. The memmove artifacts stayed unchanged, a lot, between signals.
    I rechecked the code that did this earlier today, just in case we got it wrong.

  3. made a copy of the saved xmm and ymm registers on sigtrampgo entry, then checked the copy against the saved registers, to see if our code ever somehow modified them. That never fired.

I spent some time Saturday looking for "interesting" comments in the Linux git log, I have some to review. What I am wondering is if there was some attempt to optimize saving of the ymm registers and that got fouled up. One thing I wonder a little about was what they are doing for power management with AVX use, I saw some mention of that.
(I.e., what triggers AVX use, can they "save power" if they don't touch the registers, if they believe AVX is not being used? Suppose they rely on some hardware bit that isn't set under exactly the expected conditions?)

type Xstate struct {
   Fpstate Fpstate
   Hdr Header
   Ymmh Ymmh_state
}

type Fpstate struct {
   Cwd uint16
   Swd uint16
   Twd uint16
   Fop uint16
   Rip uint64
   Rdp uint64
   Mxcsr uint32
   Mxcsr_mask uint32
   St_space [32]uint32
   Xmm_space [64]uint32
   Reserved2 [12]uint32
   Reserved3 [12]uint32
}

type Header struct {
   Xfeatures uint64
   Reserved1 [2]uint64
   Reserved2 [5]uint64
}

type Ymmh_state struct {
   Space [64]uint32
}
TEXT runtime·getymm(SB),NOSPLIT,$0
    MOVQ    0(FP), AX
    c Y0,0(AX)
    VMOVDQU Y1,(1*32)(AX)
    VMOVDQU Y2,(2*32)(AX)
    VMOVDQU Y3,(3*32)(AX)
    VMOVDQU Y4,(4*32)(AX)
    VMOVDQU Y5,(5*32)(AX)
    VMOVDQU Y6,(6*32)(AX)
    VMOVDQU Y7,(7*32)(AX)
    VMOVDQU Y8,(8*32)(AX)
    VMOVDQU Y9,(9*32)(AX)
    VMOVDQU Y10,(10*32)(AX)
    VMOVDQU Y11,(11*32)(AX)
    VMOVDQU Y12,(12*32)(AX)
    VMOVDQU Y13,(13*32)(AX)
    VMOVDQU Y14,(14*32)(AX)
    VMOVDQU Y15,(15*32)(AX)
    RET
@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Nov 25, 2019

An update from over in #35326: I've bisected the issue to kernel commit torvalds/linux@d9c9ce3, which happened between v5.1 and v5.2. It also requires the kernel to be built with GCC 9 (GCC 8 does not reproduce the issue).

@dr2chase

This comment has been minimized.

Copy link
Contributor

@dr2chase dr2chase commented Nov 26, 2019

Not sure where Austin's reporting this or if he had time today, but:

  • he has a C program demonstrating the bug in Linux 5.3 (built with gcc 9) for purposes of filing a bug soonish;
  • there is a workaround on the Go implementation side (be sure the signal stack is mapped);
  • I managed to create a failing Linux 5.3 where the entire kernel is compiled with gcc 8, except for arch/x86/kernel/fpu/signal.c.
@zikaeroh

This comment has been minimized.

Copy link

@zikaeroh zikaeroh commented Nov 26, 2019

All of the progress updates have been going on #35326. (Most recently, #35326 (comment).)

@dvyukov

This comment has been minimized.

Copy link
Member

@dvyukov dvyukov commented Nov 26, 2019

There is this commit that clams to be fixing something in the culprit commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b81ff1013eb8eef2934ca7e8cf53d553c1029e84
I don't know if it will help or not, but @aclements if you have test setup ready, may be worth cherry-picking and trying.

@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Nov 26, 2019

I think that commit is already included in 5.2 and 5.3 kernel, which still has the problem.

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Nov 26, 2019

Thanks @dvyukov. I just re-confirmed that I can still reproduce it in the same way on 5.3, which includes that commit. I'll double check that I can still reproduce right at that commit, just in case it was somehow re-introduced later.

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Nov 26, 2019

Reproduced at torvalds/linux@b81ff10, as well as v5.4, which was just released.

I've filed the upstream kernel bug here: https://bugzilla.kernel.org/show_bug.cgi?id=205663

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Dec 5, 2019

I've filed a tracking bug to remove the workaround for Go 1.15: #35979.

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Dec 5, 2019

To close with a summary:

Linux 5.2 introduced a bug when compiled with GCC 9 that could cause vector register corruption on amd64 on return from a signal handler where the top page of the signal stack had not yet been paged in. This can affect any program in any language (assuming it uses at least SSE registers), including versions of Go before 1.14, and generally results in arbitrary memory corruption. It became significantly more likely in Go 1.14 because the addition of signal-based non-cooperative preemption significantly increased the number of asynchronous signals received by Go processes. It's also somewhat more likely in Go than other languages because Go regularly creates new OS threads with alternate signal stacks that are likely not to be paged in.

The kernel bug was fixed in Linux 5.3.15 and 5.4.2, and the fix will appear in all 5.5 and future releases. 5.4 is a long-term support release, and 5.4.2 was released with the fix just 10 days after 5.4.

For Go 1.14, we introduced a workaround that mlocks the top page of each signal stack on the affected kernel versions to ensure it is paged in and remains paged in.

Thanks everyone for helping track this down!

I'll keep this issue pinned until next week for anyone running a tip from before the workaround.

@lmb

This comment has been minimized.

Copy link
Contributor

@lmb lmb commented Dec 5, 2019

Do you have an idea how much memory is going to be mlocked? Distros have different values for RLIMIT_MEMLOCK, some of them are pretty low.

@mdempsky

This comment has been minimized.

Copy link
Member

@mdempsky mdempsky commented Dec 5, 2019

Looks like the workaround CL only applies to linux/amd64. Shouldn't it apply to linux/386 too?

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Dec 5, 2019

Looks like the workaround CL only applies to linux/amd64. Shouldn't it apply to linux/386 too?

Elsewhere @aclements said he's been unable to reproduce the problem on 386.

@dr2chase

This comment has been minimized.

Copy link
Contributor

@dr2chase dr2chase commented Dec 5, 2019

@lmb How low is "pretty low"? Expected number of pages locked is O(threads) (not goroutines) pages, since it is one per page. Unless you have a lot of goroutines tied to threads, ought to be GOMAXPROCS pages, plus a few for bad luck.

and this is also tied to just a few versions of Linux that we hope nobody will be using a year from now.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 5, 2019

Change https://golang.org/cl/210098 mentions this issue: runtime: give useful failure message on mlock failure

@mdempsky

This comment has been minimized.

Copy link
Member

@mdempsky mdempsky commented Dec 5, 2019

Elsewhere @aclements said he's been unable to reproduce the problem on 386.

Hm, where was that? I looked through this issue and #35326, and didn't notice any comments to that effect.

@aclements did mention that it also affects XMM registers, which are available on 386. The linux kernel fix looks generic to all of x86, not amd64-specific.

I'm willing to believe it doesn't affect 386 executables, but then I'm curious why not.

@zikaeroh

This comment has been minimized.

Copy link

@zikaeroh zikaeroh commented Dec 5, 2019

@mdempsky In the comments on CL 209899.

@mdempsky

This comment has been minimized.

Copy link
Member

@mdempsky mdempsky commented Dec 5, 2019

@zikaeroh Hiding in plain sight. Thanks.

@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Dec 5, 2019

@mdempsky: https://go-review.googlesource.com/c/go/+/209899/3/src/runtime/os_linux_386.go#7 (it's a little buried)

It may just be harder to reproduce. But we do use the X registers in memmove on 386, so I would still have expected to see it.

@mdempsky

This comment has been minimized.

Copy link
Member

@mdempsky mdempsky commented Dec 6, 2019

@aclements Thanks. Do you mind elaborating how you tested 386? Like the C reproducer exhibits the issue when built with -m64 but not with -m32, when all else is the same (e.g., exact same kernel)?

@lmb

This comment has been minimized.

Copy link
Contributor

@lmb lmb commented Dec 6, 2019

@dr2chase I did an unrepresentative survey amongst colleagues. Debian (and Ubuntu) allows 64 mb of locked memory by default. Arch Linux only has 64 kb.

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Dec 6, 2019

@lmb, thanks for the info. At least Arch users will get failure messages when mlock fails now telling them to update their kernel to a fixed version. (at which point mlock of stack tops won't be used)

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Dec 6, 2019

Speaking of Arch, 5.4.2 just landed on their mirrors.

gopherbot pushed a commit that referenced this issue Dec 6, 2019
Currently, we're ignoring failures to mlock signal stacks in the
workaround for #35777. This means if your mlock limit is low, you'll
instead get random memory corruption, which seems like the wrong
trade-off.

This CL checks for mlock failures and panics with useful guidance.

Updates #35777.

Change-Id: I15f02d3a1fceade79f6ca717500ca5b86d5bd570
Reviewed-on: https://go-review.googlesource.com/c/go/+/210098
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements

This comment has been minimized.

Copy link
Member Author

@aclements aclements commented Dec 6, 2019

Do you mind elaborating how you tested 386?

I ran the go vet stress test with a toolchain built with GOHOSTARCH=386 GOARCH=386.

However, I just ran my C reproducer, changed to use XMM instead of YMM and compiled with gcc -m32 -msse4.2 -pthread testxmm.c and it failed. So I guess 386 has this problem, too. :(

@mdempsky mdempsky reopened this Dec 6, 2019
@mdempsky

This comment has been minimized.

Copy link
Member

@mdempsky mdempsky commented Dec 6, 2019

Reopening for 386 fix.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 6, 2019

Change https://golang.org/cl/210299 mentions this issue: runtime: suggest 5.3.15 kernel upgrade for mlock failure

@mpx

This comment has been minimized.

Copy link
Contributor

@mpx mpx commented Dec 6, 2019

FYI, max locked memory is 64KB on Fedora by default (all.bash currently fails). It looks like a 5.3.15 update is in the pipeline, so this failure should be temporary.

@eliasnaur

This comment has been minimized.

Copy link
Contributor

@eliasnaur eliasnaur commented Dec 8, 2019

I'm also on Fedora, getting

$ GODEBUG=asyncpreemptoff=1 ./make.bash 
Building Go cmd/dist using /home/elias/dev/go-1.7. (go1.7.1 linux/amd64)
Building Go toolchain1 using /home/elias/dev/go-1.7.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/amd64.
runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.4.2 or later
fatal error: mlock failed

even with GODEBUG=asyncpreemptoff=1. How can I proceed while waiting for kernel 5.3.15? ulimit -l 4096 doesn't seem to make a difference (ulimit -l still reports 64).

@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Dec 8, 2019

@eliasnaur, modify/add the memlock value in /etc/security/limits.conf?

@siebenmann

This comment has been minimized.

Copy link

@siebenmann siebenmann commented Dec 8, 2019

The cleanest official Fedora way is to create a new 95-memlock.conf file in /etc/security/limits.d/ that has the contents:

*  -   memlock   131072

Then unfortunately you need to log out and back in again (or ssh to your own machine) to get the new limits applied to your login session. Replace the 131072 by another number if you want a different limit than 128 MBytes; I aimed high because my Fedora machines are single-user machines with only me on them.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 9, 2019

Change https://golang.org/cl/210345 mentions this issue: runtime: mlock top of signal stack on both amd64 and 386

@gopherbot gopherbot closed this in 1c8d1f4 Dec 9, 2019
gopherbot pushed a commit that referenced this issue Dec 9, 2019
Some Linux distributions will continue to provide 5.3.x kernels for a
while rather than 5.4.x.

Updates #35777

Change-Id: I493ef8338d94475f4fb1402ffb9040152832b0fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/210299
Reviewed-by: Austin Clements <austin@google.com>
@mpx

This comment has been minimized.

Copy link
Contributor

@mpx mpx commented Dec 10, 2019

The 5.3.15 kernel update has been released for Fedora 30/31. all.bash builds again.

@ajstarks

This comment has been minimized.

Copy link
Contributor

@ajstarks ajstarks commented Dec 10, 2019

I can confirm the C reproducer program runs correctly on Fedora with the 5.3.15 kernel

$ gcc -pthread test.c
$ time ./a.out

real	1m0.009s
user	0m0.001s
sys	0m0.004s
$ uname -r
5.3.15-300.fc31.x86_64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.