Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memory corruption on Linux 5.2+ #35777

Closed
aclements opened this issue Nov 22, 2019 · 93 comments
Closed

runtime: memory corruption on Linux 5.2+ #35777

aclements opened this issue Nov 22, 2019 · 93 comments
Assignees
Milestone

Comments

@aclements
Copy link
Member

@aclements aclements commented Nov 22, 2019

We've had several reports of memory corruption on Linux 5.3.x (or later) kernels from people running tip since asynchronous preemption was committed. This is a super-bug to track these issues. I suspect they all have one root cause.

Typically these are "runtime error: invalid memory address or nil pointer dereference" or "runtime: unexpected return pc" or "segmentation violation" panics. They can also appear as self-detected data corruption.

If you encounter a crash that could be random memory corruption, are running Linux 5.3.x or later, and are running a recent tip Go (after commit 62e53b7), please file a new issue and add a comment here. If you can reproduce it, please try setting "GODEBUG=asyncpreemptoff=1" in your environment and seeing if you can still reproduce it.

Duplicate issues (I'll edit this comment to keep this up-to-date):

runtime: corrupt binary export data seen after signal preemption CL (#35326): Corruption in file version header observed by vet. Medium reproducible. Strong leads.

cmd/compile: panic during early copyelim crash (#35658): Invalid memory address in cmd/compile/internal/ssa.copyelim. Not reproducible. Nothing obvious in stack trace. Haven't dug into assembly.

runtime: SIGSEGV in mapassign_fast64 during cmd/vet (#35689): Invalid memory address in runtime.mapassign_fast64 in vet. Stack trace includes random pointers. Some assembly decoding work.

runtime: unexpected return pc for runtime.(*mheap).alloc (#35328): Unexpected return pc. Stack trace includes random pointers. Not reproducible.

cmd/dist: I/O error: read src/xxx.go: is a directory (#35776): Random misbehavior. Not reproducible.

runtime: "fatal error: mSpanList.insertBack" in mallocgc (#35771): Bad mspan next pointer (random and unaligned). Not reproducible.

cmd/compile: invalid memory address or nil pointer dereference in gc.convlit1 (#35621): Invalid memory address in cmd/compile/internal/gc.convlit1. Evidence of memory corruption, though no obvious random pointers. Not reproducible.

cmd/go: unexpected signal during runtime execution (#35783): Corruption in file version header observed by vet. Not reproducible.

runtime: unexpected return pc for runtime.systemstack_switch (#35592): Unexpected return pc. Stack trace includes random pointers. Not reproducible.

cmd/compile: random compile error running tests (#35760): Compiler data corruption. Not reproducible.

@aclements aclements added this to the Go1.14 milestone Nov 22, 2019
@aclements aclements self-assigned this Nov 22, 2019
@aclements aclements pinned this issue Nov 22, 2019
@mvdan
Copy link
Member

@mvdan mvdan commented Nov 22, 2019

@aclements for your records, #35328 and #35776 might be related as well. Those two were on the same Linux 5.3.x machine of mine.

@aclements
Copy link
Member Author

@aclements aclements commented Nov 22, 2019

Thanks @mvdan. I've folded those into the list above.

@zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Nov 22, 2019

#35621 from me. One time, no repro.

@myitcv
Copy link
Member

@myitcv myitcv commented Nov 22, 2019

@aclements just saw #35783 for the record.

If you think we have enough "evidence" please say and I'll stop creating issues for now 😄

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 22, 2019

Have we roughly bisected which Linux versions are affected? Looking at the kernel changes in that region might yield a clue about where and whose the bug is.

5.3 = bad.
5.2 = ?

@zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Nov 22, 2019

In #35326 (comment), I used Arch's 4.19 LTS and could not reproduce the bexport corruption. However, the kernel configuration differs between 4.19 and 5.3, so that may be unscientific. (I'm letting my machine rebuild 5.3 without PREEMPT set to see if that's the problem, but I have doubts. EDIT: It was not PREEMPT, so setting up a builder with a newer kernel would likely be good regardless.)

What set of kernels do the current Linux builders use? That might provide a lower bound, as I've never seen the issue there.

(I'd bring up #9505 to advocate for an Arch builder, but that issue is more about everything but the kernel version. I feel like there should be some builder which is at the latest Linux kernel, whatever that may be.)

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 22, 2019

The existing Go Linux builders use Container Optimized OS with a Linux kernel 4.19.72+.

@aclements
Copy link
Member Author

@aclements aclements commented Nov 22, 2019

Thanks @myitcv, I think we have enough reports. If you do happen to find another one that's reproducible, that would be very helpful, though.

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Nov 25, 2019

To recap experiments last Friday (and I rechecked the test for the more mystifying of these Sunday afternoon), Cherry and I tried the following:

Double the size of the sigaltstack, just in case. Also sanity check the bounds within gdb, they were okay.

Modified the definition of fpstate to conform to what is defined in the linux header files.
Modified sigcontext to use the new Xstate:

fpstate *Xstate // *fpstate1

Wrote a method to allow us to store the ymm registers that were supplied (as registers) to the signal handler,

  1. tried an experiment in the assembly language handler to trash the YMM registers (not the data structures) before return. We never saw any sign of the trash but this seemed to raise the rate of the failures (running "go vet all"). The trashing string stored was "This_is_a_test. "

  2. tried printing the saved and current ymm registers in sigtrampgo.
    The saved ones looked like memmove artifacts (source code while running vet all), and the current ones were always zero. The memmove artifacts stayed unchanged, a lot, between signals.
    I rechecked the code that did this earlier today, just in case we got it wrong.

  3. made a copy of the saved xmm and ymm registers on sigtrampgo entry, then checked the copy against the saved registers, to see if our code ever somehow modified them. That never fired.

I spent some time Saturday looking for "interesting" comments in the Linux git log, I have some to review. What I am wondering is if there was some attempt to optimize saving of the ymm registers and that got fouled up. One thing I wonder a little about was what they are doing for power management with AVX use, I saw some mention of that.
(I.e., what triggers AVX use, can they "save power" if they don't touch the registers, if they believe AVX is not being used? Suppose they rely on some hardware bit that isn't set under exactly the expected conditions?)

type Xstate struct {
   Fpstate Fpstate
   Hdr Header
   Ymmh Ymmh_state
}

type Fpstate struct {
   Cwd uint16
   Swd uint16
   Twd uint16
   Fop uint16
   Rip uint64
   Rdp uint64
   Mxcsr uint32
   Mxcsr_mask uint32
   St_space [32]uint32
   Xmm_space [64]uint32
   Reserved2 [12]uint32
   Reserved3 [12]uint32
}

type Header struct {
   Xfeatures uint64
   Reserved1 [2]uint64
   Reserved2 [5]uint64
}

type Ymmh_state struct {
   Space [64]uint32
}
TEXT runtime·getymm(SB),NOSPLIT,$0
    MOVQ    0(FP), AX
    c Y0,0(AX)
    VMOVDQU Y1,(1*32)(AX)
    VMOVDQU Y2,(2*32)(AX)
    VMOVDQU Y3,(3*32)(AX)
    VMOVDQU Y4,(4*32)(AX)
    VMOVDQU Y5,(5*32)(AX)
    VMOVDQU Y6,(6*32)(AX)
    VMOVDQU Y7,(7*32)(AX)
    VMOVDQU Y8,(8*32)(AX)
    VMOVDQU Y9,(9*32)(AX)
    VMOVDQU Y10,(10*32)(AX)
    VMOVDQU Y11,(11*32)(AX)
    VMOVDQU Y12,(12*32)(AX)
    VMOVDQU Y13,(13*32)(AX)
    VMOVDQU Y14,(14*32)(AX)
    VMOVDQU Y15,(15*32)(AX)
    RET
@aclements
Copy link
Member Author

@aclements aclements commented Nov 25, 2019

An update from over in #35326: I've bisected the issue to kernel commit torvalds/linux@d9c9ce3, which happened between v5.1 and v5.2. It also requires the kernel to be built with GCC 9 (GCC 8 does not reproduce the issue).

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Nov 26, 2019

Not sure where Austin's reporting this or if he had time today, but:

  • he has a C program demonstrating the bug in Linux 5.3 (built with gcc 9) for purposes of filing a bug soonish;
  • there is a workaround on the Go implementation side (be sure the signal stack is mapped);
  • I managed to create a failing Linux 5.3 where the entire kernel is compiled with gcc 8, except for arch/x86/kernel/fpu/signal.c.
@zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Nov 26, 2019

All of the progress updates have been going on #35326. (Most recently, #35326 (comment).)

@dvyukov
Copy link
Member

@dvyukov dvyukov commented Nov 26, 2019

There is this commit that clams to be fixing something in the culprit commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b81ff1013eb8eef2934ca7e8cf53d553c1029e84
I don't know if it will help or not, but @aclements if you have test setup ready, may be worth cherry-picking and trying.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 26, 2019

I think that commit is already included in 5.2 and 5.3 kernel, which still has the problem.

@aclements
Copy link
Member Author

@aclements aclements commented Nov 26, 2019

Thanks @dvyukov. I just re-confirmed that I can still reproduce it in the same way on 5.3, which includes that commit. I'll double check that I can still reproduce right at that commit, just in case it was somehow re-introduced later.

@aclements
Copy link
Member Author

@aclements aclements commented Nov 26, 2019

Reproduced at torvalds/linux@b81ff10, as well as v5.4, which was just released.

I've filed the upstream kernel bug here: https://bugzilla.kernel.org/show_bug.cgi?id=205663

@fvoznika
Copy link

@fvoznika fvoznika commented Feb 4, 2020

This workaround can be problematic for applications that run with limited RLIMIT_MEMLOCK, e.g. systemd services, apps running as root. I bumped into the limit running gVisor with Docker which inherits limits from containerd service.

runtime: mlock of signal stack failed: 12
runtime: increase the mlock limit (ulimit -l) or
runtime: update your kernel to 5.3.15+, 5.4.2+, or 5.5+

Is there an option to disable mlock'ing pages, and possibly preemption to avoid random corruptions? Is the plant to remove this workaround from 1.15?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 5, 2020

You can disable preemption by setting the environment variable GODEBUG=asyncpreemptoff=1.

But the key point here is that that doesn't avoid random corruption. The random corruption can occur with any program in any language. Using async preemption does make the random corruption more likely. But it can happen regardless.

Therefore, since the mlock call is the only way we know to avoid the corruption, there is no way to disable that mlock call. Other than upgrading or downgrading to a fixed kernel.

@0987363 0987363 mentioned this issue Mar 17, 2020
0 of 1 task complete
azdagron added a commit to azdagron/spire that referenced this issue Jul 10, 2020
We put off moving to go1.14+ to give time things to settle related to
the (largely patched) kernel bug that go1.14 tickles more due to the
signals generated by the preemptive scheduler (see
golang/go#35777).

There is a diminishingly small risk of unpatched kernels out there.

Also, go1.15 comes out in roughly a month and we'll need to move to at least
go1.14 by then to continue to get security updates (since go1.13.x will
no longer be maintained).

We've watched the ecosystem and waited for large infrastrucure products
to move to go1.14. Kubernetes and etcd, among others, have made the plunge.

Now feels like a good time.

Signed-off-by: Andrew Harding <andrew.harding@hpe.com>
azdagron added a commit to azdagron/spire that referenced this issue Jul 10, 2020
We put off moving to go1.14+ to give time things to settle related to
the (largely patched) kernel bug that go1.14 tickles more due to the
signals generated by the preemptive scheduler (see
golang/go#35777).

There is a diminishingly small risk of unpatched kernels out there.

Also, go1.15 comes out in roughly a month and we'll need to move to at least
go1.14 by then to continue to get security updates (since go1.13.x will
no longer be maintained).

We've watched the ecosystem and waited for large infrastrucure products
to move to go1.14. Kubernetes and etcd, among others, have made the plunge.

Now feels like a good time.

Signed-off-by: Andrew Harding <andrew.harding@hpe.com>
azdagron added a commit to azdagron/spire that referenced this issue Jul 10, 2020
We put off moving to go1.14+ to give time things to settle related to
the (largely patched) kernel bug that go1.14 tickles more due to the
signals generated by the preemptive scheduler (see
golang/go#35777).

There is a small risk of unpatched kernels out there.

Also, go1.15 comes out in roughly a month and we'll need to move to at least
go1.14 by then to continue to get security updates (since go1.13.x will
no longer be maintained).

We've watched the ecosystem and waited for large infrastructure products
to move to go1.14. Kubernetes and etcd, among others, have made the plunge.

Now feels like a good time.

Signed-off-by: Andrew Harding <andrew.harding@hpe.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 20, 2020

Change https://golang.org/cl/243658 mentions this issue: runtime: let GODEBUG=mlock=0 disable mlock calls

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 22, 2020

Change https://golang.org/cl/244059 mentions this issue: runtime: don't mlock on Ubuntu 5.4 systems

gopherbot pushed a commit that referenced this issue Jul 22, 2020
For #35777
For #37436
Fixes #40184

Change-Id: I68561497d9258e994d1c6c48d4fb41ac6130ee3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/244059
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 31, 2020

Change https://golang.org/cl/246200 mentions this issue: runtime: revert signal stack mlocking

gopherbot pushed a commit that referenced this issue Aug 13, 2020
Go 1.14 included a (rather awful) workaround for a Linux kernel bug
that corrupted vector registers on x86 CPUs during signal delivery
(https://bugzilla.kernel.org/show_bug.cgi?id=205663). This bug was
introduced in Linux 5.2 and fixed in 5.3.15, 5.4.2 and all 5.5 and
later kernels. The fix was also back-ported by major distros. This
workaround was necessary, but had unfortunate downsides, including
causing Go programs to exceed the mlock ulimit in many configurations
(#37436).

We're reasonably confident that by the Go 1.16 release, the number of
systems running affected kernels will be vanishingly small. Hence,
this CL removes this workaround.

This effectively reverts CLs 209597 (version parser), 209899 (mlock
top of signal stack), 210299 (better failure message), 223121 (soft
mlock failure handling), and 244059 (special-case patched Ubuntu
kernels). The one thing we keep is the osArchInit function. It's empty
everywhere now, but is a reasonable hook to have.

Updates #35326, #35777 (the original register corruption bugs).
Updates #40184 (request to revert in 1.15).
Fixes #35979.

Change-Id: Ie213270837095576f1f3ef46bf3de187dc486c50
Reviewed-on: https://go-review.googlesource.com/c/go/+/246200
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet