Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: possible memory corruption on FreeBSD #46272

Open
mknyszek opened this issue May 19, 2021 · 9 comments
Open

runtime: possible memory corruption on FreeBSD #46272

mknyszek opened this issue May 19, 2021 · 9 comments

Comments

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented May 19, 2021

Several failures in the last month on FreeBSD builders have failure modes that are very difficult to explain, such as SIGSEGVs in hot runtime paths accessing values that don't quite make sense (e.g. spanOf sees a bad arena entry, fixalloc accesses an out-of-bounds slot, a broken return PC in a runtime stack frame). I suspect they share an underlying cause. Three issues have already been opened for these: #45977, #46103, #46182.

As far as I know, these all seem to be specific to FreeBSD, and even more specifically, the "race" and "12_2" builders.

The relevant logs are available below.

2021-05-04T20:50:35-d19e549/freebsd-amd64-race
2021-05-10T23:42:56-5c48951/freebsd-amd64-12_2
2021-05-14T16:42:01-3d324f1/freebsd-amd64-race

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented May 21, 2021

For information, the freebsd-amd64-12_2 builder was recently added as part of #44431, and in that same change, the freebsd-amd64-race builder was updated to also use FreeBSD 12.2 (up from FreeBSD 12.0 previously).

CC @paulzhol, @dmgk, @cagedmantis.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented May 25, 2021

If it's correlated with FreeBSD being updated, this may not be a release blocker. We should still probably figure out what's wrong, but I don't have any good ideas besides stress-testing all.bash.

It's also still possible that this is a Go issue, but just that it's only a problem on FreeBSD 12.2. Between when the builders got updated (looks like... April 23rd) and when the first failure happened, there's about 2 weeks. Also those two weeks happened to have the last week before the freeze.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented May 27, 2021

Running all.bash in a loop on 10 FreeBSD 12.2 instances. Let's see if I can get it to fail...

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented May 27, 2021

109 all.bash runs in and nothing yet.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented May 27, 2021

I stand corrected! I do actually have a failure that looks promising. Again, in fixalloc.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x84cf86050 pc=0x41f43b]

runtime stack:
runtime.throw({0x67b9e2, 0x2a})
        /tmp/workdir/go/src/runtime/panic.go:1198 +0x74 fp=0x7fffdddeccf8 sp=0x7fffdddeccc8 pc=0x43ed5
4
runtime.sigpanic()
        /tmp/workdir/go/src/runtime/signal_unix.go:719 +0x4a5 fp=0x7fffdddecd58 sp=0x7fffdddeccf8 pc=0
x457545
runtime.(*fixalloc).alloc(0x8404f0)
        /tmp/workdir/go/src/runtime/mfixalloc.go:72 +0x3b fp=0x7fffdddecd98 sp=0x7fffdddecd58 pc=0x41f
43b
runtime.allocmcache.func1()
        /tmp/workdir/go/src/runtime/mcache.go:88 +0x48 fp=0x7fffdddecdc0 sp=0x7fffdddecd98 pc=0x41af08
runtime.allocmcache()
        /tmp/workdir/go/src/runtime/mcache.go:86 +0x58 fp=0x7fffdddecdf8 sp=0x7fffdddecdc0 pc=0x41ae58
runtime.(*p).init(0xc00019c800, 0x10)
        /tmp/workdir/go/src/runtime/proc.go:4860 +0x114 fp=0x7fffdddece18 sp=0x7fffdddecdf8 pc=0x44c8b
4
runtime.procresize(0x64)
        /tmp/workdir/go/src/runtime/proc.go:5036 +0x3a9 fp=0x7fffdddecec8 sp=0x7fffdddece18 pc=0x44d1a
9
runtime.startTheWorldWithSema(0x0)
        /tmp/workdir/go/src/runtime/proc.go:1256 +0xa5 fp=0x7fffdddecf28 sp=0x7fffdddecec8 pc=0x444625
runtime.startTheWorld.func1()
        /tmp/workdir/go/src/runtime/proc.go:1093 +0x26 fp=0x7fffdddecf48 sp=0x7fffdddecf28 pc=0x474fe6
runtime.systemstack()
        /tmp/workdir/go/src/runtime/asm_amd64.s:383 +0x49 fp=0x7fffdddecf50 sp=0x7fffdddecf48 pc=0x479
ba9
@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented May 27, 2021

Due to a bug in my script, I have lost the gomote state (and any potential core dump). Re-trying. At least I've found it's reproducible (kind of).

@heschi
Copy link
Contributor

@heschi heschi commented Jun 10, 2021

Ping -- I understand this is a tricky one, but it does still seem important to resolve in some way. Worst case we might need a prominent release note.

Do we know if this is a regression in Go? That seems worth understanding.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Jun 10, 2021

I think a prominent release note is overkill. It's unclear where the regression is. Given the frequency of failure, it is still possible it's a FreeBSD 12.2 x Go 1.17 thing.

I was trying to reproduce it in gomotes but failed to since that one time. I'm going to check the logs again and update this. I'll also spin up the gomotes again.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Jun 10, 2021

I still haven't seen any such failure on the builders since those three I posted earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants