Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: memory corruption on OpenBSD/amd64 when forking #34988

Open
jrick opened this issue Oct 18, 2019 · 11 comments
Open

syscall: memory corruption on OpenBSD/amd64 when forking #34988

jrick opened this issue Oct 18, 2019 · 11 comments

Comments

@jrick
Copy link

@jrick jrick commented Oct 18, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.2 openbsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jrick/.cache/go-build"
GOENV="/home/jrick/.config/go/env"
GOEXE=""
GOFLAGS="-tags=netgo -ldflags=-extldflags=-static"
GOHOSTARCH="amd64"
GOHOSTOS="openbsd"
GONOPROXY=""
GONOSUMDB=""
GOOS="openbsd"
GOPATH="/home/jrick/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jrick/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jrick/src/go/pkg/tool/openbsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"

What did you do?

I observed these issues in one of my applications, and assumed it was a race or invalid unsafe.Pointer usage or some other fault of the application code. When the 1.13.2 release dropped yesterday I built it from source and observed a similar issue running the regression tests. The failed regression test does not look related to the memory corruption, but I can reproduce the problem by repeatedly running the test in a loop:

$ cd test # from go repo root
$ while :; do go run run.go -- fixedbugs/issue27829.go || break; done >go.panic 2>&1

It can take several minutes to observe the issue but here are some of the captured panics and fatal runtime errors:

https://gist.githubusercontent.com/jrick/f8b21ecbfbe516e1282b757d1bfe4165/raw/6cf0efb9ba47ba869f98817ce945971f2dff47d6/gistfile1.txt

https://gist.githubusercontent.com/jrick/9a54c085b918aa32910f4ece84e5aa21/raw/91ec29275c2eb1be49f62ad8a01a5317ad168c94/gistfile1.txt

https://gist.githubusercontent.com/jrick/8faf088593331c104cc0da0adb3f24da/raw/7c92e7e7d60d426b2156fd1bdff42e0717b708f1/gistfile1.txt

https://gist.githubusercontent.com/jrick/4645316444c12cd815fb71874f6bdfc4/raw/bffac2a448b07242a538b77a2823c9db34b6ef6f/gistfile1.txt

https://gist.githubusercontent.com/jrick/3843b180670811069319e4122d32507a/raw/0d1f897aa25d91307b04ae951f1b260f33246b61/gistfile1.txt

https://gist.githubusercontent.com/jrick/99b7171c5a49b4b069edf06884ad8e17/raw/740c7b9e8fa64d9ad149fd2669df94e89c466927/gistfile1.txt

Additionally, I observed go run hanging (no runtime failure due to deadlock) and it had to be killed with SIGABRT to get a trace: https://gist.githubusercontent.com/jrick/d4ae1e4355a7ac42f1910b7bb10a1297/raw/54e408c51a01444abda76dc32ac55c2dd217822b/gistfile1.txt

It may not matter which regression test is run as the errors also occur in run.go.

@jrick
Copy link
Author

@jrick jrick commented Oct 18, 2019

I missed that 1.13.3 was also released yesterday. Currently updating to that and will report whether this is still an issue.

@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 18, 2019

This looks like cmd/go crashing while building the test, not the test itself.
The errors look heap realated. @mknyszek

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 18, 2019

@jrick maybe you meant this in your original post, but I just want to be clear. Does this reproduce with Go 1.12.X or older versions of Go?

Since we have a reasonable reproducer, the next step to me would be to just bisect what went into Go 1.13, if we know it isn't reproducing in Go 1.12. I genuinely have no idea what this could be. I thought at first that it could be scavenging related but that's highly unlikely for a number of reasons. I won't rule it out yet, though.

@jrick
Copy link
Author

@jrick jrick commented Oct 18, 2019

I haven't tested 1.12.x but will follow up testing that next. Currently hammering this test with 1.13.3 and so far it has not failed, but my application built with 1.13.3 still fails with SIGBUS (could be unrelated).

@jrick
Copy link
Author

@jrick jrick commented Oct 18, 2019

@mknyszek it still hasn't failed on 1.13.3 (running close to an hour now) but quickly failed on 1.12.12.

https://gist.githubusercontent.com/jrick/bb5a493e6ebd88e1e846f1c5c09c9e9a/raw/e82b0136b0826581f6e591915d3a634112f323a1/gistfile1.txt

@jrick
Copy link
Author

@jrick jrick commented Dec 6, 2019

This remains a problem in 1.13.5, so it's not addressed by the recent fixes to the go tool.

https://gist.githubusercontent.com/jrick/a2499b2ae10b4c63359174e26c0fd936/raw/b233f14a518ca828c4416d803f81b1e8ca34d073/gistfile1.txt

@jrick
Copy link
Author

@jrick jrick commented May 20, 2020

This may be fork/exec related. This program exhibits similar crashes on OpenBSD 6.7 and Go 1.14.3.

package main

import (
        "os/exec"
)

func main() {
        sem := make(chan struct{}, 100)
        for {
                sem <- struct{}{}
                go func() {
                        err := exec.Command("/usr/bin/true").Run()
                        if err != nil {
                                panic(err)
                        }
                        <-sem
                }()
        }
}

crash trace: https://gist.github.com/jrick/8d6ef72796a772668b891310a18dd805

Synchronizing the os/exec call with an additional mutex appears to remove the crash.

@ianlancetaylor ianlancetaylor changed the title Memory corruption on OpenBSD/amd64 syscall: memory corruption on OpenBSD/amd64 when forking May 20, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 20, 2020

Thanks for the stack trace. That looks very much like a forked child process is changing the memory seen by the parent process. Which should of course be impossible. Specifically it seems that sched.lock.key is being set to zero while the lock is held during goschedImpl.

@jrick
Copy link
Author

@jrick jrick commented May 22, 2020

I'm seeing another strange thing in addition to that crash. Sometimes the program will run forever, spinning cpu, but appears to be deadlocked because none of the pids of those true processes are ever changing. Here's the trace after sending sigquit: https://gist.github.com/jrick/74aaa63624961145b7bc7b9518da75e1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.