Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGSEGV crash on Linux in docker container, possible net/http crash #51113

Open
szuecs opened this issue Feb 9, 2022 · 4 comments
Open
Labels
NeedsInvestigation

Comments

@szuecs
Copy link

@szuecs szuecs commented Feb 9, 2022

Context I run 2 binaries inside a docker container, one open source project (first crash) and another closed source (second crash).
Both of them crashed with signal SIGSEGV: segmentation violation

The first crash started with these lines:

0x2000: fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
goroutine 301131 [running]:
runtime.throw({0xff4e03, 0x2})
/usr/local/go/src/runtime/panic.go:1198 +0x71 fp=0xc010903260 sp=0xc010903230 pc=0x4355b1
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:719 +0x396 fp=0xc0109032b0 sp=0xc010903260 pc=0x44ba36
runtime: unexpected return pc for runtime.hexdumpWords called from 0x40
stack: frame={sp:0xc0109032b0, fp:0xc010903310} stack=[0xc010902000,0xc010904000)

The second crash started with these lines:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x194b8 pc=0x436da4]
runtime stack:
runtime: unexpected return pc for runtime.acquireSudog called from 0xc00004be30
stack: frame={sp:0xc0003adf28, fp:0xc0003adf98} stack=[0xc0003ac000,0xc0003ae000)

What version of Go are you using (go version)?

The open source project runs with Go 1.17.6.
The closed source project with Go 1.14.4.
Both linux/amd64.

Does this issue reproduce with the latest release?

I can't reproduce it, it's an event that I never observed before and we run about 400 instances across 95 clusters with more than 1M RPS.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""

GOARCH="amd64"
GOBIN="/usr/local/bin"
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY="github.example.org"
GONOSUMDB="github.example.org"
GOOS="linux"
GOPATH="/go"
GOPRIVATE="github.example.org"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/workspace/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3473157175=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We run 2 processes in one docker container. One process is the open source project, which is a http reverse proxy with some more features like authnz calls to another http endpoint. The second process is the close source project which is a http server handler that does JWT validation, which is called by the first process to validate the token.
At runtime both processes crashed.
It's a single occurrence and I found https://github.com/golang/go/wiki/LinuxKernelSignalVectorBug which is mentioned in the second crash logs. However I could not reproduce the Bug test https://github.com/golang/go/wiki/LinuxKernelSignalVectorBug#bug-test. I tested with the same gcc version as the kernel.

Gcc version:

gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

Kernel version:

% uname -a
Linux ip-172-31-22-74 5.4.0-1063-aws #66-Ubuntu SMP Wed Jan 12 17:49:45 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
%  cat /proc/version_signature 
Ubuntu 5.4.0-1063.66-aws 5.4.157

We run all nodes without swap, for example:

% free
              total        used        free      shared  buff/cache   available
Mem:       15803320     1581248    12482748        3600     1739324    13949664
Swap:             0           0           0

The first crash logs
go_1st_crash.log

The second crash logs
go_2nd_crash.log

What did you expect to see?

no crash

What did you see instead?

a crash

@mdlayher mdlayher changed the title affected/package: runtime and maybe net/http crash runtime: SIGSEGV crash on Linux in docker container, possible net/http crash Feb 9, 2022
@mdlayher mdlayher added the NeedsInvestigation label Feb 9, 2022
@szuecs
Copy link
Author

@szuecs szuecs commented Feb 10, 2022

~14 Months ago I created a similar issue, that is maybe related, but I don't know #42977

@davecheney
Copy link
Contributor

@davecheney davecheney commented Feb 10, 2022

This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector .

@szuecs
Copy link
Author

@szuecs szuecs commented Feb 10, 2022

The complete code paths are tested and run go test -race on every change. I think it would show up or do you think that go run -race should show it instead of tests?

@szuecs
Copy link
Author

@szuecs szuecs commented Feb 10, 2022

I tried to run it with go run -race, but I get no error if I just use curl and if I do vegeta with 500 rps I get

race: limit on 8128 simultaneously alive goroutines is exceeded, dying
exit status 66

@aclements maybe you want to check the crash log files, as you did in #42977 , thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

3 participants