Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected signal during runtime execution #41285

Closed
karalabe opened this issue Sep 9, 2020 · 8 comments
Closed

runtime: unexpected signal during runtime execution #41285

karalabe opened this issue Sep 9, 2020 · 8 comments

Comments

@karalabe
Copy link
Contributor

@karalabe karalabe commented Sep 9, 2020

Sorry for not adhering to the template, the crash report is from one of our users.

The reported Go version is 1.15 AMD64, the OS is Linux.

We've never seen this crash before and it seems to originate from the runtime.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xffffffffffffff83 pc=0x424779]

goroutine 559704308 [running]:
runtime.throw(0x12aa8d0, 0x2a)
        runtime/panic.go:1116 +0x72 fp=0xc008e23b10 sp=0xc008e23ae0 pc=0x44e2d2
runtime.sigpanic()
        runtime/signal_unix.go:704 +0x4ac fp=0xc008e23b40 sp=0xc008e23b10 pc=0x464c2c
runtime.mallocgc(0x50, 0x10a4460, 0x1, 0x1)
        runtime/malloc.go:1063 +0x7d9 fp=0xc008e23be0 sp=0xc008e23b40 pc=0x424779
runtime.makeslice(0x10a4460, 0x44, 0x44, 0x44)
        runtime/slice.go:98 +0x6c fp=0xc008e23c10 sp=0xc008e23be0 pc=0x465ccc
github.com/golang/snappy.Decode(0x0, 0x0, 0x0, 0xc063fd0c00, 0x47, 0x600, 0x0, 0x0, 0x10, 0x20, ...)
        github.com/golang/snappy@v0.0.2-0.20200707131729-196ae77b8a26/decode.go:65 +0x1e5 fp=0xc008e23c68 sp=0xc008e23c10 pc=0x6d7225
github.com/ethereum/go-ethereum/p2p.(*rlpxFrameRW).ReadMsg(0xc07f900540, 0xbfce2a2e4429a2c3, 0x5d562b579e63, 0x1f537c0, 0x0, 0x0, 0x1f537c0, 0x0, 0x0, 0xc0f1e27560, ...)
        github.com/ethereum/go-ethereum/p2p/rlpx.go:702 +0x6e8 fp=0xc008e23d58 sp=0xc008e23c68 pc=0xb38208
github.com/ethereum/go-ethereum/p2p.(*rlpx).ReadMsg(0xc0539454d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        github.com/ethereum/go-ethereum/p2p/rlpx.go:95 +0x11f fp=0xc008e23e60 sp=0xc008e23d58 pc=0xb3305f
github.com/ethereum/go-ethereum/p2p.(*Peer).readLoop(0xc03c486540, 0xc03c4868a0)
        github.com/ethereum/go-ethereum/p2p/peer.go:271 +0xb5 fp=0xc008e23fd0 sp=0xc008e23e60 pc=0xb304d5
runtime.goexit()
        runtime/asm_amd64.s:1374 +0x1 fp=0xc008e23fd8 sp=0xc008e23fd0 pc=0x486ea1
created by github.com/ethereum/go-ethereum/p2p.(*Peer).run
        github.com/ethereum/go-ethereum/p2p/peer.go:208 +0xf4
@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 9, 2020

This is odd. Could you disassemble the area around that pc (0x424779) and post it?
I build a binary with 1.15 on linux/amd64, and the instruction at offset +0x7d9 in mallocgc is something that shouldn't ever segv (it is mov %rax,40(%rsp)).

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Sep 9, 2020

This may be related to issues #41296, #41099.

@fjl
Copy link

@fjl fjl commented Sep 9, 2020

The geth binary that produced this crash is geth.gz

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 9, 2020

Thanks. Yep, same instruction in your binary. I don't see any way that instruction gets a segv. Unless the stack pointer was corrupted, but that doesn't look to be the case here.

It's possible that a serious bug in our signal handler, triggered by async preempt, could be to blame. Seems really unlikely though. It would somehow have to confuse a SIGUSR1 with a SIGSEGV.

Random corruption on that machine? Also unlikely. Any chance you could get a memory test run on that machine?

@fjl
Copy link

@fjl fjl commented Sep 10, 2020

The original reporter says:

I am not an expert in hardware, but another idea - my ESXi host is running on Supermicro X10SDV-4C-TLN2F motherboard with non ECC RAM. Maybe there was a glitch in RAM?
On the other hand, I have been running that ESXi host for quite a few years now, and I have been running a virtual machine with Geth node since April this year. Never had problems.

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 10, 2020

I hate chalking up bugs to "ghost in the machine", but I don't see anything else we can do here.
I'll close for now. If you see this again, or have some way of reproducing, please reopen.

@randall77 randall77 closed this Sep 10, 2020
@networkimprov
Copy link

@networkimprov networkimprov commented Sep 10, 2020

At least amend the title to mention "snappy.Decode" and "runtime.makeslice" so that someone else might find it.

Honestly, I'd also leave this open for several weeks; closing after one day seems rather rapid.

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 10, 2020

At least amend the title to mention "snappy.Decode" and "runtime.makeslice" so that someone else might find it.

I don't think this issue has anything to do with either of those things. They just happened to be on the stack at the time.

Honestly, I'd also leave this open for several weeks; closing after one day seems rather rapid.

Closing is not permanent; if more data becomes available we can reopen. Closing just means it isn't on our dashboard of things that still need investigating or fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.