Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGSEGV in mstart #47441

Closed
prattmic opened this issue Jul 28, 2021 · 4 comments
Closed

runtime: SIGSEGV in mstart #47441

prattmic opened this issue Jul 28, 2021 · 4 comments

Comments

@prattmic
Copy link
Member

@prattmic prattmic commented Jul 28, 2021

On some Google internal workloads on 1.17rc1, we are seeing SIGSEGVs of the form:

#4  <C signal handler>
#5  0x000056222cf1049d in runtime.sigfwd () at /root/gc/src/runtime/sys_linux_amd64.s:327
#6  0x00007f596f4bf6c0 in ?? ()
#7  0x000056222cef2174 in runtime.sigfwdgo (sig=11, info=0x7f596f4bf8f0, ctx=0x7f596f4bf7c0) at /root/gc/src/runtime/signal_unix.go:1032
#8  0x000056222cef0927 in runtime.sigtrampgo (sig=11, info=0x7f596f4bf8f0, ctx=0x56222ee116ad <sys_gettid+13>) at /root/gc/src/runtime/signal_unix.go:418
#9  0x000056222cf10ff0 in runtime.sigtrampgo (sig=11, info=0x7f596f4bf8f0, ctx=0x7f596f4bf7c0) at <autogenerated>:1
#10 0x000056222cf104fd in runtime.sigtramp () at /root/gc/src/runtime/sys_linux_amd64.s:344
#11 <signal handler called>
#12 0x00007f596f4c0500 in ?? ()
#13 0x000056222cf0c665 in runtime.mstart () at /root/gc/src/runtime/asm_amd64.s:248
#14 0x000056222cf10e45 in runtime.mstart () at <autogenerated>:1
#15 0x000056222dd24440 in crosscall_amd64 () at gcc_amd64.S:40
#16 0x0000000000000000 in ?? ()

0x00007f596f4c0500 seems to be an address on the C stack calling mstart, indicating we jumped into the stack.

We don't have many details yet, this is still being investigated. If anyone else has seen similar crashes on 1.17, we'd love to hear.

cc @ianlancetaylor @cherrymui @aclements

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 29, 2021

@cherrymui found the problem: a preemption can occur after the call to unlockOSThread in unwindm before g.m.incgo = true in cgocallbackg. That can cause the G to move to a new M, which breaks the assumptions of cgocallbackg.

The preemption point is the first defer in cgocallbackg1. When that deferred function runs (after the second defer, of unwindm), it can be preempted. I believe that this is due to the regabidefer experiment, which wraps most defer functions. Those wrappers are separate functions, and as such are preemption points.

Here is a test case that recreates the observed problem. In order to recreate the problem the test case uses runtime.SetCgoTraceback so that the first defer in cgocallbackg1 is executed.

foo1.go:

package main

/*
extern void cgoTraceback(void* p);
extern void cgoSymbolizer(void* p);
extern void cgoContext(void* p);
extern void GoFunction(int);

static void callGo(int i) {
	GoFunction(i);
}
*/
import "C"

import (
	"fmt"
	"runtime"
	"sync"
	"unsafe"
)

//export GoFunction
func GoFunction(i C.int) {
	fmt.Sprintf("%d\n", i)
}

func main() {
	runtime.SetCgoTraceback(0, unsafe.Pointer(C.cgoTraceback), unsafe.Pointer(C.cgoContext), unsafe.Pointer(C.cgoSymbolizer))
	const funcs = 1e3
	const calls = 1e5
	var wg sync.WaitGroup
	wg.Add(1)
	for i := 0; i < funcs; i++ {
		go func(i int) {
			defer wg.Done()
			for j := 0; j < calls; j++ {
				C.callGo(C.int(i*calls + j))
			}
		}(i)
	}
	wg.Wait()
}

foo2.c:

#include <stdio.h>
#include <signal.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>

static void crash(int signum) {
	printf("caught SIGSEGV\n");
	abort();
}

__attribute__((constructor))
void setSignalHandler() {
	struct sigaction sa;

	memset(&sa, 0, sizeof sa);
	if (sigfillset(&sa.sa_mask) != 0) {
		abort();
	}
	sa.sa_handler = crash;
	if (sigaction(SIGSEGV, &sa, NULL) != 0) {
		abort();
	}
}

void cgoTraceback(void* p) {}
void cgoSymbolizer(void* p) {}

struct contextArg {
	uintptr_t context;
};

void cgoContext(struct contextArg* p) {
	if (p->context == 0) {
		p->context = 1;
	}
}

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 29, 2021

Change https://golang.org/cl/338197 mentions this issue: runtime: avoid possible preemption when returning from Go to C

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 29, 2021

Change https://golang.org/cl/338270 mentions this issue: cmd/compile: mark defer wrapper nosplit for runtime and nosplit callee

Loading

@gopherbot gopherbot closed this in 70fd4e4 Jul 29, 2021
steeve added a commit to znly/go that referenced this issue Aug 19, 2021
When returning from Go to C, it was possible for the goroutine to be
preempted after calling unlockOSThread. This could happen when there
a context function installed by SetCgoTraceback set a non-zero context,
leading to a defer call in cgocallbackg1. The defer function wrapper,
introduced in 1.17 as part of the regabi support, was not nosplit,
and hence was a potential preemption point. If it did get preempted,
the G would move to a new M. It would then attempt to return to C
code on a different stack, typically leading to a SIGSEGV.

Fix this in a simple way by postponing the unlockOSThread until after
the other defer. Also check for the failure condition and fail early,
rather than waiting for a SIGSEGV.

Without the fix to cgocall.go, the test case fails about 50% of the
time on my laptop.

Fixes golang#47441

Change-Id: Ib8ca13215bd36cddc2a49e86698824a29c6a68ba
Reviewed-on: https://go-review.googlesource.com/c/go/+/338197
Trust: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 30, 2021

Change https://golang.org/cl/359796 mentions this issue: runtime: add always-preempt maymorestack hook

Loading

gopherbot pushed a commit that referenced this issue Nov 5, 2021
This adds a maymorestack hook that forces a preemption at every
possible cooperative preemption point. This would have helped us catch
several recent preemption-related bugs earlier, including #47302,
 #47304, and #47441.

For #48297.

Change-Id: Ib82c973589c8a7223900e1842913b8591938fb9f
Reviewed-on: https://go-review.googlesource.com/c/go/+/359796
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: David Chase <drchase@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants