Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: stalls under Rosetta 2 #42700

Open
FiloSottile opened this issue Nov 18, 2020 · 14 comments
Open

runtime: stalls under Rosetta 2 #42700

FiloSottile opened this issue Nov 18, 2020 · 14 comments

Comments

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Nov 18, 2020

What version of Go are you using (go version)?

$ go version
go version devel +041a4e4c34 Tue Nov 17 22:57:34 2020 +0000 darwin/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/filippo/Library/Caches/go-build"
GOENV="/Users/filippo/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/filippo/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/filippo"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/filippo/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/filippo/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="devel +041a4e4c34 Tue Nov 17 22:57:34 2020 +0000"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/jh/3ydm4lxd71s2__g_x4hny6r00000gn/T/go-build062328339=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

While trying to run make.bash under Rosetta 2, I noticed high-CPU endless stalls in link, asm and compile. Without GOMAXPROCS=1, make.bash never gets to finish; with it, it works more often than not, but not always.

Once, I got this printed to my terminal during a stall

rosetta error: ulock_wait failure: 105

I have no reason to think this is specific to make.bash, they are probably just the longest-running things I ran so far.

/cc @cherrymui

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 18, 2020

Try GODEBUG=asyncpreemptoff=1 and don't use signals in general. I think Rosetta 2's signal handling may be buggy (I have a list of different crashes, including the emulator internal errors).

@FiloSottile
Copy link
Member Author

@FiloSottile FiloSottile commented Nov 18, 2020

Indeed, with GODEBUG=asyncpreemptoff=1 I am not observing the stalls anymore, thank you! I'll leave this open to figure out if there's anything we should do for Rosetta 2 compatibility.

@zwarich
Copy link

@zwarich zwarich commented Nov 18, 2020

I think Rosetta 2's signal handling may be buggy (I have a list of different crashes, including the emulator internal errors).

@cherrymui, could you share a gist of any Rosetta error messages you've seen? I'm the main developer of Rosetta 2, and I would be interested in seeing them.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 18, 2020

@zwarich thanks for reaching out!

One of the internal error is

assertion failed [abi_info.kind == AbiKind::TranslatedCode]: emulated forward to an arm pc that isn't in translated code. arm_pc=0x1020ad4e8 abi_kind=6 emulation_interval=[0x1021e13c0,0x1021e13d0) instruction_interval=[0x1021e13b4, 0x1021e13d0) x86_rip=0x1222ffc

I'll see if I can find out others. Thanks.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 18, 2020

Here is another one

oah error: unexpectedly need to EmulateForward on a synchronous exception x86_rip=0x404421e arm_pc=0x10472e9bc num_insts=6 inst_index=4 x86 instruction bytes: 0xc3940f045ab10ff0 0x9e3d83487175db84

This program includes sending itself a SIGSEGV using kill.

Others more likely materialize as program failures, e.g. hanging, or memory corruption. I'll see if I can reproduce them with C code. Thanks.

@zwarich
Copy link

@zwarich zwarich commented Nov 19, 2020

@cherrymui, is the program that sends itself a SIGSEGV publicly available?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 19, 2020

Yes, all programs I mentioned are publicly available. They are mostly Go programs. If you're okay with building and running Go programs, I can attach instructions here. Or if you prefer, I can attach binaries.

I'm still working on C producers.

@zwarich
Copy link

@zwarich zwarich commented Nov 19, 2020

@cherrymui, instructions would be greatly appreciated.

@zwarich
Copy link

@zwarich zwarich commented Nov 19, 2020

Actually, I think we can now reproduce all of the issues mentioned here.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 19, 2020

@zwarich that's great!

I'm about to send you the instruction for running Go programs. So I'll do it anyway.

  1. install Go. The easiest way is to download the darwin-amd64 binary distribution from https://golang.org/dl/ . You can also do it with other methods mentioned in https://golang.org/doc/install and https://golang.org/doc/install/source
  2. suppose GOROOT is the root of your Go installation, do
cd GOROOT/src/runtime/testdata/testprogcgo
GOROOT/bin/go build -o /tmp/test
/tmp/test SegvInCgo

This is the program that sends SIGSEGV to itself. This program is expected to crash due to SIGSEGV with a stack dump (but not emulator crash :). This does not always fail. In fact, it fails in a fairly low rate.

I also see another error for this program:

assertion failed [!is_synchronous_signal(sig)]: can't pend synchronous signals
        (ThreadContextSignals.cpp:234 pend_signal)

This is somewhat more likely to happen that the one mentioned previously.

I managed to reproduce the last one with a C program:

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <pthread.h>

void e() {}

void handler(int sig, siginfo_t *info, ucontext_t *uap)
{
	// inject a call to e
	uintptr_t pc = uap->uc_mcontext->__ss.__rip;
	uintptr_t sp = uap->uc_mcontext->__ss.__rsp;
	sp -= 8;
	*(uintptr_t*)sp = pc;
	uap->uc_mcontext->__ss.__rsp = sp;
	uap->uc_mcontext->__ss.__rip = (uintptr_t)e;
}

void setup_handler()
{
	struct sigaction sa = {(void*)&handler, -1, SA_SIGINFO | SA_ONSTACK | SA_RESTART};
	stack_t sigstk;
	if ((sigstk.ss_sp = malloc(SIGSTKSZ)) == NULL)
		exit(1);
	sigstk.ss_size = SIGSTKSZ;
	sigstk.ss_flags = 0;
	if (sigaltstack(&sigstk,0) < 0)
		exit(2);

	sigaction(SIGSEGV, &sa, 0);
	sigaction(SIGURG, &sa, 0);
}

void *thr(void *a)
{
	setup_handler();
	while (1);
}

int main()
{
	pid_t pid = getpid();
	int i, j;
	pthread_t tid[10];

	setup_handler();

	for (i = 0; i < 10; i++)
		pthread_create(&tid[i], 0, thr, 0);

	e();
	usleep(1000);
	for (j = 0; j < 10; j++)
		for (i = 0; i < 10; i++) {
			pthread_kill(tid[i], SIGURG);
			pthread_kill(tid[i], SIGSEGV);
		}
	usleep(1000000);
	return 0;
}

This program, running under Rosetta 2, is likely to emit the error mentioned above, or hang.

The call injection part is a bit hacky under C ABI, but it works okay given what e() compiles to. And it somewhat mimics what the Go runtime does.

Thanks!

@tmm1
Copy link
Contributor

@tmm1 tmm1 commented Nov 19, 2020

Seeing something similar as well:

assertion failed [abi_info.kind == AbiKind::TranslatedCode]: emulated forward to an arm pc that isn't in translated code. arm_pc=0x10686e4d8 abi_kind=6 emulation_interval=[0x106ea790c,0x106ea7920) instruction_interval=[0x106ea78f8, 0x106ea7920) x86_rip=0x4bddb6d
(ThreadContextRegisterState.cpp:677 move_to_instruction_boundary)
@tmm1
Copy link
Contributor

@tmm1 tmm1 commented Nov 19, 2020

Is there a way to change runtime.asyncpreemptoff once the process has already started? If I use os.Setenv("GODEBUG", "asyncpreemptoff=1") during init() will that work, or is it too late already.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 19, 2020

Is there a way to change runtime.asyncpreemptoff once the process has already started?

No. This is a temporary workaround and not a long-term solution.

(You can re-exec the process with the environment variable set, if that counts.)

@tmm1
Copy link
Contributor

@tmm1 tmm1 commented Nov 20, 2020

(You can re-exec the process with the environment variable set, if that counts.)

Thanks for this suggestion. I needed a quick workaround and this fit the bill.

The simplest way I've found to check if a process is running under Rosetta 2:

if v, _ := syscall.SysctlUint32("sysctl.proc_translated"); v == 1 {
    // Rosetta 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.