Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime, cgo: deadlock at _cgo_wait_runtime_init_done in LD_PRELOAD so #43836

Open
james-li opened this issue Jan 22, 2021 · 3 comments
Open

runtime, cgo: deadlock at _cgo_wait_runtime_init_done in LD_PRELOAD so #43836

james-li opened this issue Jan 22, 2021 · 3 comments
Milestone

Comments

@james-li
Copy link

@james-li james-li commented Jan 22, 2021

What version of Go are you using (go version)?

$ go version
go version go1.15.7 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
# go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build815615511=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I've compiled a so file to use LD_PRELOAD to hook bash execve. Deadlock occurs when using the LD_PRELOAD so file

What did you expect to see?

What did you see instead?

[golib]# LD_PRELOAD=`pwd`/libshellhook.so bash  grepconf.sh -c
<hang here>

When I use gdb to debug this, it's stack show:

(gdb) bt
#0  0x00007fdf8d79a995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fdf8e313723 in _cgo_wait_runtime_init_done () at gcc_libinit.c:40
#2  0x00007fdf8e313446 in execve (filename=0x1909350 "/usr/libexec/grepconf.sh", argv=0x1908b00, envp=0x1907d10) at _cgo_export.c:26
#3  0x000000000042fc02 in shell_execve ()

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7fdf8e856740 (LWP 25485) "bash" 0x00007fdf8d79a995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

I've found some clues. I put some debug code in /usr/lib/golang/src/runtime/cgo/gcc_libinit.c to trace the x_cgo_notify_runtime_init_done and _cgo_wait_runtime_init_done.

$ diff -u  ./src/runtime/cgo/gcc_libinit.c /usr/lib/golang/src/runtime/cgo/gcc_libinit.c
--- ./src/runtime/cgo/gcc_libinit.c	2021-01-21 13:31:03.192576533 +0800
+++ /usr/lib/golang/src/runtime/cgo/gcc_libinit.c	2021-01-21 16:08:13.426097949 +0800
@@ -11,6 +11,8 @@
 #include <stdlib.h>
 #include <string.h> // strerror
 #include <time.h>
+#include <unistd.h>
+#include <sys/types.h>
 #include "libcgo.h"
 #include "libcgo_unix.h"
 
@@ -28,7 +30,9 @@
 	if (err != 0) {
 		fprintf(stderr, "pthread_create failed: %s", strerror(err));
 		abort();
-	}
+	}else{
+		fprintf(stderr, "pthread_create success: %d:%x\n", getpid(), (unsigned)p);
+        }
 }
 
 uintptr_t
@@ -37,7 +41,15 @@
 
 	pthread_mutex_lock(&runtime_init_mu);
 	while (runtime_init_done == 0) {
-		pthread_cond_wait(&runtime_init_cond, &runtime_init_mu);
+                struct timespec ts;
+                clock_gettime(CLOCK_REALTIME, &ts);
+                ts.tv_sec += 1;
+		if(pthread_cond_timedwait(&runtime_init_cond, &runtime_init_mu, &ts)){
+                        fprintf(stderr, "wait runtime init done at thread %d:%u, value address %p\n", 
+                                (int)getpid(), (unsigned)pthread_self(), &runtime_init_done);
+                        continue;
+                }
+//                pthread_cond_wait(&runtime_init_cond, &runtime_init_mu);
 	}
 
 	// TODO(iant): For the case of a new C thread calling into Go, such
@@ -49,7 +61,10 @@
 	// initialization to be complete anyhow, later, by waiting for
 	// main_init_done to be closed in cgocallbackg1. We should wait here
 	// instead. See also issue #15943.
-	pfn = cgo_context_function;
+        if(runtime_init_done)
+                pfn = cgo_context_function;
+        else
+                pfn = nil;
 
 	pthread_mutex_unlock(&runtime_init_mu);
 	if (pfn != nil) {
@@ -64,10 +79,14 @@
 
 void
 x_cgo_notify_runtime_init_done(void* dummy __attribute__ ((unused))) {
-	pthread_mutex_lock(&runtime_init_mu);
-	runtime_init_done = 1;
-	pthread_cond_broadcast(&runtime_init_cond);
-	pthread_mutex_unlock(&runtime_init_mu);
+        int tries = 0;
+        while(tries ++ < 3){
+                fprintf(stderr, "notify runtime init done at thread %d:%u, value address %p\n", (int)getpid(), (unsigned)pthread_self(), &runtime_init_done);
+                pthread_mutex_lock(&runtime_init_mu);
+                runtime_init_done = 1;
+                pthread_cond_broadcast(&runtime_init_cond);
+                pthread_mutex_unlock(&runtime_init_mu);
+        }
 }
 
 // Sets the context function to call to record the traceback context

Here is the log

notify runtime init done at thread 32626:1276888832, value address 0x7ff74cea5268
wait runtime init done at thread 32630:1292597056, value address 0x7ff74cea5268

It's clearly shown that the notify and wait action occur in two different processes(parent process and child process). So deadlock is unavoidable.

@james-li
Copy link
Author

@james-li james-li commented Jan 22, 2021

source code:

package main

/*
#ifndef __DEF_H
#define __DEF_H 1
#define _GNU_SOURCE
typedef char * const *  char_pointer_const_t;
typedef const char * const_char_t;
#endif
*/
import "C"

import (
	"fmt"
	"log"
	"os"
	"unsafe"

	"github.com/rainycape/dl"

)

func main() {
	test.DoTest()
}

//export execve
func execve(filename C.const_char_t, argv C.char_pointer_const_t, envp C.char_pointer_const_t) C.int {
	var command = parseCommandLine(argv)
	log.Println("execve hooked, file="+C.GoString(filename), "command="+command)


    lib, err := dl.Open("libc", 0)
    if err != nil {
        fmt.Println(err)
        os.Exit(1)
    }
    defer lib.Close()
    var old_execve func(filename C.const_char_t, argv C.char_pointer_const_t, envp C.char_pointer_const_t) C.int
    lib.Sym("execve", &old_execve)
    return old_execve(filename, argv, envp)
}

/**
//export fork
func fork() C.pid_t {
	lib, err := dl.Open("libc", 0)
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	defer lib.Close()
	var old_fork func() C.pid_t
	lib.Sym("fork", &old_fork)
	return old_fork()
}
**/
func parseCommandLine(argv C.char_pointer_const_t) string {
	length := 255
	tmpslice := (*[1 << 30]*C.char)(unsafe.Pointer(argv))[:length:length]
	var commandLine = ""
	for i, s := range tmpslice {
		if s == nil || i > length {
			break
		}
		commandLine += C.GoString(s) + " "
	}
	return commandLine
}

@dmitshur dmitshur added this to the Backlog milestone Jan 22, 2021
@dmitshur dmitshur changed the title runtime/cgo: deadlock at _cgo_wait_runtime_init_done in LD_PRELOAD so runtime, cgo: deadlock at _cgo_wait_runtime_init_done in LD_PRELOAD so Jan 22, 2021
@james-li
Copy link
Author

@james-li james-li commented Jan 26, 2021

There's some progress. I use pthread_atfork to ensure runtime init done before fork.

[root@centos-linux-7 cgo]# diff -u gcc_libinit.c gcc_libinit.c.org 
--- gcc_libinit.c       2021-01-26 09:57:39.734649377 +0800
+++ gcc_libinit.c.org   2020-12-04 01:32:52.000000000 +0800
@@ -88,12 +88,6 @@
        return ret;
 }
 
-void prepare(void) {
-       fprintf(stderr, "wait init done before fork, pid %d\n", getpid());
-       _cgo_wait_runtime_init_done();
-       //pthread_mutex_unlock(&runtime_init_mu);
-}
-
 // _cgo_try_pthread_create retries pthread_create if it fails with
 // EAGAIN.
 int
@@ -101,7 +95,7 @@
        int tries;
        int err;
        struct timespec ts;
-       pthread_atfork(prepare, NULL, NULL);
+
        for (tries = 0; tries < 20; tries++) {
                err = pthread_create(thread, attr, pfn, arg);
                if (err == 0) {

It works. But in the case when I hook execve, deadlock occurs again. Here is the stack.

bt
#0  runtime.futex () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:588
#1  0x00007feb9befe086 in runtime.futexsleep (addr=0xc000036148, val=0, ns=-1)
    at /usr/lib/golang/src/runtime/os_linux.go:45
#2  0x00007feb9bed93a5 in runtime.notesleep (n=0xc000036148)
    at /usr/lib/golang/src/runtime/lock_futex.go:159
#3  0x00007feb9bf090f3 in runtime.stoplockedm () at /usr/lib/golang/src/runtime/proc.go:2085
#4  0x00007feb9bf0c954 in runtime.exitsyscall0 (gp=0xc000000600)
    at /usr/lib/golang/src/runtime/proc.go:3412
#5  0x00007feb9bf31f53 in runtime.mcall () at /usr/lib/golang/src/runtime/asm_amd64.s:318
#6  0x00007fffff8d5ec0 in ?? ()
#7  0x800000000000000e in ?? ()
#8  0x00007fffff8d5f00 in ?? ()
#9  0x00007feb9bf339be in runtime.cgocallback ()
    at /usr/lib/golang/src/runtime/asm_amd64.s:705
#10 0x00007feb9bfc8aa7 in _cgoexp_0ef309af05f9_execve (a=0x7fffff8d5fa0, n=32, ctxt=0)
    at _cgo_gotypes.go:64
#11 0x00007feb9bf37bf7 in crosscall2 () at /usr/lib/golang/src/runtime/cgo/asm_amd64.s:59
#12 0x00007feb9bfc8b00 in execve (filename=0x7fffff8d5df8 "0^\215\377\377\177", 
    argv=0x21d7950, envp=0x21d6c70) at _cgo_export.c:38
#13 0x000000000042fce2 in shell_execve ()
#14 0x000000000043185c in execute_simple_command ()
#15 0x0000000000432343 in execute_command_internal ()
#16 0x0000000000433d2e in execute_command ()
#17 0x0000000000432ba3 in execute_command_internal ()
#18 0x0000000000433d2e in execute_command ()
#19 0x000000000041e365 in reader_loop ()
#20 0x000000000041c9ce in main ()

It seems that there are still some locks which can not be released between different processes.

@networkimprov
Copy link

@networkimprov networkimprov commented Feb 4, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants