Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: infinite loop in lockextra on linux/amd64 #42207

Closed
SergeyLysanov opened this issue Oct 26, 2020 · 12 comments
Closed

runtime: infinite loop in lockextra on linux/amd64 #42207

SergeyLysanov opened this issue Oct 26, 2020 · 12 comments
Labels
Milestone

Comments

@SergeyLysanov
Copy link

@SergeyLysanov SergeyLysanov commented Oct 26, 2020

What version of Go are you using (go version)?

$ go version
go version go1.12.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/root/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build326439240=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Go program uses cgo. C part of program creates 2 threads by pthread_create: event_loop and watchdog(for monitoring hanged event loop). C code doesn't install any signal handlers and doesn't change sigmask.
C-code calls go-code for logging.

Simple C code of watchdog thread that uses popen/pclose and generates SIGCHLD signals:

static void dump_os_state(void)
{
	FILE *fp = popen("ps axv", "re");
	if (fp) {
		dump_to_log(fp); // call go code for logging
		pclose(fp);
	}
}

static void do_monitor(struct pcs_evloop *evloop, struct pcs_watchdog *proc_wd, abs_time_t now)
{
	// if event loop thread was inactive for a long period than dump stack and list of processes into log
	dump_kstack(evloop);
	dump_os_state();
}

static pcs_thread_ret_t watchdog_thread(void * arg)
{
	struct pcs_process *proc = arg;
	struct pcs_watchdog *wd = proc->evloops[0].wd;

	while (wd->run) {
		u32 i;
		for (i = 0; i < proc->nr_evloops; i++)
			do_monitor(&proc->evloops[i], wd, last);
	}

	return 0;
}

Backtrace of deadlock:

(gdb) bt
#0  runtime.osyield () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:627
#1  0x000000000043adf4 in runtime.lockextra (nilokay=false, ~r1=) at /usr/lib/golang/src/runtime/proc.go:1754
#2  0x000000000043a88d in runtime.needm (x=) at /usr/lib/golang/src/runtime/proc.go:1566
#3  0x000000000044ba57 in runtime.badsignal (sig=17, c=0x7f844d3de6d8) at /usr/lib/golang/src/runtime/signal_unix.go:594
#4  0x000000000044b2d0 in runtime.sigtrampgo (sig=, info=0x7f844d3de8b0, ctx=0x7f844d3de780) at /usr/lib/golang/src/runtime/signal_unix.go:299
#5  0x0000000000466c23 in runtime.sigtramp () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:357
#6  signal handler called
#7  runtime.muintptr.ptr (mp=824635164416, ~r0=) at /usr/lib/golang/src/runtime/runtime2.go:241
#8  0x000000000043a8bc in runtime.needm (x=) at /usr/lib/golang/src/runtime/proc.go:1577
#9  0x0000000000464887 in runtime.cgocallback_gofunc () at /usr/lib/golang/src/runtime/asm_amd64.s:712
#10 0x000000000046483e in runtime.cgocallback () at /usr/lib/golang/src/runtime/asm_amd64.s:684
#11 0x000000000065efd6 in _cgoexp_cd95bf089ac7_goLogHandler (a=0x7f844d3df590, n=24, ctxt=0) at _cgo_gotypes.go:294
#12 0x00000000005e4317 in crosscall2 () at /usr/lib/golang/src/runtime/cgo/asm_amd64.s:59
#13 0x0000000000af2ae0 in goLogHandler (p0=1295905936, p1=0x7f844d3df710 "  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND\n", p2=64) at _cgo_export.c:38
#14 0x00007f8457c10080 in log_print_buf (level=level@entry=0, buf=buf@entry=0x7f844d3df710 "  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND\n", len=64, skiplock=skiplock@entry=0) at log.c:597
#15 0x00007f8457c120a7 in pcs_valog (level=0, prefix=prefix@entry=0x0, fmt=fmt@entry=0x7f8457c37cd0 "%s", va=va@entry=0x7f844d3dfb60) at log.c:978
#16 0x00007f8457c122b3 in pcs_log (level=level@entry=0, fmt=fmt@entry=0x7f8457c37cd0 "%s") at log.c:999
#17 0x00007f8457c176e3 in dump_kstack (evloop=) at pcs_watchdog.c:82
#18 0x00007f8457c17821 in do_monitor (evloop=) at pcs_watchdog.c:101
#19 watchdog_thread (arg=0x7f84400008e0) at pcs_watchdog.c:116
#20 0x00007f84575afe65 in start_thread (arg=0x7f844d3e0700) at pthread_create.c:307
#21 0x00007f84572d888d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

What did you expect to see?

Successful execution of cgo callback.

What did you see instead?

The execution got stuck into an infinite loop in runtime.lockextra.
What happened:

  1. C-code wanted to log string and called go-code
  2. cgocallback called needm and took lock in lockextra
  3. signal handler called before unlockextra in needm function (received SIGCHLD from C thread from popen)
  4. badsignal from sigtrampgo also called needm function and got stuck into infinite loop in lockextra

Looks like there is a deadlock when signal raised in section between lockextra/unlockextra.
So is it a valid bug or I've violated some cgo rules? I've found #34391 issue but looks like root cause is different.

@ALTree
Copy link
Member

@ALTree ALTree commented Oct 26, 2020

Go1.12 is no longer supported. Does this happen in Go1.15?

@SergeyLysanov
Copy link
Author

@SergeyLysanov SergeyLysanov commented Oct 26, 2020

Go1.12 is no longer supported. Does this happen in Go1.15?

It's hard to reproduce and I can't reproduce deadlock one more time either in 1.12 or 1.15.

needm code is the same in 1.15 and 1.12 and I suppose it's also relevant for Go1.15.
cgocallback doesn't block signals for current thread and what if signal will be raised between lockextra and unlockextra:

func needm(x byte) {
	if (iscgo || GOOS == "windows") && !cgoHasExtraM {
		// Can happen if C/C++ code calls Go from a global ctor.
		// Can also happen on Windows if a global ctor uses a
		// callback created by syscall.NewCallback. See issue #6751
		// for details.
		//
		// Can not throw, because scheduler is not initialized yet.
		write(2, unsafe.Pointer(&earlycgocallback[0]), int32(len(earlycgocallback)))
		exit(1)
	}

	// Lock extra list, take head, unlock popped list.
	// nilokay=false is safe here because of the invariant above,
	// that the extra list always contains or will soon contain
	// at least one m.
	mp := lockextra(false)

	// Set needextram when we've just emptied the list,
	// so that the eventual call into cgocallbackg will
	// allocate a new m for the extra list. We delay the
	// allocation until then so that it can be done
	// after exitsyscall makes sure it is okay to be
	// running at all (that is, there's no garbage collection
	// running right now).
	mp.needextram = mp.schedlink == 0
	extraMCount--
	unlockextra(mp.schedlink.ptr()) // <--- signal raised before unlockextra
...

sigtrampgo will call badsignal since g==nil in cgocallback. badsignal will call needm and we get deadlock:

func sigtrampgo(sig uint32, info *siginfo, ctx unsafe.Pointer) {
	if sigfwdgo(sig, info, ctx) {
		return
	}
	c := &sigctxt{info, ctx}
	g := sigFetchG(c)
	setg(g)
	if g == nil {
		if sig == _SIGPROF {
			sigprofNonGoPC(c.sigpc())
			return
		}
		if sig == sigPreempt && preemptMSupported && debug.asyncpreemptoff == 0 {
			// This is probably a signal from preemptM sent
			// while executing Go code but received while
			// executing non-Go code.
			// We got past sigfwdgo, so we know that there is
			// no non-Go signal handler for sigPreempt.
			// The default behavior for sigPreempt is to ignore
			// the signal, so badsignal will be a no-op anyway.
			return
		}
		c.fixsigcode(sig)
		badsignal(uintptr(sig), c) // <--- here we will get deadlock
		return
	}
...

P.S. it's code from Go1.15 runtime

@SergeyLysanov
Copy link
Author

@SergeyLysanov SergeyLysanov commented Oct 26, 2020

UPD

I've written a simple cgo program that reproduces this bug in Go1.15.3

$ go version
go version go1.15.3 linux/amd64
go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY="git.acronis.com"
GONOSUMDB="git.acronis.com"
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE="git.acronis.com"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build195996747=/tmp/go-build -gno-record-gcc-switches"

Program consists of 3 files: main.go, thread.h, thread.c.

main.go

package main

/*
#cgo CFLAGS: -g -Wall
#include <stdlib.h>
#include "thread.h"
*/
import "C"
import (
	"fmt"
	"os"
	"os/signal"
	"syscall"
)

//export goCallbackHandler
func goCallbackHandler(s *C.char) {
	fmt.Println(C.GoString(s))
}

func main() {

	for i := 0; i < 5; i++ {
		C.start_pthread()
	}

	sigsch := make(chan os.Signal, 1)
	donech := make(chan bool, 1)
	signal.Notify(sigsch, syscall.SIGINT, syscall.SIGTERM)

	go func() {
		<-sigsch
		donech <- true
	}()

	<-donech
}

thread.h

#ifndef _THREAD_H
#define _TREAD_H

extern void goCallbackHandler(char*);
int start_pthread(void);

#endif

thread.c

#include "thread.h"
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>

static pthread_t thr;

static void dump_to_golog(FILE *fp)
{
    char buf[256];
    while (fgets(buf, sizeof(buf) - 1, fp))
    {
        buf[sizeof(buf) - 1] = 0;
        goCallbackHandler(buf);
    }
}

static void *watchdog_thread(void *arg)
{
    while (1)
    {
        FILE *fp = popen("ps axv", "re");
        if (fp)
        {
            dump_to_golog(fp);
            pclose(fp);
        }
        usleep(1000);
    }
    return 0;
}

int start_pthread(void)
{
    pthread_create(&thr, NULL, watchdog_thread, NULL);
    return 0;
}

Deadlock can be reproduced almost immediately:

(gdb) bt
#0  runtime.osyield () at /usr/local/go/src/runtime/sys_linux_amd64.s:677
#1  0x0000000000439445 in runtime.lockextra (nilokay=false, ~r1=) at /usr/local/go/src/runtime/proc.go:1721
#2  0x0000000000438e8f in runtime.needm (x=) at /usr/local/go/src/runtime/proc.go:1535
#3  0x0000000000448fa5 in runtime.badsignal (sig=17, c=0x7f6d95ffa688) at /usr/local/go/src/runtime/signal_unix.go:933
#4  0x00000000004477b9 in runtime.sigtrampgo (sig=17, info=0x7f6d95ffa830, ctx=0x7f6d95ffa700) at /usr/local/go/src/runtime/signal_unix.go:431
#5  0x0000000000467dc3 in runtime.sigtramp () at /usr/local/go/src/runtime/sys_linux_amd64.s:409
#6  
#7  0x0000000000438ea1 in runtime.needm (x=) at /usr/local/go/src/runtime/proc.go:1544
#8  0x0000000000465f17 in runtime.cgocallback_gofunc () at /usr/local/go/src/runtime/asm_amd64.s:733
#9  0x0000000000465ebe in runtime.cgocallback () at /usr/local/go/src/runtime/asm_amd64.s:705
#10 0x000000000049cea7 in _cgoexp_a037f58797fa_goCallbackHandler (a=0x7f6d95ffadd8, n=8, ctxt=0) at _cgo_gotypes.go:72
#11 0x000000000046a077 in crosscall2 () at /usr/local/go/src/runtime/cgo/asm_amd64.s:59
#12 0x000000000049cfaf in goCallbackHandler (s=0x7f6d95fface0 "", s@entry=0x7f6d95ffae00 "   84 ?        S<     0:00      0     0     0     0  0.0 [kpsmoused]\n") at _cgo_export.c:32
#13 0x000000000049d038 in dump_to_golog (fp=) at thread.c:13
#14 watchdog_thread (arg=) at thread.c:22
#15 0x00007f6dc4737e65 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f6dc446088d in clone () from /lib64/libc.so.6
@ALTree
Copy link
Member

@ALTree ALTree commented Oct 27, 2020

Thanks for the reproducer.

cc @ianlancetaylor

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 27, 2020

Thanks for the test case. I can't reproduce the problem on my system, but I think I see the problem.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265759 mentions this issue: runtime: block signals in needm before allocating M

@gopherbot gopherbot closed this in 368c401 Oct 28, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 28, 2020

Change https://golang.org/cl/265778 mentions this issue: runtime: move TestNeedmDeadlock to crash_cgo_test.go

gopherbot pushed a commit that referenced this issue Oct 28, 2020
It requires cgo. Also, skip the test on windows and plan9.

For #42207

Change-Id: I8522773f93bc3f9826506a41a08b86a083262e31
Reviewed-on: https://go-review.googlesource.com/c/go/+/265778
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@SergeyLysanov
Copy link
Author

@SergeyLysanov SergeyLysanov commented Nov 16, 2020

Thanks for quick fix!
What about merging this fix into minor release go1.15.X?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 16, 2020

@gopherbot Please open backport issues.

This bug can cause a deadlock for programs that create threads in C code such that those threads call into Go code, if a signal is received at the wrong time. There is no workaround.

Note that CL 265759 had a bug in the test, and that CL 265778 (a test-only change) is also required.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 16, 2020

Backport issue(s) opened: #42635 (for 1.14), #42636 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@dmitshur dmitshur added this to the Go1.16 milestone Nov 16, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 20, 2020

Change https://golang.org/cl/271847 mentions this issue: [release-branch.go1.15] runtime: block signals in needm before allocating M

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 20, 2020

Change https://golang.org/cl/271848 mentions this issue: [release-branch.go1.14] runtime: block signals in needm before allocating M

gopherbot pushed a commit that referenced this issue Nov 20, 2020
…ting M

Otherwise, if a signal occurs just after we allocated the M,
we can deadlock if the signal handler needs to allocate an M
itself.

For #42207
Fixes #42636

Change-Id: I76f44547f419e8b1c14cbf49bf602c6e645d8c14
Reviewed-on: https://go-review.googlesource.com/c/go/+/265759
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
(cherry picked from commit 368c401)
Reviewed-on: https://go-review.googlesource.com/c/go/+/271847
gopherbot pushed a commit that referenced this issue Nov 20, 2020
…ting M

Otherwise, if a signal occurs just after we allocated the M,
we can deadlock if the signal handler needs to allocate an M
itself.

For #42207
Fixes #42635

Change-Id: I76f44547f419e8b1c14cbf49bf602c6e645d8c14
Reviewed-on: https://go-review.googlesource.com/c/go/+/265759
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
(cherry picked from commit 368c401)
Reviewed-on: https://go-review.googlesource.com/c/go/+/271848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants