Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: deadlock in runtime_pollWait #30957

Closed
dieselburner opened this issue Mar 20, 2019 · 7 comments
Closed

runtime: deadlock in runtime_pollWait #30957

dieselburner opened this issue Mar 20, 2019 · 7 comments
Milestone

Comments

@dieselburner
Copy link

@dieselburner dieselburner commented Mar 20, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="arm"
GOBIN=""
GOCACHE="/home/andrejs/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/andrejs/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
GOARM="7"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/data/dev/andrejs-tools/imud/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build038904013=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Following function executes in a TCP connection handler. This code is triggered every 10ms by a ticker (time.NewTicker(10 * time.Millisecond)).

func get_imu_data(f *os.File) (data, error) {

	var d data

	err := f.SetDeadline(time.Now().Add(time.Millisecond))
	if err != nil {
		return d, err
	}

	b := make([]byte, 6)
	n, err := f.Read(b)
	if err != nil {
		return d, err
	}
	if n != 6 {
		return d, errors.New("Read error: bad read length")
	}

	r := bytes.NewReader(b)
	err = binary.Read(r, binary.LittleEndian, &d)
	if err != nil {
		return d, err
	}

	return d, nil
}

What did you expect to see?

No deadlock.

What did you see instead?

Deadlock. Hard to reproduce (~5 times in 2 days of spontaneous testing).

kill -3 <pid> gives following:

# ./imud 
Listening on 0.0.0.0:2948
Connection accepted: 172.16.71.100:48764
SIGQUIT: quit
PC=0x6a8f0 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x312cb4, 0x80, 0x0, 0x0, 0x0, 0x0, 0x19701, 0x0, 0x0, 0x435a0, ...)
        /usr/local/go/src/runtime/sys_linux_arm.s:334 +0x1c
runtime.futexsleep(0x312cb4, 0x0, 0xffffffff, 0xffffffff)
        /usr/local/go/src/runtime/os_linux.go:46 +0x78
runtime.notesleep(0x312cb4)
        /usr/local/go/src/runtime/lock_futex.go:151 +0xc4
runtime.stopm()
        /usr/local/go/src/runtime/proc.go:1936 +0x7c
runtime.findrunnable(0x416000, 0x0)
        /usr/local/go/src/runtime/proc.go:2399 +0x2a0
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2525 +0x1e4
runtime.park_m(0x4009a0)
        /usr/local/go/src/runtime/proc.go:2605 +0x5c
runtime.mcall(0x3129f8)
        /usr/local/go/src/runtime/asm_arm.s:289 +0x5c

goroutine 1 [IO wait, 15 minutes]:
internal/poll.runtime_pollWait(0xa6c4beb8, 0x72, 0x2094f8)
        /usr/local/go/src/runtime/netpoll.go:182 +0x4c
internal/poll.(*pollDesc).wait(0x474014, 0x72, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x88
internal/poll.(*pollDesc).waitRead(0x474014, 0xffffff00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 +0x3c
internal/poll.(*FD).Accept(0x474000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:384 +0x2f4
net.(*netFD).accept(0x474000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/fd_unix.go:238 +0x5c
net.(*TCPListener).accept(0x40e440, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x50
net.(*TCPListener).Accept(0x40e440, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock.go:260 +0x70
main.main()
        /data/dev/andrejs-tools/imud/main.go:165 +0xb6c

goroutine 6 [runnable]:
internal/poll.runtime_pollWait(0xa6c4bfc0, 0x72, 0xb)
        /usr/local/go/src/runtime/netpoll.go:182 +0x4c
internal/poll.(*pollDesc).wait(0x438594, 0x72, 0x209401, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x88
internal/poll.(*pollDesc).waitRead(0x438594, 0x4bb201, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 +0x3c
internal/poll.(*FD).Read(0x438580, 0x4bb2a6, 0x6, 0x6, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:169 +0x320
os.(*File).read(0x40e368, 0x4bb2a6, 0x6, 0x6, 0x0, 0x0, 0x0)
        /usr/local/go/src/os/file_unix.go:263 +0x64
os.(*File).Read(0x40e368, 0x4bb2a6, 0x6, 0x6, 0x0, 0x0, 0x0)
        /usr/local/go/src/os/file.go:108 +0xb0
main.get_imu_data(0x40e368, 0x0, 0x0, 0x0, 0x0)
        /data/dev/andrejs-tools/imud/main.go:378 +0x15c
main.prepare_message(0x0, 0x0, 0x0, 0x0)
        /data/dev/andrejs-tools/imud/main.go:330 +0x48
main.connection_handler(0x20a7e8, 0x40e450)
        /data/dev/andrejs-tools/imud/main.go:300 +0x164
created by main.main
        /data/dev/andrejs-tools/imud/main.go:170 +0xc94

trap    0x0
error   0x0
oldmask 0x0
r0      0x312cb4
r1      0x80
r2      0x0
r3      0x0
r4      0x0
r5      0x0
r6      0x0
r7      0xf0
r8      0x7
r9      0x0
r10     0x3129f8
fp      0x7
ip      0x448c6d
sp      0xbe9d8cc0
lr      0x3b6e0
pc      0x6a8f0
cpsr    0x20000010
fault   0x0
@ianlancetaylor ianlancetaylor changed the title Deadlock in runtime_pollWait runtime: deadlock in runtime_pollWait Mar 20, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.13 milestone Mar 20, 2019
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 20, 2019

What is GOMAXPROCS on your system?

Can you share a complete program that demonstrates the problem?

@dieselburner

This comment has been minimized.

Copy link
Author

@dieselburner dieselburner commented Mar 20, 2019

Hi Ian,

As you can see from go env, I'm cross-compiling. Target is a single core ARM CPU.

I will prepare a minimal working example, and will try to reproduce this on my workstation.

@dieselburner

This comment has been minimized.

Copy link
Author

@dieselburner dieselburner commented Mar 22, 2019

@ianlancetaylor I believe I have made a code that has 100% reproducibility on my machine. Deadlock is instant.

Code:

package main

import (
	"fmt"
	"net"
	"os"
	"runtime"
	"time"
)

var tty *os.File

func main() {

	var err error

	// Open TTY

	tty, err = os.Open("/dev/tty")
	if err != nil {
		fmt.Println("Error opening TTY:", err.Error())
		os.Exit(1)
	}
	defer tty.Close()

	// Listen for incoming connections

	l, err := net.Listen("tcp", "0.0.0.0:2019")
	if err != nil {
		fmt.Println("Error listening:", err.Error())
		os.Exit(1)
	}

	fmt.Println("GOMAXPROCS =", runtime.GOMAXPROCS(0))

	// Close the listener when the application closes

	defer l.Close()
	fmt.Println("Listening on port 2019")

	// Listen for an incoming connection

	for {
		conn, err := l.Accept()
		if err != nil {
			fmt.Println("Error accepting:", err.Error())
			os.Exit(1)
		}
		go connection_handler(conn)
	}
}

func connection_handler(conn net.Conn) {

	fmt.Println("Connection accepted:", conn.RemoteAddr())

	// Close connection on completion

	defer conn.Close()

	// Use ticker to send data at fixed intervals

	ticker := time.NewTicker(10 * time.Millisecond)
	defer ticker.Stop()

	// Connection loop

	for {
		// Read data

		msg, err := read_tty(tty)
		if err != nil {
			if os.IsTimeout(err) {
				continue
			}
			fmt.Println("Error preparing message:", err.Error())
			break
		}

		// Send data

		_, err = conn.Write([]byte(msg))
		if err != nil {
			break
		}

		fmt.Print(".")

		// Wait for ticket trigger

		<-ticker.C
	}

	// Done

	fmt.Println("Connection closed:", conn.RemoteAddr())
}

func read_tty(f *os.File) ([]byte, error) {

	// Set tiny read timeout

	err := f.SetDeadline(time.Now().Add(time.Millisecond))
	if err != nil {
		return nil, err
	}

	// Read few bytes

	data := make([]byte, 10)

	_, err = f.Read(data)
	if err != nil {
		return nil, err
	}

	// Done

	return data, nil
}

Instructions:

Terminal 1:

$ go build
$ ./pollwait 
GOMAXPROCS = 1
Listening on port 2019
Connection accepted: 127.0.0.1:37586

Terminal 2:

nc 127.0.0.1 2019

This results to immediate deadlock. Killing application from terminal 3 with kill -3 <pid> gives following:

SIGQUIT: quit
PC=0x458da1 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x605728, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7fff0ba8f4d8, 0x40b0d1, ...)
	/usr/local/go/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0x605728, 0x7fff00000000, 0xffffffffffffffff)
	/usr/local/go/src/runtime/os_linux.go:46 +0x4b
runtime.notesleep(0x605728)
	/usr/local/go/src/runtime/lock_futex.go:151 +0xa1
runtime.stopm()
	/usr/local/go/src/runtime/proc.go:1936 +0xc1
runtime.findrunnable(0xc000022000, 0x0)
	/usr/local/go/src/runtime/proc.go:2399 +0x54a
runtime.schedule()
	/usr/local/go/src/runtime/proc.go:2525 +0x21c
runtime.park_m(0xc000001500)
	/usr/local/go/src/runtime/proc.go:2605 +0xa1
runtime.mcall(0x0)
	/usr/local/go/src/runtime/asm_amd64.s:299 +0x5b

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fed68f51e38, 0x72, 0x0)
	/usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc00007c018, 0x72, 0x0, 0x0, 0x51e295)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc00007c000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:384 +0x1ba
net.(*netFD).accept(0xc00007c000, 0xc000010001, 0x0, 0x0)
	/usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc000010040, 0xc000047e80, 0xc000047e88, 0x10)
	/usr/local/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc000010040, 0x525ed0, 0x541780, 0xc000010050, 0x0)
	/usr/local/go/src/net/tcpsock.go:260 +0x48
main.main()
	/home/test/pollwait/pollwait.go:45 +0x2a6

goroutine 5 [runnable]:
internal/poll.runtime_pollWait(0x7fed68f51f08, 0x72, 0xffffffffffffffff)
	/usr/local/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc000056198, 0x72, 0x1, 0xa, 0xffffffffffffffff)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000056180, 0xc0000ee540, 0xa, 0xa, 0x0, 0x0, 0x0)
	/usr/local/go/src/internal/poll/fd_unix.go:169 +0x19b
os.(*File).read(...)
	/usr/local/go/src/os/file_unix.go:263
os.(*File).Read(0xc000010030, 0xc0000ee540, 0xa, 0xa, 0x0, 0x0, 0x604dc0)
	/usr/local/go/src/os/file.go:108 +0x70
main.read_tty(0xc000010030, 0xc0000ed080, 0x1, 0x0, 0x53fc00, 0xc0000ed080)
	/home/test/pollwait/pollwait.go:113 +0xc5
main.connection_handler(0x541780, 0xc000010050)
	/home/test/pollwait/pollwait.go:72 +0x157
created by main.main
	/home/test/pollwait/pollwait.go:50 +0x292

rax    0xca
rbx    0x6055e0
rcx    0x458da3
rdx    0x0
rdi    0x605728
rsi    0x80
rbp    0x7fff0ba8f4a0
rsp    0x7fff0ba8f458
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x55
r13    0x54
r14    0x200
r15    0x0
rip    0x458da1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

$ go env:

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/test/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/test/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build981214745=/tmp/go-build -gno-record-gcc-switches"
@bmartynov

This comment has been minimized.

Copy link

@bmartynov bmartynov commented May 8, 2019

i have same problem.
@dieselburner repro works in my environment.

➜  ~ go version                               
go version go1.12.5 linux/amd64
➜  ~ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/user/go"
GOPROXY=""
GORACE=""
GOROOT="/home/user/opt/go/go"
GOTMPDIR=""
GOTOOLDIR="/home/user/opt/go/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build475130163=/tmp/go-build -gno-record-gcc-switches"
trace
GOMAXPROCS = 8
Listening on port 2019
Connection accepted: 127.0.0.1:36608
SIGQUIT: quit
PC=0x458d63 m=0 sigcode=0

goroutine 50 [syscall]:
runtime.notetsleepg(0x608a80, 0xf013e, 0x1414)
	/home/user/opt/go/go/src/runtime/lock_futex.go:227 +0x34 fp=0xc0000e6760 sp=0xc0000e6730 pc=0x40b3a4
runtime.timerproc(0x608a60)
	/home/user/opt/go/go/src/runtime/time.go:311 +0x2ea fp=0xc0000e67d8 sp=0xc0000e6760 pc=0x44891a
runtime.goexit()
	/home/user/opt/go/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000e67e0 sp=0xc0000e67d8 pc=0x456f01
created by runtime.(*timersBucket).addtimerLocked
	/home/user/opt/go/go/src/runtime/time.go:169 +0x10e

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fb2f64fce38, 0x72, 0x0)
	/home/user/opt/go/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc0000a4018, 0x72, 0x0, 0x0, 0x51e296)
	/home/user/opt/go/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/home/user/opt/go/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc0000a4000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/user/opt/go/go/src/internal/poll/fd_unix.go:384 +0x1ba
net.(*netFD).accept(0xc0000a4000, 0xc0000c0001, 0x0, 0x0)
	/home/user/opt/go/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc000010038, 0xc000057e80, 0xc000057e88, 0x10)
	/home/user/opt/go/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc000010038, 0x525f30, 0x541800, 0xc0000c0000, 0x0)
	/home/user/opt/go/go/src/net/tcpsock.go:260 +0x48
main.main()
	/home/user/work/go/repro/repro.go:44 +0x2a6

goroutine 18 [IO wait]:
internal/poll.runtime_pollWait(0x7fb2f64fcf08, 0x72, 0xffffffffffffffff)
	/home/user/opt/go/go/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc00006c198, 0x72, 0x1, 0xa, 0xffffffffffffffff)
	/home/user/opt/go/go/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/home/user/opt/go/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00006c180, 0xc0001b6250, 0xa, 0xa, 0x0, 0x0, 0x0)
	/home/user/opt/go/go/src/internal/poll/fd_unix.go:169 +0x19b
os.(*File).read(...)
	/home/user/opt/go/go/src/os/file_unix.go:263
os.(*File).Read(0xc000010028, 0xc0001b6250, 0xa, 0xa, 0x0, 0x0, 0x605de0)
	/home/user/opt/go/go/src/os/file.go:108 +0x70
main.read_tty(0xc000010028, 0xc0001b4b10, 0x1, 0x0, 0x53fc80, 0xc0001b4b10)
	/home/user/work/go/repro/repro.go:112 +0xc5
main.connection_handler(0x541800, 0xc0000c0000)
	/home/user/work/go/repro/repro.go:71 +0x157
created by main.main
	/home/user/work/go/repro/repro.go:49 +0x292

rax    0xfffffffffffffffc
rbx    0xf013e
rcx    0x458d63
rdx    0x0
rdi    0x608a80
rsi    0x80
rbp    0xc0000e66e8
rsp    0xc0000e66a0
r8     0x0
r9     0x0
r10    0xc0000e66d8
r11    0x202
r12    0x26
r13    0x25
r14    0x200
r15    0x0
rip    0x458d63
rflags 0x202
cs     0x33
fs     0x0
gs     0x0
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 8, 2019

Thanks for the test case. The problem doesn't reproduce for me on amd64, and I'm trying to see if I can recreate it on ARM. In the meantime, can you run the Go program with GODEBUG=schedtrace=1,scheddetail=1 in the environment? Thanks.

@dieselburner

This comment has been minimized.

Copy link
Author

@dieselburner dieselburner commented May 8, 2019

That's weird. I have updated to latest 1.12.5 release and cannot reproduce the issue anymore. GOMAXPROCS defaults now to 8, tried setting it to 1 as well.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 8, 2019

I don't know what change might have fixed the issue, but if we can't reproduce this I will close the issue. Please comment if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.