Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Not close accept conn when epoll failed #34392

Closed
woodliu opened this issue Sep 19, 2019 · 2 comments
Closed

net: Not close accept conn when epoll failed #34392

woodliu opened this issue Sep 19, 2019 · 2 comments

Comments

@woodliu
Copy link

@woodliu woodliu commented Sep 19, 2019

What version of Go are you using (go version)?

go version go1.12.9 linux/amd64

$ go version

Does this issue reproduce with the latest release?

Code review

What operating system and processor architecture are you using (go env)?

[root@test-master1 ~]# go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/root/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build784496647=/tmp/go-build -gno-record-gcc-switches"

go env Output
$ go env

What did you do?

Code review

What did you expect to see?

In the code below(net/fd_unix.go), when netfd.init() return err not nil, it will close listen socket, but at this time, fd.pfd.Accept() has established a connection, it has nowhere to close the connection.

func (fd *netFD) accept() (netfd *netFD, err error) {
	d, rsa, errcall, err := fd.pfd.Accept()
	if err != nil {
		if errcall != "" {
			err = wrapSyscallError(errcall, err)
		}
		return nil, err
	}

	if netfd, err = newFD(d, fd.family, fd.sotype, fd.net); err != nil {
		poll.CloseFunc(d)
		return nil, err
	}
	if err = netfd.init(); err != nil {
		fd.Close()
		return nil, err
	}
	lsa, _ := syscall.Getsockname(netfd.pfd.Sysfd)
	netfd.setAddr(netfd.addrFunc()(lsa), netfd.addrFunc()(rsa))
	return netfd, nil
}

What did you see instead?

I test the case, step as below:
1:build the code below

package main

import (
	"net"
	"fmt"
	"bytes"
	"runtime"
	"strconv"
)

func GetGID() uint64 {
	b := make([]byte, 64)
	b = b[:runtime.Stack(b, false)]
	b = bytes.TrimPrefix(b, []byte("goroutine "))
	b = b[:bytes.IndexByte(b, ' ')]
	n, _ := strconv.ParseUint(string(b), 10, 64)
	return n
}

func Handler(conn net.Conn){

	fmt.Println("connection is connected from ...",conn.RemoteAddr().String())

	buf := make([]byte,1024)
	for{
		lenght, err := conn.Read(buf)
		if(err !=nil){
			fmt.Println("close conn,err=",err)
			conn.Close()
			break
		}
		if lenght > 0{
			buf[lenght]=0
		}
		fmt.Println("read gid = ",GetGID())
		fmt.Println("Rec[",conn.RemoteAddr().String(),"] Say :" ,string(buf[0:lenght]))
		go func(){
			conn.Write(buf)
			fmt.Println("write gid = ",GetGID())
		}()
	}
}


func main(){
	service:=":19090"
	tcpAddr, _ := net.ResolveTCPAddr("tcp4", service)
	l,_ := net.ListenTCP("tcp",tcpAddr)
	for{
		conn,err := l.Accept()
		if err != nil{
			continue
		}

		go Handler(conn)
	}
}

2:gdb test
3:gdb# b fd_unix.go:250 //This is the code if err = netfd.init(); err != nil in accept function
4: telnet 127.0.0.1 19090
5: step into function net.init->poll.Init->poll.init
6: after run runtime_pollOpen, set errno=1 manually in gdb

func (pd *pollDesc) init(fd *FD) error {
	serverInit.Do(runtime_pollServerInit)
	ctx, errno := runtime_pollOpen(uintptr(fd.Sysfd))
	if errno != 0 {
		if ctx != 0 {
			runtime_pollUnblock(ctx)
			runtime_pollClose(ctx)
		}
		return syscall.Errno(errno)
	}
	pd.runtimeCtx = ctx
	return nil
}

7: gdb# c
At this time, when run netstat it shows below, the server stuck in CLOSE_WAIT, and the accept connection still there.

# netstat -ntp|grep 127.0.0.1
tcp        0      0 127.0.0.1:47958         127.0.0.1:19090         FIN_WAIT2   -
tcp6       1      0 127.0.0.1:19090         127.0.0.1:47958         CLOSE_WAIT  46521/test

And when telnet again, it shows below

# telnet 127.0.0.1 19090
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused

I try to add the code below, it will close the accept connection, but the server will be in TIME_WAIT, because there is no listen socket, it can't establish new connection. So when telnet ,it refused.

        if err = netfd.init(); err != nil {
                fd.Close()
                **poll.CloseFunc(d)**
                return nil, err
        }

I think the code should like below.

        if err = netfd.init(); err != nil {
                **poll.CloseFunc(d)**
                return nil, err
        }

When start new start, it will establish a new connection

# netstat -ntp|grep 127.0.0.1
tcp        0      0 127.0.0.1:48506         127.0.0.1:19090         ESTABLISHED 53559/telnet
tcp6       0      0 127.0.0.1:19090         127.0.0.1:48496         TIME_WAIT   -
tcp6       0      0 127.0.0.1:19090         127.0.0.1:48506         ESTABLISHED 53464/test

I think this is should be the notmal way

@agnivade agnivade changed the title runtime:Not close accept conn when epoll failed net: Not close accept conn when epoll failed Sep 19, 2019
@gopherbot
Copy link

@gopherbot gopherbot commented Sep 21, 2019

Change https://golang.org/cl/196778 mentions this issue: net: close correct file descriptor when netpoll registration fails

@gopherbot gopherbot closed this in 361ab73 Sep 23, 2019
@lcdbin
Copy link

@lcdbin lcdbin commented Mar 5, 2020

to anyone who cares this bug...yesterday I found one of online servers listener disappeared, and tried some methods to find out the reason, cause our server go code can't be the killer.. finnaly I used gdb watchpoint, caught the goroutine stack :

go version 1.12.6
0 0x00000000005374d5 in syscall.Syscall
at /usr/local/go/src/syscall/asm_linux_amd64.s:19
1 0x00000000005347f4 in syscall.Close
at /usr/local/go/src/syscall/zsyscall_linux_amd64.go:310
2 0x0000000000559622 in internal/poll.(*FD).destroy
at /usr/local/go/src/internal/poll/fd_unix.go:78
3 0x000000000055837b in internal/poll.(*FD).decref
at /usr/local/go/src/internal/poll/fd_mutex.go:213
4 0x0000000000559709 in internal/poll.(*FD).Close
at /usr/local/go/src/internal/poll/fd_unix.go:100
5 0x00000000005e29f1 in net.(*netFD).Close
at /usr/local/go/src/net/fd_unix.go:184
6 0x00000000005e38db in net.(*netFD).accept
at /usr/local/go/src/net/fd_unix.go:251
7 0x000000000061317e in net.(*TCPListener).accept
at /usr/local/go/src/net/tcpsock_posix.go:139

then found this bug...

(dlv) p netfd
net.netFD {
pfd: internal/poll.FD {
fdmu: (
"internal/poll.fdMutex")(0xc00adf0000),
Sysfd: 995,

net.netFD {
pfd: internal/poll.FD {
fdmu: (
"internal/poll.fdMutex")(0xc000166f80),
Sysfd: -1,

(dlv) p err
error(syscall.Errno) EBADF (9)

So I changed the code from fd.Close -> netfd.Close

I mean this bug could be reproduced in production environment,and the error returned by epoll_ctl (EBADF) is impossible I think .... both epfd and newFD.Sysfd look ok, but I'll post more information here if i solve this..

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants