Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: epoll reader is blocked for a period of time because of runtime.netpollWaiters overflow #60782

Closed
hanchao886 opened this issue Jun 14, 2023 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@hanchao886
Copy link

hanchao886 commented Jun 14, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="off"                                                                                                                                           GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.5"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1611007689=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I verified that runtime.netpollWaiters is increased with each wait of a goroutine on network in an example of handling TCP server:

package main

import (
	"flag"
	"fmt"
	"io"
	"net"
	"os"
	"sync"
	"sync/atomic"
	_ "unsafe"
)

//go:linkname netpollWaiters runtime.netpollWaiters
var netpollWaiters atomic.Uint32

var bufPool = sync.Pool{
	New: func() interface{} {
		buffer := make([]byte, 32<<10)
		return &buffer
	},
}

func copyNet(c net.Conn) error {
	p := bufPool.Get().(*[]byte)
	defer bufPool.Put(p)
	if _, err := io.CopyBuffer(os.Stdout, c, *p); err != nil {
		fmt.Println("copy buffer error")
	}
	return nil
}

func main() {
	netpollWaiters.Store(4294967000) // to quickly reproduce the problem
	listener, err := net.Listen("tcp", "127.0.0.1:7020")
	if err != nil {
		fmt.Println("Listen err:", err)
		return
	}
	defer listener.Close()

	conn, err := listener.Accept()
	if err != nil {
		fmt.Println("Accept err:", err)
		return
	}
	defer conn.Close()
	copyNet(conn)
}

TCP client

package main

import (
	"bytes"
	"fmt"
	"io"
	"net"
	"time"
)

func main() {
	conn, err := net.Dial("tcp", "127.0.0.1:7020")
	if err != nil {
		fmt.Println("Dial err:", err)
		return
	}
	defer conn.Close()

	for {
		begin := time.Now()
		log := begin.Format("2006-01-02 15:04:05.000")
		log += "\n"
		buffer := bytes.NewReader([]byte(log))
		io.Copy(conn, buffer)
		time.Sleep(time.Millisecond * 3)
	}
}

What did you expect to see?

server will not be blocked for a period of time

What did you see instead?

server is blocked for a period of time
we can see that runtime.netpollWaiters become zero.

(dlv) vars runtime.netpollWaiters
runtime.netpollWaiters = sync/atomic.Uint32 {_: sync/atomic.noCopy {}, v: 0}

A previous issue mentioned the same problem: #33624

@ianlancetaylor ianlancetaylor changed the title runtime/epoll: epoll reader is blocked for a period of time because of runtime.netpollWaiters overflow runtime: epoll reader is blocked for a period of time because of runtime.netpollWaiters overflow Jun 15, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 15, 2023
@ianlancetaylor
Copy link
Contributor

@gopherbot Please open backport issues.

This is an unusual but serious problem. A counter is not decremented as expected, and can overflow.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #60832 (for 1.19), #60833 (for 1.20).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/503923 mentions this issue: runtime: decrement netpollWaiters in netpollunblock

@dmitshur dmitshur added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 16, 2023
@dmitshur dmitshur modified the milestones: Go1.21, Go1.22 Jun 16, 2023
@AHNakbari
Copy link

AHNakbari commented Jun 26, 2023

Problem: overflow of variable netpollWaiters that cause the server blocked after period of time!
reason of this problem: in Go libraries -> src/runtime/netpoll.go at line 523 there is line of code that increase the netpollWaiters: atomic.Uint32 each time a goroutine waiting for the network poller. in line 530 netpollWaiters decreased but in your code snippet never called!
how to fix it:

if you insist on using io.CopyBuffer for copying data from the client connection to the server's standard output:
you can modify the server side to limit the number of goroutines that are waiting for the network poller.

package main

import (
	"fmt"
	"io"
	"net"
	"os"
	"sync"
	"sync/atomic"
)

//go:linkname netpollWaiters runtime.netpollWaiters
var netpollWaiters atomic.Uint32

var bufPool = sync.Pool{
	New: func() interface{} {
		buffer := make([]byte, 32<<10)
		return &buffer
	},
}

func copyNet(c net.Conn) error {
	p := bufPool.Get().(*[]byte)
	defer bufPool.Put(p)
	if _, err := io.CopyBuffer(os.Stdout, c, *p); err != nil {
		fmt.Println("copy buffer error:", err)
	}
	return nil
}

func main() {
	netpollWaiters.Store(4294967000) // There is no problem now!
	listener, err := net.Listen("tcp", "127.0.0.1:7020")
	if err != nil {
		fmt.Println("Listen err:", err)
		return
	}
	defer func(listener net.Listener) {
		err := listener.Close()
		if err != nil {

		}
	}(listener)

	// Here we set the limit
	poolSize := 100
	goroutinePool := make(chan struct{}, poolSize)

	for {
		conn, err := listener.Accept()
		if err != nil {
			fmt.Println("Accept err:", err)
			return
		}
		goroutinePool <- struct{}{}

		go func() {
			err := copyNet(conn)
			if err != nil {
				return
			}
			err = conn.Close()
			if err != nil {
				return
			}
			<-goroutinePool
		}()
	}
}

now test it. it doesn't overflow afterwards the server will not blocked as long as the client is sending data.

If you are satisfied to use another way:
you can modify the server side to use a different approach for copying data from the client connection to the server's standard output. Instead of using io.CopyBuffer with os.Stdout, you can manually read from the client connection and write to os.Stdout.

package main

import (
	"fmt"
	"io"
	"net"
	"os"
	"sync/atomic"
)

//go:linkname netpollWaiters runtime.netpollWaiters
var netpollWaiters atomic.Uint32

func copyNet(c net.Conn) error {
	buffer := make([]byte, 32<<10)
	defer func(c net.Conn) {
		err := c.Close()
		if err != nil {
			
		}
	}(c)
	for {
		n, err := c.Read(buffer)
		if err != nil {
			if err != io.EOF {
				fmt.Println("Read error:", err)
			}
			break
		}

		if _, err := os.Stdout.Write(buffer[:n]); err != nil {
			fmt.Println("Write error:", err)
			break
		}
	}
	return nil
}

func main() {
	netpollWaiters.Store(4294967000) // There is no problem now!
	listener, err := net.Listen("tcp", "127.0.0.1:7020")
	if err != nil {
		fmt.Println("Listen err:", err)
		return
	}
	defer func(listener net.Listener) {
		err := listener.Close()
		if err != nil {
			
		}
	}(listener)
	conn, err := listener.Accept()
	if err != nil {
		fmt.Println("Accept err:", err)
		return
	}
	defer func(conn net.Conn) {
		err := conn.Close()
		if err != nil {
			
		}
	}(conn)
	err = copyNet(conn)
	if err != nil {
		return 
	}
}

again test it. it works and the server will never be blocked!
in both codes I handled errors out of habits!

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/511356 mentions this issue: Revert "runtime: decrement netpollWaiters in netpollunblock"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants