Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: network read using a lot of CPU #52341

Open
vinoblade opened this issue Apr 14, 2022 · 5 comments
Open

runtime: network read using a lot of CPU #52341

vinoblade opened this issue Apr 14, 2022 · 5 comments
Labels
WaitingForInfo

Comments

@vinoblade
Copy link

@vinoblade vinoblade commented Apr 14, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18 darwin/amd64

Does this issue reproduce with the latest release?

Yes. This issue is more of a feature request than a bug fix.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/go/snap-statsd/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3187482189=/tmp/go-build -gno-record-gcc-switches"

What did you do?

My code is running a tcp server and read from connection in a for loop

        // c is a connection
        r := bufio.NewReader(c)
	for {
		buf, err := r.ReadBytes('\n')
		if err != nil {
			_ = c.Close()
			return
		}
		if len(buf) > 0 {
			t.payloads <- buf
		}

	}

The problem is the syscall.read is 50% of the total CPU, see the profiling below
reduce-connection-1

Looking at https://github.com/golang/go/blob/master/src/internal/poll/fd_unix.go#L163-L167

n, err := ignoringEINTRIO(syscall.Read, fd.Sysfd, p)
		if err != nil {
			n = 0
			if err == syscall.EAGAIN && fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					continue
				}
			}
		}

My theory is that on line 163 it calls syscall.read to read the socket, and syscall.read returns error when there is nothing to read. Then on line 167 it calls fd.pd.waitRead() to wait for data. And when data arrives at the connection, the fd is notified and continue to read. I'm seeing two problems here,

  1. If on line 163 the syscall.Read returns nothing, it unnecessarily does one system call. Why don't we call fd.pd.waitRead() first and wait for data to arrive and then call syscall.Read? That way we will save one time system call.
  2. In current mechanism, whenever data arrives, we do one time syscall.Read, no matter how large the data is. Can we implement fd.pd.waitRead() to have it wait until some amount of time, or some amount of data accumulates? That way we can use much fewer syscall.Read to read the same amount of data, which should save CPU usage a lot.
@ianlancetaylor ianlancetaylor changed the title poll: fd_unix Read() seems using a lot of CPU runtime: network read using a lot of CPU Apr 14, 2022
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 14, 2022

If on line 163 the syscall.Read returns nothing, it unnecessarily does one system call. Why don't we call fd.pd.waitRead() first and wait for data to arrive and then call syscall.Read? That way we will save one time system call.

That would force us to use kqueue in level triggered mode which in common cases would lead to a lot of busywork by the poller receiving and ignoring notifications that we don't care about. While this does lead to an extra system call for the case you describe it still on average saves system calls.

In current mechanism, whenever data arrives, we do one time syscall.Read, no matter how large the data is. Can we implement fd.pd.waitRead() to have it wait until some amount of time, or some amount of data accumulates? That way we can use much fewer syscall.Read to read the same amount of data, which should save CPU usage a lot.

Unix kernels do not support this kind of operation, and of course we couldn't use it in general as the Read method has no way to say "don't return until you've read at least N bytes."

I'm not sure how to interpret your profile. You didn't show us the complete program. It's normal for a program that doesn't do anything other than read from the network to spend most of its time reading from the network; what else might it be doing? You also didn't tell us what kind of network traffic it is handling.

It's not clear to me that there is anything we can do here, though I would be happy to hear specific tested suggestions. The Go networking code is optimized for handling many different network connections simultaneously. That does mean that it is not optimized for handling a single network connection; that case works fine but it does use more CPU time than is strictly required.

@ianlancetaylor ianlancetaylor added the WaitingForInfo label Apr 14, 2022
@vinoblade
Copy link
Author

@vinoblade vinoblade commented Apr 14, 2022

Hey thanks for replying! You are right my application is not handling high number of connections. In most cases there would be at most 5 - 6 connections. And the connections are usually very busy, serving more than 5K QPS traffic per connection. And the network is in TCP. I understand Read doesn't have the capability to "preread", but is it possible to have waitRead() do such things?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 14, 2022

is it possible to have waitRead() do such things?

I don't know of a way to do that.

@vinoblade
Copy link
Author

@vinoblade vinoblade commented Apr 14, 2022

Got it. Do you think it makes sense to do some event based implementations, using epoll explicitly etc, to improve CPU usage? My guts say it will not help because Accept and Read already implemented with epoll under the hood on Linux. Just wanna double check with you.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 14, 2022

Yes, since the Go runtime already uses epoll under the hood (or kqueue on Darwin), using it in the main program won't make any difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WaitingForInfo
Projects
None yet
Development

No branches or pull requests

2 participants