Open
Description
What version of Go are you using (go version
)?
$ go version go version go1.18 darwin/amd64
Does this issue reproduce with the latest release?
Yes. This issue is more of a feature request than a bug fix.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/root/.cache/go-build" GOENV="/root/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.18.1" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/go/snap-statsd/go.mod" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3187482189=/tmp/go-build -gno-record-gcc-switches"
What did you do?
My code is running a tcp server and read from connection in a for loop
// c is a connection
r := bufio.NewReader(c)
for {
buf, err := r.ReadBytes('\n')
if err != nil {
_ = c.Close()
return
}
if len(buf) > 0 {
t.payloads <- buf
}
}
The problem is the syscall.read is 50% of the total CPU, see the profiling below
Looking at https://github.com/golang/go/blob/master/src/internal/poll/fd_unix.go#L163-L167
n, err := ignoringEINTRIO(syscall.Read, fd.Sysfd, p)
if err != nil {
n = 0
if err == syscall.EAGAIN && fd.pd.pollable() {
if err = fd.pd.waitRead(fd.isFile); err == nil {
continue
}
}
}
My theory is that on line 163 it calls syscall.read to read the socket, and syscall.read returns error when there is nothing to read. Then on line 167 it calls fd.pd.waitRead() to wait for data. And when data arrives at the connection, the fd is notified and continue to read. I'm seeing two problems here,
- If on line 163 the syscall.Read returns nothing, it unnecessarily does one system call. Why don't we call fd.pd.waitRead() first and wait for data to arrive and then call syscall.Read? That way we will save one time system call.
- In current mechanism, whenever data arrives, we do one time syscall.Read, no matter how large the data is. Can we implement fd.pd.waitRead() to have it wait until some amount of time, or some amount of data accumulates? That way we can use much fewer syscall.Read to read the same amount of data, which should save CPU usage a lot.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo