Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: default random write scheduler uses too much cpu when there are a lot of streams #21200

Closed
goby opened this issue Jul 28, 2017 · 3 comments

Comments

@goby
Copy link

@goby goby commented Jul 28, 2017

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/goby/dev"
GORACE=""
GOROOT="/home/goby/go"
GOTOOLDIR="/home/goby/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build890478411=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

When we run kubernetes apiserver, we simluate 200 clients in one exe, and we have about 300 exe to simulate 60k clients. So there're many streams in one tcp connection.

This is my steps:

  • Setup a https server like kube-apiserver, use tls1.3, enabled http2
  • Run clients, and create tons of streams, and most of them are idle -- no data transfered most of time.

Here's some clues we found:

# curl localhost:16080/debug/pprof/heap -s >mem.prof
# go tool pprof bin mem.prof
ROUTINE ======================== net/http.(*http2randomWriteScheduler).Push in /home/goby/go/src/net/http/h2_bundle.go
  507.52MB   682.79MB (flat, cum)  5.78% of Total
         .          .   8393:}
         .          .   8394:
         .          .   8395:func (ws *http2randomWriteScheduler) Push(wr http2FrameWriteRequest) {
         .          .   8396:   id := wr.StreamID()
         .          .   8397:   if id == 0 {
         .          .   8398:           ws.zero.push(wr)
         .          .   8399:           return
         .          .   8400:   }
         .          .   8401:   q, ok := ws.sq[id]
         .          .   8402:   if !ok {
  206.51MB   206.51MB   8403:           q = ws.queuePool.get()
         .   175.27MB   8404:           ws.sq[id] = q
         .          .   8405:   }
  301.02MB   301.02MB   8406:   q.push(wr)
         .          .   8407:}
         .          .   8408:
         .          .   8409:func (ws *http2randomWriteScheduler) Pop() (http2FrameWriteRequest, bool) {
         .          .   8410:
         .          .   8411:   if !ws.zero.empty() {

the Push of http2randomWriteScheduler use 5.78% of memory, and about 1%(175.27MB) in ws.sq, so we
guess there're many streams.

And the cpu hotspot:

# capture 30s
# curl localhost:16080/debug/pprof/profile?seconds=30 -s >cpu.prof
# go tool pprof bin cpu.prof
(pprof) list Pop
Total: 1.21mins
ROUTINE ======================== net/http.(*http2randomWriteScheduler).Pop in /home/goby/go/src/net/http/h2_bundle.go
     1.75s        46s (flat, cum) 63.62% of Total
         .          .   8410:
         .          .   8411:   if !ws.zero.empty() {
         .          .   8412:           return ws.zero.shift(), true
         .          .   8413:   }
         .          .   8414:
     400ms     12.63s   8415:   for _, q := range ws.sq {
     1.35s     33.37s   8416:           if wr, ok := q.consume(math.MaxInt32); ok {
         .          .   8417:                   return wr, true
         .          .   8418:           }
         .          .   8419:   }
         .          .   8420:   return http2FrameWriteRequest{}, false
         .          .   8421:}
(pprof) list consume
Total: 1.21mins
ROUTINE ======================== net/http.(*http2writeQueue).consume in /home/goby/go/src/net/http/h2_bundle.go
    32.01s     32.02s (flat, cum) 44.29% of Total
         .          .   7907:
         .          .   7908:// consume consumes up to n bytes from q.s[0]. If the frame is
         .          .   7909:// entirely consumed, it is removed from the queue. If the frame
         .          .   7910:// is partially consumed, the frame is kept with the consumed
         .          .   7911:// bytes removed. Returns true iff any bytes were consumed.
     740ms      740ms   7912:func (q *http2writeQueue) consume(n int32) (http2FrameWriteRequest, bool) {
    30.79s     30.79s   7913:   if len(q.s) == 0 {
     480ms      480ms   7914:           return http2FrameWriteRequest{}, false
         .          .   7915:   }
         .          .   7916:   consumed, rest, numresult := q.s[0].Consume(n)
         .          .   7917:   switch numresult {
         .          .   7918:   case 0:
         .          .   7919:           return http2FrameWriteRequest{}, false
         .          .   7920:   case 1:
         .       10ms   7921:           q.shift()
         .          .   7922:   case 2:
         .          .   7923:           q.s[0] = rest
         .          .   7924:   }
         .          .   7925:   return consumed, true
         .          .   7926:}

It used 44.29% cpu time on http2writeQueue.consume, as the fact, most of the queue is empty.

Here the png generated by the pprof tools, it maybe help:
image

What did you expect to see?

The http2 random write scheduler works fine and uses lower cpu.
The scheduler should find not empty queues as soon as possible or should we have any newer scheduler can be used?

What did you see instead?

The scheduler uses so much cpu, about 45% in my case.

@ALTree ALTree changed the title x/net/http2: default random write scheduler uses too mach cpu when existed a lot of streams x/net/http2: default random write scheduler uses too much cpu when there are a lot of streams Jul 28, 2017
@bradfitz bradfitz added this to the Go1.10 milestone Jul 28, 2017
@tombergan

This comment has been minimized.

Copy link
Contributor

@tombergan tombergan commented Jul 28, 2017

Can you try with Go1.9? We switched to a new scheduler for 1.9 (unfortunately, this is likely to get worse, but I'm curious how much so).

@goby

This comment has been minimized.

Copy link
Author

@goby goby commented Aug 4, 2017

@tombergan sorry for my late reply. I run 4days with 1.9 and 3 days with 1.8, but could not reproduce. I'm trying...

@tombergan

This comment has been minimized.

Copy link
Contributor

@tombergan tombergan commented Aug 7, 2017

Since you're unable to reproduce, I'm going to close under the assumption that it's been fixed. There have been some performance improvements in 1.9. Please reopen if the problem persists.

@tombergan tombergan closed this Aug 7, 2017
@golang golang locked and limited conversation to collaborators Aug 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.