New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: steady performance degeneration seen with HTTP/2 over recent releases #25117

Open
abhinavdangeti opened this Issue Apr 27, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@abhinavdangeti
Copy link

abhinavdangeti commented Apr 27, 2018

Hello, I work on the full text search engine at couchbase, and we recently moved our scatter gather protocol to use x/net/http2 instead of net/http.

The go version we currently use is go-1.9.3.

In some testing in house .. I've noticed the query throughput over a full text index take a nose dive as I upgraded the x/net release version used. (These numbers are over a system that runs on centos7)

branch query throughput
release-branch.go1.6 27734 q/s
release-branch.go1.7 17037 q/s
release-branch.go1.8 15139 q/s
release-branch.go1.9 14836 q/s
release-branch.go1.10 9500 q/s
master 8336 q/s

This is how the scatter-gather client is set up in our code using x/net/http2's transport:

transport2 := &http2.Transport{
	TLSClientConfig: clientTLSConfig,
}
cbft.Http2Client = &http.Client{Transport: transport2}

And here's how the server is setup, which also uses the x/net/netutil to set up a LimitListener:

listener, _ := net.Listen("tcp", bindHTTP)
server := &http.Server{Addr: bindHTTP,
    Handler:      routerInUse,
    ReadTimeout:  httpReadTimeout,
    WriteTimeout: httpWriteTimeout
}
...
keepAliveListener := tcpKeepAliveListener{listener.(*net.TCPListener)}
limitListener := netutil.LimitListener(keepAliveListener, httpMaxConnections)
tlsListener := tls.NewListener(limitListener, serverTLSConfig)
...
_ = server.Serve(tlsListener)

I was wondering if there're any known issues around this area, or if there're any additional settings that I'd have to tune in the later branches of x/net.

I also tried writing up a simple server client benchmark unit test, based on your current tests to see if it'd point out anything obvious ..

func BenchmarkServerClient(b *testing.B) {
	b.ReportAllocs()

	const (
		itemSize = 1 << 10
		itemCount = 1000
	)

	st := newServerTester(b, func(w http.ResponseWriter, r *http.Request) {
		for i := 0; i < itemCount; i++ {
			_, err := w.Write(make([]byte, itemSize))
			if err != nil {
				return
			}
		}
	}, optOnlyServer)
	defer st.Close()

	tlsConfig := &tls.Config{
		InsecureSkipVerify: true,
	}

	tr := &Transport{TLSClientConfig: tlsConfig}
	defer tr.CloseIdleConnections()
	cl := &http.Client{Transport: tr}

	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		resp, err := cl.Get(st.ts.URL)
		if err != nil {
			b.Fatal(err)
		}

		resp.Body.Close()
	}
}

And here're the results for the above benchmark test from my computer (on OSX 10.13.4):

release-branch.go1.6   	    BenchmarkServerClient-8   	    2000	    742842 ns/op	   18715 B/op	      64 allocs/op
release-branch.go1.7   	    BenchmarkServerClient-8   	    2000	    554580 ns/op	   20688 B/op	      83 allocs/op
release-branch.go1.8   	    BenchmarkServerClient-8   	    2000	    631001 ns/op	   20728 B/op	      84 allocs/op
release-branch.go1.9   	    BenchmarkServerClient-8   	    2000	    675503 ns/op	   19745 B/op	      87 allocs/op
release-branch.go1.10       BenchmarkServerClient-8   	    2000	    772618 ns/op	   19087 B/op	      89 allocs/op
master   	       	    BenchmarkServerClient-8   	    2000	    930152 ns/op	   19212 B/op	      89 allocs/op

So although I couldn't reproduce the issue accurately (with 1.6 as the anomaly here), there is some degeneration in time taken per op from 1.7 to master.

This is the go env output from my system:

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/abhinavdangeti/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/abhinavdangeti/Documents/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.10.1/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.10.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ln/vkg1npcj5tz0ps359y6fwmn80000gn/T/go-build266872403=/tmp/go-build -gno-record-gcc-switches -fno-common"

I'd appreciate any pointers or advise regarding what I'm observing here ..

@gopherbot gopherbot added this to the Unreleased milestone Apr 27, 2018

@agnivade agnivade changed the title x/net: Steady performance degeneration seen with HTTP/2 over recent releases x/net/http2: steady performance degeneration seen with HTTP/2 over recent releases Apr 27, 2018

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Apr 27, 2018

CC: @bradfitz

@meirf

This comment has been minimized.

Copy link
Contributor

meirf commented May 5, 2018

I was able to reproduce these BenchmarkServerClient slowdowns:

  • 1.7 -> 1.8
  • 1.8 -> 1.9
  • 1.9 -> 1.10

1.9 -> 1.10 is due to https://go-review.googlesource.com/71372. I don't yet have answers in the form of tips for improvement, sources of 1.7->1.8/1.8->1.9 slowdowns or if there are fixes we need.

For BenchmarkServerClient, the biggest degradation claimed in this issue is 1.10 -> master. But I wasn't able to reproduce that slowdown. @abhinavdangeti , can you please specify what commit master was at when you ran that benchmark? When I run it for example at golang/net@640f462, I see almost no degradation from 1.10.

@abhinavdangeti

This comment has been minimized.

Copy link

abhinavdangeti commented May 7, 2018

Hey @meirf , thank you for confirming the slowdowns in the earlier versions.
This was the tip of my master when I tested it last: golang/net@5f9ae10.

However I just re-ran the test on master a bunch of times, and the number looks a bit different from what I reported earlier .. it seems to be a bit closer to 1.10, but with a certain dip in performance still..

BenchmarkServerClient-8   	    2000	    849227 ns/op	   19133 B/op	      89 allocs/op

I re-ran the test with the latest code from all the branches and here're the results. You're right I don't see a big dip between 1.10 and master. So what I saw earlier could've been an anomaly..

goos: darwin
goarch: amd64
pkg: golang.org/x/net/http2
release-branch.go1.6    BenchmarkServerClient-8   	    2000	    754807 ns/op	   19309 B/op	      64 allocs/op
release-branch.go1.7    BenchmarkServerClient-8   	    3000	    550916 ns/op	   20445 B/op	      83 allocs/op
release-branch.go1.8    BenchmarkServerClient-8   	    2000	    617894 ns/op	   20441 B/op	      84 allocs/op
release-branch.go1.9    BenchmarkServerClient-8   	    2000	    673070 ns/op	   19736 B/op	      87 allocs/op
release-branch.go1.10   BenchmarkServerClient-8   	    2000	    792244 ns/op	   19101 B/op	      89 allocs/op
master                  BenchmarkServerClient-8   	    2000	    783671 ns/op	   19086 B/op	      89 allocs/op

@bcmills bcmills added the Performance label May 8, 2018

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented May 8, 2018

@abhinavdangeti If I'm reading those results correctly, you're also seeing a significant speedup from 1.6 to 1.7, where you previously saw a significant decline. Is that correct?

If so, it might be helpful to eliminate noise (including thermal effects!) on your test machine and see how significant the remaining effect is. (Note that benchcmp can compare the results of arbitrarily many runs of the benchmarks.)

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented May 8, 2018

Hmm, never mind. You had that dip in the original post too: any idea why your observed throughput numbers don't track the benchmark timings?

@abhinavdangeti

This comment has been minimized.

Copy link

abhinavdangeti commented May 9, 2018

Hey @bcmills, that's the part that's still a little confusing to me - why our throughput numbers don't match the benchmark timings going from 1.6 -> 1.7, but repeated tests have shown this regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment