Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: 1.14 performance regression on mac #36298

Open
erikdubbelboer opened this issue Dec 28, 2019 · 22 comments
Open

net: 1.14 performance regression on mac #36298

erikdubbelboer opened this issue Dec 28, 2019 · 22 comments

Comments

@erikdubbelboer
Copy link
Contributor

@erikdubbelboer erikdubbelboer commented Dec 28, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.5 darwin/amd64

and

$ go version
go version go1.14beta1 darwin/amd64

Does this issue reproduce with the latest release?

Yes, the issue started with the latest beta release.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/erik/Library/Caches/go-build"
GOENV="/Users/erik/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/erik"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/s6/xkrsszq51hb0vrnmn08jsr140000gn/T/go-build119700800=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I wrote a simple program that uses the net package to use 9000 connections and 9000*2 goroutines (*2 for both sides of the connection) to send requests and responses every 200 milliseconds. Then I measure the average delay it takes for the response to arrive (so a little bit over 200ms expected).
Program: https://gist.github.com/erikdubbelboer/d925c6e271aa0c5c8bb20a6ec16455c5

What did you expect to see?

On go1.13.5 the average response time is around 206ms. So 6ms more than the expected delay.

What did you see instead?

On go1.14beta1 the average response time is 300ms. That's 94ms extra delay just for upgrading Go.

The program only consumes about half of the CPU's I have so it's not a CPU bottleneck.

I have tested the exact same thing on an Ubuntu machine and here the average response time is 200ms (rounded down, it's only a couple of microseconds of delay) for both versions of Go.

@agnivade
Copy link
Contributor

@agnivade agnivade commented Dec 29, 2019

Thank you for this report @erikdubbelboer. Unfortunately, I don't have a mac, so I can't test this. Would you be open to running a git bisect to exactly pin point this issue to the offending commit ? That would help a lot in the investigation.

@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Dec 29, 2019

Some information I found bisecting.

go1.13rc1 had bad performance for my program. Which was fixed in 81d995d (runtime: fix darwin syscall performance regression).

c1635ad (runtime: fix darwin syscall performance regression) fixed it for master at the same time. But it went back to bad performance again with 6becb03 (runtime: switch to using new timer code). But not nearly as bad as it was before the syscall fix.

@josharian
Copy link
Contributor

@josharian josharian commented Dec 29, 2019

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2019

I'm seeing similar behavior with Go 1.12, Go 1.13, and tip. The program starts out reporting a latency of about 200ms. As the connections get up to the full 9000, the latency starts to increase. For me on macOS 10.15 it seems to settle around 220ms to 250ms. This is an example of what I see. Are these the kinds of numbers you see as well? Do you really see a difference between 1.13 and tip? I'm not seeing it. I see similar delays on GNU/Linux on my laptop.

connections:  343  latency: 200.890396ms  (requests: 683)
connections:  697  latency: 200.821368ms  (requests: 2418)
connections:  1061  latency: 200.743184ms  (requests: 4192)
connections:  1434  latency: 200.733952ms  (requests: 6017)
connections:  1808  latency: 200.761542ms  (requests: 7896)
connections:  2187  latency: 200.714418ms  (requests: 9797)
connections:  2583  latency: 200.639253ms  (requests: 11672)
connections:  2992  latency: 200.642091ms  (requests: 13695)
connections:  3404  latency: 200.817111ms  (requests: 15702)
connections:  3828  latency: 200.752635ms  (requests: 17853)
connections:  4268  latency:  200.64982ms  (requests: 19982)
connections:  4715  latency: 200.606723ms  (requests: 22118)
connections:  5165  latency: 200.740386ms  (requests: 24346)
connections:  5615  latency: 200.706656ms  (requests: 26657)
connections:  6072  latency: 201.176292ms  (requests: 28991)
connections:  6545  latency: 201.548536ms  (requests: 30761)
connections:  7024  latency: 201.627506ms  (requests: 33446)
connections:  7503  latency: 202.091374ms  (requests: 35604)
connections:  7997  latency: 212.458301ms  (requests: 36428)
connections:  8467  latency: 216.730454ms  (requests: 36998)
connections:  8929  latency: 227.487405ms  (requests: 38116)
connections:  9000  latency: 251.811791ms  (requests: 35618)
connections:  9000  latency: 240.457705ms  (requests: 37968)
connections:  9000  latency: 237.856099ms  (requests: 37371)
connections:  9000  latency: 234.247637ms  (requests: 38453)
connections:  9000  latency: 256.271508ms  (requests: 34680)
connections:  9000  latency: 247.740709ms  (requests: 36829)
connections:  9000  latency: 236.892206ms  (requests: 37733)
connections:  9000  latency: 236.814958ms  (requests: 38422)
connections:  9000  latency: 226.538981ms  (requests: 39630)
connections:  9000  latency: 239.949913ms  (requests: 37197)
connections:  9000  latency: 226.853039ms  (requests: 40724)
connections:  9000  latency: 226.292862ms  (requests: 39254)
connections:  9000  latency: 229.269346ms  (requests: 39063)
connections:  9000  latency: 235.468175ms  (requests: 37769)
connections:  9000  latency: 220.767293ms  (requests: 41690)
connections:  9000  latency: 224.779376ms  (requests: 39430)
connections:  9000  latency: 256.439405ms  (requests: 34553)
connections:  9000  latency: 229.501098ms  (requests: 39812)
connections:  9000  latency: 242.980294ms  (requests: 37187)
connections:  9000  latency: 234.840228ms  (requests: 38104)
connections:  9000  latency: 216.742172ms  (requests: 41804)
connections:  9000  latency: 232.979317ms  (requests: 38800)
connections:  9000  latency: 222.800823ms  (requests: 40336)
connections:  9000  latency: 233.061475ms  (requests: 38445)
connections:  9000  latency: 246.251366ms  (requests: 36510)
connections:  9000  latency: 244.446013ms  (requests: 36494)
connections:  9000  latency: 247.226202ms  (requests: 37439)
connections:  9000  latency: 221.866441ms  (requests: 39705)
connections:  9000  latency: 227.286424ms  (requests: 39899)
connections:  9000  latency: 223.551262ms  (requests: 40033)
connections:  9000  latency:  245.03208ms  (requests: 36543)
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2019

I think part of the problem is that the timer implementation tends to cause all your servers to synchronize. They all write data at the same time and the readers all run into each other, so some see higher latency.

If I change the handler function as follows, the numbers are slightly better, but the program still gets overwhelmed on macOS 10.15 (it seems stable at 200ms on my laptop).

func handler(c net.Conn) {
	buff := make([]byte, 4)
	for {
		if _, err := c.Read(buff); err != nil {
			panic(err)
		}
		off := (rand.Int() % 16) - 8
		time.Sleep(responseDelay + time.Duration(off)*time.Millisecond)
		if _, err := c.Write(buff); err != nil {
			panic(err)
		}
	}
}
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2019

Any program like this is bound to collapse at some level of connections. As I said above, I'm seeing similar results with Go 1.13 and tip. Can you confirm that you see better results with 1.13? If you push up the number of connections, it will collapse on 1.13; what number of connection causes that collapse? Thanks.

@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Dec 30, 2019

I have a 2019 Macbook pro (2,3 GHz 8-Core) with 10.15.2.

This is a run on tip (go version devel +dcd3b2c173 Fri Dec 27 14:52:12 2019 +0000 darwin/amd641):
connections:  214  latency: 200.602515ms  (requests: 420)
connections:  433  latency: 200.708629ms  (requests: 1501)
connections:  654  latency: 200.724819ms  (requests: 2592)
connections:  878  latency: 200.721209ms  (requests: 3707)
connections:  1104  latency: 200.697312ms  (requests: 4811)
connections:  1333  latency: 200.653961ms  (requests: 5972)
connections:  1566  latency: 200.545604ms  (requests: 7107)
connections:  1799  latency: 200.676203ms  (requests: 8293)
connections:  2033  latency: 200.810307ms  (requests: 9441)
connections:  2267  latency: 200.825205ms  (requests: 10541)
connections:  2502  latency: 200.596215ms  (requests: 11805)
connections:  2739  latency:  200.62905ms  (requests: 12957)
connections:  2978  latency: 200.517847ms  (requests: 14150)
connections:  3217  latency: 200.539781ms  (requests: 15216)
connections:  3458  latency: 200.700861ms  (requests: 16513)
connections:  3698  latency: 200.447265ms  (requests: 17752)
connections:  3937  latency: 200.696562ms  (requests: 18862)
connections:  4176  latency:  200.63284ms  (requests: 20144)
connections:  4418  latency: 200.346688ms  (requests: 21362)
connections:  4662  latency:  200.69621ms  (requests: 22556)
connections:  4905  latency: 200.712916ms  (requests: 23775)
connections:  5148  latency: 202.988265ms  (requests: 24897)
connections:  5391  latency: 201.089358ms  (requests: 26045)
connections:  5634  latency: 200.549375ms  (requests: 27386)
connections:  5877  latency: 200.697466ms  (requests: 28579)
connections:  6122  latency: 205.651507ms  (requests: 28230)
connections:  6366  latency:   209.0161ms  (requests: 29643)
connections:  6609  latency: 203.697572ms  (requests: 32314)
connections:  6851  latency: 206.174856ms  (requests: 32867)
connections:  7094  latency: 206.177448ms  (requests: 32674)
connections:  7336  latency: 223.981978ms  (requests: 32597)
connections:  7507  latency: 305.926898ms  (requests: 25405)
connections:  7724  latency: 305.048324ms  (requests: 23950)
connections:  7912  latency: 296.864946ms  (requests: 27612)
connections:  8093  latency: 337.062981ms  (requests: 18546)
connections:  8297  latency: 322.061387ms  (requests: 26544)
connections:  8495  latency: 316.352134ms  (requests: 27000)
connections:  8681  latency: 315.528667ms  (requests: 28212)
connections:  8882  latency:  305.82877ms  (requests: 30176)
connections:  9000  latency: 330.977453ms  (requests: 22154)
connections:  9000  latency: 362.380999ms  (requests: 27621)
connections:  9000  latency: 332.453846ms  (requests: 28781)
connections:  9000  latency: 320.721338ms  (requests: 28836)
connections:  9000  latency: 285.075168ms  (requests: 31100)
connections:  9000  latency: 251.145905ms  (requests: 33885)
connections:  9000  latency: 266.993671ms  (requests: 34766)
connections:  9000  latency: 314.165259ms  (requests: 26445)
connections:  9000  latency: 345.336108ms  (requests: 25636)
connections:  9000  latency: 361.087745ms  (requests: 28983)
connections:  9000  latency: 295.295363ms  (requests: 30191)
connections:  9000  latency: 339.091424ms  (requests: 22256)
connections:  9000  latency: 293.209347ms  (requests: 37219)
connections:  9000  latency: 286.418834ms  (requests: 31369)
connections:  9000  latency: 338.615434ms  (requests: 25998)
connections:  9000  latency: 302.197456ms  (requests: 26307)
connections:  9000  latency:  339.63387ms  (requests: 27489)
connections:  9000  latency: 345.729196ms  (requests: 28294)
connections:  9000  latency: 296.134887ms  (requests: 29979)
While this is a run on go version go1.13.5 darwin/amd64:
connections:  206  latency: 201.149826ms  (requests: 406)
connections:  416  latency: 200.932811ms  (requests: 1445)
connections:  628  latency: 200.991975ms  (requests: 2493)
connections:  844  latency: 200.945477ms  (requests: 3545)
connections:  1061  latency: 200.918099ms  (requests: 4639)
connections:  1280  latency: 201.072663ms  (requests: 5734)
connections:  1498  latency: 201.003851ms  (requests: 6771)
connections:  1719  latency: 201.163361ms  (requests: 7895)
connections:  1942  latency: 201.064027ms  (requests: 9034)
connections:  2165  latency: 201.100604ms  (requests: 10112)
connections:  2387  latency: 201.302735ms  (requests: 11237)
connections:  2613  latency: 201.103637ms  (requests: 12310)
connections:  2837  latency: 201.162047ms  (requests: 13473)
connections:  3060  latency: 201.149551ms  (requests: 14530)
connections:  3283  latency: 201.063125ms  (requests: 15556)
connections:  3510  latency: 201.138935ms  (requests: 16813)
connections:  3739  latency: 201.106901ms  (requests: 17954)
connections:  3969  latency: 200.785103ms  (requests: 19103)
connections:  4198  latency: 200.967839ms  (requests: 20288)
connections:  4424  latency: 201.233829ms  (requests: 21369)
connections:  4655  latency: 201.029016ms  (requests: 22482)
connections:  4886  latency: 200.937855ms  (requests: 23499)
connections:  5099  latency: 237.045212ms  (requests: 21638)
connections:  5307  latency: 201.191379ms  (requests: 25906)
connections:  5520  latency: 201.021086ms  (requests: 26960)
connections:  5732  latency: 201.196859ms  (requests: 28005)
connections:  5947  latency: 201.223842ms  (requests: 28388)
connections:  6156  latency: 201.125707ms  (requests: 30039)
connections:  6368  latency: 201.053725ms  (requests: 31157)
connections:  6580  latency: 201.053327ms  (requests: 31922)
connections:  6792  latency: 201.039457ms  (requests: 32863)
connections:  7005  latency: 201.009148ms  (requests: 34001)
connections:  7218  latency: 201.153349ms  (requests: 35231)
connections:  7430  latency: 201.104808ms  (requests: 36403)
connections:  7642  latency: 201.181692ms  (requests: 37396)
connections:  7854  latency: 201.059726ms  (requests: 38416)
connections:  8066  latency: 201.171463ms  (requests: 39493)
connections:  8279  latency: 201.136226ms  (requests: 40496)
connections:  8491  latency: 201.199482ms  (requests: 41736)
connections:  8700  latency: 201.219195ms  (requests: 42761)
connections:  8911  latency: 201.325808ms  (requests: 43509)
connections:  9000  latency:  201.12872ms  (requests: 44478)
connections:  9000  latency: 201.443376ms  (requests: 44568)
connections:  9000  latency: 202.122442ms  (requests: 44627)
connections:  9000  latency: 203.712159ms  (requests: 44141)
connections:  9000  latency: 224.089598ms  (requests: 38300)
connections:  9000  latency: 205.978398ms  (requests: 44455)
connections:  9000  latency: 202.635591ms  (requests: 45000)
connections:  9000  latency: 204.245141ms  (requests: 45000)
connections:  9000  latency: 205.165565ms  (requests: 44985)
connections:  9000  latency: 203.643746ms  (requests: 44517)
connections:  9000  latency:  205.35368ms  (requests: 45000)
connections:  9000  latency: 207.054406ms  (requests: 42367)
connections:  9000  latency: 204.450071ms  (requests: 42891)
connections:  9000  latency: 207.518396ms  (requests: 42052)
connections:  9000  latency: 205.640631ms  (requests: 44188)
connections:  9000  latency: 205.211299ms  (requests: 45000)
connections:  9000  latency: 205.542529ms  (requests: 44677)
connections:  9000  latency: 221.424168ms  (requests: 37812)
connections:  9000  latency: 216.833935ms  (requests: 44775)
connections:  9000  latency: 203.923357ms  (requests: 44981)
connections:  9000  latency: 204.407349ms  (requests: 44999)
connections:  9000  latency: 203.890018ms  (requests: 43897)
connections:  9000  latency: 205.134139ms  (requests: 42448)
connections:  9000  latency:   202.2458ms  (requests: 44625)

As you can see go1.13.5 is quite a lot better on my machine.

@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Dec 30, 2019

@ianlancetaylor what minor version of 1.13 did you use? 1.13.2 was the first version where it was fixed.

With 1.13.5 the collapsing seems to start happening around 16000 connections on my machine:

connections:  10081  latency: 201.543594ms  (requests: 49629)
connections:  10287  latency: 201.842059ms  (requests: 50637)
connections:  10492  latency: 201.619908ms  (requests: 51731)
connections:  10697  latency: 201.800619ms  (requests: 52655)
connections:  10902  latency: 202.044265ms  (requests: 53543)
connections:  11109  latency: 201.760951ms  (requests: 54709)
connections:  11315  latency:  201.67879ms  (requests: 55913)
connections:  11520  latency: 201.629559ms  (requests: 55798)
connections:  11728  latency: 201.725815ms  (requests: 57648)
connections:  11932  latency: 201.624005ms  (requests: 58427)
connections:  12137  latency: 201.747662ms  (requests: 59554)
connections:  12340  latency:   201.7525ms  (requests: 60301)
connections:  12544  latency: 201.881979ms  (requests: 61589)
connections:  12747  latency: 202.253278ms  (requests: 62605)
connections:  12951  latency: 201.842114ms  (requests: 63771)
connections:  13156  latency: 203.120681ms  (requests: 64749)
connections:  13356  latency: 202.875871ms  (requests: 64827)
connections:  13557  latency: 202.099093ms  (requests: 66139)
connections:  13760  latency: 202.912646ms  (requests: 67198)
connections:  13962  latency:  203.19292ms  (requests: 68694)
connections:  14166  latency: 202.143645ms  (requests: 69493)
connections:  14367  latency: 205.881382ms  (requests: 69463)
connections:  14568  latency: 202.632688ms  (requests: 71619)
connections:  14767  latency: 202.851463ms  (requests: 71808)
connections:  14965  latency: 202.405885ms  (requests: 73269)
connections:  15166  latency: 203.179725ms  (requests: 74557)
connections:  15369  latency: 202.990612ms  (requests: 75057)
connections:  15569  latency:  203.49432ms  (requests: 76009)
connections:  15770  latency: 204.251799ms  (requests: 76570)
connections:  15968  latency: 208.079449ms  (requests: 76338)
connections:  16166  latency: 212.495423ms  (requests: 74899)
connections:  16367  latency: 219.372552ms  (requests: 73830)
connections:  16564  latency: 225.440395ms  (requests: 72965)
connections:  16768  latency: 233.959613ms  (requests: 71311)
connections:  16965  latency: 238.820195ms  (requests: 70405)
connections:  17163  latency:   239.0021ms  (requests: 71058)
connections:  17364  latency: 243.902386ms  (requests: 70551)
connections:  17564  latency: 247.059713ms  (requests: 70397)
connections:  17767  latency: 253.104628ms  (requests: 69966)
connections:  17969  latency: 249.847045ms  (requests: 71142)
connections:  18171  latency: 264.383372ms  (requests: 68061)
connections:  18370  latency: 266.974333ms  (requests: 68307)
connections:  18567  latency: 283.147913ms  (requests: 64881)
connections:  18768  latency: 269.575891ms  (requests: 69489)
connections:  18963  latency: 273.408313ms  (requests: 68427)
connections:  19000  latency: 274.476879ms  (requests: 69898)
connections:  19000  latency: 279.551007ms  (requests: 68027)
connections:  19000  latency: 269.565461ms  (requests: 70508)
connections:  19000  latency: 272.486302ms  (requests: 68795)
connections:  19000  latency:  274.05602ms  (requests: 70395)
connections:  19000  latency: 270.531462ms  (requests: 69769)
connections:  19000  latency: 273.705709ms  (requests: 69718)
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 30, 2019

I was testing with the tip of the 1.13 branch, which is a couple of CLs past 1.13.5.

What happens on your machine if you tweak the handler function as I suggested above?

@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Dec 31, 2019

The tip of the 1.13 branch is steady at around 202ms for me at 9000 connections.

These are two runs with the handler function modified exactly as you suggested:

go1.13.5 (1.13 tip performs the same)
connections:  205  latency: 200.651663ms  (requests: 402)
connections:  421  latency: 200.240554ms  (requests: 1460)
connections:  644  latency: 200.070669ms  (requests: 2556)
connections:  866  latency: 200.062754ms  (requests: 3662)
connections:  1092  latency: 199.830623ms  (requests: 4779)
connections:  1321  latency: 199.773243ms  (requests: 5920)
connections:  1552  latency: 199.801887ms  (requests: 7083)
connections:  1782  latency: 199.827072ms  (requests: 8272)
connections:  2012  latency:  199.80562ms  (requests: 9388)
connections:  2243  latency:  199.80677ms  (requests: 10584)
connections:  2475  latency:  199.77306ms  (requests: 11703)
connections:  2707  latency: 199.720148ms  (requests: 12851)
connections:  2939  latency: 199.737232ms  (requests: 14031)
connections:  3172  latency: 199.744186ms  (requests: 15202)
connections:  3405  latency: 199.720737ms  (requests: 16330)
connections:  3640  latency: 199.683117ms  (requests: 17526)
connections:  3873  latency: 199.614053ms  (requests: 18700)
connections:  4107  latency: 199.655703ms  (requests: 19870)
connections:  4343  latency: 199.680723ms  (requests: 21051)
connections:  4581  latency: 199.662293ms  (requests: 22237)
connections:  4818  latency: 199.709878ms  (requests: 23406)
connections:  5057  latency: 199.661038ms  (requests: 24611)
connections:  5294  latency: 199.698572ms  (requests: 25803)
connections:  5534  latency: 199.655629ms  (requests: 27008)
connections:  5775  latency: 199.657599ms  (requests: 28182)
connections:  6009  latency:  202.75378ms  (requests: 28921)
connections:  6239  latency: 199.950074ms  (requests: 30538)
connections:  6472  latency: 199.959554ms  (requests: 31663)
connections:  6704  latency:   200.0548ms  (requests: 32838)
connections:  6935  latency: 199.983346ms  (requests: 34029)
connections:  7095  latency: 200.047203ms  (requests: 24050)
connections:  7308  latency:  315.82355ms  (requests: 35000)
connections:  7512  latency: 222.749987ms  (requests: 32470)
connections:  7733  latency: 200.669254ms  (requests: 38009)
connections:  7956  latency: 200.167569ms  (requests: 39104)
connections:  8170  latency: 209.557317ms  (requests: 36422)
connections:  8387  latency: 206.165387ms  (requests: 41438)
connections:  8610  latency:  202.99401ms  (requests: 42428)
connections:  8830  latency: 200.106641ms  (requests: 41330)
connections:  9000  latency: 229.503818ms  (requests: 39202)
connections:  9000  latency: 203.331271ms  (requests: 44816)
connections:  9000  latency:  200.15151ms  (requests: 44976)
connections:  9000  latency: 200.213181ms  (requests: 44990)
connections:  9000  latency: 200.094154ms  (requests: 45038)
connections:  9000  latency:  200.18218ms  (requests: 44994)
connections:  9000  latency: 200.058057ms  (requests: 45034)
connections:  9000  latency: 200.129857ms  (requests: 44978)
connections:  9000  latency: 200.117958ms  (requests: 45020)
connections:  9000  latency: 200.052849ms  (requests: 45050)
connections:  9000  latency: 200.060677ms  (requests: 45034)
connections:  9000  latency: 200.089998ms  (requests: 44965)
current tip of master aa175a196d
connections:  223  latency: 199.842716ms  (requests: 433)
connections:  462  latency:  199.69198ms  (requests: 1600)
connections:  706  latency: 199.634942ms  (requests: 2799)
connections:  951  latency: 199.541451ms  (requests: 4031)
connections:  1197  latency: 199.519781ms  (requests: 5273)
connections:  1444  latency: 199.514354ms  (requests: 6495)
connections:  1692  latency: 199.594346ms  (requests: 7723)
connections:  1937  latency: 199.654528ms  (requests: 8976)
connections:  2184  latency: 199.574595ms  (requests: 10202)
connections:  2432  latency:  199.60285ms  (requests: 11441)
connections:  2679  latency:  199.61805ms  (requests: 12667)
connections:  2927  latency: 199.575252ms  (requests: 13926)
connections:  3175  latency: 199.654334ms  (requests: 15154)
connections:  3423  latency: 199.570494ms  (requests: 16401)
connections:  3671  latency: 199.523459ms  (requests: 17638)
connections:  3920  latency: 199.482107ms  (requests: 18913)
connections:  4168  latency: 199.568823ms  (requests: 20178)
connections:  4417  latency: 199.546459ms  (requests: 21403)
connections:  4665  latency: 199.593138ms  (requests: 22616)
connections:  4914  latency: 199.555076ms  (requests: 23867)
connections:  5162  latency: 199.576197ms  (requests: 25134)
connections:  5408  latency: 199.909673ms  (requests: 26282)
connections:  5657  latency: 199.576071ms  (requests: 27596)
connections:  5905  latency: 199.564561ms  (requests: 28838)
connections:  6153  latency: 199.577316ms  (requests: 30076)
connections:  6401  latency: 199.622011ms  (requests: 31295)
connections:  6649  latency:  199.56113ms  (requests: 32570)
connections:  6799  latency: 224.169372ms  (requests: 29791)
connections:  7038  latency: 237.610424ms  (requests: 28856)
connections:  7259  latency: 279.478036ms  (requests: 23731)
connections:  7473  latency: 279.591356ms  (requests: 25734)
connections:  7679  latency: 290.216978ms  (requests: 25919)
connections:  7850  latency: 322.964509ms  (requests: 25184)
connections:  8070  latency: 289.276727ms  (requests: 26234)
connections:  8276  latency: 331.784857ms  (requests: 25808)
connections:  8466  latency: 329.789152ms  (requests: 27138)
connections:  8590  latency: 352.660204ms  (requests: 23912)
connections:  8810  latency:  367.09193ms  (requests: 22240)
connections:  8985  latency: 343.293889ms  (requests: 28132)
connections:  9000  latency: 350.080837ms  (requests: 28876)
connections:  9000  latency: 318.476377ms  (requests: 28205)
connections:  9000  latency: 330.702274ms  (requests: 23668)
connections:  9000  latency: 354.526881ms  (requests: 28601)
connections:  9000  latency: 337.101232ms  (requests: 29011)
connections:  9000  latency:  357.45473ms  (requests: 20601)
connections:  9000  latency: 331.605019ms  (requests: 28740)
connections:  9000  latency: 296.477423ms  (requests: 31863)
connections:  9000  latency: 260.442915ms  (requests: 33965)
connections:  9000  latency: 280.284734ms  (requests: 31509)
connections:  9000  latency: 270.296341ms  (requests: 33823)
connections:  9000  latency: 290.062132ms  (requests: 30314)
connections:  9000  latency: 318.941952ms  (requests: 30438)
connections:  9000  latency: 262.615993ms  (requests: 34628)

As you can see even with the randomized response delay master is still a lot slower.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 31, 2019

When I look at an execution trace on GNU/Linux I see that the program is idle about half the time, so I'm currently still inclined to think that this is more of a timing synchronization problem than a performance problem as such. That is, I'm not yet persuaded that this would be a problem for a real program that does network activity but doesn't use timers. But I'll keep looking.

@networkimprov
Copy link

@networkimprov networkimprov commented Dec 31, 2019

cc @eliasnaur @odeke-em as other MacOS devs...

@gopherbot remove WaitingForInfo

@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Jan 1, 2020

@ianlancetaylor for me it's also idle about half the time (with both 1.13 and 1.14) so I agree it should be reworded as a timing problem. But that doesn't change the fact that it seems the new timer code could potentially add a lot of unnecessary delay to high traffic network code on Mac. I'm not running any production environment on Mac so I can't say if any real programs are affected.

Maybe a more realistic scenario is having the delay between requests: https://gist.github.com/erikdubbelboer/d493450f38a77fc74b57f4a1cd4f7380
You can now see it sort of as a client sending a HTTP request and reading the response every 200ms. With the server responding as fast as possible.

go1.13.5 has around 2ms delay
connections:  210  latency:    191.002µs  (requests: 622)
connections:  423  latency:     209.91µs  (requests: 1676)
connections:  639  latency:    243.975µs  (requests: 2763)
connections:  857  latency:    279.191µs  (requests: 3845)
connections:  1073  latency:    332.069µs  (requests: 4900)
connections:  1292  latency:    399.005µs  (requests: 6015)
connections:  1510  latency:    526.117µs  (requests: 7112)
connections:  1732  latency:     544.43µs  (requests: 8183)
connections:  1953  latency:    569.576µs  (requests: 9274)
connections:  2173  latency:    740.093µs  (requests: 10387)
connections:  2398  latency:    959.499µs  (requests: 11521)
connections:  2623  latency:    657.265µs  (requests: 12562)
connections:  2848  latency:    655.602µs  (requests: 13602)
connections:  3072  latency:    938.408µs  (requests: 14891)
connections:  3301  latency:    883.951µs  (requests: 15904)
connections:  3526  latency:   1.231755ms  (requests: 17176)
connections:  3753  latency:   2.302141ms  (requests: 17988)
connections:  3980  latency:    530.625µs  (requests: 19319)
connections:  4208  latency:    427.564µs  (requests: 20502)
connections:  4438  latency:    959.507µs  (requests: 21630)
connections:  4667  latency:    537.394µs  (requests: 22821)
connections:  4898  latency:    522.938µs  (requests: 23915)
connections:  5129  latency:    612.374µs  (requests: 25182)
connections:  5346  latency:  17.847547ms  (requests: 24053)
connections:  5573  latency:    265.663µs  (requests: 27389)
connections:  5797  latency:    326.151µs  (requests: 28408)
connections:  6019  latency:    360.931µs  (requests: 29571)
connections:  6240  latency:    437.708µs  (requests: 30650)
connections:  6460  latency:    550.442µs  (requests: 31723)
connections:  6677  latency:    723.032µs  (requests: 31555)
connections:  6846  latency:  65.381637ms  (requests: 21947)
connections:  7055  latency:  40.699971ms  (requests: 33396)
connections:  7262  latency:    651.958µs  (requests: 35886)
connections:  7467  latency:    9.19435ms  (requests: 33836)
connections:  7677  latency:     1.8626ms  (requests: 37929)
connections:  7886  latency:   1.060051ms  (requests: 38991)
connections:  8095  latency:    573.726µs  (requests: 40049)
connections:  8308  latency:    505.408µs  (requests: 41102)
connections:  8521  latency:    542.991µs  (requests: 41928)
connections:  8732  latency:    839.358µs  (requests: 42980)
connections:  8943  latency:     638.77µs  (requests: 44049)
connections:  9000  latency:    656.128µs  (requests: 44567)
connections:  9000  latency:  10.647651ms  (requests: 40523)
connections:  9000  latency:   1.582034ms  (requests: 44887)
connections:  9000  latency:   2.668245ms  (requests: 44935)
connections:  9000  latency:   4.070734ms  (requests: 44920)
connections:  9000  latency:   4.721951ms  (requests: 44124)
connections:  9000  latency:   3.543523ms  (requests: 44485)
connections:  9000  latency:   2.903695ms  (requests: 44009)
connections:  9000  latency:   1.166801ms  (requests: 44037)
connections:  9000  latency:   4.330805ms  (requests: 43123)
connections:  9000  latency:   4.595809ms  (requests: 42088)
connections:  9000  latency:   5.344845ms  (requests: 44361)
connections:  9000  latency:    3.50281ms  (requests: 44773)
connections:  9000  latency:   1.270037ms  (requests: 45000)
connections:  9000  latency:    1.28122ms  (requests: 45000)
tip (go version devel +aa175a196d Mon Dec 30 20:44:00 2019 +0000 darwin/amd64) has around 100ms delay still
connections:  213  latency:    197.913µs  (requests: 635)
connections:  437  latency:    230.572µs  (requests: 1734)
connections:  672  latency:    170.671µs  (requests: 2884)
connections:  908  latency:    236.383µs  (requests: 4049)
connections:  1142  latency:    319.981µs  (requests: 5224)
connections:  1381  latency:    405.083µs  (requests: 6420)
connections:  1615  latency:    452.673µs  (requests: 7574)
connections:  1854  latency:    438.935µs  (requests: 8792)
connections:  2092  latency:    491.614µs  (requests: 9977)
connections:  2330  latency:    374.371µs  (requests: 11173)
connections:  2567  latency:    623.341µs  (requests: 12358)
connections:  2805  latency:    401.139µs  (requests: 13546)
connections:  3042  latency:    649.438µs  (requests: 14591)
connections:  3281  latency:    363.364µs  (requests: 15922)
connections:  3516  latency:    840.844µs  (requests: 17007)
connections:  3752  latency:    763.746µs  (requests: 18093)
connections:  3991  latency:    573.783µs  (requests: 19471)
connections:  4233  latency:     4.3477ms  (requests: 20180)
connections:  4476  latency:   2.729418ms  (requests: 21895)
connections:  4720  latency:    318.776µs  (requests: 23085)
connections:  4945  latency:   4.871933ms  (requests: 22857)
connections:  5189  latency:   2.608706ms  (requests: 25451)
connections:  5429  latency:   3.554109ms  (requests: 26490)
connections:  5669  latency:   6.950101ms  (requests: 26548)
connections:  5861  latency:  11.042294ms  (requests: 26314)
connections:  6008  latency:  25.257558ms  (requests: 25757)
connections:  6204  latency:  40.020907ms  (requests: 26194)
connections:  6432  latency:  57.460309ms  (requests: 23019)
connections:  6629  latency:  59.107574ms  (requests: 23109)
connections:  6826  latency:  83.458901ms  (requests: 21615)
connections:  7016  latency:   89.02983ms  (requests: 22460)
connections:  7214  latency:  87.295282ms  (requests: 23732)
connections:  7407  latency: 139.117764ms  (requests: 23633)
connections:  7632  latency: 122.404973ms  (requests: 25276)
connections:  7796  latency: 104.030479ms  (requests: 17655)
connections:  7984  latency: 106.160659ms  (requests: 24955)
connections:  8199  latency: 100.033108ms  (requests: 26870)
connections:  8405  latency:  60.981735ms  (requests: 34501)
connections:  8609  latency:   66.85082ms  (requests: 32020)
connections:  8793  latency:  85.746594ms  (requests: 22607)
connections:  9000  latency: 104.688803ms  (requests: 30429)
connections:  9000  latency: 112.219789ms  (requests: 28528)
connections:  9000  latency:  95.875495ms  (requests: 30169)
connections:  9000  latency:  78.598719ms  (requests: 31431)
connections:  9000  latency: 122.359633ms  (requests: 28625)
connections:  9000  latency:  65.958915ms  (requests: 32001)
connections:  9000  latency: 123.409997ms  (requests: 28438)
connections:  9000  latency:  34.628089ms  (requests: 33896)
connections:  9000  latency:  92.292065ms  (requests: 28895)
connections:  9000  latency:  94.782454ms  (requests: 30240)
connections:  9000  latency: 141.724581ms  (requests: 25506)
connections:  9000  latency:  75.539259ms  (requests: 33497)
connections:  9000  latency: 131.929565ms  (requests: 20559)
connections:  9000  latency:  89.196202ms  (requests: 31967)
connections:  9000  latency:  93.219438ms  (requests: 30251)
connections:  9000  latency:  91.787391ms  (requests: 29767)
@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Jan 1, 2020

Separating the server and client part of the process into separate processes I can still reproduce the issue if the server is running go1.13.5 (without any sleep in the handler) and the client is running tip with a 200ms delay between requests. This I think shows that the issue is with the net code in combination with the sleep. Using go1.13.5 for the client and tip for the server I could never reproduce the issue.

@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Jan 1, 2020

Another interesting thing which might give a hint is that all runs have a drop in the number of requests handled right before the latency increase. This is with the separate server and client process and a random sleep between requests.

run 1
connections:  5920  latency:    108.459µs  (requests: 29157)
connections:  6167  latency:    263.384µs  (requests: 30345)
connections:  6415  latency:    106.456µs  (requests: 31641)
connections:  6663  latency:    111.774µs  (requests: 32885)
connections:  6912  latency:    115.933µs  (requests: 34132)
connections:  7160  latency:    119.558µs  (requests: 35398)
connections:  7202  latency:     262.22µs  (requests: 6035) <-- here
connections:  7453  latency: 1.919001617s  (requests: 25186)
connections:  7685  latency:   55.58679ms  (requests: 31844)
connections:  7939  latency:  37.682556ms  (requests: 30672)
connections:  8158  latency:  39.397319ms  (requests: 34357)
connections:  8347  latency:  74.686898ms  (requests: 26174)
connections:  8567  latency:  94.930925ms  (requests: 28020)
connections:  8782  latency: 108.449594ms  (requests: 29237)
connections:  8953  latency: 106.607694ms  (requests: 29934)
run 2
connections:  5671  latency:    122.542µs  (requests: 27920)
connections:  5918  latency:    189.623µs  (requests: 29170)
connections:  6165  latency:    137.299µs  (requests: 30348)
connections:  6413  latency:    160.631µs  (requests: 31650)
connections:  6662  latency:    175.661µs  (requests: 32864)
connections:  6706  latency:    237.109µs  (requests: 5985) <-- here
connections:  6942  latency: 1.528003839s  (requests: 24469)
connections:  7198  latency:  39.526885ms  (requests: 31393)
connections:  7424  latency:   48.19922ms  (requests: 26490)
connections:  7682  latency:  54.880246ms  (requests: 33168)
connections:  7921  latency:   61.12974ms  (requests: 27962)
run 3
connections:  6165  latency:     83.668µs  (requests: 30374)
connections:  6413  latency:     79.824µs  (requests: 31655)
connections:  6662  latency:    157.149µs  (requests: 32870)
connections:  6910  latency:     88.691µs  (requests: 34132)
connections:  7159  latency:     93.118µs  (requests: 35371)
connections:  7180  latency:    311.564µs  (requests: 2984) <-- here
connections:  7412  latency: 1.362051106s  (requests: 24037)
connections:  7624  latency:  79.527981ms  (requests: 26877)
connections:  7850  latency:  35.147355ms  (requests: 32510)
connections:  8089  latency:  17.806359ms  (requests: 35870)
connections:  8336  latency:  43.741719ms  (requests: 35076)
connections:  8577  latency:   6.058145ms  (requests: 41767)
run 4
connections:  5173  latency:     65.637µs  (requests: 25426)
connections:  5422  latency:     71.411µs  (requests: 26658)
connections:  5670  latency:      66.32µs  (requests: 27903)
connections:  5919  latency:     64.958µs  (requests: 29189)
connections:  5947  latency:    136.446µs  (requests: 3431) <-- here
connections:  6176  latency: 1.189237636s  (requests: 20437)
connections:  6416  latency:  46.963571ms  (requests: 28186)
connections:  6640  latency:  45.592822ms  (requests: 24123)
connections:  6857  latency:  52.164676ms  (requests: 29145)
connections:  7069  latency:  54.679434ms  (requests: 25002)
connections:  7259  latency:   58.74997ms  (requests: 24658)
connections:  7453  latency:  69.040101ms  (requests: 25217)

Something seems to be happening at these points that probably causes all requests to synchronize around the same time. Even with a random sleep. And this is with GOGC=off to make sure it isn't the GC causing these pauses.

But weirdly I don't see this same drop if I run everything in the same process with otherwise the same code. If it's all in the same process the latency just gradually increases.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 1, 2020

But that doesn't change the fact that it seems the new timer code could potentially add a lot of unnecessary delay to high traffic network code on Mac.

But if I'm right this will only happen if the program is using time.Sleep calls that cause the timers to tend to synchronize when running the new code in ways that they did not synchronize when running the old code. That seems unlikely in a real network server, since a real network server will not tend to call time.Sleep.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 1, 2020

I tweaked the runtime to print every time it runs more than 500 timers in a single call to checkTimers. This is from my GNU/Linux laptop, going up to 15000 connections. Since the only timers here are calls to time.Sleep, this does suggest a fair degree of synchronization of time.Sleep completions.

connections:  449  latency: 200.414124ms  (requests: 891)
connections:  908  latency:  200.34785ms  (requests: 3141)
connections:  1366  latency: 200.333493ms  (requests: 5439)
connections:  1837  latency: 200.241471ms  (requests: 7758)
connections:  2292  latency: 200.316448ms  (requests: 10094)
connections:  2765  latency: 200.325891ms  (requests: 12382)
connections:  3230  latency: 200.455848ms  (requests: 14729)
connections:  3703  latency: 200.457528ms  (requests: 17072)
connections:  4165  latency:  200.41005ms  (requests: 19456)
connections:  4640  latency: 200.361869ms  (requests: 21744)
connections:  5117  latency: 200.368466ms  (requests: 24097)
connections:  5584  latency: 200.834337ms  (requests: 26529)
connections:  6056  latency: 202.885266ms  (requests: 27686)
connections:  6525  latency: 206.370416ms  (requests: 31120)
ran 538 timers
connections:  6986  latency: 217.894851ms  (requests: 30759)
ran 782 timers
connections:  7408  latency: 250.366346ms  (requests: 24522)
ran 1059 timers
connections:  7794  latency: 295.580142ms  (requests: 30539)
ran 805 timers
connections:  8214  latency: 311.639705ms  (requests: 24828)
ran 662 timers
connections:  8634  latency: 310.074713ms  (requests: 26119)
ran 583 timers
connections:  9063  latency: 312.046573ms  (requests: 27310)
connections:  9505  latency: 315.841608ms  (requests: 28354)
ran 853 timers
connections:  9923  latency: 317.957578ms  (requests: 29570)
ran 656 timers
connections:  10353  latency: 326.432386ms  (requests: 30669)
connections:  10797  latency: 330.348747ms  (requests: 32110)
ran 1440 timers
connections:  11235  latency: 330.969847ms  (requests: 33496)
ran 619 timers
ran 1495 timers
connections:  11640  latency: 334.269563ms  (requests: 34751)
connections:  12069  latency: 339.508599ms  (requests: 35959)
connections:  12473  latency: 350.345072ms  (requests: 37031)
ran 592 timers
connections:  12855  latency:  356.60846ms  (requests: 26378)
ran 711 timers
ran 1089 timers
ran 1263 timers
ran 569 timers
connections:  13252  latency: 347.959632ms  (requests: 38713)
connections:  13704  latency: 348.470149ms  (requests: 40538)
ran 1229 timers
ran 527 timers
connections:  14132  latency: 350.545532ms  (requests: 41985)
ran 887 timers
ran 1301 timers
connections:  14502  latency: 287.791229ms  (requests: 48603)
ran 613 timers
ran 1215 timers
ran 511 timers
ran 697 timers
connections:  14939  latency: 265.150884ms  (requests: 55492)
connections:  15000  latency: 322.280038ms  (requests: 45778)
ran 749 timers
connections:  15000  latency: 360.427487ms  (requests: 45510)
ran 553 timers
ran 1145 timers
ran 1219 timers
ran 753 timers
ran 618 timers
connections:  15000  latency: 361.385123ms  (requests: 33958)
ran 1012 timers
connections:  15000  latency: 358.502829ms  (requests: 42082)
ran 595 timers
connections:  15000  latency:  358.96632ms  (requests: 45358)
ran 558 timers
connections:  15000  latency: 360.259505ms  (requests: 45463)
connections:  15000  latency:  360.96247ms  (requests: 33950)
connections:  15000  latency: 364.252616ms  (requests: 41909)
ran 561 timers
ran 539 timers
ran 841 timers
connections:  15000  latency: 360.952581ms  (requests: 45210)
ran 970 timers
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 1, 2020

I've convinced myself that this is due to timer synchronization. With the original program I see delays increase at around 7000 connections on my GNU/Linux laptop. Modifying handler to add an offset of (rand.Int() % 32) - 16 milliseconds to the time.Sleep I can run up to 15000 connections without seeing any delays.

I'm retargeting this for 1.15, but happy to hear counter-arguments.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.14, Go1.15 Jan 1, 2020
@erikdubbelboer
Copy link
Contributor Author

@erikdubbelboer erikdubbelboer commented Jan 1, 2020

@ianlancetaylor I agree. I guess the next step is to investigate why the new timer code so much more likely to synchronize the timers.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 1, 2020

One more comment: if I change goroutineReady in runtime/time.go to

	ready(arg.(*g), 0, false)

the program can run a few more connections before collapsing. The difference is that the current version queues up the goroutine whose sleep time has expired to be the next goroutine that the P runs. That is normally the desired behavior. But when a bunch of sleep timers expire in the same run of checkTimers, the effect is to run the goroutines in reverse order of timer expiration.

@networkimprov
Copy link

@networkimprov networkimprov commented May 23, 2020

#38860 has some CLs which might apply here, including https://golang.org/cl/232298

@ianlancetaylor ianlancetaylor modified the milestones: Go1.15, Go1.16 Jun 16, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 16, 2020

Didn't do anything for 1.15, rolling on to 1.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.