Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/time/rate: Limiter.Wait waits too long #47084

Open
beanieboi opened this issue Jul 7, 2021 · 7 comments
Open

x/time/rate: Limiter.Wait waits too long #47084

beanieboi opened this issue Jul 7, 2021 · 7 comments

Comments

@beanieboi
Copy link

@beanieboi beanieboi commented Jul 7, 2021

What version of Go are you using (go version)?

$ go version

1.16.5

Does this issue reproduce with the latest release?

yes, tested on 1.17beta1 as well.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/sdk/go1.16.5"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/sdk/go1.16.5/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.5"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/user/go-rate-limiter-bug/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2983600181=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I created a reproduction case: https://github.com/beanieboi/go-rate-limiter-bug/blob/main/main_test.go

func TestRateLimiter(t *testing.T) {
	ctx := context.Background()
	numItems := 300000

	endTime := time.Now().Add(10 * time.Second)
	timeRemaining := time.Until(endTime)
	dispatchInterval := time.Duration(int64(timeRemaining) / int64(numItems))
	limiter := rate.NewLimiter(rate.Every(dispatchInterval), 1)

	loopStart := time.Now()
	for i := 0; i < numItems; i++ {
		err := limiter.Wait(ctx)
		if err != nil {
			t.Errorf("error waiting: %v", err)
		}
		// do actual dispatching
	}
	duration := time.Since(loopStart)

	if duration > 15*time.Second {
		t.Errorf("expected execution time of less than 15, got: %v", duration)
	}
}

in short. we are using the time rate.NewLimiter() with limiter.Wait(ctx) and see much longer wait times than on 1.15.7

What did you expect to see?

the same behaviour as on 1.15.7, the tests linked above should finish within ~15s

What did you see instead?

the limiter is waiting too long and the whole test takes ~160s instead of ~15s

@mdlayher mdlayher changed the title limiter.Wait() on 1.16 and higher waits too long x/time/rate: Limiter.Wait waits too long Jul 7, 2021
@gopherbot gopherbot added this to the Unreleased milestone Jul 7, 2021
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jul 7, 2021

Have you been able to determine whether this is due to a change in the x/time/rate or something deeper? I will note #44343 which seems to have regressed in Go 1.16.

Loading

@beanieboi
Copy link
Author

@beanieboi beanieboi commented Jul 28, 2021

hey @mknyszek

I was able to use bisect and pin it down to: 8fdc79e

~/goroot$ git bisect start go1.16.1 go1.15.14
Bisecting: a merge base must be tested
[ba9e10889976025ee1d027db6b1cad383ec56de8] cmd: update golang.org/x/xerrors
$ git bisect run src/bisect.sh
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s

Bisecting: 891 revisions left to test after this (roughly 10 steps)
[0040adfd0f98e1012837b0317fabf69cf1f8855b] runtime: define ios/arm64 entry points
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 445 revisions left to test after this (roughly 9 steps)

[0932dc21180642ce1ff095b9b3e68b06c6f440b3] runtime: declare arg size/map for race version of sync/atomic functions
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.968s
Bisecting: 222 revisions left to test after this (roughly 8 steps)

[d4c1ad882973e407ff85b977f4ce5b9435451190] database/sql: fix tx stmt deadlock when rollback
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.972s
Bisecting: 111 revisions left to test after this (roughly 7 steps)

[db185e543fe471c522790b7d93291e786dc54a84] runtime: drop redundant gcBlackenEnabled reset
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 55 revisions left to test after this (roughly 6 steps)

[36c5edd8d9e6c13af26733e5c820eae0598203fe] runtime: add timeHistogram type
...Building Go...ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 27 revisions left to test after this (roughly 5 steps)

[79a3482d9e735fc6d0373da170fd1b4ff9222bc2] cmd/compile: remove support for untyped ssa rules
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	17.042s
Bisecting: 13 revisions left to test after this (roughly 4 steps)

[333e90448a0e55f2e1161853caecf3d30ef3a74a] crypto/rand: generate random numbers using RtlGenRandom on Windows
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 6 revisions left to test after this (roughly 3 steps)

[b3f7f60129b822978115717912f4d477a46e8467] cmd/dist: fix build failure of misc/cgo/test on arm64
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 3 revisions left to test after this (roughly 2 steps)

[e3bb53a7683eb9c3d04c09f28abb4cf9aa89a7c1] cmd/compile: remove isLiteral
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	17.011s
Bisecting: 0 revisions left to test after this (roughly 1 step)

[3f6b1a0d5eea4756b905db1c2b2c03e8594850d3] misc/cgo/test: test C.enum_*
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.950s
Bisecting: 0 revisions left to test after this (roughly 0 steps)

[8fdc79e18a9704185bd6471b592db1e8004bd993] runtime: reduce timer latency
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.989s
8fdc79e18a9704185bd6471b592db1e8004bd993 is the first bad commit

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Aug 9, 2021

@ChrisHines @prattmic Is this the same root cause as #44343?

Loading

@beanieboi
Copy link
Author

@beanieboi beanieboi commented Aug 11, 2021

@mknyszek I saw that #44343 is tagged as "OS-Windows" and the discussion is mainly about Windows.
We are seeing this issue on Linux (Ubuntu 20.04).

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Aug 11, 2021

It is, but we've been having other late timer issues since 8fdc79e and I thought that issue spun out of that. Perhaps its moved on from the general issues around that commit.

Loading

@prattmic
Copy link
Member

@prattmic prattmic commented Aug 11, 2021

This is probably the same cause as #44343. That bug is mostly discussing Windows as the problem is more pronounced there (15ms sleeps), but it does exist on Linux as well (1ms sleeps).

Loading

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Aug 12, 2021

I agree that the change in behavior goes back to 8fdc79e. After that change timers rely entirely on the netpoller to sleep and wakeup threads. It removed a variable frequency polling loop in the sysmon thread that was helping to wake up threads more quickly in some situations (but not all).

On linux that polling loop could run as frequently as every ~20 microseconds or as infrequently as every 10 milliseconds. By comparison the netpoller on linux has a wakeup granularity of ~1 millisecond. So that change reduced the worst case sleep time from 10ms to 1ms on linux, but also raised the best case time. As a counter point, the change also helped reduce CPU load for programs that mostly do lots of sleeps (see: #27707 (comment)).

The test code above calls rate.NewLimiter with a burst size of one and an event frequency of 33.333 microseconds. If timers sleep for a minimum of 1ms, then I believe on average the rate.Limiter will alternate between waiting for 1ms and 0ms. The first long sleep forces enough time to pass that a full burst is allowed on the subsequent call. But since a burst is limited to one event no additional tokens accumulate in the bucket until the zero wait has consumed the one that's there.

As a result the test is performing the equivalent of ~150,000 sleeps of 1ms, which matches up reasonably well with the report of the total runtime of 160 seconds.

The above test wants to allow 300,000 events in 10 seconds which is the same as 30 events per ms. If we know that the Go 1.16 runtime cannot reliably perform sleeps less than 1ms, then using a burst size ~30 would be more achievable. That should allow 30 events back-to-back followed by a ~1ms sleep and another 30 events, etc.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants