Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: time.Sleep takes more time than expected #44343

Open
egonelbre opened this issue Feb 17, 2021 · 93 comments
Open

runtime: time.Sleep takes more time than expected #44343

egonelbre opened this issue Feb 17, 2021 · 93 comments
Labels
NeedsInvestigation OS-Linux OS-Windows
Milestone

Comments

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Feb 17, 2021

This seems to be a regression with Go 1.16 time.Sleep.

What version of Go are you using (go version)?

$ go version
go version go1.16 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=Z:\gocache
set GOENV=C:\Users\egone\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=F:\Go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=F:\Go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.16
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=f:\temp\sleep\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=Z:\Temp\go-build3014714148=/tmp/go-build -gno-record-gcc-switches

What did you do?

package main

import (
	"fmt"
	"time"

	"github.com/loov/hrtime"
)

func main() {
	b := hrtime.NewBenchmark(100)
	for b.Next() {
		time.Sleep(50 * time.Microsecond)
	}
	fmt.Println(b.Histogram(10))
}

hrtime is a package that uses RDTSC for time measurements.

Output on Windows 10:

> go1.16 run .
  avg 15.2ms;  min 55.3µs;  p50 15.5ms;  max 16.3ms;
  p90 16.1ms;  p99 16.3ms;  p999 16.3ms;  p9999 16.3ms;
     55.3µs [  2] █
        2ms [  0]
        4ms [  0]
        6ms [  0]
        8ms [  0]
       10ms [  1] ▌
       12ms [  0]
       14ms [ 75] ████████████████████████████████████████
       16ms [ 22] ███████████▌
       18ms [  0]

> go1.15.8 run .
  avg 1.03ms;  min 63.9µs;  p50 1ms;  max 2.3ms;
  p90 1.29ms;  p99 2.3ms;  p999 2.3ms;  p9999 2.3ms;
     63.9µs [  1] ▌
      500µs [ 47] ███████████████████████████████████████
        1ms [ 48] ████████████████████████████████████████
      1.5ms [  1] ▌
        2ms [  3] ██
      2.5ms [  0]
        3ms [  0]
      3.5ms [  0]
        4ms [  0]
      4.5ms [  0]

Output on Linux (Debian 10):

$ go1.16 run test.go
  avg 1.06ms;  min 1.06ms;  p50 1.06ms;  max 1.08ms;
  p90 1.07ms;  p99 1.08ms;  p999 1.08ms;  p9999 1.08ms;
     1.06ms [  4] █▌
     1.07ms [ 84] ████████████████████████████████████████
     1.07ms [  7] ███
     1.08ms [  3] █
     1.08ms [  1]
     1.09ms [  1]
     1.09ms [  0]
      1.1ms [  0]
      1.1ms [  0]
     1.11ms [  0]

$ go1.15.8 run test.go
  avg 86.7µs;  min 57.3µs;  p50 83.6µs;  max 132µs;
  p90 98.3µs;  p99 132µs;  p999 132µs;  p9999 132µs;
     57.3µs [  2] █
       60µs [  1] ▌
       70µs [ 13] ████████
       80µs [ 64] ████████████████████████████████████████
       90µs [ 11] ██████▌
      100µs [  2] █
      110µs [  1] ▌
      120µs [  3] █▌
      130µs [  3] █▌
      140µs [  0]

The time granularity shouldn't be that bad even for Windows. So, there might be something going wrong somewhere.

@seankhliao seankhliao added the NeedsInvestigation label Feb 17, 2021
@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Feb 17, 2021

@seankhliao seankhliao changed the title time.Sleep takes more time than expected runtime: time.Sleep takes more time than expected Feb 17, 2021
@bcmills
Copy link
Member

@bcmills bcmills commented Feb 17, 2021

That same CL shook out a number of kernel and runtime bugs in various configurations. (See previously #43067, #42515, #42237; cc @prattmic.)

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 17, 2021

@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Feb 17, 2021
@prattmic
Copy link
Member

@prattmic prattmic commented Feb 17, 2021

This is reproducible with a trivial benchmark in time package:

func BenchmarkSimpleSleep(b *testing.B) {
       for i := 0; i < b.N; i++ {
               Sleep(50*Microsecond)
       }
}

amd64/linux, before/after http://golang.org/cl/232298:

name            old time/op  new time/op   delta
SimpleSleep-12  86.9µs ± 0%  609.8µs ± 5%  +601.73%  (p=0.000 n=10+9)

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 17, 2021

For reference, across different sleep times:

name                  old time/op  new time/op   delta
SimpleSleep/1ns-12     460ns ± 3%    479ns ± 1%    +4.03%  (p=0.000 n=10+9)
SimpleSleep/100ns-12   466ns ± 3%    476ns ± 2%    +2.35%  (p=0.001 n=10+9)
SimpleSleep/500ns-12  6.47µs ±11%   6.70µs ± 5%      ~     (p=0.105 n=10+10)
SimpleSleep/1µs-12    10.3µs ±10%   12.2µs ±13%   +18.23%  (p=0.000 n=10+10)
SimpleSleep/10µs-12   81.9µs ± 1%  502.5µs ± 4%  +513.45%  (p=0.000 n=10+10)
SimpleSleep/50µs-12   87.0µs ± 0%  622.9µs ±18%  +615.69%  (p=0.000 n=8+10)
SimpleSleep/100µs-12   179µs ± 0%   1133µs ± 1%  +533.52%  (p=0.000 n=8+10)
SimpleSleep/500µs-12   592µs ± 0%   1137µs ± 1%   +91.97%  (p=0.000 n=10+10)
SimpleSleep/1ms-12    1.12ms ± 2%   1.14ms ± 1%    +1.36%  (p=0.000 n=9+10)
SimpleSleep/10ms-12   10.2ms ± 0%   10.3ms ± 0%    +0.79%  (p=0.000 n=9+9)

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 17, 2021

Looking at the 100µs, the immediate problem is the delay resolution in netpoll.

Prior to http://golang.org/cl/232298, 95% of timer expirations in the 100µs case are detected by sysmon, which calls startm to wake an M to handle the timer (though this is not a particularly efficient approach).

After http://golang.org/cl/232298, this path is gone and the wakeup must come from netpoll (assuming all Ms are parked/blocked). netpoll on Linux only has 1ms resolution, so it must sleep at least that long before detecting the timer.

I'm not sure why I'm seeing ~500µs on the 10µs and 50µs benchmarks, but I may have bimodal distribution where ~50% of cases a spinning M is still awake long enough to detect the timer before entering netpoll.

I'm also not sure why @egonelbre is seeing ~14ms on Windows, as that also appears to have 1ms resolution on netpoll.

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 17, 2021

I think the ideal fix to this would be to increase the resolution of netpoll. Even for longer timers, this limited resolution will cause slight skew to timer delivery (though of course there are no real-time guarantees).

As it happens, Linux v5.11 includes epoll_pwait2 which switches the timeout argument to a timespec for nanosecond resolution. Unfortunately, Linux v5.11 was released ... 3 days ago, so availability is not widespread to say the least.

In the past, I've also prototyped changing the netpoll timeout to being controlled by a timerfd (with the intention of being able to adjust the timer earlier without a full netpollBreak). That could be an option as well.

Both of these are Linux-specific solutions, I'd have to research other platforms more to get a sense of the options there.

We also may just want to bring the sysmon wakeup back, perhaps with slight overrun allowed to avoid excessive M wakeups.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 18, 2021

I guess that wakeNetPoller doesn't help, because there is no poller sleeping at the point of calling time.Sleep.

Perhaps when netpoll sees a delay that is shorter than the poller resolution it should just do a non-blocking poll. That will effectively turn findrunnable into a busy wait when the next timer is very soon.

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Feb 18, 2021

While working on CL232298 I definitely observed anecdotal evidence that the netpoller has more latency than other ways of sleeping. From #38860 (comment):

My anecdotal observation here is that it appears the Linux netpoller implementation has more latency when waking after a timeout than the Go 1.13 timerproc implementation. Most of these benchmark numbers replicate on my Mac laptop, but the darwin netpoller seems to suffer less of that particular latency by comparison, and is also worse in other ways. So it may not be possible to close the gap with Go 1.13 purely in the scheduler code. Relying on the netpoller for timers changes the behavior as well, but these new numbers are at least in the same order of magnitude as Go 1.13.

I didn't try to address that in CL232298 primarily because it was already risky enough that I didn't want to make bigger changes. But an idea for something to try occurred to me back then. Maybe we could improve the latency of non-network timers by having one M block on a notesleep call instead of the netpoller. That would require findrunnable to compute two wakeup times, one for net timers to pass to the netpoller and one for all other timers to use with notesleep depending on which role it takes on (if any) when it cannot find any other work.

I haven't fully gauged how messy that would get.

Questions and concerns:

  • Coordinating two sleeping M's probably has complicating edge cases to figure out.
  • I haven't tested the latency of notesleep wakeups to know if it would actually help.
  • Would it require duplicating all the timer fields on each P, one for net timers and one for the others?

One other oddity that I noticed when testing CL232298: The linux netpoller sometimes wakes up from the timeout value early by a several microseconds. When that happens findrunnable usually does not find any expired timers since they haven't actually expired yet. A new--very short--pollUntil value gets computed and the M reenters the netpoller. The subsequent wakeup is then typically rather late, maybe up to 1ms, but I am going from memory here. I might be able to dig up some trace logs showing this behavior if I still have them and people are interested.

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 18, 2021

I guess that wakeNetPoller doesn't help, because there is no poller sleeping at the point of calling time.Sleep.

wakeNetPoller shouldn't matter either way, because even if we wake the netpoller, it will just sleep again with a new timeout of 1ms, which is too long. (Unless wakeNetPoller happens to take so long that the timer has expired by the time the woken M gets to checkTimers).

Perhaps when netpoll sees a delay that is shorter than the poller resolution it should just do a non-blocking poll. That will effectively turn findrunnable into a busy wait when the next timer is very soon.

Maybe we could improve the latency of non-network timers by having one M block on a notesleep call instead of the netpoller.

As somewhat of a combination of these, one potential option would be to make netpoll with a short timeout do non-blocking netpoll, short notetsleep, non-blocking netpoll. Though this has the disadvantage of slightly increasing latency of network events from netpoll.

One other oddity that I noticed when testing CL232298: The linux netpoller sometimes wakes up from the timeout value early by a several microseconds. When that happens findrunnable usually does not find any expired timers since they haven't actually expired yet. A new--very short--pollUntil value gets computed and the M reenters the netpoller. The subsequent wakeup is then typically rather late, maybe up to 1ms, but I am going from memory here. I might be able to dig up some trace logs showing this behavior if I still have them and people are interested.

Hm, this sounds like another bug, or perhaps a spurious netpollBreak from another M.

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 18, 2021

It seems that on Windows, notetsleep has 1ms precision in addition to netpoll, so the explanation in #44343 (comment) doesn't explain the increase in latency on Windows.

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Feb 18, 2021

My first thought on the Windows behavior is that somehow osRelax is being mismanaged and allowing the timer resolution to decrease to its resting mode. That thought is driven by the spike on the above histograms at 15ms. I haven't thought through how that might happen now.

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Feb 18, 2021

Hm, this sounds like another bug, or perhaps a spurious netpollBreak from another M.

That could be, but I was logging at least some of the calls to netpollBreak as well and don't recall seeing seeing that happen. I saved my logging code in case it can help. https://github.com/ChrisHines/go/tree/dlog-backup

@vellotis
Copy link

@vellotis vellotis commented Feb 25, 2021

For reference, output on my Windows 10:

> go1.16 run .
  avg 1.06ms;  min 475µs;  p50 1.01ms;  max 1.99ms;
  p90 1.13ms;  p99 1.99ms;  p999 1.99ms;  p9999 1.99ms;
      475µs [  1] ▌
      600µs [  1] ▌
      800µs [ 36] ██████████████████████████
        1ms [ 55] ████████████████████████████████████████
      1.2ms [  0]
      1.4ms [  0]
      1.6ms [  0]
      1.8ms [  7] █████
        2ms [  0]
      2.2ms [  0]

A totally different results from the #44343 (comment).

go env Output
$ go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\user\AppData\Local\go-build
set GOENV=C:\Users\user\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Projects\Go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Projects\Go
set GOPRIVATE=
set GOPROXY=
set GOROOT=C:\Tools\Go\go1.16
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Tools\Go\go1.16\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.16
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=NUL
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\user\AppData\Local\Temp\go-build2862485594=/tmp/go-build -gno-record-gcc-switches

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Feb 26, 2021

@vellotis this could be because there's something running in the background changing Windows timer resolution. This could be some other Go service/binary built using older Go version. Of course there can plenty of other programs that may change it.

You can use https://github.com/tebjan/TimerTool to see what the current value is. There's some more detail in https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted.

@zobkiw
Copy link

@zobkiw zobkiw commented Mar 19, 2021

For what it's worth, go1.16.2 darwin/amd64 is also exhibiting this. In a program I'm running, the line time.Sleep(1 * time.Hour) takes roughly an hour and 3 minutes each time.

@prattmic
Copy link
Member

@prattmic prattmic commented Mar 19, 2021

@zobkiw It sounds like they is more likely to be related to #44868. I'm also curious if the failure is consistent and it really is always 1hr 3min, not 1hr 2min (or rather, ranging from 2-4min since alignment with 2min forcegc will vary)?

@zobkiw
Copy link

@zobkiw zobkiw commented Mar 19, 2021

@prattmic Strangely, I just checked the output of the script now and (although I just restarted it a few hours before) it was spot on at one hour twice in a row. However, yesterday it was "generally" 3 minutes. I don't have an exact time since we were only seeing the minute logged. It was always 3 minutes (rounded) from the previous run. 1:00pm, 2:03pm, 3:06pm, 4:09pm, etc.

Some things to note about this loop I'm running, it is calling a shell script using exec.Command before sleeping for an hour, then it does it again. The script takes about 10 seconds to execute and completes its job. It is also running in a screen session but so was the one this morning that was doing fine so I don't think that was it. The machine it was running on, a mac mini, was basically idle other that this lightweight job.

If you have some code you would like me to run I would be happy to - otherwise I will keep an eye on it here as well and report anything else I see. Hopefully some of this is helpful in narrowing down the cause if you haven't already.

UPDATE 3-20-2021 10am ET: I just ran my program for about the last 24 hours and sure enough, it would run fine for the first few iterations and then start to sleep longer based on the logging. Mostly it would hover between 3-5 minutes late but once it was 9 minutes! I've (this morning) written another program using the same logic but with simulated activity (since the actual tasks did not seem to be part of the problem) and much more detailed logging. I have it running now in 3 environments, two using screen and one not. It is running on an M1 Big Sur 11.2.3 and an Intel mac mini running the same. One thing that struck me this morning was wondering if the machine being asleep at all caused the delay. Since it runs OK for awhile (while I was using the machine presumably) and then had the delays over night. This morning, the latest reading (while I was using the machine again to install the new test code) was back to 1 hour spot on - no delay. Once I get some more data at the end of the day or tomorrow morning I will report back and share the code if there remains a problem.

@tandr
Copy link

@tandr tandr commented Mar 19, 2021

3 minutes per hour is rather worrisome, especially if some cron-like functionality required in a long-running service...

@networkimprov
Copy link

@networkimprov networkimprov commented Mar 19, 2021

In case it's related; on Windows, timers tick while the system is in standby/suspend, but they pause on other platforms.

This is supposed to have been addressed already #36141

@zobkiw
Copy link

@zobkiw zobkiw commented Mar 21, 2021

@prattmic et al, I took the liberty of writing a testbed to experiment with this issue. See all the details of test runs on Linux, Intel, and M1, along with source code here:

https://github.com/zobkiw/sleeptest

Comments welcome - especially if you find a bug in the code. Also, see the results directory for the output of the tests. Regardless, these results seem strange to me. I look forward to other input on this issue.

@networkimprov
Copy link

@networkimprov networkimprov commented Mar 21, 2021

cc @zx2c4 :-)

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Mar 21, 2021

@zobkiw could it be something due to cpu frequency scaling? Can you try locking the cpu to a specific frequency (something low) and see whether the problem disappears?

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Nov 11, 2021

My understanding is that this issue is about Go 1.16+ runtime timers having worse best case granularity than earlier versions of Go. In other words, a 50 microsecond timer always takes longer than that to fire and also takes longer to fire than on earlier versions of Go, especially on Windows. It does not require high CPU load to trigger the problem described here. It is fundamental to changes in how timers are serviced in the runtime in Go 1.16+.

I think a fix for this issue would be focussed on reducing the timer latency and would not have to reduce the runtime overhead of epollwait to address the original problem report. That's why your concern seems like a separate issue to me.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Nov 11, 2021

I'm not aware of other platforms aside Windows being affected by this issue (unless I missed an important comment). So, if the performance issue happens on other platforms, then it's probably something different.

@coder543
Copy link

@coder543 coder543 commented Nov 11, 2021

@egonelbre my entire analysis was on Mac and Linux. Mac was unaffected by these changes, but Linux on both ARM and x86 was strongly impacted by a regression in Go 1.16, as I documented there.

This isn’t just about Windows.

I would also point to my comment discussing some ideas for how to move forward with resolving this regression without completely negating the original reasons that the regression was introduced, as far as I can tell.

@ChrisHines
Copy link
Contributor

@ChrisHines ChrisHines commented Nov 11, 2021

@egonelbre Your original post shows a regression on Debian 10 from ~80µs to ~1ms. The regression is much less than on Windows, but it's still there.

@prattmic
Copy link
Member

@prattmic prattmic commented Nov 11, 2021

The underlying issue here is that we depend on the netpoll timeout to wait for the next timer, but the netpoll timeout may have fairly coarse resolution (1ms on Linux, up to 15ms on Windows, depending on the current system timer settings). Timers that expire in less than that resolution can thus have comparatively high overshoot.

Due to the extremely coarse resolution on Windows, the issue is more extreme, but it still affects all systems to some extent.

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Nov 11, 2021

@ChrisHines I need to find the facepalm emote. Somehow completely forgot and eyes glided over it.

@prattmic
Copy link
Member

@prattmic prattmic commented Nov 11, 2021

In #44343 (comment), I noted that epoll_pwait2 can pretty simply resolve the problem for Linux. It is still quite new, but if we switch to that, at least there is a clear workaround for Linux: ensure you are running on a newer kernel.

@egonelbre No worries, this issue has gotten rather long and I keep having to reread to remember as well. :)

@prattmic
Copy link
Member

@prattmic prattmic commented Nov 11, 2021

I've thrown together the mostly-untested https://golang.org/cl/363417 to prototype using epoll_pwait2 (only linux-amd64 implemented) if anyone would like to give it a try. You'll need Linux kernel >=5.11 to take the new code path.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 11, 2021

Change https://golang.org/cl/363417 mentions this issue: runtime: use epoll_pwait2 for netpoll if available

@zhouguangyuan0718
Copy link
Contributor

@zhouguangyuan0718 zhouguangyuan0718 commented Nov 11, 2021

Looking back at the beginning of this thread from feb, the original discussion was that netpoller had more latency than previous implementations. This aligned with our pprof data between 1.15.x and 1.16.x, where a large portion of time was spent in netpoll. Is that not still the case with this issue or have I missed interpreted something?

If you focus on the time spent in netpoll. Maybe this issue is related. #43997
The recheck for timers and goroutines of non-spinning M will spend a lot of time.
On our some applications, such as etcd, it have a higher CPU usage in 1.16 than 1.14. And the difference we can find of them is the time spent in netpoll. We analysed the trace data of it. In 1.16, it has more proc start and stop event. They appears when the application become idle from busy.
Our team have bockport the modification in that issue to our own go. The problem can be reaolved.
Maybe the refaction and fix of findrunnable can be backport from 1.17 to 1.16 in official release? @prattmic

@zhouguangyuan0718
Copy link
Contributor

@zhouguangyuan0718 zhouguangyuan0718 commented Nov 12, 2021

By the way, could you review my CL https://go-review.googlesource.com/c/go/+/356253. It's for issue #49026.

We are testing it on our own go. But we don't have enough confidence to use it in production environment.
It's helpful if you have some advice about it.
Or it can be accepted officially. Thank you. @prattmic

@woidl
Copy link

@woidl woidl commented Jan 19, 2022

Dear Contributors,
we have the accuracy-problem on different Windows 10 versions (in current test: version19043.1415) after GO version 1.15.
hrtimedemo shows avg 1ms on 1.15.15, but avg 13.8 ms on 1.17.6.

This is very bad, since we use sleep and ticker funcs for period hardware measuring. I think thouse funcs should always stay in region of 1ms accuracy, otherwise not usable anymore - timer funcs are very important.
Isn't it possible to use the old (good) working fragments in 1.15.xx ?

Many thanks for your great work!

go version go1.15.15 windows/amd64
hrtimeDemo.exe
avg 1.01ms; min 77.1µs; p50 1ms; max 2.02ms;
p90 1.19ms; p99 2.02ms; p999 2.02ms; p9999 2.02ms;
77.1µs [ 2] █▌
200µs [ 0]
400µs [ 1] ▌
600µs [ 5] ████
800µs [ 34] ███████████████████████████
1ms [ 50] ████████████████████████████████████████
1.2ms [ 5] ████
1.4ms [ 0]
1.6ms [ 1] ▌
1.8ms+[ 2] █▌

go version go1.17.6 windows/amd64
hrtimeDemo.exe
avg 13.8ms; min 16.1µs; p50 15.4ms; max 16.8ms;
p90 16ms; p99 16.8ms; p999 16.8ms; p9999 16.8ms;
16.1µs [ 11] █████▌
2ms [ 0]
4ms [ 0]
6ms [ 0]
8ms [ 0]
10ms [ 0]
12ms [ 0]
14ms [ 76] ████████████████████████████████████████
16ms [ 13] ██████▌
18ms [ 0]

@gmalouf
Copy link

@gmalouf gmalouf commented Jan 25, 2022

I want to pile in saying we are also experiencing this issue, causing us to rollback to Go 1.15 for our project (performance-sensitivity, degradation not worth any other benefits right now). This has been proven present on Linux-based distributions, so it would be great if the tagging reflected that (or removed the Windows-specific one) to help with prioritization.

@tsachiherman
Copy link

@tsachiherman tsachiherman commented Jan 25, 2022

I wish there was a voting for this ticket, as I would have given it a +10 ( just like @jhnlsn). The current labeling of this issue, in my view, as a Windows-specific issue is misleading.

If I read what's being written here correctly, then go 1.16 introduces several independent regressions within it's event triggering subsystem :
The first issue, is the higher CPU pressure on a CPU-intensive application; in particular, those that involve a high number of context switching and blocking operations.
This issue would affect all range of solutions across the board, and would break CPU utilization linearly ( i.e., x2 CPU won't translate into x2 work due to the newly introduced overhead ).

Second, on Linux-based systems, the entire time-based event system ( which includes time.Sleep, but also time.After and time.Timer ), was downgraded to have millisecond level accuracy.
This regression on its own could have been tolerated if "it was always like that". Unfortunately, it puts into question any previously deployed platform that would be using these calls, as these platforms would need to be re-evaluated and tested with these regressions in mind before they could be upgraded to 1.16 ( or above ).

Third, on Windows, the regression ( which I have not verified myself, and I'll refer to @woidl's excellent post to speak for itself ) is making the usage of these calls absurdly hard to justify.
A timing precision of 15ms was acceptable 25 years ago, when writing windows programs that were waiting for WM_TIMER messages. Nowadays, windows have evolved considerably and there is no justification for having this slow mechanism ( ** maybe with the exception of a UI related activity )

I do realize the fixing these issues might not be trivial; especially as @prattmic noted with his Linux v5.11 solution. Could I suggest that beside prioritizing this issue, the documentation itself would be updated to reflect the above ?
As is, the usage of these functionalities might be reasonable for coarse resolution timers, but unsuitable for high precision timer ( as was previously supported in 1.15 ). Documenting this would reduce the false expectation of these methods.

Do you agree with my conclusion above @mknyszek @ChrisHines ?

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 3, 2022

With regards to fixes, I've thought about this more today and my thoughts are:

  • For Linux, we should use epoll_pwait2, like https://go.dev/cl/363417. This system call is on the newer side, but this will improve things going forward and provide a workaround (upgrade the kernel) for particularly affected users.
  • For Windows, I think we can do something like @jstarks' (2) in #44343 (comment). We already have a per-thread high resolution timer, so netpoll would do:
    • SetWaitableTimer, to set the timeout.
    • WaitForMultipleObjects to wait for IO or timeout.
    • GetQueuedCompletionStatusEx, nonblocking, if the prior call indicated IO readiness.
    • This seems fairly straightforward, the main concern is how much more expensive is making these two extra calls?

cc @aclements

@prattmic prattmic removed this from the Backlog milestone Feb 18, 2022
@prattmic prattmic added this to the Go1.19 milestone Feb 18, 2022
@aclements
Copy link
Member

@aclements aclements commented Feb 22, 2022

This seems fairly straightforward, the main concern is how much more expensive is making these two extra calls?

If we're willing to assume that long timers require less precision, we could somewhat reduce this cost by only using the high-resolution timer if the timeout is short, and otherwise using the low-precision timeout to the blocking GetQueuedCompletionStatusEx.

We could go even further and record the precision when a timer is enqueued, so we know at creation-time how precise we have to be once it bubbles to the top of the heap. However, this is complicated because the top timer might be a low-precision timer firing soon, immediately followed by a high-precision timer and we actually need a high-precision wait to avoid missing the second timer.

@CannibalVox
Copy link

@CannibalVox CannibalVox commented Mar 7, 2022

If we're willing to assume that long timers require less precision, we could somewhat reduce this cost by only using the high-resolution timer if the timeout is short, and otherwise using the low-precision timeout to the blocking GetQueuedCompletionStatusEx.

We could go even further and record the precision when a timer is enqueued, so we know at creation-time how precise we have to be once it bubbles to the top of the heap. However, this is complicated because the top timer might be a low-precision timer firing soon, immediately followed by a high-precision timer and we actually need a high-precision wait to avoid missing the second timer.

Would this involve returning to 1ms precision on the regular timer? There are probably many cases where +/- 8ms are acceptable, but I doubt they account for the majority of sleeps in any given go program.

@woidl
Copy link

@woidl woidl commented Mar 8, 2022

Hi, I continue with my opinion that 1ms resolution is needed.
Hope you can solve the issue (still approx 14ms resolution on Windows 10/11 with go1.17.8 windows/amd64) in near future.
Thank you all for doing a great job!

@aclements
Copy link
Member

@aclements aclements commented Mar 8, 2022

Would this involve returning to 1ms precision on the regular timer? There are probably many cases where +/- 8ms are acceptable, but I doubt they account for the majority of sleeps in any given go program.

I'm not sure what you mean by a "regular" timer. If it's a short sleep or timeout, we would use a slightly more CPU-expensive way to wait that has higher precision. Hopefully we can do that without timeBeginPeriod.

I suspect it's actually quite common for programs to have long timers. RPC-driven programs often have a large number of long-duration timers for timeouts (often on the order of minutes). They also often go idle, which would enter a blocking netpoll the same way a sleep does.

@CannibalVox
Copy link

@CannibalVox CannibalVox commented Mar 8, 2022

I'm not sure what you mean by a "regular" timer. If it's a short sleep or timeout, we would use a slightly more CPU-expensive way to wait that has higher precision. Hopefully we can do that without timeBeginPeriod.

I suspect it's actually quite common for programs to have long timers. RPC-driven programs often have a large number of long-duration timers for timeouts (often on the order of minutes). They also often go idle, which would enter a blocking netpoll the same way a sleep does.

I see, you're right. The main concern I have, then, is that semaphores (and therefore notesleep/notewake) are still based on the system timer granularity. I understand that this is the "slow" path, but there's a difference between 1ms futex and a 16ms (when unmodified) singleobjectwait.

tsachiherman added a commit to algorand/go-algorand that referenced this issue Mar 21, 2022
## Summary

This PR upgrade the go-algorand repository to use go 1.16 while adding workarounds for golang/go#44343.

## Test Plan

Use performance testing to verify no regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation OS-Linux OS-Windows
Projects
Status: Todo
Development

No branches or pull requests