Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: windows system clock resolution issue #8687

Open
defia opened this issue Sep 9, 2014 · 81 comments
Open

runtime: windows system clock resolution issue #8687

defia opened this issue Sep 9, 2014 · 81 comments

Comments

@defia
Copy link

@defia defia commented Sep 9, 2014

What does 'go version' print?

go version go1.3.1 windows/amd64

actually it's just the same as the issue in chrome:
https://code.google.com/p/chromium/issues/detail?id=153139

What steps reproduce the problem?

1.download and run system clock resolution tool provided by the link in
https://code.google.com/p/chromium/issues/detail?id=153139
it will show your current timer interval
2.if current timer interval smaller than 15.625ms,try to close running programs until it
goes to 15.625ms
3.run any program compiled by go

What happened?
current timer interval goes to 1ms when go compiled bins running,and goes to 15.625
after quiting

What should have happened instead?
current timer interval should not change

at least,with no time package used, the timer setting should not change, or there should
be a method to control this
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 9, 2014

Comment 1:

Labels changed: added repo-main, release-go1.4, os-windows.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Sep 10, 2014

Comment 2:

I confirm what DefiaQ said.
The problem is that we call timeBeginPeriod Windows API. It was introduced as part of
pprof implementation
changeset:   9786:9c5c0cbadb4d                           
user:        Hector Chu <hectorchu@gmail.com>            
date:        Sat Sep 17 17:57:59 2011 +1000              
summary:     runtime: implement pprof support for windows
but I view it as "windows timer precision is very very low (about 15ms) comparing to
other OSes, so lets get best precision whenever we can (1ms)".
timeBeginPeriod has many problems. It does not work on some systems (it does not work on
my oldish windows xp). I suspect, it fails, if you don't have correct access rights on
the system (it works for administrator). When successfully set timeBeginPeriod will
drain laptop batteries faster (I read about it everywhere, I didn't have chance to
experience it on practice). But, I suspect, many other applications call timeBeginPeriod
too, so Go is not the only one at fault here.
Personally I don't care if timeBeginPeriod is set, but others might feel different.
Perhaps there is case here for some way to control it from within Go program.
ALternatively, people who care, can just remove timeBeginPeriod call in their own
runtime.
Alex
@defia
Copy link
Author

@defia defia commented Sep 10, 2014

Comment 3:

it will succeed even without administrator privilege.
so far i didn't find any applications other than chrome change timer precesion
@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Sep 10, 2014

Comment 4:

Well, I am not running chrome but my timer interval is set to 1ms.
Also, why did you said "... if current timer interval smaller than 15.625ms,try to close
running programs until it goes to 15.625ms .. "? You should have said " ... close chrome
...".
Alex
@defia
Copy link
Author

@defia defia commented Sep 10, 2014

Comment 5:

i also suspect other applications will change the setting , though i havn't found any so
far
@rsc
Copy link
Contributor

@rsc rsc commented Sep 18, 2014

Comment 6:

Too late for 1.4. Maybe for a future release, if someone wants to do the work to prove
that it matters and that removing this is safe. I think without a clear report about bad
implications we won't do that work ourselves.

Labels changed: added release-none, removed release-go1.4.

Status changed to Accepted.

@defia defia added accepted labels Sep 18, 2014
@defia
Copy link
Author

@defia defia commented Dec 9, 2014

i made a package to call timeEndPeriod to reset the timer to normal on initialization.

if the program doesn't need much performance, but runs on a windows laptop for a long time, just import the package.
i've using this for some days, everything seems to be well so far

@defia
Copy link
Author

@defia defia commented Dec 11, 2014

I've done some benchmarks today.
At first I ran a cpu-heavy program and modified it to use hundreds of goroutines. To my surprise,it actually ran ~5% faster if I set the system clock resolution to 15.6ms(that is default value)

Then I ran this benchmark
I didn't look into the code, I guess it use only one goroutine, still pure cpu thing, and get still the same result.

Then I wrote a http benchmark here
This time it ran a lot faster(~50%) with 1ms system clock resolution.

So I guess, if your program don't use much IO (cause less context switches of goroutine), you can discard the system clock resoluton change.

@minux
Copy link
Member

@minux minux commented Dec 11, 2014

Could we use QueryPerformanceCounter for pprof support?

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Dec 11, 2014

And how are you going to use QueryPerformanceCounter in pprof support?

Alex

@mattn
Copy link
Member

@mattn mattn commented Dec 12, 2014

cyclesPerSecond uses systime in os1_windows.go

https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L314-L316

https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L286-L306

This part is possible to be replaced with QueryUnbiasedInterruptTime (not QueryPerformanceCounter).

@randomascii
Copy link

@randomascii randomascii commented Jun 12, 2015

I've blogged about this issue in the past (https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/) and I wanted to share what I have learned:

As defia pointed out, having a higher timer frequency wastes CPU time and can make some CPU heavy programs run more slowly. This is because the OS is waking up ~15 times more frequently, consuming CPU time each time it does. It also means that other programs may wake up more frequently and waste CPU time - some software sits in Sleep(1) loops and will wake up ~15 times more frequently when the timer resolution is changed. So, raising the timer frequency can waste performance.

Raising the timer frequency will almost always waste some power and will harm battery life. This is the kicker. This is why no program should ever raise the timer frequency unless it needs the timer frequency raised. The timer frequency is a global setting and raising it without a good reason is poor citizenship.

If you have a work loop that relies on Sleep(1) or timeouts from WaitForSingleObject(handle, 1) then raising the timer frequency will make your loop run faster. Presumably that is why the http test ran faster. Most software should not be relying on timeouts, it should be running based on events. But, it is reasonable for specific go programs to raise the timer frequency if needed.

It is not reasonable for the go runtime to unilaterally raise timer frequency.

@randomascii
Copy link

@randomascii randomascii commented Jun 12, 2015

With the background information out of the way...

I discovered this issue because I noticed (using clockres.exe) that the timer frequency on my computer was always raised. I used "powercfg /energy /duration 5" to identify the culprit, a go program that was running in the background. This go program was monitoring system status and was idle about 99.9% of the time but because of the go runtime's decision to raise the timer frequency for all go programs this background program was measurably affecting power consumption and battery life. This 99.9% idle go program was affecting my computer more than a program that was actively running.

This was clearly a case where the program did not need a higher timer frequency. Raising the timer frequency is a reasonable option for the go runtime, but it is absolutely not a reasonable default for the go runtime.

Chrome has been fixed to not leave the timer frequency high, so has SDL, and so have many other products. Go needs to do likewise. Please.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Sep 15, 2015

@randomascii

I appreciate and share your concerns. So I tried to measure the impact of removing timeBeginPeriod call on my windows 7 amd64 computer. (see bench.txt in https://gist.github.com/alexbrainman/914885b57518cb4a6e39 ). Unfortunately I can see quite significant slowdowns here and there. The change looks like a no go for me. Maybe gophers who know more about Go scheduler can explain the slowdowns and suggest ways to improve the situation.

For the moment, if someone cares about this, they can comment out call to timeBeginPeriod.

Alex

@randomascii
Copy link

@randomascii randomascii commented Sep 15, 2015

I know that some video applications (games, Chrome, a few others) have to leave the frequency high in order to accurately schedule tasks and meet frame rates. For other programs I don't understand the need to have the timer frequency high. I would imagine that tasks would typically be happening as quickly as possible (reading files, doing calculations, passing messages) and I'm curious as to what sort of timeouts, or waits, or sleeps are being affected by the timer frequency.

Anyway, if some Go programs need the timer frequency then those Go programs should raise the timer frequency. I have no problem with that. But if the Go runtime raises the timer frequency then that is a problem. We could debate whether raising the timer frequency should be opt-in (my strong vote) or opt-out, but it should be a choice. Otherwise long-running low-activity Go programs will be blacklisted by anybody who wants good battery life.

Go deserves better than that.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Sep 15, 2015

I would imagine that tasks would typically be happening as quickly as possible ...

Things are never simple, it is always about trade-offs. But I don't know enough about Go runtime to give you explanation. And I am hopeful that other gophers will comment. All I can say is that Windows makes the task much harder when everything is measured in 15 milliseconds (comparing to nanoseconds on linux) - imagine running scheduler that performs equally well on both. But maybe Go runtime can be improved on windows, and we can disable timeBeginPeriod call.

Alex

@randomascii
Copy link

@randomascii randomascii commented Sep 15, 2015

FWIW, an ETW trace (see https://github.com/google/UIforETW/releases for the easiest way to record one) can be used to record, among other things, all context switches. This makes it possible to determine where applications are waiting.

The Windows timer resolution is a continuing source of frustration to be sure, but it only slows down code that calls Sleep(n) or WaitForSingleObject(handle, n) (where 'n' is not INFINITE).

Looking at bench.txt I see that *OneShotTimeout-2 runs slower with a lower timer frequency. That sounds like it is by-design and shouldn't be used to justify all Go programs running with a high timer frequency. Ditto with BenchmarkGoroutineIdle-2.

BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.

I strongly suspect that the latter is the problem. timeGetTime's precision is increased when the timer frequency is increased. The benchmarks with the lower timer precision are mostly not slower they are just less accurate. The correct fix is not to raise the timer frequency (the timers will still be inaccurate) but to use accurate timers such as QueryPerformanceCounter.

It is a frustrating problem I'll agree, but raising the timer frequency so that benchmarks can measure the execution time better is not a great solution.

If it turns out that the benchmarks such as BenchmarkAppendGrowByte-2 are actually slower when the timer frequency is lower then I will be surprised and might actually investigate. Fixing slow benchmarks requires understanding why they are slow.

danhhz pushed a commit to danhhz/cockroach that referenced this issue May 28, 2019
time.Now() has a precision of about 1ms on Windows[0], which is coarse
enough to fail some of our tests. This implementation has a precision
of about 10µs, which is fine enough to trip a hastily-written test in
storage; that test is also fixed here.

[0] golang/go#8687

Note that this implementation is significantly slower than time.Now:
```
name   old time/op  new time/op   delta
Now-2  14.0ns ± 3%  339.4ns ± 4%  +2332.32%  (p=0.000 n=9+9)
```
I suspect this is because of the explicit syscall required. time.Now
avoids the syscall by reading the system clock directly[1][2], so if
we need better performance, we can do something similar.

[1] golang/go@74b62b4
[2] https://github.com/golang/go/blob/go1.8/src/runtime/os_windows.go#L581:L624

(sapling split of 3fc5c81)
@randomascii
Copy link

@randomascii randomascii commented Nov 28, 2019

I am a Windows developer working at Google. I checked recently and I found ten (10!) different programs on my work machine that were all raising the Windows system-global timer interrupt frequency to 1 kHz. I raised this with our winops team who is responsible for creating and pushing these tools and they said that it is likely that these are all Go tools.

This means that my machine is essentially always running with a raised timer interrupt frequency. This then makes it quite difficult to do any testing of Chrome's behavior with the timer interrupt frequency not raised.

The winops team is not thrilled about having to apply mitigations to every tool which they write in order to stop the Go Runtime from doing this. In addition, all that they can do is revert the timer frequency change after it happens, which means that it still happens briefly whenever a Go programs is launched.

Can we please get the Go Runtime to stop doing this? It's a bad default. If Go ever becomes popular on Windows then it will need to be fixed to avoid globally increasing power usage and it would be better to fix it now.

Any regressions from fixing it necessarily come from inappropriately designed benchmarks which are inserting delays and hoping that they will be short.

If specific Go programs need the timer frequency raised then they should raise it themselves.

The internal Google bug number that I filed is 133354039.

This is the script that I have written in order to record these timer frequency transitions is:
https://github.com/google/UIforETW/blob/master/bin/trace_timer_intervals.bat

@randomascii
Copy link

@randomascii randomascii commented Nov 28, 2019

It was just pointed out to me that the Go Runtime was changed to only raise the timer interrupt frequency on demand.

11eaf42

This is an improvement, to be sure, but it is still not ideal. Because the Windows timer interrupt is a system global resource, any changes which Go makes to the timer interrupt frequency will affect other programs. That is, due to the many Go programs which our winops team runs on my workstation I can never test what Chrome behaves like when the timer interrupt frequency is "normal". Our winops team doesn't want or need the raised timer frequency and would appreciate a way to opt-out (or, better yet if the timer adjustments were opt-in).

Note that if the osRelax() implementation was behind a flag then this would also give an easy fix to this issue:

#28255

TL;DR - automatically changing the system-global timer interrupt frequency is not universally desirable and should be opt-in (ideally) or opt-out (next best). Either solution, but especially opt-in, would fix both this issue and issue 28255.

Thanks!

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2019

Let's revisit in the 1.15 release cycle.

We started using a higher resolution in https://golang.org/cl/4960057, which added support for profiling on Windows. Perhaps we could at least switch to the higher resolution only while profiling, not all the time.

@aclements
Copy link
Member

@aclements aclements commented Dec 3, 2019

I think we're all in agreement that the runtime shouldn't be lowering the system timer frequency. The question is what we should be doing instead.

We tried removing the timeBeginPeriod in 2016 and it caused severe regressions in some benchmarks (#14790) because it affected sysmon's ability to promptly retake Ps from system calls and C calls. We're not doing this because specific Go applications depend on the higher frequency. If that were the case, I'd be happy to give them the API to ask for it. We're doing this because the Go runtime itself depends on the higher frequency. This doesn't affect all Go applications, but it's hard to predict what it will affect; it's clearly not just the games and multimedia applications that typically depend on a higher timer frequency.

I'd be happy to told this is no longer the case, but as far as I know the solution is still to use UMS to detect blocked system calls rather than short timers (this would be great, in fact; UMS is a far better solution than short timers, just nobody has tackled that complexity yet). Until that's implemented, my understanding is that simply removing the timeBeginPeriod calls isn't viable.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 3, 2019

Have we tried changing Windows so that entersyscall acts like entersyscallblock? It seems that then the sysmon timer frequency would be less important, though of course there would be a performance penalty for every non-blocking system call.

@johnrs
Copy link

@johnrs johnrs commented Dec 3, 2019

I've been out of the loop for a year or two, so forgive me for not being up to date on recent changes.

If specific Go programs need the timer frequency raised then they should raise it themselves.

It depends on your priorities. Some need the higher resolution a faster clock offers and aren't as concerned about the extra overhead it costs.

Some other languages do this, too. Perl comes to mind (last time I checked it was about 5 years ago). And there are some other programs which set it fast, sometimes dynamically (as Go does now), for example Chrome.

As far as I can remember: Initially Go set it fast, always. But the user could slow it down. Most were happy.

To make the others happy, Go set it slow after start up. This made them happy but it upset thye other camp. For example, this broke 2 of my programs. But the user could force it fast, so it was easy to patch.

Then the Go runtime decided it needed to be fast from time to time for internal reasons, so it dynamically changed it. But this wasn't coordinated with any user i/f, so the user lost the ability to slow it. Some from both camps ( fast and slow) were upset.

Then Austin came up with the great idea to set it dynamically, when sleep and some other time related functions were called. This make most happy. The slow camp only saw occasional unwanted speedups which automatically reverted about 50ms later. The first time called, however, there could be a long (2 x 15.6ms, if memory serves) delay. This troubled the fast group but they found ways to keep it always fast (my favorite is to make a goroutine that does a 10ms sleep in a tight loop). Almost everyone was happy.

If a stable environment for testing is what you need, then one way is to run a go "utility" which stays in fast mode (as I just described) for the duration of the test, because if ANYONE on the system asks for fast then EVERYONE is affected. On the other hand, if your program runs in a pristine environment and is certain that nobody will do this, then I'd think there is no problem at test time either, as long as you test in the same environment. :)

A work-around might be to give the user the ability to enable (default) or disable the dynamic speedup done based on time-related calls. This wouldn't block what the runtime does for itself, however.

Side note: Why do some want the slow clock? Embedded systems trying to save power and high performance systems who don't want the extra OS overhead. Am I missing any? Have you done an estimate of what percentage effect is incurred by fast vs slow? I realize that situations vary, but some idea of the average and worst case would be interesting, I think.

Another side note: An additional consideration is a problem I mentioned a few years ago. Portable code might require more resolution that Windows is capable of (1ms). The more the disparity (15.6ms) the higher the odds that something bad might happen. The 2 programs I mentioned above both exhibited this. They ran fine on a Linux system, and fine on Windows fast (1ms) - mainly because I had allowed for a clock as slow as 1ms. But they both failed on Windows sloe (15.6ms). So the Go compatibility goals are a consideration also.

Side note: The Go runtime does internal timing much faster than 1ms in about a number places. I asked then if they were safe with a slower (either 1 or 15.6ms) clock. I don't recall anyone ever looking into that.

And then there's the other issue that "always longer than requested" is incorrect. It should say "from 1 tick (whatever that is on your system) less to something longer". This is most troublesome at the 1 tick boundary, of course. For example, on a fast (1ms) Windows system, if you ask for a 1ms sleep, it might sleep anywhere from 0 to 2ms (or longer if it's a busy system). It's the 0ms side that scares me.

@randomascii
Copy link

@randomascii randomascii commented Dec 10, 2019

@ToadKing
Copy link

@ToadKing ToadKing commented Dec 10, 2019

I second the power saving/battery life issue. We ship a Windows program built in Go that is long-running and in the background most of the time. It spends most of the time idle or only waking up and doing very small and quick processing before going idle again. Our use-case would 100% take the power saving over faster performance that a faster clock might bring.

@johnrs
Copy link

@johnrs johnrs commented Dec 10, 2019

I like the idea of an environment/runtime variable. Some (all?) of the problems between the Go runtime and the Go user could be resolved this way. It seems like the best we can hope for.

@randomascii
Copy link

@randomascii randomascii commented Feb 5, 2020

I found another data point to answer the question of why this matters. I just found that the performance measurement test machines used at Google to test Chromium have long-running Go programs on them. This means that the timer interrupt frequency is sometimes/frequently/always raised, which means that Chrome behaves slightly differently on those machines compared to normal consumer machines. This then makes it more difficult for us to validate changes that we make to Chrome to make it less dependent on a raised timer interrupt frequency, because the timer interrupt frequency will still be raised.

FWIW.

@aclements
Copy link
Member

@aclements aclements commented Jul 9, 2020

I just reviewed this issue from the top and my understanding (which hasn’t really changed since December) is that this remains blocked on implementing UMS for blocked syscall detection (#7876). Once we’ve done that, we believe the runtime itself will no longer depend on the high timer frequency, and we can drop timeBeginPeriod from the runtime (and probably expose them for applications that need them).

Unfortunately, we have very little Windows expertise on the runtime team at Google (in a somewhat ridiculous twist, our Windows laptop is in a drawer in my office, which remains inaccessible for the foreseeable future) and it’s difficult to solicit complex changes like this from open source contributors (though that would be extremely welcome!). I’m looking for solutions to this.

Ian suggested using entersyscallblock for all syscalls on Windows, though this would come at a performance penalty for all system calls. I don’t believe this has been tried.

History up to now

The call to timeBeginPeriod was added in 2011 as part of CPU profiling support. However, the runtime had several tight sleep loops and, as a result, increasing the timer frequency became important to the performance of the runtime and hence many Go programs.

In 2015, we'd eliminated nearly all of these sleep loops and it appeared the runtime no longer depended on higher timer frequency, so the frequency adjustment was removed. Unfortunately, it turned out that some benchmarks still suffered (#14790), so timeBeginPeriod was added back. @jstarks's analysis showed that this was because of the sysmon loop that detects blocked system calls. Hence, @alexbrainman suggested that the way to resolve this is to resolve #7876 by switching to user-mode scheduling to detect blocked system calls.

We made some progress on this in 2017 by reducing the timer frequency when the whole process goes idle. This improved the situation, but is far from ideal because the timer frequency is still regularly raised, causing battery drain and interfering with testing of other applications that also need to manipulate the timer frequency (e.g., Chrome).

However, the fundamental issue with detecting blocked system calls in the runtime that led to #14790 remains. Windows UMS is the right way to fix this (indeed, I wish other OSes had an equivalent to UMS; our blocked system call detection just targets the lowest common denominator right now). It seems likely that UMS support will obviate the need for a high timer frequency in the Go runtime itself.

@aclements aclements modified the milestones: Go1.15, Go1.16 Jul 9, 2020
@jstarks
Copy link

@jstarks jstarks commented Jul 9, 2020

Some quick thoughts, from my perspective inside the Windows organization. Although UMS seems like a good fit, I don't believe it will actually work that well for this use case. I can't recall the details right now, but for example I have found that UMS thread switches that should be cheap sometimes result in expensive cross-processor context switches. I looked into fixing this in Windows, but in general UMS is considered deprecated technology, with support only on a subset of x86_64 processors (no planned support for x86 or ARM32 or ARM64).

So while I'd love to make UMS work for this case, it would likely require Windows changes to be effective, and we're not likely to make these Windows changes soon.

However, some good news: newer versions of Windows (from roughly 2019 onward) support high-resolution timers without increasing the timer frequency. There is a (currently undocumented, working to fix this) flag to CreateWaitableTimerExW, CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, that essentially creates a timer that expires when you ask it to instead of aligning to a tick. So a less invasive fix to this problem might be to use this on operating systems that support it.

From the public SDK:

#define CREATE_WAITABLE_TIMER_MANUAL_RESET  0x00000001
#if (_WIN32_WINNT >= _NT_TARGET_VERSION_WIN10_RS4)
#define CREATE_WAITABLE_TIMER_HIGH_RESOLUTION 0x00000002
#endif
@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Jul 10, 2020

@jstarks

There is a (currently undocumented, working to fix this) flag to CreateWaitableTimerExW, CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, that essentially creates a timer that expires when you ask it to instead of aligning to a tick.

I take it you propose to change runtime.usleep2 function

// Runs on OS stack. duration (in 100ns units) is in BX.
// The function leaves room for 4 syscall parameters
// (as per windows amd64 calling convention).
TEXT runtime·usleep2(SB),NOSPLIT|NOFRAME,$48
MOVQ SP, AX
ANDQ $~15, SP // alignment as per Windows requirement
MOVQ AX, 40(SP)
// Want negative 100ns units.
NEGQ BX
LEAQ 32(SP), R8 // ptime
MOVQ BX, (R8)
MOVQ $-1, CX // handle
MOVQ $0, DX // alertable
MOVQ runtime·_NtWaitForSingleObject(SB), AX
CALL AX
MOVQ 40(SP), SP
RET

Am I correct?

How would I use CreateWaitableTimerExW to do that? CreateWaitableTimerExW just creates a handle. How do I use the handle to sleep for intervals of less than 1 ms?

Thank you.

Alex

@jstarks
Copy link

@jstarks jstarks commented Jul 10, 2020

Unfortunately you will now need multiple syscalls, which will increase the overhead of this call (but maybe that's OK since you're sleeping anyway). You'll need to call SetWaitableTimer on the timer handle with the desired due time, and then you can call WaitForSingleObject (or NtWaitForSingleObject, doesn't matter) without a timeout.

Of course, you'll need a separate waitable timer pre-allocated for each thread that might be doing this. Is that one per M?

@zx2c4 may be interested in this thread.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Jul 10, 2020

Thanks for @-ing me in @jstarks.

Right, it looks like usleep2 is executed on an m using its g0 stack. In this case, we'd just add the pre-allocated handle to mOS, where we already have some semaphores and locks and whatnot. The usleep2 code is already invoked via function address. So, in the event that CreateWaitableTimerExW(CREATE_WAITABLE_TIMER_HIGH_RESOLUTION) is successful and populates the member in mOS, we'd just update the value of usleep2Addr to point to usleep2_highres or whatever. Seems doable.

Biggest issue potentially is extra overhead from two syscalls. But the timer period would start with the first call to SetWaitableTimer anyway, so maybe indeed it doesn't really matter. After all, sleeps are always expensive since they tend to hit a scheduler.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Jul 14, 2020

You'll need to call SetWaitableTimer on the timer handle with the desired due time, and then you can call WaitForSingleObject (or NtWaitForSingleObject, doesn't matter) without a timeout.

Thank you. We already use SetWaitableTimer in runtime profiler. So that should be fine.

Unfortunately you will now need multiple syscalls, which will increase the overhead of this call (but maybe that's OK since you're sleeping anyway).

Lets implement the code first, and then we can decide about this. We might do some optimisations, for example, we could call old code for longer wait time, and only call new code for short waits. Or something like that.

I tried running your code https://play.golang.org/p/Zh8F6iqJOE from #14790 (comment) to reproduce #14790. I had to use set GOMAXPROCS=1, otherwise I could not reproduce #14790.

If I run #14790 test against current tip, I see this:

c:\Users\Alex\dev\src\issue\go\14790>set GOMAXPROCS=1

c:\Users\Alex\dev\src\issue\go\14790>go test -bench=. -count=10 > b && %GOPATH%\bin\benchstat b
name               time/op
DefaultResolution  1.08ms ± 2%
1ms                1.09ms ± 1%

c:\Users\Alex\dev\src\issue\go\14790>

But if I change runtime.osRelax, like this

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index a584ada702..a485571003 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -407,6 +407,8 @@ const osRelaxMinNS = 60 * 1e6
 // if we're already using the CPU, but if all Ps are idle there's no
 // need to consume extra power to drive the high-res timer.
 func osRelax(relax bool) uint32 {
+       return 0
+
        if relax {
                return uint32(stdcall1(_timeEndPeriod, 1))
        } else {

I see test output this

c:\Users\Alex\dev\src\issue\go\14790>go test -bench=. -count=10 > b && %GOPATH%\bin\benchstat b
name               time/op
DefaultResolution  9.35ms ±100%
1ms                 1.09ms ± 2%

c:\Users\Alex\dev\src\issue\go\14790>

So, yes, we can still reproduce #14790, if we stop calling timeBeginPeriod from runtime.

Next we should replace timeBeginPeriod with code that uses CreateWaitableTimerExW(CREATE_WAITABLE_TIMER_HIGH_RESOLUTION) to sleep, and see if #14790 is still broken.

I will do that when I have time, unless someone beats me to it.

Did I miss anything?

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.