Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: benchmark is using low resolution time on Windows #31160

Open
egonelbre opened this Issue Mar 30, 2019 · 11 comments

Comments

Projects
None yet
5 participants
@egonelbre
Copy link
Contributor

commented Mar 30, 2019

time.Now has a large resolution on Windows, so the benchmark results end up with a timing error.

As a demonstration here's code that measures QueryPerformanceCounter vs. time.Now: https://play.golang.org/p/FN4AtJ9b51m.

f:\Go\src\sandbox\benchmark>go version
go version devel +2bd767b102 Thu Mar 28 09:36:43 2019 +0000 windows/amd64

f:\Go\src\sandbox\benchmark>go run main.go
                             50%       90%       99%
time.Now    Z=9999935    1000000   1005100   1014600
QPC         Z=0              100       200       200

Which roughly says that my getting measurements at 1000000ns or 1ms granularity.

Fixing nanotime would be problematic (as seen in #8687), however benchmarks could use QueryPerformanceCounter instead.

@egonelbre egonelbre changed the title testing: benchmark is using time.Now on Windows for measuring time. testing: benchmark is using low resolution time on Windows for measuring time. Mar 30, 2019

@egonelbre

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2019

I could make an easy patch like I did in https://github.com/rakyll/hey/blob/master/requester/now_windows.go#L24. However since runtime already contains definitions for QPC was wondering whether it can be organized better.

@egonelbre

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2019

As a more practical example:

func Benchmark(b *testing.B) {
	for i := 0; i < b.N; i++ {
		fmt.Sprintf("hello world")
	}
}

As we vary benchtime it shows quite a significant timing error:

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32                1000                 0.00 ns/op
PASS
ok      sandbox/benchmark       0.261s

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 100000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32              100000                80.4 ns/op
PASS
ok      sandbox/benchmark       0.285s

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32             1000000                67.0 ns/op
PASS
ok      sandbox/benchmark       0.342s

@egonelbre egonelbre changed the title testing: benchmark is using low resolution time on Windows for measuring time. testing: benchmark is using low resolution time on Windows Mar 30, 2019

@networkimprov

This comment has been minimized.

Copy link

commented Mar 31, 2019

Related: #29714

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@egonelbre

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

Ping.

I could make a PR and discuss a specific fix, if that's requires less effort from core maintainers.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

I don't have much to say about this. A higher resolution benchmarking timer certainly seems desirable. What are the downsides?

cc @alexbrainman for windows

@egonelbre

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

Some duplicated code for windows and handling for other platforms to be compatible with that.

Specifically the qpc part

// useQPCTime controls whether time.now and nanotime use QueryPerformanceCounter.
. Unless there's a nice way to share that code with testing package.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

We could use //go:linkname to get access to it. Or an internal package, perhaps.

@egonelbre

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

After poking it a bit it seems using nanotimeQPC at the moment works only for Wine, since the implementation converts a division into multiplication that doesn't work in the general case.

Usually the way to convert QPC value is result = qpcCounter * 1e9 / qpcFrequency. Currently the implementation uses qpcMultiplier = 1e9 / qpcFrequency and then result = qpcCounter * qpcMultiplier, which only works correctly when frequency is a power of 10.

https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L447

Ideas so far:

  1. add QPC funcs into internal/syscall/windows and use them rather than try to modify runtime to fit the needs
  2. use division inside nanotimeQPC to avoid the error and accept the cost of division
  3. find a faster way to calculate division by qpcFrequency, if there exists one

1 seems the smallest and easiest change. 2 requires adjusting things inside runtime... also, I'm not sure exactly how big the cost of additional division would be in practice. 3 seems slightly overkill, and I don't have a really good idea at the moment.

Based on these thoughts, my instincts tell me to go with 1.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

As a demonstration here's code that measures QueryPerformanceCounter vs. time.Now: https://play.golang.org/p/FN4AtJ9b51m.

I agree. I also see QueryPerformanceCounter performers better than time.Now on my Windows 10 computer. Each qpcDeltas[i] is about 600-700 (ns), while most timeDeltas[i] are 0.

Fixing nanotime would be problematic (as seen in #8687)

I don't see why we cannot change nanotime to use QueryPerformanceCounter. Remind me, please. I just changed runtime

$ git diff
diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index d3e84fe..8f1373c 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -251,10 +251,10 @@ func loadOptionalSyscalls() {
                throw("WSAGetOverlappedResult not found")
        }

-       if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
+       //if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
                // running on Wine
                initWine(k32)
-       }
+       //}
 }

 //go:nosplit
$

and it seems to be working fine.

As we vary benchtime it shows quite a significant timing error:

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32                1000                 0.00 ns/op
PASS
ok      sandbox/benchmark       0.261s

I would not use -benchtime 1000x for this test on Windows. Just let your benchmark run for long enough (default of 1 second is good), and you should be good.

Ideas so far:

  1. add QPC funcs into internal/syscall/windows and use them rather than try to modify runtime to fit the needs
  2. use division inside nanotimeQPC to avoid the error and accept the cost of division
  3. find a faster way to calculate division by qpcFrequency, if there exists one

1 seems the smallest and easiest change. 2 requires adjusting things inside runtime... also, I'm not sure exactly how big the cost of additional division would be in practice. 3 seems slightly overkill, and I don't have a really good idea at the moment.

Based on these thoughts, my instincts tell me to go with 1.

I would (if I had free time) try 2 or 3 first. Maybe division is not expensive enough? Can we do division in every 100's nanotime call to adjust for the drift?

Making runtime.nanotime and time.Now more precise might be beneficial in other areas (not just for benchmarks).

I am also worried about creating too many functions that read time. They all use different approach to collect time. How do you know the time they return is all in sync? And different implementations might diverge.

Alex

@egonelbre

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

As far as I recall QPC is unreliable for measuring long periods of times.

Best reference mentioning issues I was able to find https://stackoverflow.com/a/5297163/192220. More in depth info on timing issues https://github.com/chromium/chromium/blob/08dbb44e81454b6d67c3b6f4989e7e58e88f4b0b/base/time/time_win.cc#L14. MSDN also only suggests it for measuring short periods of time https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps.

I would not use -benchtime 1000x for this test on Windows. Just let your benchmark run for long enough (default of 1 second is good), and you should be good.

Yeah, I wouldn't use that value either. This is to demonstrate the make the issue more obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.