New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which clock should we use on Windows? #33

Closed
njsmith opened this Issue Jan 23, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@njsmith
Member

njsmith commented Jan 23, 2017

Right now we use time.monotonic everywhere. This is a bit problematic on Windows, where time.monotonic is GetTickCount64, which has ~15 ms resolution. The other option is QueryPerformanceCounter, which has much higher resolution, is also monotonic, and is exposed as time.perf_counter (checked with time.get_clock_info).

The reason for this appears to be that QueryPerformanceCounter has had a troubled past: https://www.python.org/dev/peps/pep-0418/#windows-queryperformancecounter

But all these issues are like "Virtualbox had a bug that was fixed 6 years ago" or "Windows XP is flaky" which is probably true but irrelevant since we don't support it – it's not clear that any of them apply anymore.

Advantages of using a higher-precision clock:

  • Right now if there's just one task running and it does a sleep for t seconds, the actual timeout passed to the underlying system call will be (t + current_time()) - current_time(), which can be pretty imprecise depending on whether the clock ticks over between the two calls. OTOH I don't know what the actual resolution of our sleeping syscall is (currently GetQueuedCompletionStatusEx), or whether anyone cares about millisecond accurate sleeps.

  • I've had two bugs already with tests that assumed that time always, like, passes. These were trivial (like replacing a < with a <=), but it's always annoying to have the tests pass locally then fail on appveyor.

  • If we implement a fancier scheduling system (#32) then we'll definitely need better than 15 ms precision to measure tasks running. (Though there's no reason that the clock we use for that has to match the clock we use for deadlines.)

@njsmith

This comment has been minimized.

Member

njsmith commented Jan 23, 2017

@njsmith

This comment has been minimized.

Member

njsmith commented Sep 8, 2018

@tkerwin reported in chat that they have a program that misbehaves on windows if trio uses time.monotonic, but works correctly if trio uses time.perf_counter. This seems like a pretty good argument for switching over to me.

How risky is it? It looks like libuv uses QueryPerformanceCounter exclusively (that's what uv__hrtime uses, and I don't see any other timer calls in the libuv source code). Chromium uses QueryPerformanceCounter if the CPU is new enough to have a stable rdtsc (ref) so QPC can be fast. A comment notes that in August 2015, this was true for 72% of Chrome's userbase, and it's presumably only gone up in the three years since then.

I guess I'd still like to see the output from these commands on some Windows system:

python -m timeit -s "from time import monotonic" "monotonic()" 
python -m timeit -s "from time import perf_counter" "perf_counter()"

(On my linux laptop, they both give ~0.05 µs/loop)

@tkerwin

This comment has been minimized.

Contributor

tkerwin commented Sep 11, 2018

On Windows 10, with an i5-3570:

λ python -m timeit -s "from time import monotonic" "monotonic()"
5000000 loops, best of 5: 78 nsec per loop

λ python -m timeit -s "from time import perf_counter" "perf_counter()"
2000000 loops, best of 5: 125 nsec per loop
@njsmith

This comment has been minimized.

Member

njsmith commented Sep 12, 2018

Meh, 50 ns is like an attribute lookup or two, I think spending that once or twice per pass through the run loop is fine if it's giving better user-visible behavior.

@tkerwin Any interest in putting together a PR?

tkerwin added a commit to tkerwin/trio that referenced this issue Sep 25, 2018

njsmith added a commit that referenced this issue Sep 29, 2018

Merge pull request #682 from tkerwin/master
Change to high-resolution clock per #33
@njsmith

This comment has been minimized.

Member

njsmith commented Sep 29, 2018

Resolved in #33

@njsmith njsmith closed this Sep 29, 2018

@pquentin

This comment has been minimized.

Member

pquentin commented Sep 29, 2018

And in #682 :)

@pquentin

This comment has been minimized.

Member

pquentin commented Nov 16, 2018

For reference, on macOS 10.14 with a mid-2014 Macbook Pro:

$ python -m timeit -s "from time import monotonic" "monotonic()"
10000000 loops, best of 3: 0.0779 usec per loop

$ python -m timeit -s "from time import perf_counter" "perf_counter()"
10000000 loops, best of 3: 0.0796 usec per loop

It also takes about 80nsec per loop.

pquentin added a commit to pquentin/progress that referenced this issue Nov 16, 2018

Update avg/eta/eta_td only once per second
When the progress bar is updated frequently (every few milliseconds),
the eta can change so quickly that it's impossible to read.

This means we're calling monotonic/time often, but those calls take less
than 100 nsec per loop on Linux, Windows and macOS [0], which is
equivalent to one attribute lookup or two.

[0]: python-trio/trio#33
@njsmith

This comment has been minimized.

Member

njsmith commented Nov 16, 2018

@pquentin That makes sense. There's some confusing indirection to trace through, but if I'm reading it right, when CPython is built for anything besides Windows, monotonic and perf_counter end up calling the exact same code internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment