New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-31415: Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t #3942
Conversation
The first commit of this PR ("Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t") modifiy _PyTime_GetWinPerfCounter() to get the clock value as a _PyTime_t directly. The complexity is to avoid integer overflow, since the function computes value * SEC_TO_NS / perf_frequency, where SEC_TO_NS = 1_000_000_000. In my Windows 8.1 VM, perf_frequency = 10_000_000, and so the ratio becomes 100/1. At the end, _PyTime_GetWinPerfCounter() just has to multiply the clock by 100. There is no risk of precision loss by an irrational denominator. While modifying Windows time.perf_counter(), I also modified macOS time.monotonic() which also uses a numerator/denominator ratio. It's just to make the code more consistent, since on macOS Sierra, the ratio is 1/1 and so computing GCD is uselss. But the GCD is only computed once, so it doesn't matter. I finished with a code cleanup to normalize the code for the PEP 7: add { ... } since I like "my" tiny pytime.c file :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look wrong to me, but I'd never really followed how the old version worked, so I'm not sure that actually counts for much :)
Using pseudo-code, this big PR replaces:
with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation - I agree this patch does indeed achieve that goal.
One comment inline regarding a function name that you may want to change to improve readability, but I don't consider that a blocker for merging.
Python/pytime.c
Outdated
assert(div >= 1); | ||
|
||
_PyTime_t t = *tp; | ||
if (t > _PyTime_MAX / mul) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that if the low order bits in mul
are zero, and the high order bits in div
are also zero, then shifting mul
right and div
left could allow even more computations to avoid overflow.
I'd be surprised if that was worth the extra complexity given the GCD calculations further down, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using GCD, there is no need to count the number of zero low order bits.
I'd be surprised if that was worth the extra complexity given the GCD calculations further down, though.
On Windows, the ratio is SEC_TO_NS / perf_frequency where SEC_TO_NS=1_000_000_000. On my Windows 8.1 VM, perf_frequency=10_000_000. At the end, you get numer=100 and denom=1.
If you only use binary shifts, you loose the common divisor 5.
Python/pytime.c
Outdated
if (!QueryPerformanceFrequency(&freq) || freq.QuadPart == 0) { | ||
PyErr_SetFromWindowsErr(0); | ||
if (perf_frequency == 0) { | ||
if (get_perf_frequency(&t0, &perf_frequency, &numer, &denom) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost missed that this initialised t0
in addition to perf_frequency
. Since it initialises all the statics for this function rather than just perf_frequency
, perhaps _init_win_perf_counter
would be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the method to init_win_perf_counter(). No need for a "_" prefix, since the function is marked as private using "static" keyword.
I wasn't confortable with checking for integer overflow and raising an exception on clock read. So I replaced the check at each clock read with a check only done once when a clock is initialized. The maximum multiplier should now be lower than 291. On macOS, the multipler is always 1, so it's ok. On my Windows 8.1 VM, the multiplier is 100 (so it's ok on my VM). I don't know for older Windows versions, we will see during the beta stage :-) If the check fails on some machines, we will have to find a different strategy, like using larger integers (128 bits) to prevent any risk of integer overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this change? Why using integer arithmetic is better than using floats?
A large number of brace additions distracts attention and makes reviewing harder. I'm not sure that I haven't skipped some meaningful changes.
Python/pytime.c
Outdated
if (!QueryPerformanceFrequency(&freq) || freq.QuadPart == 0) { | ||
PyErr_SetFromWindowsErr(0); | ||
if (perf_frequency == 0) { | ||
if (init_win_perf_counter(&t0, &perf_frequency, &numer, &denom) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that _PyTime_GetWinPerfCounterWithInfo()
now can return -1 without setting an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's on purpose.
_PyTime_Init() makes sure the time.time(), time.monotonic() and time.perf_counter() clocks work. If a function returns -1 without raising an exception, it's ok. _Py_InitializeMainInterpreter() calls Py_FatalError() with a nice error message if _PyTime_Init() fails. If an exception is raised, it is displayed. If no exception is raised, we only have the Py_FatalError() message which should be enough.
I only uesd the -1 without any exception set for cases which "should never occur in practice" :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use asserts instead?
It would be better to document this behavior explicitly, because it is different from conventional behavior of C API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use asserts instead?
I prefer to not make too many assumptions on clock. Some platforms are really weird.
When I implemented time.monotonic(), I added a check at runtime, but only in debug mode, to really make sure that the clock is monotonic. The test failed on a strange Debian VM. We decided to just drop the assertion... Honestly, I only added the assertion to really check if some weird clock exist in the world. So yes, they do exist :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to document this behavior explicitly, because it is different from conventional behavior of C API.
In fact, the same C function is used in 3 different ways:
- _PyTime_Init(): read each clock once to check if the clock works. The check is required since clock_gettime(CLOCK_MONOTONIC) can return an error on Hurd for example.
- time.get_clock_time()
- time.time(), time.perf_counter(), time.monotonic()
Python/pytime.c
Outdated
static _PyTime_t | ||
_PyTime_GCD(_PyTime_t a, _PyTime_t b) | ||
{ | ||
_PyTime_t x, y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reuse a and b as in common implementation of Euclid algorithm? This would slightly simplify the code. In any case you need a one temporary variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do once I will rebase this PR.
gcd = _PyTime_GCD(timebase.numer, timebase.denom); | ||
timebase.numer /= gcd; | ||
timebase.denom /= gcd; | ||
if (timebase.numer < 1 || timebase.denom < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sanity checks "just in case". In practice, mach_timebase_info() always return (1, 1).
These checks avoid a possible division by zero.
gcd = _PyTime_GCD(num, den); | ||
num /= gcd; | ||
den /= gcd; | ||
if (num < 1 || den < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
freq.QuadPart is supposed to be >= 1... but I prefer to be super safe, and check anything, "just in case".
The check is cheap and only done once, whereas a crash in bug can be very annoying and complex to debug.
Python/pytime.c
Outdated
diff = diff / (double)cpu_frequency; | ||
return _PyTime_FromDouble(t, diff, _PyTime_ROUND_FLOOR, SEC_TO_NS); | ||
QueryPerformanceCounter(&now); | ||
t = (now.QuadPart - t0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(now.QuadPart - t0) shouldn't overflow. Both values are supposed to be positive and the clock is supposed to be monotonic. In practice, there is no warranty, the clock might go backward, so it isn't used for time.monotonic() but only time.perf_counter().
Note: my PR doesn't introduced this code, it exists since Python 2.7 at least:
https://github.com/python/cpython/blob/2.7/Modules/timemodule.c#L184
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean that the subtraction can overflow. I mean that casting the result from LARGE_INTEGER to _PyTime_t can overflow. In the current code the result is casted to double. It never overflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Stupid GitHub, my rebase dropped the commits, I don't have the context anymore.)
No, in practice, _PyTime_t == LONGLONG, it's the same type. I only added an assertion to be "future proof".
Hum, I'm making more changes than I expected. I created the PR #3955. Once the cleanup PR will be merged, I will rebase this one on top of it to keep this PR reviewable. |
This PR is a follow-up of my first PR which was already merged. See my comment: This PR is supposed to reduce the risk of precision loss even further. |
I checked test.pythoninfo on Windows 3.x buildbots and AppVeyor. I looked at the resolution of time.perf_counter:
Full example:
resolution=2.793651148400146e-07 can be an issue. It gives QueryPerformanceFrequency()==3579545, but the fraction cannot be simplified much:
|
Note: 3579545 Hz comes from the Programmable Interval Timer (PIT) chip (also called an 8253/8254 chip). https://stackoverflow.com/questions/2345599/why-is-my-stopwatch-frequency-so-low
|
The _PyTime_GetWinPerfCounterWithInfo() function now only uses _PyTime_t (integer) internally, rather than using C double to divide the clock by the performance frequency. Add _PyTime_GCD() to compute the smallest numerator and denominator to apply the performance frequency in _PyTime_GetWinPerfCounterWithInfo(). Modify also pymonotonic() macOS implementation which uses mach_absolute_time() and mach_timebase_info() to use _PyTime_GCD(). Add also _PyTime_MulDiv() to apply a numerator and denominotor on a timestamp but also check for integer overflow.
I abandon this PR, my approach doesn't work. This PR is superceded by the PR #3964 which uses C double for time.clock() and time.perf_counter(). While my PR was funny to write and works nicely with QueryPerformanceFrequency() == 10**7 (10_000_000, 10 MHz, 100 ns resolution), it leads quickly to integer overflow in _PyTime_MulDiv() with QueryPerformanceFrequency() == 3,579,545. I don't know how to implement 128-bit int mul and 128-bit int div in a portable way, especially on 32-bit architecture. I tried __int128 in Visual Studio but I got an error in x64 (64-bit) mode (so I don't expect it to work in 32-bit mode). I don't want to create temporary "heavy" PyLongObject... just to read a clock. Note: 3,579,545 is decomposed as prime numbers as 5 * 715,909. The 715,909 number is not really convenient for integer arithmetic with 64-bit integer (since I would like to store nanoseconds). |
https://bugs.python.org/issue31415