Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Python to using native Win32 condition variables #104530

Open
adr26 opened this issue May 16, 2023 · 10 comments
Open

Switch Python to using native Win32 condition variables #104530

adr26 opened this issue May 16, 2023 · 10 comments
Labels
OS-windows type-feature A feature request or enhancement

Comments

@adr26
Copy link
Contributor

adr26 commented May 16, 2023

Py_HAVE_CONDVAR denotes support for Python condition variable APIs. On Win32, the _PY_EMULATED_WIN_CV macro exists to allow condition variables to be emulated using Win32 critical section and semaphore APIs.

Native Win32 support for condition variables has existed from Windows Vista onwards (courtesy of my friend and fellow ex-MSFT colleague Neill Clift).

An implementation of Python condition variable also exists with _PY_EMULATED_WIN_CV set to 0 which uses native Win32 condition variable / SRW lock (mutex) support.

Since support for Windows XP was dropped as of Python 3.5, I propose moving to the native Win32 condition variable APIs, as they are more efficient and easier to maintain.

Linked PRs

@adr26 adr26 added the type-feature A feature request or enhancement label May 16, 2023
adr26 added a commit to adr26/cpython that referenced this issue May 16, 2023
@eryksun
Copy link
Contributor

eryksun commented May 16, 2023

PyThread_type_lock in "Python/thread_nt.h" currently defaults to using a mutex and a condition variable (emulated) instead of just using a semaphore, whereas the POSIX implementation in "Python/thread_pthread.h" prefers to use a semaphore instead of a mutex and a condition variable. Possibly the decision on Windows was in anticipation of using native condition variables. But a downside of that choice, as things stand, would be that we wouldn't be able to implement the intr_flag parameter of PyThread_acquire_lock_timed(). That's because there's no documented way to interrupt SleepConditionVariableSRW() for a particular thread.

For now there's no difference because no one has bothered to modify the Windows implementation of PyThread_acquire_lock_timed() to support intr_flag. It could be implemented rather simply, however. For example, include the SIGINT event in a WaitForMultipleObjects() call, as is used elsewhere in CPython (e.g. by time.sleep()), or use an alertable WaitForSingleObjectEx() call and modify the C signal handler to queue an APC to the main thread via QueueUserAPC().

If native condition variables are adopted, then if we ever want to support a Ctrl+C interrupt when waiting on a threading.Lock() or queue.Queue() in the main thread, a capability that has been implemented on POSIX for many years, then we would need to use a different lock implementation on Windows in "Modules/_threadmodule.c" and "Modules/_queuemodule.c" instead of using a PyThread_type_lock. Or maybe we could keep using semaphores for PyThread_type_lock, while using a condition variable for the global interpreter lock (GIL).

@adr26
Copy link
Contributor Author

adr26 commented May 16, 2023

Hi @eryksun,

I appreciate you looking at these changes.

I had looked at the POSIX implementation of PyThread_acquire_lock_timed() while putting these changes together and in the current pthreads implementation, PyThread_acquire_lock_timed() will return PY_LOCK_INTR if the underlying pthread_cond_wait()/pthread_cond_timedwait() call returns 0 (which denotes successful completion of the condition variable wait, as per the linked POSIX specifications), but lock->locked != 0 (i.e. the condition has not been met).

The POSIX specification details how POSIX condition variables are subject to spurious wakeup:

Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.

and how the execution of a signal handler may cause the condition variable to have a spurious wakeup:

If a signal is delivered to a thread waiting for a condition variable, upon return from the signal handler the thread resumes waiting for the condition variable as if it was not interrupted, or it shall return zero due to spurious wakeup.

Note in particular that this means that PY_LOCK_INTR may be returned by the POSIX implementation of PyThread_acquire_lock_timed() both in the case of a spurious wakeup caused by the execution of a signal handler, and also as a result of spurious wakeups caused by other conditions (e.g. another thread performing a wait on the condition variable having stolen the condition variable signal due to a race between it and the wakeup of this thread from a wait condition, after the condition variable was signalled). As such, a return value of PY_LOCK_INTR means the caller must evaluate whether any signal-related conditions exist and then either handle these or re-wait on the condition variable, as appropriate.

Win32 condition variables are likewise subject to spurious and stolen wakeups, as documented in Raymond Chen's 'Old New Thing' blog and the docs for SleepConditionVariableSRW(). As such, the natural implementation of support for the intr_flag in PyThread_acquire_lock_timed() on Windows when using native Win32 condition variables would appear to be to map the same semantics for returning PY_LOCK_INTR — i.e. to return PY_LOCK_INTR if SleepConditionVariableSRW() returned != FALSE /PyCOND_TIMEDWAIT() returned 0 (success), but mutex->locked != 0 (the condition has not been met).

PY_LOCK_INTR would then be returned from PyThread_acquire_lock_timed() on Windows for spurious condition variable wakeups (as on pthreads), but you could also then implement the required semantics for PY_LOCK_INTR handling a signal…

The C signal handler code could signal all threads waiting on the condition variable. The thread(s) waiting on it would be woken without the condition being satisfied, and so would return PY_LOCK_INTR. Any thread not associated with the C signal being handled would treat the PY_LOCK_INTR as another spurious wakeup, while a thread needing to handle the signal would do so, as in the POSIX case.

That's because there's no documented way to interrupt SleepConditionVariableSRW() for a particular thread.

The documented way to wake SleepConditionVariableSRW() is through Wake­Condition­Variable() (for a single thread) / WakeAllConditionVariable() (for all threads). As explained above, I would envisage Win32 support for the intr_flag in PyThread_acquire_lock_timed() would use WakeAllConditionVariable() to wake all threads waiting on the condition variable.

Given that there appears to be a natural path to implementing support for the intr_flag in PyThread_acquire_lock_timed() on Win32 while using native condition variables, if this is required, I didn't see this as a blocker to implementing the changes I'm suggesting (though I would value further discussion and insights into this problem if this analysis is incorrect).

Thanks, Andrew R

@eryksun
Copy link
Contributor

eryksun commented May 16, 2023

Wake­Condition­Variable() wakes an arbitrary thread, so it's useless, and it's why I chose the phrase "a particular thread". The interpreter handles signals only on the main thread. Thus only WakeAllConditionVariable() would be viable. When the main thread waits on a condition variable, and only the main thread, it would have to store a global reference to the condition variable that's accessible to the C signal handler, which executes on a new, non-Python thread.

@adr26
Copy link
Contributor Author

adr26 commented May 17, 2023

Hi Eryk,

Thanks for clarifying what you meant by "a particular thread" — I hadn't dug into the details of how the signalling handling worked, as the back-of-an-envelope analysis I'd done of how support for the intr_flag in PyThread_acquire_lock_timed() on Win32 was sufficient to convince me that moving to Win32 condition variables and SRW locks would not interfere with that.

I hope that the explanation I've given of how PY_LOCK_INTR can be returned from PyThread_acquire_lock_timed() on POSIX today for any thread as a result of spurious wakeup (whether the main thread in the main interpreter or not) is a convincing enough argument that the semantics I'd proposed for a Win32 implementation wouldn't be fundamentally different from what is implemented today for POSIX.

I think we both agree that WakeAllConditionVariable() would have to be used for a Win32 implementation, because (as you correctly point out) neither of the two Win32 APIs for waking a condition variable can target a specific thread. Your suggestion that the thread which wants to wake from a PyThread_acquire_lock_timed() call to handle signals would store a global reference to the condition variable on which it is waiting, which is accessible to the C signal handler, before it makes the PyThread_acquire_lock_timed() call, further flushes out the details of how this mechanism would be designed. I agree with you that that seems like the appropriate way this should be implemented.

Since semantically, any caller of PyThread_acquire_lock_timed() has to be able to handle PY_LOCK_INTR being returned as a result of spurious condition variable wakeups, the only remaining objection I could envisage would be that the use of WakeAllConditionVariable() is less efficient than it needs to theoretically be, since an individual thread waiting on a condition variable can't be targeted for wakeup using this API (or WakeConditionVariable()). If this is a concern, it would possible to analyse how frequently PyThread_acquire_lock_timed() is called by the singular thread which wishes to handle signals, while another thread also waits on the same condition variable. It may be that this almost never happens, in which case WakeAllConditionVariable() on a waitlist with only one waiter would produce no spurious wakes and target precisely the desired thread for wakeup. If it does happen, then the frequency at which it does (and the amortised expected number of spurious wakeups) should be compared to the average number of spurious wakeups caused by typical use of the condition variables. If the rate is similar, then the use of WakeAllConditionVariable() will not be causing a performance cost which should be a concern, as it will be similar to the 'background noise' of condition variable use.

In any case (to the point in your original post), if the use of Win32 condition variables and SRW locks were to be considered a performance hazard at such point in time as support for the intr_flag in PyThread_acquire_lock_timed() on Win32 is implemented, then the implementation used for the GIL and for PyThread_type_lock could be bifurcated. As such, even such a (hypothetical) performance concern does not preclude making the change to using Win32 condition variables and SRW locks now (when the intr_flag in PyThread_acquire_lock_timed() is not supported on Win32) and thereby achieving the performance advantage that should accrue from doing so.

In summary, I don't see any remaining reason to object to taking this change…

Thanks, Andrew

@adr26
Copy link
Contributor Author

adr26 commented May 17, 2023

Hi @ericsnowcurrently,

Two quick points:

  1. The remaining CI failure on gh-104530: Use native Win32 condition variables #104531 appears to be test__xxsubinterpreters is Occasionally Crashing #104341, which you're chasing.
  2. Do you know how I could usefully test the performance impact (if any) of this change, as that may help the discussion with Eryk?

Thanks, Andrew R

@ericsnowcurrently
Copy link
Member

I've started a benchmarking run for your PR.

@ericsnowcurrently
Copy link
Member

There was a problem. I've restarted the benchmarking run.

@ericsnowcurrently
Copy link
Member

https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20230516-3.12.0a7+-4f0355f

@ericsnowcurrently
Copy link
Member

So a little bit faster, but not much.

@adr26
Copy link
Contributor Author

adr26 commented May 18, 2023

So a little bit faster, but not much.

A shame it's not more, but we have a grocery chain here in the UK whose motto is "Every Little Helps"…

What do you think?

Andrew R

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants