- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1k
 
Description
Hi @daanx,
First thanks again for this awesome library!
My team (COD devs @ Activsion) have spotted some issues with hanging akin to #890 but on Windows. Very rare, real hard to repro, but given our fleet of test infra it happens quite a lot.
We found that it maybe due to improper use of atomics, and compiling to C++ possibly fixes it (comparing the assembly generated .cod files from MSVC) shows the C++ version would add memory barriers, and the nop (npad 1) instruction, and possibly other changes.
So this was hopeful, but wanted to check what else changes by moving to C++, and I've found that now MSVCP140.dll loads too (which is not unexpected in general, but before it wasn't). And here are the two functions it needed.
(we have special custom build of mimalloc with switchable flavors at runtime controlled by env, but essentially each "mimalloc-flavor-xxxx.dll" is a mimalloc compiled with special flags, in case you are wondering about the naming).
So from the MSVCP140.dll it needs _Thrd_yield an std::get_new_handler - now the latter is expected I guess, but I got curious about the former.
Why would changing to C++ would need this now, and not before, and this led me to this discovery:
mimalloc/include/mimalloc/atomic.h
Line 361 in 09a2709
| #if defined(__cplusplus) | 
#if defined(__cplusplus)
#include <thread>
static inline void mi_atomic_yield(void) {
  std::this_thread::yield();
}
#elif defined(_WIN32)
static inline void mi_atomic_yield(void) {
  YieldProcessor();
}
#elif defined(__SSE2__)
#include <emmintrin.h>
...and then to the std::this_thread::yield implementation -
https://github.com/microsoft/STL/blob/5f8b52546480a01d1d9be6c033e31dfce48d4f13/stl/src/cthread.cpp#L86
which seems to call SwitchToThread
_CRTIMP2_PURE void __cdecl _Thrd_yield() noexcept { // surrender remainder of timeslice
    SwitchToThread();
}So the difference is that: YieldProcessor emits pause (same as what _mm_pause would do), while std::this_thread::yield would call SwitchToProcessor() that may sometimes call into kernel to reschedule the current thread. I've asked ChatGPT here - https://chatgpt.com/share/689beeff-69c0-800a-be43-c16de7a405e3
I didn't want to change too much the behavior, so I've decided to stick for these two cases the way "C" did it (and also did not want to load the MSVC140.dll - it gets loaded anyway, but didn't want to disturb this part).
But it raises the question - what is really intended to be called here - The "pause" (YieldProcessor), or std::this_thread::yield SwitchToProcessor?
Thank you!