Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize atomic_thread_fence #740

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions stl/inc/atomic
Original file line number Diff line number Diff line change
Expand Up @@ -1942,8 +1942,15 @@ extern "C" inline void atomic_thread_fence(const memory_order _Order) noexcept {
#else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv
_Compiler_barrier();
if (_Order == memory_order_seq_cst) {
static long _Guard;
(void) _InterlockedCompareExchange(&_Guard, 0, 0);
volatile long _Guard; // Not initialized to avoid an unnecessary operation; the value does not matter

// _mm_mfence could have been used, but it is not supported on older x86 CPUs and is slower on some recent CPUs.
// The memory fence provided by interlocked operations has some exceptions, but this is fine:
// std::atomic_thread_fence works with respect to other atomics only; it may not be a full fence for all ops.
#pragma warning(suppress : 6001) // "Using uninitialized memory '_Guard'"
#pragma warning(suppress : 28113) // "Accessing a local variable _Guard via an Interlocked function: This is an unusual
// usage which could be reconsidered."
(void) _InterlockedIncrement(&_Guard);
_Compiler_barrier();
}
#endif // hardware
Expand Down