Improve spinlock implementation

Calling lock.test_and_set() in a tight loop creates expensive memory synchronizations among processors and penalize other running threads. So syncronize only only once at the beginning with fetch_sub() and then loop on a simple load() that puts much less pressure on the system. Reported about 2-3% speed up on various systems. Patch by Ronald de Man. No functional change.
official-stockfish · Feb 23, 2015 · d3d26a9 · d3d26a9
1 parent 3811206
commit d3d26a9
Showing 1 changed file with 9 additions and 5 deletions.
diff --git a/src/thread.h b/src/thread.h
@@ -39,16 +39,20 @@ const size_t MAX_THREADS = 128;
 const size_t MAX_SPLITPOINTS_PER_THREAD = 8;
 const size_t MAX_SLAVES_PER_SPLITPOINT = 4;
 
-/// Spinlock class wraps low level atomic operations to provide spin lock functionality
+
+/// Spinlock class wraps low level atomic operations to provide a spin lock
 
 class Spinlock {
 
-  std::atomic_flag lock;
+  std::atomic_int lock;
 
 public:
-  Spinlock() { std::atomic_flag_clear(&lock); }
-  void acquire() { while (lock.test_and_set(std::memory_order_acquire)) {} }
-  void release() { lock.clear(std::memory_order_release); }
+  Spinlock() { lock = 1; } // Init here to workaround a bug with MSVC 2013
+  void acquire() {
+    while (lock.fetch_sub(1, std::memory_order_acquire) != 1)
+        while (lock.load(std::memory_order_relaxed) <= 0) {}
+  }
+  void release() { lock.store(1, std::memory_order_release); }
 };