mempool: lock-free bucket-based memory pools for threads and tasks #183
128-bit atomic CAS is useful to implement lock-free LIFO, but most compiler intrinsics do not support 128-bit atomic operations. This patch implements 128-bit atomic compare-and-swap. Since some CPU models do not support necessary instructions, the availability is detected at configure time. Currently, x86/64, 64-bit ARM, and POWER 8 and 9 are supported. Note that, even if the CPU supports them, some compilers fail to recognize them in inline assembly code: for example, Clang 9 and older do not recognize 128-bit LL/SC instructions on ARM and POWER.
Atomic tagged pointer operations (void * + size_t) are implemented primarily for lock-free LIFO. This requires a special instruction (i.e., 128-bit atomic CAS is needed if the 64-bit OS is used), so not all environments support this. If this atomic type is supported, ABTD_ATOMIC_SUPPORT_TAGGED_PTR is defined. Note that this atomic type is special and therefore provides only weak CAS and non-atomic acquire/release/relaxed load and store.
ABTI_sync_lifo is a scalable LIFO implementation that does not have the ABA problem. If an atomic tagged pointer operation is supported (i.e., most x86/64 with ICC, GCC, Clang, 64-bit ARM with GCC, and POWER 8 and 9 with XLC and GCC), push and pop are lock-free. If not, it falls back on a spinlock-based blocking implementation.
ABTI_mem_pool is a generic memory pool implementation. The basic algorithm is similar to the current memory pool implementation; first it accesses a local memory pool and then a global memory pool if it is empty or full. The advantages of the new algorithm are as follows: - Generic: the implementation takes a segment size as a runtime argument. - Per-bucket operation: not a single memory segment but multiple segments in a "bucket" are used when accessing a global memory pool, which reduces a number of global pool accesses. The number of local buckets and the bucket size are constant, so the local pool do not keep too many segments. - Lock-free: entire push-pop operations are lock-free (on most CPUs). No ABA problem. The current memory pool implementations should be replaced by this.
This patch introduces new memory pools for thread and task, which replace the existing ones.
Finally I could confirm that this PR works on POWER 9 and 64-bit ARM in addition to x86/64 with various compilers including GCC 4.8, 6.5, 8.3, 9.2, Clang 3.9, 7.0, 9.0, 10.0, and architecture-specific compilers such as ICC 18, 19, 20 and XLC 16.
Note that this change might cause some performance issues; please tell us if you encounter any performance issue.