-
Notifications
You must be signed in to change notification settings - Fork 53
mempool: lock-free bucket-based memory pools for threads and tasks #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test:argobots/all |
test:argobots/all |
c1e690c
to
9392ad6
Compare
test:argobots/all |
test:argobots/all |
test:argobots/all |
128-bit atomic CAS is useful to implement lock-free LIFO, but most compiler intrinsics do not support 128-bit atomic operations. This patch implements 128-bit atomic compare-and-swap. Since some CPU models do not support necessary instructions, the availability is detected at configure time. Currently, x86/64, 64-bit ARM, and POWER 8 and 9 are supported. Note that, even if the CPU supports them, some compilers fail to recognize them in inline assembly code: for example, Clang 9 and older do not recognize 128-bit LL/SC instructions on ARM and POWER.
Atomic tagged pointer operations (void * + size_t) are implemented primarily for lock-free LIFO. This requires a special instruction (i.e., 128-bit atomic CAS is needed if the 64-bit OS is used), so not all environments support this. If this atomic type is supported, ABTD_ATOMIC_SUPPORT_TAGGED_PTR is defined. Note that this atomic type is special and therefore provides only weak CAS and non-atomic acquire/release/relaxed load and store.
ABTI_sync_lifo is a scalable LIFO implementation that does not have the ABA problem. If an atomic tagged pointer operation is supported (i.e., most x86/64 with ICC, GCC, Clang, 64-bit ARM with GCC, and POWER 8 and 9 with XLC and GCC), push and pop are lock-free. If not, it falls back on a spinlock-based blocking implementation.
ABTI_mem_pool is a generic memory pool implementation. The basic algorithm is similar to the current memory pool implementation; first it accesses a local memory pool and then a global memory pool if it is empty or full. The advantages of the new algorithm are as follows: - Generic: the implementation takes a segment size as a runtime argument. - Per-bucket operation: not a single memory segment but multiple segments in a "bucket" are used when accessing a global memory pool, which reduces a number of global pool accesses. The number of local buckets and the bucket size are constant, so the local pool do not keep too many segments. - Lock-free: entire push-pop operations are lock-free (on most CPUs). No ABA problem. The current memory pool implementations should be replaced by this.
This patch introduces new memory pools for thread and task, which replace the existing ones.
test:argobots/all |
Finally I could confirm that this PR works on POWER 9 and 64-bit ARM in addition to x86/64 with various compilers including GCC 4.8, 6.5, 8.3, 9.2, Clang 3.9, 7.0, 9.0, 10.0, and architecture-specific compilers such as ICC 18, 19, 20 and XLC 16. Note that this change might cause some performance issues; please tell us if you encounter any performance issue. |
Problems
The current memory pools for threads and tasks use the following algorithms:
Thread pool
ABT_MEM_MAX_NUM_STACKS
).ABT_MEM_MAX_NUM_STACKS
of elements.Task pool
Both have several issues:
Solution
This PR creates a generic lock-free bucket-based memory pool that allows users to set the strict capacity of every local pool. The new implementation has the following merits:
Issues
The new algorithm is not always faster than the existing implementations. Because of the difference of memory access order and cache access patterns, I observed 60% slowdown (and up to 2500% speedup) with fork-join mircobenchmarks. I arbitrarily set the local pool capacity, but this tuning can also negatively affect the performance in some cases. For now, because of the ABA problem (#178), I do not think the current implementation is better than this, but if the application performance is noticeably changed by this PR, please let me/us know so that we can fix it.