A lock-free bounded queue #83

polytypic · 2023-07-14T13:28:16Z

This implements a lock-free bounded queue — blocking using domain-local-await.

The data structure / algorithm is based on the Michael-Scott queue extended with length maintenance and caching of remaining capacity in such a way that additional contention is avoided in the happy paths.

TODO:

fix opam files to build on CI
use backoff from separate package
- fix dune-project
try_push, length (can be done in O(1)), is_empty
tests
ability to change capacity
benchmarking
consider whether making wakeups fair is worth it (currently all awaiters are woken up so there is no fairness)
documentation

lyrm · 2023-10-20T09:55:10Z

src/cue.ml

+  | Null -> None
+  | Node r as node ->
+      let value = r.value in
+      if Atomic.get t.head != node then peek_opt t else Some value


if Atomic.get t.head != node then ...

There is a small bug here, it should be :

if Atomic.get t.head != head then ...

Same in peek.

There is still some bugs between peek_opt and pop_opt.

| Try_push 42 : true | .------------------------------------. | | Peek_opt : Some (0) Pop_opt : Some (42)

I will continue investigating !

Good catch! Yes, this is a bug. What would ideally be needed here is a(n acquire) fence to prevent the r.value read from being reordered after the Atomic.get, but OCaml currently doesn't provide a direct way to do only that. Some alternatives:

value could be turned into an atomic and then r.value would be replaced with Atomic.get r.value. This would be quite expensive.

Instead of Atomic.get t.head == head one could use Atomic.compare_and_set t.head head head. This is also somewhat expensive, because it means that the t.head location is actually written to, which causes additional contention.

In practise, but not in the OCaml memory model, which doesn't match actual hardware, it is also possible to generate fences by accessing an unrelated location. For example, one could define
let fence () = Atomic.set (Atomic.make 0) 0
and use that to write
let value = r.value in fence (); if Atomic.get t.head != head then ...
This is technically not guaranteed by the OCaml memory model. However, I'm not aware of any hardware where that would not work, because hardware doesn't track fences (or "frontiers") per memory location like the OCaml memory model does (my understanding of the physics is that it would be too expensive / impractical / wouldn't make any sense to do that at the hardware level).

The fence trick seems to work !

This may actually be a compiler bug.

The OCaml 5 compiler definitely has a bug related to an incorrect optimization of Atomic.get operations.

What remains to consider is whether it is actually allowed to reorder a nonatomic load that precedes an atomic load to happen after the atomic load. I suspect it is not actually allowed. See the explanation of fences in this post about fences in C++11:

An acquire fence prevents the memory reordering of any read which precedes it in program order with any read or write which follows it in program order.

This, of course, is not exactly the same as the semantics of Atomic.get in OCaml 5, but according to the above explanation the original code

let value = r.value in (* The above read should not be allowed to happen after the read below as the read below has an acquire fence. *) if Atomic.get t.head != head then ...

would be correct. I'll try to redecipher the Bounding Data Races in Space and Time paper to find definite word on this.

I actually checked with Godbolt that, indeed, in this queue case, what happens is that the compiler optimizes the second Atomic.get t.head away and that is what causes the code to misbehave.

Assuming this is just a compiler bug, then the following workaround would be sufficient:

let value = r.value in (* The [Sys.opaque_identity] below prevents OCaml 5 from optimizing the repeated load away. *) if Atomic.get (Sys.opaque_identity t.head) != head then ...

polytypic · 2024-03-01T11:06:57Z

FYI, I will squash this PR to a single commit to make this PR a little bit easier to work with.

There was a trivial bug using a wrong variable. Additionally, it turns out the OCaml 5 compiler incorrectly optimized away a repeated `Atomic.get`. This commit works around that compiler bug by using `Sys.opaque_identity` to provent the compiler from figuring out that the location has been read before.

The latest version of DLA includes bug fixes.

polytypic mentioned this pull request Jul 14, 2023

No bounded version of ws_deque and mpsc_queue? #52

Open

polytypic force-pushed the lock-free-bounded-queue branch 3 times, most recently from 3bee128 to 79f3692 Compare July 15, 2023 19:54

polytypic mentioned this pull request Jul 18, 2023

change the domain API? ocaml/ocaml#12385

Open

polytypic force-pushed the lock-free-bounded-queue branch from 79f3692 to d6bc981 Compare July 27, 2023 13:08

polytypic force-pushed the lock-free-bounded-queue branch from d6bc981 to 98804fa Compare August 21, 2023 07:50

lyrm reviewed Oct 20, 2023

View reviewed changes

polytypic force-pushed the lock-free-bounded-queue branch 4 times, most recently from d4b7e51 to e548e50 Compare November 3, 2023 13:31

This was referenced Nov 3, 2023

Issues due to OCaml 5 compiler bug with Atomic.get #97

Closed

disable CSE for atomic loads ocaml/ocaml#12715

Merged

polytypic force-pushed the lock-free-bounded-queue branch 2 times, most recently from 0084799 to 2aae21c Compare November 20, 2023 17:59

polytypic force-pushed the lock-free-bounded-queue branch 4 times, most recently from 261eabb to 928660b Compare January 11, 2024 14:40

polytypic force-pushed the lock-free-bounded-queue branch 2 times, most recently from 12a77c7 to 46c2cae Compare January 14, 2024 12:28

lyrm mentioned this pull request Jan 24, 2024

Saturn 1.0 : progress tracking #123

Open

12 tasks

Sudha247 added this to the 1.0 milestone Jan 29, 2024

polytypic force-pushed the lock-free-bounded-queue branch 3 times, most recently from b47847c to 3749cbf Compare February 17, 2024 17:22

polytypic force-pushed the lock-free-bounded-queue branch from 3749cbf to 5e7d637 Compare March 1, 2024 10:16

polytypic force-pushed the lock-free-bounded-queue branch from 734c367 to 47cdeb1 Compare March 1, 2024 11:49

polytypic force-pushed the lock-free-bounded-queue branch from 47cdeb1 to 4accc07 Compare March 1, 2024 13:52

polytypic force-pushed the lock-free-bounded-queue branch from 4accc07 to 3f75aea Compare March 13, 2024 07:47

polytypic and others added 10 commits April 2, 2024 19:19

A lock-free bounded queue

6045520

STM test.

44bd147

Upgrade ocamlformat

cde6bd8

Work around compiler bug in snapshot

cb8e726

Move Cue under Saturn_lockfree

c363e83

Bench

de7924e

Require latest version of DLA

b663cbe

The latest version of DLA includes bug fixes.

Use GADT to make typing somewhat more precise

f6752d7

Fix STM test to build on OCaml 4

5565737

polytypic force-pushed the lock-free-bounded-queue branch from 3f75aea to 5565737 Compare April 2, 2024 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A lock-free bounded queue #83

A lock-free bounded queue #83

polytypic commented Jul 14, 2023 •

edited

lyrm Oct 20, 2023

lyrm Oct 20, 2023

polytypic Oct 24, 2023 •

edited

lyrm Oct 24, 2023

polytypic Nov 3, 2023 •

edited

polytypic Nov 3, 2023

polytypic commented Mar 1, 2024

A lock-free bounded queue #83

Are you sure you want to change the base?

A lock-free bounded queue #83

Conversation

polytypic commented Jul 14, 2023 • edited

lyrm Oct 20, 2023

Choose a reason for hiding this comment

lyrm Oct 20, 2023

Choose a reason for hiding this comment

polytypic Oct 24, 2023 • edited

Choose a reason for hiding this comment

lyrm Oct 24, 2023

Choose a reason for hiding this comment

polytypic Nov 3, 2023 • edited

Choose a reason for hiding this comment

polytypic Nov 3, 2023

Choose a reason for hiding this comment

polytypic commented Mar 1, 2024

polytypic commented Jul 14, 2023 •

edited

polytypic Oct 24, 2023 •

edited

polytypic Nov 3, 2023 •

edited