Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A lock-free bounded queue #83

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

A lock-free bounded queue #83

wants to merge 10 commits into from

Conversation

polytypic
Copy link
Contributor

@polytypic polytypic commented Jul 14, 2023

This implements a lock-free bounded queue — blocking using domain-local-await.

The data structure / algorithm is based on the Michael-Scott queue extended with length maintenance and caching of remaining capacity in such a way that additional contention is avoided in the happy paths.

TODO:

  • fix opam files to build on CI
  • use backoff from separate package
    • fix dune-project
  • try_push, length (can be done in O(1)), is_empty
  • tests
  • ability to change capacity
  • benchmarking
  • consider whether making wakeups fair is worth it (currently all awaiters are woken up so there is no fairness)
  • documentation

src/cue.ml Outdated
| Null -> None
| Node r as node ->
let value = r.value in
if Atomic.get t.head != node then peek_opt t else Some value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Atomic.get t.head != node then ...

There is a small bug here, it should be :

if Atomic.get t.head != head then ...

Same in peek.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still some bugs between peek_opt and pop_opt.

                              |                        
                    Try_push 42 : true               
                              |                        
           .------------------------------------.
           |                                    |                        
  Peek_opt : Some (0)            Pop_opt : Some (42) 

I will continue investigating !

Copy link
Contributor Author

@polytypic polytypic Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yes, this is a bug. What would ideally be needed here is a(n acquire) fence to prevent the r.value read from being reordered after the Atomic.get, but OCaml currently doesn't provide a direct way to do only that. Some alternatives:

  • value could be turned into an atomic and then r.value would be replaced with Atomic.get r.value. This would be quite expensive.
  • Instead of Atomic.get t.head == head one could use Atomic.compare_and_set t.head head head. This is also somewhat expensive, because it means that the t.head location is actually written to, which causes additional contention.
  • In practise, but not in the OCaml memory model, which doesn't match actual hardware, it is also possible to generate fences by accessing an unrelated location. For example, one could define
    let fence () = Atomic.set (Atomic.make 0) 0
    
    and use that to write
    let value = r.value in
    fence ();
    if Atomic.get t.head != head then ...
    
    This is technically not guaranteed by the OCaml memory model. However, I'm not aware of any hardware where that would not work, because hardware doesn't track fences (or "frontiers") per memory location like the OCaml memory model does (my understanding of the physics is that it would be too expensive / impractical / wouldn't make any sense to do that at the hardware level).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fence trick seems to work !

Copy link
Contributor Author

@polytypic polytypic Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may actually be a compiler bug.

The OCaml 5 compiler definitely has a bug related to an incorrect optimization of Atomic.get operations.

What remains to consider is whether it is actually allowed to reorder a nonatomic load that precedes an atomic load to happen after the atomic load. I suspect it is not actually allowed. See the explanation of fences in this post about fences in C++11:

An acquire fence prevents the memory reordering of any read which precedes it in program order with any read or write which follows it in program order.

This, of course, is not exactly the same as the semantics of Atomic.get in OCaml 5, but according to the above explanation the original code

let value = r.value in
(* The above read should not be allowed to happen after
    the read below as the read below has an acquire fence. *)
if Atomic.get t.head != head then ...

would be correct. I'll try to redecipher the Bounding Data Races in Space and Time paper to find definite word on this.

I actually checked with Godbolt that, indeed, in this queue case, what happens is that the compiler optimizes the second Atomic.get t.head away and that is what causes the code to misbehave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is just a compiler bug, then the following workaround would be sufficient:

let value = r.value in
(* The [Sys.opaque_identity] below prevents OCaml 5 from
    optimizing the repeated load away. *)
if Atomic.get (Sys.opaque_identity t.head) != head then ...

@polytypic polytypic force-pushed the lock-free-bounded-queue branch 4 times, most recently from d4b7e51 to e548e50 Compare November 3, 2023 13:31
@polytypic polytypic force-pushed the lock-free-bounded-queue branch 2 times, most recently from 0084799 to 2aae21c Compare November 20, 2023 17:59
@polytypic polytypic force-pushed the lock-free-bounded-queue branch 4 times, most recently from 261eabb to 928660b Compare January 11, 2024 14:40
@polytypic polytypic force-pushed the lock-free-bounded-queue branch 2 times, most recently from 12a77c7 to 46c2cae Compare January 14, 2024 12:28
@lyrm lyrm mentioned this pull request Jan 24, 2024
12 tasks
@Sudha247 Sudha247 added this to the 1.0 milestone Jan 29, 2024
@polytypic polytypic force-pushed the lock-free-bounded-queue branch 3 times, most recently from b47847c to 3749cbf Compare February 17, 2024 17:22
@polytypic
Copy link
Contributor Author

FYI, I will squash this PR to a single commit to make this PR a little bit easier to work with.

polytypic and others added 10 commits April 2, 2024 19:19
There was a trivial bug using a wrong variable.

Additionally, it turns out the OCaml 5 compiler incorrectly optimized away a
repeated `Atomic.get`.  This commit works around that compiler bug by using
`Sys.opaque_identity` to provent the compiler from figuring out that the
location has been read before.
The latest version of DLA includes bug fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants