Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Eio.Semaphore lock-free #398

Merged
merged 2 commits into from Jan 4, 2023

Conversation

talex5
Copy link
Collaborator

@talex5 talex5 commented Jan 3, 2023

This also changes the semaphore benchmark. Before, two domains just passed control back and forth, which was really just timing how long it takes to wake a domain. Now, we keep the domains busy while waiting for the semaphore. We also now have 4 domains trying to use 2-4 resources, which is a more realistic use of a semaphore.

This is a little faster in the single domain case, and quite a bit faster with multiple domains:
semaphore

Before, two domains just passed control back and forth, which was really
just timing how long it takes to wake a domain.

Now, we keep the domains busy while waiting for the semaphore. We also
now have 4 domains trying to use 2-4 resources, which is a more
realistic use of a semaphore.
(* This state is unreachable because we (the provider) haven't set this yet *)
assert false
in
aux ()
Copy link
Contributor

@polytypic polytypic Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not yet fully understand the Cells abstraction. However, looking at the code here it seems that Resumed is a terminal state. If that is the case, then it may be safe to always exchange the state to Resumed. Like I said, I do not yet fully understand Cells so this might be wrong.

However, assuming that exchanging to Resumed is a safe, then the above could be changed to eliminate the aux loop:

let rec resume t =
  let cell = Cells.next_resume t.cells in
  match (Atomic.exchange cell Resumed : cell) with
  | Request r ->
    (* The common case: there was a waiter for the value *)
    r ()
  | (Cancelled
     (* The waker had finished cancelling. Ignore it and resume the next one. *)
    | Resumed
      (* We lost the race. Ignore and resume the next one. *)) ->
    resume t
  | (Empty
     (* The consumer had reserved this cell but not yet stored the request.
        We placed Resumed there and the consumer will handle it soon. *)
    | Cancelling
      (* The waker had started cancelling. We let it know we want to resume it
         and then let it handle it. *)) ->
    ()

(I edited the above to change the response to Resumed where the race was lost.)

Assuming it is safe to always set to Resumed, the above version avoids the loop overhead and only does a single atomic read-write. Assuming there are not many racing to do the same this may be slightly faster. This also likely results in a smaller amount of machine code.

Copy link
Collaborator Author

@talex5 talex5 Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One side-effect is that calling cancel twice currently reliably raises Invalid_argument, whereas with this it might return false. However, the only current user of this module (Semaphore) should never call cancel twice anyway.

It doesn't seem noticeably faster (with dune exec -- ./bench/bench_semaphore.exe any effect seems lost in the noise, on my machine anyway).

Assuming there are not many racing to do the same this may be slightly faster.

The only thing we can be racing with is a single consumer thread. The next_resume means this is the only resume with this cell. Likewise, there is only one suspender with this cell. And if the suspender cancels, it is required to call cancel only from the same domain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also likely results in a smaller amount of machine code.

We could combine the Empty and Cancelled handlers to save some code though (they're identical). But maybe it's clearer as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me just play a devil's advocate a bit here. :)

Perhaps one possibility might be to merge Resumed and Cancelled into a single, let's say, Terminal state. There is a race to set the cell to Terminal state. The action depends on the state from which the Terminal state was reached.

One side-effect is that calling cancel twice currently reliably raises Invalid_argument

About cancel. Hmm... Under which circumstances one might call cancel twice? (Asking because I'm not sure I understand the concern.)

Assuming there is a race to call cancel, I would tend to think that it is important that cancel returns true at most once. This way any follow up actions to successful cancellation would only be performed once. If this makes sense, then the API might be more user friendly without raising Invalid_argument. John Ousterhout talks about this in his talk on software design here.

@talex5
Copy link
Collaborator Author

talex5 commented Jan 4, 2023

Perhaps one possibility might be to merge Resumed and Cancelled into a single, let's say, Terminal state.

Makes sense. I updated the PR (I called it Finished though, mainly because I started just before seeing your comment!).

Under which circumstances one might call cancel twice?

It would always be a bug. I was just noting it as a change to this module's external behaviour (but it's only used by Semaphore anyway, and that will only call it once).

Assuming the new version is correct, I think we should stop and admire that fact that our version has only 4 states, while the original paper had 9! (https://arxiv.org/pdf/2111.12682.pdf section 3.1 plus the REFUSE state added in 3.2)

Thanks!

(* To call [cancel] the user needs a [request] value,
which they only get once we've reached the [Request] state.
[Empty] is unreachable from [Request]. *)
assert false
Copy link
Contributor

@polytypic polytypic Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Speaking of reducing states, I wonder whether Empty and Cancelling states could actually be merged.

Is this the only place where they are handled differently? And both indicate a programming error?

Thinking of the approach here in more general terms, both Empty and Cancelling seem to be kind of InTransition states where the suspender is planning to do something and needs to indicate that to resumers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - done :-)

We must wait until one of the existing users increments the counter and resumes us.
It's OK if they resume before we suspend; we'll just pick up the token they left. *)
Suspend.enter_unchecked (fun ctx enqueue ->
match Sem_state.suspend t.state (fun () -> enqueue (Ok ())) with
Copy link
Contributor

@polytypic polytypic Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an opportunity here to avoid the Suspend.enter_unchecked call (which, I assume, captures the continuation) by exposing the Sem_state protocol a bit more. Basically, require the acquirer to separately create the cell. After creating the cell, the acquirer would then read the cell state (as is done in Sem_state.suspend at the moment in case the CAS fails) before capturing the continuation. If the state has already been set to Finished, then there is no need to capture a continuation. Otherwise capture continuation and proceed as before. Basically, this results in a kind of double-checked pattern. This could improve performance in highly contested cases where it is possible that a resumer actually manages to see the InTransition (or Empty) state. The downside, of course, is making the Sem_state API/protocol more complex (adding an extra step).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This case wouldn't work:

  1. The suspender decrements the count to say they're planning to suspend.
  2. The resumer resumes the cell.
  3. The suspender creates the cell.

We have to be able to initialise cells before the suspender does anything beyond modifying the counter. We don't want it creating cells before changing the counter because that includes the fast path (where we don't need a cell at all).

Copy link
Contributor

@polytypic polytypic Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this is what I had in mind:

diff --git a/lib_eio/sem_state.ml b/lib_eio/sem_state.ml
index 24a301c..7ffebf6 100644
--- a/lib_eio/sem_state.ml
+++ b/lib_eio/sem_state.ml
@@ -109,6 +109,26 @@ let acquire t =
      which happens if we decremented *from* a positive one. *)
   s > 0
 
+let prepare_suspend t =
+  Cells.next_suspend t.cells
+
+let is_in_transition (_, (cell: cell Atomic.t)) =
+  match Atomic.get cell with
+  | In_transition -> true
+  | Finished | Request _ -> false
+
+let perform_suspend t (segment, (cell: cell Atomic.t)) k : request option =
+  if Atomic.compare_and_set cell In_transition (Request k) then Some (t, segment, cell)
+  else (
+    (* We got resumed before we could add the waiter. *)
+    k ();
+    None
+  )
+
 let suspend t k : request option =
   let (segment, cell) = Cells.next_suspend t.cells in
   if Atomic.compare_and_set cell In_transition (Request k) then Some (t, segment, cell)
diff --git a/lib_eio/semaphore.ml b/lib_eio/semaphore.ml
index 2733be5..a90fa64 100644
--- a/lib_eio/semaphore.ml
+++ b/lib_eio/semaphore.ml
@@ -20,21 +20,24 @@ let acquire t =
     (* No free resources.
        We must wait until one of the existing users increments the counter and resumes us.
        It's OK if they resume before we suspend; we'll just pick up the token they left. *)
-    Suspend.enter_unchecked (fun ctx enqueue ->
-        match Sem_state.suspend t.state (fun () -> enqueue (Ok ())) with
-        | None -> ()   (* Already resumed *)
-        | Some request ->
-          Ctf.note_try_read t.id;
-          match Fiber_context.get_error ctx with
-          | Some ex ->
-            if Sem_state.cancel request then enqueue (Error ex);
-            (* else already resumed *)
-          | None ->
-            Fiber_context.set_cancel_fn ctx (fun ex ->
-                if Sem_state.cancel request then enqueue (Error ex)
-                (* else already resumed *)
-              )
-      )
+    let segment_cell = Sem_state.prepare_suspend t.state in
+    (* We may have already been resumed at this point. So, check before capturing continuation. *)
+    if Sem_state.is_in_transition segment_cell then (
+      Suspend.enter_unchecked (fun ctx enqueue ->
+          match Sem_state.perform_suspend t.state segment_cell (fun () -> enqueue (Ok ())) with
+          | None -> ()   (* Already resumed *)
+          | Some request ->
+            Ctf.note_try_read t.id;
+            match Fiber_context.get_error ctx with
+            | Some ex ->
+              if Sem_state.cancel request then enqueue (Error ex);
+              (* else already resumed *)
+            | None ->
+              Fiber_context.set_cancel_fn ctx (fun ex ->
+                  if Sem_state.cancel request then enqueue (Error ex)
+                  (* else already resumed *)
+                )
+        )
+    )
   );
   Ctf.note_read t.id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Pushing some of the work to outside of the suspend in case we can skip the suspend by the time it's done, at the cost of an extra Atomic.get.

I tried measuring it by incrementing slow and fast counters in is_in_transition and running the benchmark gives:

slow=850475, fast=75, frac=0.01%

So probably not worth it, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I guess I should have disabled the single-process runs before measuring that... it's slow=51221, fast=98, frac=0.19% now (0.19% of the slow path cases). But I don't think that changes anything.

This uses the new Cells module to replace the use of a mutex.

Co-authored-by: Vesa Karvonen <vesa.a.j.k@gmail.com>
@talex5 talex5 merged commit 32c26ab into ocaml-multicore:main Jan 4, 2023
@talex5 talex5 deleted the lock-free-semaphore branch January 4, 2023 17:47
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants