Make Eio.Semaphore lock-free #398

talex5 · 2023-01-03T14:52:14Z

This also changes the semaphore benchmark. Before, two domains just passed control back and forth, which was really just timing how long it takes to wake a domain. Now, we keep the domains busy while waiting for the semaphore. We also now have 4 domains trying to use 2-4 resources, which is a more realistic use of a semaphore.

This is a little faster in the single domain case, and quite a bit faster with multiple domains:

Before, two domains just passed control back and forth, which was really just timing how long it takes to wake a domain. Now, we keep the domains busy while waiting for the semaphore. We also now have 4 domains trying to use 2-4 resources, which is a more realistic use of a semaphore.

polytypic · 2023-01-04T11:53:16Z

lib_eio/sem_state.ml

+      (* This state is unreachable because we (the provider) haven't set this yet *)
+      assert false
+  in
+  aux ()


I do not yet fully understand the Cells abstraction. However, looking at the code here it seems that Resumed is a terminal state. If that is the case, then it may be safe to always exchange the state to Resumed. Like I said, I do not yet fully understand Cells so this might be wrong.

However, assuming that exchanging to Resumed is a safe, then the above could be changed to eliminate the aux loop:

let rec resume t = let cell = Cells.next_resume t.cells in match (Atomic.exchange cell Resumed : cell) with | Request r -> (* The common case: there was a waiter for the value *) r () | (Cancelled (* The waker had finished cancelling. Ignore it and resume the next one. *) | Resumed (* We lost the race. Ignore and resume the next one. *)) -> resume t | (Empty (* The consumer had reserved this cell but not yet stored the request. We placed Resumed there and the consumer will handle it soon. *) | Cancelling (* The waker had started cancelling. We let it know we want to resume it and then let it handle it. *)) -> ()

(I edited the above to change the response to Resumed where the race was lost.)

Assuming it is safe to always set to Resumed, the above version avoids the loop overhead and only does a single atomic read-write. Assuming there are not many racing to do the same this may be slightly faster. This also likely results in a smaller amount of machine code.

One side-effect is that calling cancel twice currently reliably raises Invalid_argument, whereas with this it might return false. However, the only current user of this module (Semaphore) should never call cancel twice anyway.

It doesn't seem noticeably faster (with dune exec -- ./bench/bench_semaphore.exe any effect seems lost in the noise, on my machine anyway).

Assuming there are not many racing to do the same this may be slightly faster.

The only thing we can be racing with is a single consumer thread. The next_resume means this is the only resume with this cell. Likewise, there is only one suspender with this cell. And if the suspender cancels, it is required to call cancel only from the same domain.

This also likely results in a smaller amount of machine code.

We could combine the Empty and Cancelled handlers to save some code though (they're identical). But maybe it's clearer as it is.

Let me just play a devil's advocate a bit here. :)

Perhaps one possibility might be to merge Resumed and Cancelled into a single, let's say, Terminal state. There is a race to set the cell to Terminal state. The action depends on the state from which the Terminal state was reached.

One side-effect is that calling cancel twice currently reliably raises Invalid_argument

About cancel. Hmm... Under which circumstances one might call cancel twice? (Asking because I'm not sure I understand the concern.)

Assuming there is a race to call cancel, I would tend to think that it is important that cancel returns true at most once. This way any follow up actions to successful cancellation would only be performed once. If this makes sense, then the API might be more user friendly without raising Invalid_argument. John Ousterhout talks about this in his talk on software design here.

talex5 · 2023-01-04T14:06:58Z

Perhaps one possibility might be to merge Resumed and Cancelled into a single, let's say, Terminal state.

Makes sense. I updated the PR (I called it Finished though, mainly because I started just before seeing your comment!).

Under which circumstances one might call cancel twice?

It would always be a bug. I was just noting it as a change to this module's external behaviour (but it's only used by Semaphore anyway, and that will only call it once).

Assuming the new version is correct, I think we should stop and admire that fact that our version has only 4 states, while the original paper had 9! (https://arxiv.org/pdf/2111.12682.pdf section 3.1 plus the REFUSE state added in 3.2)

Thanks!

polytypic · 2023-01-04T14:18:48Z

lib_eio/sem_state.ml

+    (* To call [cancel] the user needs a [request] value,
+       which they only get once we've reached the [Request] state.
+       [Empty] is unreachable from [Request]. *)
+    assert false


Hmm... Speaking of reducing states, I wonder whether Empty and Cancelling states could actually be merged.

Is this the only place where they are handled differently? And both indicate a programming error?

Thinking of the approach here in more general terms, both Empty and Cancelling seem to be kind of InTransition states where the suspender is planning to do something and needs to indicate that to resumers.

Good point - done :-)

polytypic · 2023-01-04T14:37:23Z

lib_eio/semaphore.ml

+       We must wait until one of the existing users increments the counter and resumes us.
+       It's OK if they resume before we suspend; we'll just pick up the token they left. *)
+    Suspend.enter_unchecked (fun ctx enqueue ->
+        match Sem_state.suspend t.state (fun () -> enqueue (Ok ())) with


Is there an opportunity here to avoid the Suspend.enter_unchecked call (which, I assume, captures the continuation) by exposing the Sem_state protocol a bit more. Basically, require the acquirer to separately create the cell. After creating the cell, the acquirer would then read the cell state (as is done in Sem_state.suspend at the moment in case the CAS fails) before capturing the continuation. If the state has already been set to Finished, then there is no need to capture a continuation. Otherwise capture continuation and proceed as before. Basically, this results in a kind of double-checked pattern. This could improve performance in highly contested cases where it is possible that a resumer actually manages to see the InTransition (or Empty) state. The downside, of course, is making the Sem_state API/protocol more complex (adding an extra step).

I don't think so. This case wouldn't work:

The suspender decrements the count to say they're planning to suspend.

The resumer resumes the cell.

The suspender creates the cell.

We have to be able to initialise cells before the suspender does anything beyond modifying the counter. We don't want it creating cells before changing the counter because that includes the fast path (where we don't need a cell at all).

Just to be clear, this is what I had in mind:

diff --git a/lib_eio/sem_state.ml b/lib_eio/sem_state.ml index 24a301c..7ffebf6 100644 --- a/lib_eio/sem_state.ml +++ b/lib_eio/sem_state.ml @@ -109,6 +109,26 @@ let acquire t = which happens if we decremented *from* a positive one. *) s > 0 +let prepare_suspend t = + Cells.next_suspend t.cells + +let is_in_transition (_, (cell: cell Atomic.t)) = + match Atomic.get cell with + | In_transition -> true + | Finished | Request _ -> false + +let perform_suspend t (segment, (cell: cell Atomic.t)) k : request option = + if Atomic.compare_and_set cell In_transition (Request k) then Some (t, segment, cell) + else ( + (* We got resumed before we could add the waiter. *) + k (); + None + ) + let suspend t k : request option = let (segment, cell) = Cells.next_suspend t.cells in if Atomic.compare_and_set cell In_transition (Request k) then Some (t, segment, cell) diff --git a/lib_eio/semaphore.ml b/lib_eio/semaphore.ml index 2733be5..a90fa64 100644 --- a/lib_eio/semaphore.ml +++ b/lib_eio/semaphore.ml @@ -20,21 +20,24 @@ let acquire t = (* No free resources. We must wait until one of the existing users increments the counter and resumes us. It's OK if they resume before we suspend; we'll just pick up the token they left. *) - Suspend.enter_unchecked (fun ctx enqueue -> - match Sem_state.suspend t.state (fun () -> enqueue (Ok ())) with - | None -> () (* Already resumed *) - | Some request -> - Ctf.note_try_read t.id; - match Fiber_context.get_error ctx with - | Some ex -> - if Sem_state.cancel request then enqueue (Error ex); - (* else already resumed *) - | None -> - Fiber_context.set_cancel_fn ctx (fun ex -> - if Sem_state.cancel request then enqueue (Error ex) - (* else already resumed *) - ) - ) + let segment_cell = Sem_state.prepare_suspend t.state in + (* We may have already been resumed at this point. So, check before capturing continuation. *) + if Sem_state.is_in_transition segment_cell then ( + Suspend.enter_unchecked (fun ctx enqueue -> + match Sem_state.perform_suspend t.state segment_cell (fun () -> enqueue (Ok ())) with + | None -> () (* Already resumed *) + | Some request -> + Ctf.note_try_read t.id; + match Fiber_context.get_error ctx with + | Some ex -> + if Sem_state.cancel request then enqueue (Error ex); + (* else already resumed *) + | None -> + Fiber_context.set_cancel_fn ctx (fun ex -> + if Sem_state.cancel request then enqueue (Error ex) + (* else already resumed *) + ) + ) + ) ); Ctf.note_read t.id

Ah, I see. Pushing some of the work to outside of the suspend in case we can skip the suspend by the time it's done, at the cost of an extra Atomic.get.

I tried measuring it by incrementing slow and fast counters in is_in_transition and running the benchmark gives:

slow=850475, fast=75, frac=0.01%

So probably not worth it, I think.

OK, I guess I should have disabled the single-process runs before measuring that... it's slow=51221, fast=98, frac=0.19% now (0.19% of the slow path cases). But I don't think that changes anything.

This uses the new Cells module to replace the use of a mutex. Co-authored-by: Vesa Karvonen <vesa.a.j.k@gmail.com>

@bikallem

CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).

@bikallem

CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).

@bikallem

CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).

@bikallem

CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).

polytypic reviewed Jan 4, 2023

View reviewed changes

talex5 force-pushed the lock-free-semaphore branch from ff5305f to 58f2e09 Compare January 4, 2023 13:58

polytypic reviewed Jan 4, 2023

View reviewed changes

talex5 force-pushed the lock-free-semaphore branch from 58f2e09 to 5c9fac1 Compare January 4, 2023 15:13

Make Eio.Semaphore lock-free

97acd4b

This uses the new Cells module to replace the use of a mutex. Co-authored-by: Vesa Karvonen <vesa.a.j.k@gmail.com>

talex5 force-pushed the lock-free-semaphore branch from 5c9fac1 to 97acd4b Compare January 4, 2023 15:19

polytypic approved these changes Jan 4, 2023

View reviewed changes

talex5 merged commit 32c26ab into ocaml-multicore:main Jan 4, 2023

talex5 deleted the lock-free-semaphore branch January 4, 2023 17:47

talex5 mentioned this pull request Feb 1, 2023

[new release] eio_main, eio_luv, eio_linux and eio (0.8) ocaml/opam-repository#23208

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Eio.Semaphore lock-free #398

Make Eio.Semaphore lock-free #398

talex5 commented Jan 3, 2023

polytypic Jan 4, 2023 •

edited

talex5 Jan 4, 2023 •

edited

talex5 Jan 4, 2023

polytypic Jan 4, 2023

talex5 commented Jan 4, 2023

polytypic Jan 4, 2023 •

edited

talex5 Jan 4, 2023

polytypic Jan 4, 2023 •

edited

talex5 Jan 4, 2023

polytypic Jan 4, 2023 •

edited

talex5 Jan 4, 2023

talex5 Jan 4, 2023

Make Eio.Semaphore lock-free #398

Make Eio.Semaphore lock-free #398

Conversation

talex5 commented Jan 3, 2023

polytypic Jan 4, 2023 • edited

Choose a reason for hiding this comment

talex5 Jan 4, 2023 • edited

Choose a reason for hiding this comment

talex5 Jan 4, 2023

Choose a reason for hiding this comment

polytypic Jan 4, 2023

Choose a reason for hiding this comment

talex5 commented Jan 4, 2023

polytypic Jan 4, 2023 • edited

Choose a reason for hiding this comment

talex5 Jan 4, 2023

Choose a reason for hiding this comment

polytypic Jan 4, 2023 • edited

Choose a reason for hiding this comment

talex5 Jan 4, 2023

Choose a reason for hiding this comment

polytypic Jan 4, 2023 • edited

Choose a reason for hiding this comment

talex5 Jan 4, 2023

Choose a reason for hiding this comment

talex5 Jan 4, 2023

Choose a reason for hiding this comment

polytypic Jan 4, 2023 •

edited

talex5 Jan 4, 2023 •

edited

polytypic Jan 4, 2023 •

edited

polytypic Jan 4, 2023 •

edited

polytypic Jan 4, 2023 •

edited