Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blocking #32

Merged
merged 2 commits into from Apr 28, 2023
Merged

Add blocking #32

merged 2 commits into from Apr 28, 2023

Conversation

polytypic
Copy link
Contributor

@polytypic polytypic commented Feb 28, 2023

This PR adds blocking support to kcas and basically turns kcas into a proper software transactional memory (STM) implementation.

For blocking this PR introduces an experimental domain local await mechanism. It is a first-order interface inspired by proposed "rendezvous" mechanism in a lockfree PR with support for cancellation. Ultimately some sort of unified interface for suspend functionality should be provided by some other library, possibly the Stdlib.

For the blocking support this PR also introduces a dedicated Retry.Later exception for retrying transactions and higher-order single location updates. Previously the Stdlib.Exit exception was used for that purpose, but I feel that having a dedicated exception makes the intent clearer and frees Stdlib.Exit for other purposes such as aborting a transaction.

This PR adds a Promise module based on the Promise module of Eio to the kcas_data package. I also drafted some other Eio style primitives, but decided to move the development of those to another PR.

Perhaps a good place to start review is to read the couple of sections added to the README:

The blocking support works by adding a list of awaiters to locations. When a location is modified, the awaiters are resumed. This adds a bit of overhead to all operations as locations take an extra word of memory and those words also need to be accessed on every write operation. Based on the benchmarks the overhead seems to result in the previous implementation being roughly about 1.05x faster (in some cases the overhead is less and in some cases more). After a blocking operation is resumed, the implementation eagerly removes any awaiters it attached to locations to avoid space leaks.

One internal change made in this PR is that uses of Atomic.get and Atomic.set operations that do not need use fences for correctness are made to use functions fenceless_get and fenceless_set, which are just aliases for Atomic.get and Atomic.set. I have another PR #46 that changes those to actually use fenceless operations and introduces some additional optimizations.

This PR also changes the description of the library to reflect the new capabilities by calling kcas "Software transactional memory based on lock-free multi-word compare-and-set". I believe multi-word compare-and-set is less familiar technical jargon to potential users.

@polytypic polytypic linked an issue Feb 28, 2023 that may be closed by this pull request
@polytypic polytypic force-pushed the add-blocking branch 5 times, most recently from 9e6ea54 to 8ba2122 Compare March 7, 2023 09:19
@polytypic polytypic force-pushed the add-blocking branch 6 times, most recently from 00fe8f8 to 1d03f71 Compare March 22, 2023 07:40
@polytypic polytypic force-pushed the main branch 2 times, most recently from 1f32020 to 790267c Compare April 5, 2023 09:52
@polytypic polytypic force-pushed the add-blocking branch 4 times, most recently from c2c780c to 3d8521b Compare April 8, 2023 23:28
@polytypic polytypic changed the title WIP: Add blocking Add blocking Apr 9, 2023
@polytypic polytypic force-pushed the add-blocking branch 8 times, most recently from 3e9287e to 0dce076 Compare April 11, 2023 20:15
@polytypic polytypic force-pushed the add-blocking branch 6 times, most recently from 9c4951f to a13b8d7 Compare April 20, 2023 07:20
@polytypic
Copy link
Contributor Author

kcas-this is with blocking support and kcas-main is without blocking support. As expected, blocking support adds some overhead.

Benchmark 1: kcas-main/_build/default/test/benchmark.exe 1 10000
  Time (mean ± σ):       3.5 ms ±   0.0 ms    [User: 2.6 ms, System: 0.6 ms]
  Range (min … max):     3.5 ms …   4.0 ms    826 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 1 10000
  Time (mean ± σ):       3.6 ms ±   0.0 ms    [User: 2.7 ms, System: 0.6 ms]
  Range (min … max):     3.5 ms …   3.9 ms    822 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-main/_build/default/test/benchmark.exe 1 10000' ran
    1.01 ± 0.01 times faster than 'kcas-this/_build/default/test/benchmark.exe 1 10000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 2 10000
  Time (mean ± σ):       7.7 ms ±   0.0 ms    [User: 6.8 ms, System: 0.6 ms]
  Range (min … max):     7.7 ms …   8.0 ms    386 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 2 10000
  Time (mean ± σ):       8.0 ms ±   0.0 ms    [User: 7.1 ms, System: 0.6 ms]
  Range (min … max):     8.0 ms …   8.3 ms    372 runs
 
Summary
  'kcas-main/_build/default/test/benchmark.exe 2 10000' ran
    1.04 ± 0.01 times faster than 'kcas-this/_build/default/test/benchmark.exe 2 10000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 4 10000
  Time (mean ± σ):      12.9 ms ±   0.1 ms    [User: 11.9 ms, System: 0.7 ms]
  Range (min … max):    12.8 ms …  13.2 ms    230 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 4 10000
  Time (mean ± σ):      13.4 ms ±   0.1 ms    [User: 12.5 ms, System: 0.7 ms]
  Range (min … max):    13.3 ms …  13.8 ms    223 runs
 
Summary
  'kcas-main/_build/default/test/benchmark.exe 4 10000' ran
    1.04 ± 0.01 times faster than 'kcas-this/_build/default/test/benchmark.exe 4 10000'


Benchmark 1: kcas-main/_build/default/test/xt_benchmark.exe 1 10000
  Time (mean ± σ):       7.0 ms ±   0.1 ms    [User: 6.1 ms, System: 0.6 ms]
  Range (min … max):     6.8 ms …   7.3 ms    422 runs
 
Benchmark 2: kcas-this/_build/default/test/xt_benchmark.exe 1 10000
  Time (mean ± σ):       7.3 ms ±   0.1 ms    [User: 6.4 ms, System: 0.7 ms]
  Range (min … max):     7.2 ms …   7.9 ms    407 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-main/_build/default/test/xt_benchmark.exe 1 10000' ran
    1.05 ± 0.01 times faster than 'kcas-this/_build/default/test/xt_benchmark.exe 1 10000'


Benchmark 1: kcas-main/_build/default/test/xt_benchmark.exe 2 10000
  Time (mean ± σ):      12.7 ms ±   0.1 ms    [User: 11.7 ms, System: 0.7 ms]
  Range (min … max):    12.5 ms …  13.2 ms    237 runs
 
Benchmark 2: kcas-this/_build/default/test/xt_benchmark.exe 2 10000
  Time (mean ± σ):      13.0 ms ±   0.1 ms    [User: 12.0 ms, System: 0.7 ms]
  Range (min … max):    12.8 ms …  13.3 ms    227 runs
 
Summary
  'kcas-main/_build/default/test/xt_benchmark.exe 2 10000' ran
    1.02 ± 0.01 times faster than 'kcas-this/_build/default/test/xt_benchmark.exe 2 10000'


Benchmark 1: kcas-main/_build/default/test/xt_benchmark.exe 4 10000
  Time (mean ± σ):      22.7 ms ±   0.2 ms    [User: 21.7 ms, System: 0.7 ms]
  Range (min … max):    22.4 ms …  23.7 ms    133 runs
 
Benchmark 2: kcas-this/_build/default/test/xt_benchmark.exe 4 10000
  Time (mean ± σ):      23.4 ms ±   0.4 ms    [User: 22.3 ms, System: 0.8 ms]
  Range (min … max):    23.0 ms …  25.5 ms    129 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-main/_build/default/test/xt_benchmark.exe 4 10000' ran
    1.03 ± 0.02 times faster than 'kcas-this/_build/default/test/xt_benchmark.exe 4 10000'


Benchmark 1: kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 100000
  Time (mean ± σ):      20.9 ms ±   1.9 ms    [User: 35.0 ms, System: 1.6 ms]
  Range (min … max):    19.1 ms …  31.5 ms    139 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 100000
  Time (mean ± σ):      20.7 ms ±   1.3 ms    [User: 34.1 ms, System: 1.6 ms]
  Range (min … max):    19.6 ms …  29.5 ms    151 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 100000' ran
    1.01 ± 0.11 times faster than 'kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 100000'


Benchmark 1: kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 200000
  Time (mean ± σ):      39.6 ms ±   2.1 ms    [User: 68.7 ms, System: 2.2 ms]
  Range (min … max):    36.0 ms …  41.4 ms    73 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 200000
  Time (mean ± σ):      38.8 ms ±   2.0 ms    [User: 66.3 ms, System: 2.2 ms]
  Range (min … max):    37.2 ms …  42.7 ms    70 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (42.3 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.
 
Summary
  'kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 200000' ran
    1.02 ± 0.08 times faster than 'kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 200000'


Benchmark 1: kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 400000
  Time (mean ± σ):      76.1 ms ±   4.9 ms    [User: 134.8 ms, System: 3.1 ms]
  Range (min … max):    69.8 ms …  85.3 ms    42 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 400000
  Time (mean ± σ):      76.7 ms ±   5.2 ms    [User: 133.9 ms, System: 3.3 ms]
  Range (min … max):    72.4 ms …  91.4 ms    34 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (86.5 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.
 
Summary
  'kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 400000' ran
    1.01 ± 0.09 times faster than 'kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 400000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 1 200000
  Time (mean ± σ):      35.3 ms ±   0.3 ms    [User: 34.3 ms, System: 0.7 ms]
  Range (min … max):    34.5 ms …  35.8 ms    85 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 1 200000
  Time (mean ± σ):      36.5 ms ±   0.2 ms    [User: 35.5 ms, System: 0.7 ms]
  Range (min … max):    36.0 ms …  36.8 ms    82 runs
 
Summary
  'kcas-main/_build/default/test/benchmark.exe 1 200000' ran
    1.03 ± 0.01 times faster than 'kcas-this/_build/default/test/benchmark.exe 1 200000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 2 200000
  Time (mean ± σ):     120.9 ms ±   0.3 ms    [User: 119.6 ms, System: 1.0 ms]
  Range (min … max):   120.5 ms … 121.6 ms    24 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 2 200000
  Time (mean ± σ):     127.1 ms ±   0.1 ms    [User: 125.7 ms, System: 1.0 ms]
  Range (min … max):   126.8 ms … 127.3 ms    23 runs
 
Summary
  'kcas-main/_build/default/test/benchmark.exe 2 200000' ran
    1.05 ± 0.00 times faster than 'kcas-this/_build/default/test/benchmark.exe 2 200000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 4 200000
  Time (mean ± σ):     224.9 ms ±   0.3 ms    [User: 223.4 ms, System: 1.2 ms]
  Range (min … max):   224.5 ms … 225.5 ms    13 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 4 200000
  Time (mean ± σ):     235.3 ms ±   0.3 ms    [User: 233.8 ms, System: 1.2 ms]
  Range (min … max):   234.7 ms … 235.8 ms    12 runs
 
Summary
  'kcas-main/_build/default/test/benchmark.exe 4 200000' ran
    1.05 ± 0.00 times faster than 'kcas-this/_build/default/test/benchmark.exe 4 200000'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90
  Time (mean ± σ):      43.2 ms ±   0.1 ms    [User: 42.0 ms, System: 0.9 ms]
  Range (min … max):    42.9 ms …  43.6 ms    68 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90
  Time (mean ± σ):      44.1 ms ±   0.2 ms    [User: 42.9 ms, System: 0.9 ms]
  Range (min … max):    43.9 ms …  44.7 ms    67 runs
 
Summary
  'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90' ran
    1.02 ± 0.00 times faster than 'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90
  Time (mean ± σ):      27.3 ms ±   0.2 ms    [User: 50.3 ms, System: 1.2 ms]
  Range (min … max):    27.1 ms …  29.0 ms    108 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90
  Time (mean ± σ):      29.5 ms ±   0.2 ms    [User: 54.7 ms, System: 1.3 ms]
  Range (min … max):    29.3 ms …  30.5 ms    100 runs
 
Summary
  'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90' ran
    1.08 ± 0.01 times faster than 'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90
  Time (mean ± σ):      17.3 ms ±   0.3 ms    [User: 57.1 ms, System: 2.0 ms]
  Range (min … max):    16.7 ms …  18.8 ms    159 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90
  Time (mean ± σ):      17.5 ms ±   0.2 ms    [User: 58.0 ms, System: 2.1 ms]
  Range (min … max):    17.1 ms …  18.6 ms    174 runs
 
Summary
  'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90' ran
    1.01 ± 0.02 times faster than 'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10
  Time (mean ± σ):     255.8 ms ±   0.3 ms    [User: 254.3 ms, System: 1.2 ms]
  Range (min … max):   255.4 ms … 256.3 ms    11 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10
  Time (mean ± σ):     261.1 ms ±   0.3 ms    [User: 259.5 ms, System: 1.2 ms]
  Range (min … max):   260.7 ms … 261.6 ms    11 runs
 
Summary
  'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10' ran
    1.02 ± 0.00 times faster than 'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10
  Time (mean ± σ):     157.1 ms ±   0.2 ms    [User: 308.9 ms, System: 1.6 ms]
  Range (min … max):   156.8 ms … 157.4 ms    19 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10
  Time (mean ± σ):     177.3 ms ±   0.2 ms    [User: 349.0 ms, System: 1.7 ms]
  Range (min … max):   176.9 ms … 177.5 ms    16 runs
 
Summary
  'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10' ran
    1.13 ± 0.00 times faster than 'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10
  Time (mean ± σ):      90.0 ms ±   0.6 ms    [User: 344.9 ms, System: 2.7 ms]
  Range (min … max):    88.9 ms …  91.2 ms    33 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10
  Time (mean ± σ):      91.4 ms ±   0.4 ms    [User: 350.6 ms, System: 2.8 ms]
  Range (min … max):    90.5 ms …  92.3 ms    32 runs
 
Summary
  'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10' ran
    1.02 ± 0.01 times faster than 'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10'

@polytypic polytypic marked this pull request as ready for review April 20, 2023 07:57
@polytypic polytypic requested a review from a team April 20, 2023 08:01
@@ -1504,7 +1602,7 @@ experiment where we abort the transaction in case we observe that the values of

```ocaml
# with_updater @@ fun () ->
for _ = 1 to 10_000 do
for _ = 1 to 100_000 do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inherently non-deterministic and I noticed that it occasionally failed on CI. I haven't noticed failures after increasing the number of attempts, but it can of course still happen.

*)
(**)
let fenceless_get = Atomic.get
let fenceless_set = Atomic.set
Copy link
Contributor Author

@polytypic polytypic Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another PR #46 that changes these aliases to actually perform fenceless operations. Having these as fenceless should be safe (because the fences would be redundant), seems to significantly improve performance on ARM (Apple M1), and is also completely internal to the library, so I personally feel that we should just make the optimization.

state : 'a state;
lt : cass;
gt : cass;
mutable awaiters : awaiter list;
Copy link
Contributor Author

@polytypic polytypic Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mutable so that during the determine phase the awaiters can be efficiently copied from the updated locations to be resumed during the release phase.

awaiter
then add_awaiters awaiter casn gt
else stop
| CASN _ as stop -> stop)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_awaiters signals that some location had already been updated by returning the CASN _ descriptor of that location. This way the subsequent call of remove_awaiters (to prevent space leaks) can stop early at that same descriptor.

@@ -0,0 +1,57 @@
open Kcas

type 'a internal = 'a Magic_option.t Loc.t
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Magic_option avoids a level of indirection. I decided to go with this optimization to reduce space usage and because promise is already using some other OCaml magic (as does the Eio Promise implementation).

Copy link

@lyrm lyrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this review, I :

  • read and tested the examples in README
  • read documentation
  • had a look at the added code but I can not say I fully understand kcas algorithm
  • made some tests/tries on my own
  • had a quick look at kcas_data changes

What I did not do :

  • try the changes in kcas_data data structures
  • try/read Kcas_data.Promise implementation

Overall, there is not a lot to say except documentation is (as usual) great and easy to read and obviously, this seems like a great addition to kcas and kcas_data !

src/kcas.mli Outdated
Comment on lines 9 to 18
(** Exception that may be raised to signal that the operation should be
retried, at some point in the future, after the examined shared memory
location or locations have changed.

{b NOTE}: It is important to understand that "{i after}" may effectively
mean "{i immediately}", because it may be the case that the examined
shared memory locations have already changed. *)

val later : unit -> 'a
(** [later ()] is equivalent to [raise Later]. *)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give more information about where this exception is caught (at least a pointer to the corresponding functions ?) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

BTW, it could be a nice addition to odoc to automatically attach a list of references to items. The list of operations like get_as, update, and commit that refer to Later would then be seen directly when looking at Later.

src/kcas.mli Outdated
Comment on lines 30 to 32
(** In [lock_free] mode the algorithm makes sure that at least one domain will
be able to make progress by performing read-only operations as read-write
operations. *)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is a more expensive operation, right ? It may be good to emphasize this with something like

at least one domain will be able to make progress at the cost of performing read-only operations as read-write operations.

Copy link
Contributor Author

@polytypic polytypic Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will adjust the text. The cost is actually non-trivial, but generally speaking mutating locations is more expensive unless avoiding the mutation leads to starvation, which is kind of the difference between the modes. (The commit mechanism is designed to avoid starvation due to the obstruction-free mode by switching to the lock-free mode in case interference is detected.)

README.md Outdated
Comment on lines 112 to 116
let x =
x
|> Loc.get_as @@ fun x ->
Retry.unless (x <> 0);
x
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this is more readable than

let x = Loc.get_as (fun x -> Retry.unless (x<>0); x) x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I guess it is better to avoid unnecessary cute uses of operators in examples.

Comment on lines +277 to +267
val to_blocking : xt:'x t -> (xt:'x t -> 'a option) -> 'a
(** [to_blocking ~xt tx] converts the non-blocking transaction [tx] to a
blocking transaction by retrying on [None]. *)

val to_nonblocking : xt:'x t -> (xt:'x t -> 'a) -> 'a option
(** [to_nonblocking ~xt tx] converts the blocking transaction [tx] to a
non-blocking transaction by returning [None] on retry. *)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like simple but useful functions 👍

Comment on lines -242 to -248
val attempt : ?mode:Mode.t -> 'a tx -> 'a
(** [attempt tx] attempts to atomically perform the transaction over shared
memory locations recorded by calling [tx] with a fresh explicit
transaction log. If used in {!Mode.obstruction_free} may raise
{!Mode.Interference}. Otherwise either raises [Exit] on failure to commit
the transaction or returns the result of the transaction. The default for
[attempt] is {!Mode.lock_free}. *)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this function ? Is it because commit can be aborted by an exception ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I usually try to comment on changes like this, but I apparently missed that.

I decided to remove attempt for two reasons:

  1. attempt can not implement the blocking mechanism by itself. Instead it would potentially raise the Later exception for the user to handle and then it would mean that there should probably be some additional support to be able to await for changes to multiple locations.

  2. I initially provided attempt as a means for users to write their own version of commit (e.g. with additions like timeouts). However, one should be able to achieve pretty much everything via commit already (including timeouts — e.g. setup a location that is written at timeout) and I don't see much use cases for attempt — I never used it myself except in tests.

So, rather than add even more (potentially unused) things to the API, I decided that it is better to just remove attempt. Something like it can be later added back if there are real use cases for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested to see how you add a timeout (this also seems like it could be a good example).

let modify ?backoff loc f = update ?backoff loc f |> ignore [@@inline]
match f before with
| after ->
if before == after then before
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resume_awaiters is not called here because the value has not changed, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an optimization to avoid unnecessarily updating locations. It wasn't implemented before, but with the addition of awaiters it potentially avoids waking up awaiters unnecessarily (as nothing logically changed), which could avoid a lot of unnecessary computation.

src/kcas.ml Outdated
resume_awaiters before state'.awaiters
else update_no_alloc await (Backoff.once backoff) loc state f
| exception Retry.Later ->
let state = new_state (Obj.magic ()) in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not extremely familiar with the use of Obj.magic but why not use before here instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Both state.before and state.after will be overwritten, so using before would work here also. Sure, why not.

@polytypic
Copy link
Contributor Author

polytypic commented Apr 26, 2023

For this review [...]

Thanks for the review!

My plan now is to move the Domain_local_await thing to a separate repository and release it as a separate package on opam and adjust this PR (and related PRs) to use that package.

As has been discussed elsewhere, the idea with Domain_local_await is to provide a mechanism that works today and should also work in the future to implement blocking in kcas and potentially other places like lockfree. Specifically, I want to be able to publish a new version of kcas that potential users can install, today, from opam and then that version of kcas should just work out of the box, allowing users to use blocking and to communicate and synchronize between domains, systhreads, Eio fibers, Domainslib fibers, and anything else that support Domain_local_await.

However, in the future we may end up using some different means to provide blocking support. This is not a problem for me and should not be a problem in general and we can then change kcas to use such a future mechanism when it is available. At that point we can deprecate Domain_local_await and remove support for it from schedulers.

It should also be mentioned that Domain_local_await is something that 99% of users should not need to know about at all. It is just an internal mechanism that allows blocking support. So, if and when we decide what the final interface for such blocking support is, most users should not even notice that it changed (they just upgrade their packages and at some point the Domain_local_await package is no longer used at all).

@kayceesrk
Copy link
Collaborator

kayceesrk commented Apr 26, 2023

Sounds good to me.

99% of users should not need to know about at all.

The 1% will be the implementations of concurrency libraries such as Eio and Domainslib, and any blocking synchronization structures such as promises, kcas, rendezvous channels, etc. That said, by going ahead with the proposed interface for DLA, I'm hoping that we will gain experience by doing it rather than trying to come up with the perfect scheme now.

@polytypic polytypic merged commit 877fe54 into main Apr 28, 2023
1 of 2 checks passed
@polytypic polytypic deleted the add-blocking branch April 28, 2023 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend the Tx mechanism to support non-busy wait or blocking
3 participants