multicore: expose the domain index for advanced use-cases#13171
Conversation
|
Thanks a lot for having taken the time to explain things so well, as a
non-expert I very much appreciate.
So far I didn't manage to understand why we'd prefer to use non-unique
identifiers over those almost-guaranteed to be unique (modulo the 32bits
overflows)?
Sorry if I missed something obvious.
|
kayceesrk
left a comment
There was a problem hiding this comment.
I like the idea. The code looks simple -- the feature already exists in the runtime and this PR only exposes the functionality to the user.
expert users to implement data structures with per-domain values, which come up frequently in various concurrent designs.
It would be useful to document in this PR examples of such data structures (if you have a couple of them handy).
| } | ||
|
|
||
| /* The index of the current domain. It is an integer between [0] and | ||
| [caml_num_domains_running] unique among currently-running domains. |
There was a problem hiding this comment.
I suppose you meant to write
| [caml_num_domains_running] unique among currently-running domains. | |
| [Max_domains] unique among currently-running domains. |
since caml_num_domains_running is, well, the number of currently running domains, i.e. the number of valid entries in all_domains[].
There was a problem hiding this comment.
I want to say something more precise than Max_domains (and also, we don't expose the existence of Max_domains to users, which I think is the better choice as it would tie us to an implementation decision that could become a limitation in the future). But you are correct that caml_num_domains_running is not quite right. The index of a domain D is in the interval [0; n] where n was number of running domains at the time D was created.
(I am trying to be precise about how "dense" the indexing is, it is more dense than just [0; Max_domains], but also not set in stone implementation details about how indices are chosen at domain creation time.)
There was a problem hiding this comment.
Here is something more precise than Max_domains and more correct than what I wrote:
It is an integer unique among
currently-running domains, in the interval [0; N] where N is the
peak number of domains running simultaneously so far.
|
|
||
| val self_index : unit -> int | ||
| (** The index of the current domain. It is an integer between [0] and | ||
| and the number of currently running domains, unique among |
There was a problem hiding this comment.
Same comment applies here too.
| and the number of currently running domains, unique among | |
| and the maximum number of domains, unique among |
|
To test the theory that this is reasonably useful to implement concurrent data structure, I wrote a small microbenchmark that implements a simplified form of DLS using domain indices: https://github.com/gasche/ocaml/blob/domain-index-bench/bench.ml . The microbenchmark implements integer-only domain-local storage (each instance stores one integer per domain) with the following interface: module type IntegerDLS = sig
type key
val new_key : unit -> key
val get : key -> int (* default value: 0 *)
val set : key -> int -> unit
endThe implementation on top of module IndexDLS : IntegerDLS = struct
type key = int Atomic.t array Atomic.t
let new_key () = Atomic.make [| |]
let rec get_ref_slow key idx =
let arr = Atomic.get key in
if idx < Array.length arr then arr.(idx)
else begin
let new_arr =
Array.init (max 1 (2 * idx)) (fun i ->
if i < Array.length arr then arr.(i)
else Atomic.make_contended 0
)
in
if Atomic.compare_and_set key arr new_arr
then new_arr.(idx)
else get_ref_slow key idx
end
let get_ref key =
let idx = Domain.self_index () in
let arr = Atomic.get key in
if idx < Array.length arr then arr.(idx)
else get_ref_slow key idx
let get key = Atomic.get (get_ref key)
let set key v = Atomic.set (get_ref key) v
endI then compare the performance of this to the performance of Domain.DLS on a silly micro-benchmark. (Create one DLS key, spawn N domains which increments their keyed value R times in a loop.) On this silly micro-benchmark, testing with 2 to 8 domains on my machine, this implementation is 10-20% faster than the stdlib DLS implementation. I wouldn't read too much into the result (this is a very small difference for a microbenchmark, just as likely to be a measurement bias than an actual performance difference between the two), but qualitatively it suggests that this domain index enables reasonably efficient domain-specific structures. |
|
For other examples of domain-indexed structures: I am not at all an expert in concurrent data structures. @polytypic implemented something fairly similar as Multicore_magaic.instantaneous_domain_index, which he uses in several places (a github search finds occurrences in Saturn, kcas and picos.) Representative code examples:
(I don't understand the use-case in picos_hashtbl.) |
The typical use-case is to build associative (mutable) maps from domains to values. With a dense enough domain index (it is in the interval [0; n[ for some small enough n), I can just use an array. If I get unique identifiers that may be arbitrarily large, using an array would waste memory and have bad locality, so I may prefer to use a hashtable instead, but then random access is noticeably more expensive. (The identifiers of domains can grow large if the program regularly terminates domains and starts new domains, so that the total number of distinct domains created is much larger than the maximum number of live domains at any point in time. One could argue that one should not do this, and instead spawn a fixed number of domains at the beginning of the application and terminate them all at the end. But this is a global decision of the implementor of the final application, the author of concurrent libraries has no control over the domain-creation policy.) |
| } | ||
|
|
||
| /* The index of the current domain. It is an integer unique among | ||
| currently-running domains, in the interval [0; N] where N is the |
There was a problem hiding this comment.
| currently-running domains, in the interval [0; N] where N is the | |
| currently-running domains, in the interval [0; N[ where N is the |
There was a problem hiding this comment.
I think the [0;N[ notation for open intervals is a french-speaking convention. Though I like it, the rest of the world seems to use [0;N). To avoid cultural fights perhaps [0;N-1] is best :-)
|
Thanks a lot for your additional explanations, @gasche.
I think I'd still prefer unique ids, or maybe I'd like to have the
choice of which ones I'd like to use, but I also think, as a total
newcomer to the topic, the previously expressed opinion can safely be
ignored. :)
|
Yes, unique ids are already exposed to users (at the OCaml level this is |
|
There is a weird MSCV CI failure that looks like this: It is independent from the present PR and we should just ignore it. Besides, I wonder if I should go ahead and merge -- it has been approved and reviewed -- or whether there is more discussion to be had. I don't expect to get feedback from @polytypic (that would be lovely of course, but oh well), and I don't know if there are other people that would have arguments in favor or against. My impression is that at worst we find out that this is not as useful as I hoped, and we exposed a small amount of unnecessary surface. So the risk are very limited, and it's not worth =thinking too hard about this, I should just go ahead and merge. |
|
Ah, but I forgot: this is a stdlib change so it needs two maintainer approvals. |
| Caml_inline int caml_domain_index(void) | ||
| { | ||
| return Caml_state->id; | ||
| } |
There was a problem hiding this comment.
This function is not mentioned in the changelog. If it is meant to be public, shouldn't it be documented in the manual?
There was a problem hiding this comment.
This is an excellent question. I looked at it but ended up doing nothing for the following reasons:
- as far as I know, there is no documentation in the manual for the C interface to the domain machinery (there is a chapter on multicore programming, that is pure OCaml; and there is a chapter on the C FFI that mostly does not mention domains)
- this function is in CAML_INTERNALS, as the rest of domain.h, so technically users are not supposed to rely on it
In this context one can wonder whether we should bother exposing this as a function at all -- especially since the definition is trivial. I think that it makes sense to have as an abstraction, for the same reasons that it is useful in Domain, and I could see people writing low-level C extensions for concurrent programming (using CAML_INTERNALS) using it in a way that would be clearer and more forward-compatible as relying on Caml_state->id directly.
daf964a to
7407878
Compare
|
Gabriel Scherer (2024/05/16 04:09 -0700):
> I think I'd still prefer unique ids, or maybe I'd like to have the
choice of which ones I'd like to use
Yes, unique ids are already exposed to users (at the OCaml level this
is `(Domain.self () :> int)`).
Ah sorry I thought htey weren't or you were replacing them by non-unique
ids.
|
This exposes `caml_ml_domain_index` as `Domain.self_index` for use cases such as avoiding contention based using arrays with per domain elements. See ocaml/ocaml#13171 for the upstream version.
This exposes `caml_ml_domain_index` as `Domain.self_index` for use cases such as avoiding contention based on using arrays with per domain elements. See ocaml/ocaml#13171 for the upstream version.
This exposes `caml_ml_domain_index` as `Domain.self_index` for use cases such as avoiding contention based on using arrays with per domain elements. See ocaml/ocaml#13171 for the upstream version.
This exposes `caml_ml_domain_index` as `Domain.self_index` for use cases such as avoiding contention based on using arrays with per domain elements. See ocaml/ocaml#13171 for the upstream version.
Context
The multicore runtime of OCaml uses two different notion of "identifier" for a domain:
Caml_state->idis the index of the domain in the table of current domains. It is between 0 and the number of running domains at the time the domain was created. Spawning a new domain may reuse the index of another domain that has already terminated.Caml_state->unique_idis a unique identifier generated at domain startup time, almost-guaranteed to be distinct from the identifiers of all other domains (almost: overflows on 32bit systems can lose this guarantee).But then the runtime only exposes
Caml_state->unique_idto users, which is more principled.Proposed change: expose (non-unique) domain indices
The present PR exposes the non-unique identifiers
Caml_state->idto users (in C and in OCaml), and calls it the "domain index".The purpose of this simple change is to make it easier for expert users to implement data structures with per-domain values, which come up frequently in various concurrent designs. This can be done with Domain.DLS, but there is an ongoing discussion about changing the semantics of Domain.DLS to make it thread-local which would make it less suitable for certain use-cases. (See in particular the remnants of @polytypic's comments at #12719 (comment) )
Exposing domain indices is a simpler, lower-level mechanism that puts advanced users in control of their implementation choices. They can experiment outside the standard library and runtime, with a useful building block that we had been hiding from them so far.