Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain.DLS is not thread-safe #12677

Closed
polytypic opened this issue Oct 19, 2023 · 3 comments · Fixed by #12889
Closed

Domain.DLS is not thread-safe #12677

polytypic opened this issue Oct 19, 2023 · 3 comments · Fixed by #12889

Comments

@polytypic
Copy link

polytypic commented Oct 19, 2023

The current Domain.DLS implementation is not (sys)thread-safe, because the implementations of the various state updating operations include safe points during which the runtime may switch between threads. I'll briefly highlight some specific scenarios of what can go wrong.

Consider the implementation of Domain.DLS.get:

ocaml/stdlib/domain.ml

Lines 120 to 127 in e397ed2

let get (idx, init) =
let st = maybe_grow idx in
let v = st.(idx) in
if v == unique_value then
let v' = Obj.repr (init ()) in
st.(idx) <- (Sys.opaque_identity v');
Obj.magic v'
else Obj.magic v

Two threads on a single domain that concurrently get with the same key before init has been called may both end up calling init (so init would be called twice for a single key). That can happen because maybe_grow (which potentially allocates) includes a safe point and because init (which often allocates) itself may include a safe point. What can happen then is that the two calls of get return two different results (e.g. two different mutable records), one of which is stored in DLS.

Consider the implementation of the internal Domain.DLS.maybe_grow:

ocaml/stdlib/domain.ml

Lines 98 to 111 in e397ed2

let maybe_grow idx =
let st = get_dls_state () in
let sz = Array.length st in
if idx < sz then st
else begin
let rec compute_new_size s =
if idx < s then s else compute_new_size (2 * s)
in
let new_sz = compute_new_size sz in
let new_st = Array.make new_sz unique_value in
Array.blit st 0 new_st 0 sz;
set_dls_state new_st;
new_st
end

Two threads on a single domain that concurrently get with two different keys both call maybe_grow and may both end up allocating a new array for the DLS. This is possible because maybe_grow includes a safe point. Only one of the two arrays eventually ends up being stored as the DLS and one of the get operations ends up being undone. Furthermore if the two threads also perform set operations, some of those may end up being undone.

Here is an example interleaving (variation of the above) where the effects of DLS operations performed by Thread B vanish:

  Thread A                 Thread B
  get x starts

    <--- switch at a safe point --->
  
                           get y starts
                           get y ends
                           set y starts
                           set y ends  

    <--- switch at a safe point --->
  
  set_dls_state
  get x ends

The effects of both get y and set y vanish. Even just having get y vanish is a problem, because get may call init, which may return a mutable object, which might be mutated or stored outside of the DLS.

@kayceesrk
Copy link
Contributor

@polytypic The issue isn't fixed yet. Unsure whether the issue has been accidentally closed.

@gasche
Copy link
Member

gasche commented Jan 8, 2024

I would like to see progress on #12724, but this is at best for 5.3. I think that there is a real bug here that we could fix for 5.2, and I will look at it -- this is related to some of the thread-safety musings I had with Dynarray.

@gasche
Copy link
Member

gasche commented Jan 8, 2024

I implemented a candidate fix for 5.2 in #12889, but this is not tested to fix the problem. (The current issue does not contain a repro case that I could run on my branch.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants