Code comprehension: why don't STW sections keep `all_domains_lock` the whole time? #11073

gasche · 2022-02-28T23:56:44Z

STW sections provide a mutual-exclusion mechanism, but the way they prevent races with new domains being created is subtle: they use a condition variable all_domains_cond to signal the end of their section, paired with an atomic variable stw_leader to test whether a STW section is currently running.

ocaml/runtime/domain.c

Lines 455 to 468 in be2db8e

    
           static void create_domain(uintnat initial_minor_heap_wsize) { 
        
             dom_internal* d = 0; 
        
             caml_domain_state* domain_state; 
        
             struct interruptor* s; 
        
             CAMLassert (domain_self == 0); 
        
             /* take the all_domains_lock so that we can alter the STW participant 
        
                set atomically */ 
        
             caml_plat_lock(&all_domains_lock); 
        
             /* wait until any in-progress STW sections end */ 
        
             while (atomic_load_acq(&stw_leader)) 
        
               caml_plat_wait(&all_domains_cond);

I tried to document (my understanding of) the current synchronization mechanism for STW sections in #11072.

Question: why don't STW sections keep all_domains_lock the whole time? (Instead of taking it once at setup, and once at the end to signal that condition variable.) Then create_domain could just take all_domains_lock, and that would guarantee that all STW sections are done running. It looks like the behavior would be the same (no less parallelism), and the code would be simpler.

cc @ctk21 @kayceesrk

(Note: this question arose from a code-reading party I had today with @Engil, @Armael and @jhjourdan)

The text was updated successfully, but these errors were encountered:

xavierleroy · 2022-03-01T08:56:32Z

One possible reason is that the last domain to exit the STW is not necessarily the first domain to enter it. A POSIX mutex must be unlocked by the thread that locked it.

More generally, the mutex + shared state + condition variable dance can express many more protocols than just mutexes or just semaphores or just Win32 events.

xavierleroy · 2022-03-01T09:00:31Z

On a related note: variables protected by mutexes need not be atomic. So I believe stw_leader doesn't need to be declared atomic, and could have its true pointer type instead of "atomic_uintnat".

gasche · 2022-03-01T09:06:27Z

There are unprotected accesses to stw_leader, see the "check for a leader or try to take the lock" dance at:

ocaml/runtime/domain.c

Lines 1109 to 1137 in be2db8e

    
           int caml_try_run_on_all_domains_with_spin_work( 
        
             void (*handler)(caml_domain_state*, void*, int, caml_domain_state**), 
        
             void* data, 
        
             void (*leader_setup)(caml_domain_state*), 
        
             void (*enter_spin_callback)(caml_domain_state*, void*), 
        
             void* enter_spin_data) 
        
           { 
        
             int i; 
        
             caml_domain_state* domain_state = domain_self->state; 
        
             caml_gc_log("requesting STW"); 
        
             /* Don't touch the lock if there's already a stw leader 
        
               OR we can't get the lock */ 
        
             if (atomic_load_acq(&stw_leader) || 
        
                 !caml_plat_try_lock(&all_domains_lock)) { 
        
               caml_handle_incoming_interrupts(); 
        
               return 0; 
        
             } 
        
             /* see if there is a stw_leader already */ 
        
             if (atomic_load_acq(&stw_leader)) { 
        
               caml_plat_unlock(&all_domains_lock); 
        
               caml_handle_incoming_interrupts(); 
        
               return 0; 
        
             } 
        
             /* we have the lock and can claim the stw_leader */ 
        
             atomic_store_rel(&stw_leader, (uintnat)domain_self);

Maybe this could be done with just a try_lock? If it fails, we give up, if it succeeds we know that we must be the leader.

gasche · 2022-03-01T09:11:17Z

One possible reason is that the last domain to exit the STW is not necessarily the first domain to enter it.

Right, so the leader would have to wait for all other domains to exit the STW sections before releasing the lock, which adds more latency to the leader. (We could do useful "spin callback" work at this point, but most STW sections don't provide a spin callback.)

I'll wait a bit more in case other people would be able to comment, and then propose the best explanation as a documentation PR.

ctk21 · 2022-03-01T10:13:08Z

I believe the optimization at the top of caml_try_run_on_all_domains_with_spin_work in which we poll stw_leader was an optimization born from benchmarking parallel applications with very large numbers of cores. The benchmarks used are here.

It is also possible that the benefit outweighs the complexity; sadly I don't think I can resuscitate the numbers easily.

gasche · 2022-03-01T10:22:46Z

I wrote out of curiosity a patch to make stw_leader non-atomic at
https://github.com/gasche/ocaml/compare/non-atomic-stw-leader-base..non-atomic-stw-leader

Indeed the main change is that the current code does an atomic load before trylock, while the patch must use trylock directly. I suppose that trylock is a bit more expensive that a load-acquire (if only because glibc must test the mutex kind, etc.), so doing an atomic load first is an optimization in the case where many domains try to run a STW concurrently.

For now my plan is to change nothing (not submit my change as a PR), but document this suggested reason for the current code.

ctk21 · 2022-03-01T10:24:39Z

Question: why don't STW sections keep all_domains_lock the whole time? (Instead of taking it once at setup, and once at the end to signal that condition variable.) Then create_domain could just take all_domains_lock, and that would guarantee that all STW sections are done running. It looks like the behavior would be the same (no less parallelism), and the code would be simpler.

Flexibility of the lock/cond pair with state idiom. In particular the pthread leaving the STW section is very likely not to be the one that triggered it.

Also, while your suggestion might simplify create_domain, I'm not (immediately) sure it will simplify terminate_domain as that thread needs to know it can get into the STW section if needed:

ocaml/runtime/domain.c

Lines 1404 to 1439 in 04ddddd

    
               /* take the all_domains_lock to try and exit the STW participant set 
        
                  without racing with a STW section being triggered */ 
        
               caml_plat_lock(&all_domains_lock); 
        
               /* The interaction of termination and major GC is quite subtle. 
        
                * 
        
                * At the end of the major GC, we decide the number of domains to mark and 
        
                * sweep for the next cycle. If a STW section has been started, it will 
        
                * require this domain to participate, which in turn could involve a 
        
                * major GC cycle. This would then require finish marking and sweeping 
        
                * again in order to decrement the globals [num_domains_to_mark] and 
        
                * [num_domains_to_sweep] (see major_gc.c). 
        
                */ 
        
               if (!caml_incoming_interrupts_queued() && 
        
                   domain_state->marking_done && 
        
                   domain_state->sweeping_done) { 
        
                 finished = 1; 
        
                 s->terminating = 0; 
        
                 s->running = 0; 
        
                 /* Remove this domain from stw_domains */ 
        
                 remove_from_stw_domains(domain_self); 
        
                 /* signal the interruptor condition variable 
        
                  * because the backup thread may be waiting on it 
        
                  */ 
        
                 caml_plat_lock(&s->lock); 
        
                 caml_plat_broadcast(&s->cond); 
        
                 caml_plat_unlock(&s->lock); 
        
                 CAMLassert (domain_self->backup_thread_running); 
        
                 domain_self->backup_thread_running = 0; 
        
               } 
        
               caml_plat_unlock(&all_domains_lock);

Dropping the all_domain_lock in the STW initiation (caml_try_run_on_all_domains_with_spin_work ) is making sure that the terminating domains can join that STW section.

gasche · 2022-03-01T10:36:27Z

Thanks! I believe that my questions have been adequately answered. I would like to close this issue, but rather to "fix" it by documenting these discussions in the code itself, possibly in #11072.

@ctk21 ah right, I missed this interaction with domain_terminate. I guess we could try_lock instead of locking there, and handle interrupts as long as we cannot get the lock, just like the STW startup code does? (caml_try_... does not try_lock in a loop, but its callers call it in a loop.)

Note: I'm happy to see that I'm not the only one having trouble remembering the inconsistent function names create_domain and domain_terminate. I felt a little too proud to send a PR just for that, but maybe I should.

ctk21 · 2022-03-01T10:57:17Z

@ctk21 ah right, I missed this interaction with domain_terminate. I guess we could try_lock instead of locking there, and handle interrupts as long as we cannot get the lock, just like the STW startup code does? (caml_try_... does not try_lock in a loop, but its callers call it in a loop.)

Possibly, I'd have to see the proposed code. The mutex/condition pair with state is an idiomatic and flexible way to do things.

gasche · 2022-03-17T13:36:38Z

I have documented the answer to this question in the now-merged #11072: https://github.com/ocaml/ocaml/pull/11072/files#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1258-R1270 . This can now be closed. Thanks!

gasche added the devel-discussion label Feb 28, 2022

gasche added a commit to gasche/ocaml that referenced this issue Mar 2, 2022

document questions from ocaml#11073

4551a1c

gasche added a commit to gasche/ocaml that referenced this issue Mar 4, 2022

document questions from ocaml#11073

319f96c

gasche closed this as completed Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code comprehension: why don't STW sections keep `all_domains_lock` the whole time? #11073

Code comprehension: why don't STW sections keep `all_domains_lock` the whole time? #11073

gasche commented Feb 28, 2022

xavierleroy commented Mar 1, 2022

xavierleroy commented Mar 1, 2022

gasche commented Mar 1, 2022

gasche commented Mar 1, 2022 •

edited

Loading

ctk21 commented Mar 1, 2022

gasche commented Mar 1, 2022

ctk21 commented Mar 1, 2022

gasche commented Mar 1, 2022

ctk21 commented Mar 1, 2022

gasche commented Mar 17, 2022

Code comprehension: why don't STW sections keep all_domains_lock the whole time? #11073

Code comprehension: why don't STW sections keep all_domains_lock the whole time? #11073

Comments

gasche commented Feb 28, 2022

xavierleroy commented Mar 1, 2022

xavierleroy commented Mar 1, 2022

gasche commented Mar 1, 2022

gasche commented Mar 1, 2022 • edited Loading

ctk21 commented Mar 1, 2022

gasche commented Mar 1, 2022

ctk21 commented Mar 1, 2022

gasche commented Mar 1, 2022

ctk21 commented Mar 1, 2022

gasche commented Mar 17, 2022

Code comprehension: why don't STW sections keep `all_domains_lock` the whole time? #11073

Code comprehension: why don't STW sections keep `all_domains_lock` the whole time? #11073

gasche commented Mar 1, 2022 •

edited

Loading