Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code comprehension: why don't STW sections keep all_domains_lock the whole time? #11073

Closed
gasche opened this issue Feb 28, 2022 · 10 comments
Closed

Comments

@gasche
Copy link
Member

gasche commented Feb 28, 2022

STW sections provide a mutual-exclusion mechanism, but the way they prevent races with new domains being created is subtle: they use a condition variable all_domains_cond to signal the end of their section, paired with an atomic variable stw_leader to test whether a STW section is currently running.

ocaml/runtime/domain.c

Lines 455 to 468 in be2db8e

static void create_domain(uintnat initial_minor_heap_wsize) {
dom_internal* d = 0;
caml_domain_state* domain_state;
struct interruptor* s;
CAMLassert (domain_self == 0);
/* take the all_domains_lock so that we can alter the STW participant
set atomically */
caml_plat_lock(&all_domains_lock);
/* wait until any in-progress STW sections end */
while (atomic_load_acq(&stw_leader))
caml_plat_wait(&all_domains_cond);

I tried to document (my understanding of) the current synchronization mechanism for STW sections in #11072.

Question: why don't STW sections keep all_domains_lock the whole time? (Instead of taking it once at setup, and once at the end to signal that condition variable.) Then create_domain could just take all_domains_lock, and that would guarantee that all STW sections are done running. It looks like the behavior would be the same (no less parallelism), and the code would be simpler.

cc @ctk21 @kayceesrk

(Note: this question arose from a code-reading party I had today with @Engil, @Armael and @jhjourdan)

@xavierleroy
Copy link
Contributor

One possible reason is that the last domain to exit the STW is not necessarily the first domain to enter it. A POSIX mutex must be unlocked by the thread that locked it.

More generally, the mutex + shared state + condition variable dance can express many more protocols than just mutexes or just semaphores or just Win32 events.

@xavierleroy
Copy link
Contributor

On a related note: variables protected by mutexes need not be atomic. So I believe stw_leader doesn't need to be declared atomic, and could have its true pointer type instead of "atomic_uintnat".

@gasche
Copy link
Member Author

gasche commented Mar 1, 2022

There are unprotected accesses to stw_leader, see the "check for a leader or try to take the lock" dance at:

ocaml/runtime/domain.c

Lines 1109 to 1137 in be2db8e

int caml_try_run_on_all_domains_with_spin_work(
void (*handler)(caml_domain_state*, void*, int, caml_domain_state**),
void* data,
void (*leader_setup)(caml_domain_state*),
void (*enter_spin_callback)(caml_domain_state*, void*),
void* enter_spin_data)
{
int i;
caml_domain_state* domain_state = domain_self->state;
caml_gc_log("requesting STW");
/* Don't touch the lock if there's already a stw leader
OR we can't get the lock */
if (atomic_load_acq(&stw_leader) ||
!caml_plat_try_lock(&all_domains_lock)) {
caml_handle_incoming_interrupts();
return 0;
}
/* see if there is a stw_leader already */
if (atomic_load_acq(&stw_leader)) {
caml_plat_unlock(&all_domains_lock);
caml_handle_incoming_interrupts();
return 0;
}
/* we have the lock and can claim the stw_leader */
atomic_store_rel(&stw_leader, (uintnat)domain_self);

Maybe this could be done with just a try_lock? If it fails, we give up, if it succeeds we know that we must be the leader.

@gasche
Copy link
Member Author

gasche commented Mar 1, 2022

One possible reason is that the last domain to exit the STW is not necessarily the first domain to enter it.

Right, so the leader would have to wait for all other domains to exit the STW sections before releasing the lock, which adds more latency to the leader. (We could do useful "spin callback" work at this point, but most STW sections don't provide a spin callback.)

I'll wait a bit more in case other people would be able to comment, and then propose the best explanation as a documentation PR.

@ctk21
Copy link
Contributor

ctk21 commented Mar 1, 2022

I believe the optimization at the top of caml_try_run_on_all_domains_with_spin_work in which we poll stw_leader was an optimization born from benchmarking parallel applications with very large numbers of cores. The benchmarks used are here.

It is also possible that the benefit outweighs the complexity; sadly I don't think I can resuscitate the numbers easily.

@gasche
Copy link
Member Author

gasche commented Mar 1, 2022

I wrote out of curiosity a patch to make stw_leader non-atomic at
https://github.com/gasche/ocaml/compare/non-atomic-stw-leader-base..non-atomic-stw-leader

Indeed the main change is that the current code does an atomic load before trylock, while the patch must use trylock directly. I suppose that trylock is a bit more expensive that a load-acquire (if only because glibc must test the mutex kind, etc.), so doing an atomic load first is an optimization in the case where many domains try to run a STW concurrently.

For now my plan is to change nothing (not submit my change as a PR), but document this suggested reason for the current code.

@ctk21
Copy link
Contributor

ctk21 commented Mar 1, 2022

Question: why don't STW sections keep all_domains_lock the whole time? (Instead of taking it once at setup, and once at the end to signal that condition variable.) Then create_domain could just take all_domains_lock, and that would guarantee that all STW sections are done running. It looks like the behavior would be the same (no less parallelism), and the code would be simpler.

Flexibility of the lock/cond pair with state idiom. In particular the pthread leaving the STW section is very likely not to be the one that triggered it.

Also, while your suggestion might simplify create_domain, I'm not (immediately) sure it will simplify terminate_domain as that thread needs to know it can get into the STW section if needed:

ocaml/runtime/domain.c

Lines 1404 to 1439 in 04ddddd

/* take the all_domains_lock to try and exit the STW participant set
without racing with a STW section being triggered */
caml_plat_lock(&all_domains_lock);
/* The interaction of termination and major GC is quite subtle.
*
* At the end of the major GC, we decide the number of domains to mark and
* sweep for the next cycle. If a STW section has been started, it will
* require this domain to participate, which in turn could involve a
* major GC cycle. This would then require finish marking and sweeping
* again in order to decrement the globals [num_domains_to_mark] and
* [num_domains_to_sweep] (see major_gc.c).
*/
if (!caml_incoming_interrupts_queued() &&
domain_state->marking_done &&
domain_state->sweeping_done) {
finished = 1;
s->terminating = 0;
s->running = 0;
/* Remove this domain from stw_domains */
remove_from_stw_domains(domain_self);
/* signal the interruptor condition variable
* because the backup thread may be waiting on it
*/
caml_plat_lock(&s->lock);
caml_plat_broadcast(&s->cond);
caml_plat_unlock(&s->lock);
CAMLassert (domain_self->backup_thread_running);
domain_self->backup_thread_running = 0;
}
caml_plat_unlock(&all_domains_lock);

Dropping the all_domain_lock in the STW initiation (caml_try_run_on_all_domains_with_spin_work ) is making sure that the terminating domains can join that STW section.

@gasche
Copy link
Member Author

gasche commented Mar 1, 2022

Thanks! I believe that my questions have been adequately answered. I would like to close this issue, but rather to "fix" it by documenting these discussions in the code itself, possibly in #11072.

@ctk21 ah right, I missed this interaction with domain_terminate. I guess we could try_lock instead of locking there, and handle interrupts as long as we cannot get the lock, just like the STW startup code does? (caml_try_... does not try_lock in a loop, but its callers call it in a loop.)

Note: I'm happy to see that I'm not the only one having trouble remembering the inconsistent function names create_domain and domain_terminate. I felt a little too proud to send a PR just for that, but maybe I should.

@ctk21
Copy link
Contributor

ctk21 commented Mar 1, 2022

@ctk21 ah right, I missed this interaction with domain_terminate. I guess we could try_lock instead of locking there, and handle interrupts as long as we cannot get the lock, just like the STW startup code does? (caml_try_... does not try_lock in a loop, but its callers call it in a loop.)

Possibly, I'd have to see the proposed code. The mutex/condition pair with state is an idiomatic and flexible way to do things.

gasche added a commit to gasche/ocaml that referenced this issue Mar 2, 2022
gasche added a commit to gasche/ocaml that referenced this issue Mar 4, 2022
@gasche
Copy link
Member Author

gasche commented Mar 17, 2022

I have documented the answer to this question in the now-merged #11072: https://github.com/ocaml/ocaml/pull/11072/files#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1258-R1270 . This can now be closed. Thanks!

@gasche gasche closed this as completed Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants