-
Notifications
You must be signed in to change notification settings - Fork 1.1k
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code comprehension: why don't STW sections keep all_domains_lock
the whole time?
#11073
Comments
One possible reason is that the last domain to exit the STW is not necessarily the first domain to enter it. A POSIX mutex must be unlocked by the thread that locked it. More generally, the mutex + shared state + condition variable dance can express many more protocols than just mutexes or just semaphores or just Win32 events. |
On a related note: variables protected by mutexes need not be atomic. So I believe |
There are unprotected accesses to Lines 1109 to 1137 in be2db8e
Maybe this could be done with just a |
Right, so the leader would have to wait for all other domains to exit the STW sections before releasing the lock, which adds more latency to the leader. (We could do useful "spin callback" work at this point, but most STW sections don't provide a spin callback.) I'll wait a bit more in case other people would be able to comment, and then propose the best explanation as a documentation PR. |
I believe the optimization at the top of It is also possible that the benefit outweighs the complexity; sadly I don't think I can resuscitate the numbers easily. |
I wrote out of curiosity a patch to make Indeed the main change is that the current code does an atomic load before For now my plan is to change nothing (not submit my change as a PR), but document this suggested reason for the current code. |
Flexibility of the lock/cond pair with state idiom. In particular the pthread leaving the STW section is very likely not to be the one that triggered it. Also, while your suggestion might simplify Lines 1404 to 1439 in 04ddddd
Dropping the all_domain_lock in the STW initiation (caml_try_run_on_all_domains_with_spin_work ) is making sure that the terminating domains can join that STW section.
|
Thanks! I believe that my questions have been adequately answered. I would like to close this issue, but rather to "fix" it by documenting these discussions in the code itself, possibly in #11072. @ctk21 ah right, I missed this interaction with Note: I'm happy to see that I'm not the only one having trouble remembering the inconsistent function names |
Possibly, I'd have to see the proposed code. The mutex/condition pair with state is an idiomatic and flexible way to do things. |
I have documented the answer to this question in the now-merged #11072: https://github.com/ocaml/ocaml/pull/11072/files#diff-67115925103982a8ebeb085cfab5ef31a182c9a442bc51e053934364d3750dafR1258-R1270 . This can now be closed. Thanks! |
STW sections provide a mutual-exclusion mechanism, but the way they prevent races with new domains being created is subtle: they use a condition variable
all_domains_cond
to signal the end of their section, paired with an atomic variablestw_leader
to test whether a STW section is currently running.ocaml/runtime/domain.c
Lines 455 to 468 in be2db8e
I tried to document (my understanding of) the current synchronization mechanism for STW sections in #11072.
Question: why don't STW sections keep
all_domains_lock
the whole time? (Instead of taking it once at setup, and once at the end to signal that condition variable.) Thencreate_domain
could just takeall_domains_lock
, and that would guarantee that all STW sections are done running. It looks like the behavior would be the same (no less parallelism), and the code would be simpler.cc @ctk21 @kayceesrk
(Note: this question arose from a code-reading party I had today with @Engil, @Armael and @jhjourdan)
The text was updated successfully, but these errors were encountered: