-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force all domain threads to exit before the main thread #13010
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two questions inline, and two questions here:
-
caml_do_exit
seems to also be called from a side thread under Windows (seecaml_signal_thread
in runtime/win32.c). Is that correct? My uninformed intuition is that it is wrong, but was already wrong before the current change. (But maybe the change makes it worse?) -
You made the choice to call
pthread_exit
on all non-main domains and their backup threads, but not to call their termination logic (the many things done indomain_terminate
after we exit the STW). I suppose this is an intentional design choice? Do we believe that the cleanup-mode-at-exit will be good enough to reclaim all the domain-owned resources that would have been released by the termination logic, or do we risk leaking resources with your approach?
@@ -191,6 +191,7 @@ CAML_RUNTIME_EVENTS_DESTROY(); | |||
#ifndef NATIVE_CODE | |||
caml_debugger(PROGRAM_EXIT, Val_unit); | |||
#endif | |||
caml_terminate_all_domains(); | |||
if (caml_params->cleanup_on_exit) | |||
caml_shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to call caml_terminate_all_domains
unconditionally or only when cleanup_on_exit
is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to do this regardless of whether cleanup will occur, in order not to leave "dangling" threads in programs embedding the caml runtime (as mentioned in the PR description).
But if people are worried of the consequences of this change, I'll move it to caml_shutdown
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't personally have a strong opinion -- I don't know about threads and systems programming enough to tell the amount of risk involved here -- so I am happy to follow your preference unless someone else speaks out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, this adds some time overhead.
On the other hand, I think there is a race in the existing code between caml_terminate_signals
and the delivery of signals to some other (still-running) domain, which this will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this call to caml_terminate_signals
is problematic, but I also think it can be removed.
I am not sure I understand the point about programs embedding the ocaml runtime. These programs will indeed call caml_shutdown
directly rather than caml_do_exit
.
Thanks for noticing this. I have not looked at the windows code paths yet. I will have a look and answer later.
My understanding of the STW rendezvous mechanism requires the callbacks to be limited in what they do. In particular,
which is why I am not doing any of the domain cleanup work here at this point (my intent is to have the main thread do it in |
@@ -191,6 +191,7 @@ CAML_RUNTIME_EVENTS_DESTROY(); | |||
#ifndef NATIVE_CODE | |||
caml_debugger(PROGRAM_EXIT, Val_unit); | |||
#endif | |||
caml_terminate_all_domains(); | |||
if (caml_params->cleanup_on_exit) | |||
caml_shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't personally have a strong opinion -- I don't know about threads and systems programming enough to tell the amount of risk involved here -- so I am happy to follow your preference unless someone else speaks out.
f05742e
to
5d2647e
Compare
This looks harmless to me. That use case is specific to the use of (apparently undocumented) |
The code looks reasonable to me. IIUC, the current PR unconditionally forces all the non-main domains to terminate before the main domain termites, irrespective of whether memory cleanup at exit (MCE) mode is on. Do we know how by how much this slows down multi-domain programs that do not use MCE? My motivation is to allow command-line tools to be written using multiple domains. My experimental observations with OCaml 5 lead me to think that the cost of parallelism creation is still quite high. Without significant work (100s of milliseconds), the parallelism isn't worth it now. I would like to bring this cost down. I wonder whether it is worth considering doing this only when the "memory cleanup at exit" mode is on for performance? |
Yes.
I have not attempted to measure this, and this will depend upon the kind of activity done in each domain thread. This adds the cost of a STW rendez-vous,
This point is open for discussion. |
Sorry, I noticed that the point that I had raised was already discussed here: https://github.com/ocaml/ocaml/pull/13010/files#r1514040251. I'm happy to go with the unconditional wait to terminate all domains. We can come back to this point if and when we solve other issues with scaling short-lived programs with multiple domains. |
For the record, @damiendoligez suggested raising an exception to force the domain thread to exit in a more "natural" way and proper cleanup. However I can't get this to work at the moment (and also, raising the exception in the STW callback will cause it to be taken care of immediately, which goes against the "no mutator code" STW requirement). |
Can you elaborate on this idea? Does this mean that any GC safe point (allocation, poll points, etc) can raise an exception that the user code is expected to handle cleanly? IINM, today, only |
My understanding is that the idea is not to encourage users to try to catch failures at poll points (in fact many of them are not visible in the source), but to use coarser-grained exception handlers, typically using helpers such as The documentation of
There are proposals to change the way we handle asynchronous exceptions. As far as I know there are currently two competing proposals:
In any case I would expect |
Thanks @gasche. I like the idea (2); it makes sense to relegate asynchronous exceptions to a second-class status. @dustanddreams I would like to know the proposed design for the asynchronous exception. In particular, what exception are we planning to raise on the other threads? Will this be
Curious whether this requirement is documented somewhere in the runtime. |
He suggested raising a I was thinking of introducing something similar to Unfortunately, in the bytecode flavour, this reliably causes a |
See #13010 (comment) |
Discontinuing non-main domains with an exception
I would try to do this by handling this "domain exit" as a "pending action" in the sense of the I don't think you need to run a STW yourself to do this. You could presumably just set a global flag in the runtime ("we are in the process of shutting down"), then call One question that arises is how to prevent a sort of race where you ask all current domains to shut down, but there are other domains being spawned at the same time that don't get the memo. Maybe this can be avoided by taking one of the locks on domain states that are needed to spawn (so as to prevent new domain from spawning), or by changing the spawning code to fail once we are in the "we are trying to shut down everyone" global state. Feature scopeI wonder what is the feature scope for the present PR. I think that you started with "valgrind should not report memory leaks due to uncollected domain structures on program exit", and now we are in the territory of "let's make a best effort to let OCaml code cleanup its own logical resources on program exit", which is a much harder problem. I would propose two things moving forward:
|
(I edited my post above to discuss how the "let's shutdown everyone" decision should affect domains that are being spawned concurrently.) |
Out of curiosity, do you know of a particular situation where minor allocations (or poll points) can raise It has already been mentioned that asynchronous exceptions are a thing, but there has been ample discussion and research on actual code usage in the past couple of years, and there is no longer a debate about whether these exceptions are relied-upon in practice. Perhaps you mean to say that out of the box, this sort of exception does not arise. Indeed, async exceptions are currently opt-in. One of the requirements of our current work on async exceptions is that programs built with the assumption that there are no async exceptions should remain correct by keeping async exceptions opt-in. For this reason, the solution cannot be to interrupt threads by default. With this in mind, I propose to split the idea of interruption in two. The default behaviour should be for the runtime running on domain 0 to wait for the completion of other domains, similarly to how the main systhread of a domain waits on other systhreads at termination. The programmer can then interrupt domains with asynchronous exceptions as a deliberate choice. (A variant of exit that exits all domains at once could also be used in some situations, assuming it remains opt-in for the programmer.) Then, the support for interruption in OCaml should be developed to become more convenient and adapted to multicore. Part of this support can be delegated to specialized libraries. The cancellation tokens from memprof-limits can already serve for this purpose in some situations, and serve as a proof of concept. There are many ways in which support for interruption could be improved with better support from the runtime. Since cancellation becomes all the more important in a parallel setting, interruption by asynchronous exceptions is indeed deserving of attention from core developers. I agree that there is an issue with domain termination order even without the "memory cleanup at exit" mode. I'd also be wary about introducing major differences between two modes. By the way, the "memory cleanup at exit" mode already introduces semantic changes with regular mode that makes it unusable in practice especially for library writers1. I was also left with the impression that this mode was unmaintained, cf. the conclusion of #10304. So I think there is a problem to solve worthy of some efforts, but not because of the "memory cleanup at exit" mode, itself not worthy of too much effort. Also, I think it would be beneficial for discussions in the OCaml github that they are not based on hearsay about private experiments in 3rd-party tools that have not yet been proposed for public discussion. It is not true that one could adapt Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In summary:
- There indeed seems to be an issue with domain termination order, and not only in the "memory cleanup at exit" mode. It seems very preferable to have the same behaviour in the two modes. (Edit: this mainly concerns
caml_shutdown
, rather thancaml_do_exit
, but this PR on changescaml_do_exit
.) - Interrupting domains with asynchronous exceptions as a default behaviour breaks an implicit rule that asynchronous exceptions are opt-in, which some want to see preserved.
- The default behaviour could be to wait on the completion of all domains before tearing down the global runtime state.
- Separately, support for interruption could be improved in the runtime.
- The current PR could be adapted into adding some "exit_all_domains" primitive that the programmer calls explicitly if that's the right solution.
I don't know the answer to the question of whether there is a divergence in the failure mode (Out_of_memory vs fatal error). I agree that making the behaviour consistent is a good idea. It would be useful to track this somewhere (perhaps not an issue because we don't know whether the issue even exists). Perhaps just document the prescribed behaviour in code / manual in the appropriate place? |
In the interest of not making this PR more complicated, I would split this into two steps.
Would this be acceptable? Also, it would be interesting to hear from @damiendoligez, who originally suggested the idea of tearing down by raising exceptions. |
Waiting for the completion of all domains seems the cleanest solution to me, as it avoids the many drawbacks of asynchronous termination. However, there is one caveat: the expected behavior of I don't remember why I suggested raising an exception, probably as an attempt at implementing the asynchronous termination while allowing for in-program resource cleanup, but I think that was misguided, since an exception doesn't guarantee termination. As a side note, I totally support making sure that |
This looks like I opened a can of worms larger than expected, and there is no strong consensus on what to do. I think we can all agree that the expected behaviour of a well-written Caml program using multiple domains is to make sure to If we can be bold and write somewhere that terminating the main thread while other domains are still running is a programming error, then the current intent of this PR, forcing remaining domains to If the behaviour of letting domain threads outlive the main thread is accepted, then this PR won't do and I'll withdraw it. |
Please don't, it was not a documented invariant and it's not a particularly nice one for people who are already trying to use domain pools. I've never had to make sure all threads were stopped before exiting the main thread, why would I have to do it for domains (which are a necessity even for those of us who prefer to run threads, as it's necessary to run domains even just to dispatch threads on them). The domain interface is already not the greatest imho, I'd like it if it was not made even more complicated to deal with. |
@c-cube Threads are automatically joined when the main thread exits. Is this what you would like to see for domains? One could distinguish calling It makes sense to halt other domains inside (From this point of view my suggestion of an |
@gadmm are they? let () =
let rd, _wr = Unix.pipe () in
let _t =
Thread.create
(fun () ->
Printf.printf "thread started…\n%!";
let buf = Bytes.create 4 in
let n = Unix.read rd buf 0 4 in
Printf.printf "read %dB\n%!" n)
()
in
Thread.delay 0.1;
() this prints "thread started" and exits after ~ 0.1s. The thread is not joined since it's in a blocking syscall that will never return. Am I missing something? |
Thanks for the example. I missed an important code path: regarding systhreads, domain termination is handled differently for domain 0 and other domains. In the case of domain 0, threads on this domain are not joined during termination. Other points:
To go back at the core of this PR, it makes sense to have a conditional STW to stop all domains in the Lastly, I am missing at this point the link with the scenario from the description of this PR:
There is likely something to do directly in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think caml_shutdown
, rather than caml_do_exit
, should be fixed. In the specific case of callers that are going to call exit
, it makes sense to call pthread_exit
in all threads that hold the domain lock of their domain to avoid interference with clean-up functions.
But I sense a need to go to the blackboard regarding the desired semantics of caml_shutdown
, and also the cleanup_on_exit
mode that relies on the pooled mode.
thread to terminate here. */ | ||
terminate_backup_thread(domain_self); | ||
atomic_fetch_add(&caml_num_domains_running, -1); | ||
pthread_exit(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this is dealing with other threads running on the same domain by not releasing the domain lock. Since the other threads cannot acquire the domain lock, they won't access the runtime state. Doesn't the same reasoning apply for the backup thread? As in, you do not need to terminate it?
Another potential bug (perhaps unrelated to this PR) is that threads calling caml_stat_free
without the domain lock will race with the cleanup of the pools. For this bug specifically, rather than changing this PR to stop all threads (some of which might well be blocked actually), I think one should fix the caml_stat_*
functions to prevent this race, if the pooled mode is to be retained.
while (!caml_domain_alone()) { | ||
pthread_t myself = pthread_self(); | ||
caml_try_run_on_all_domains(stw_terminate_domain, &myself, NULL); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (!caml_domain_alone()) { | |
pthread_t myself = pthread_self(); | |
caml_try_run_on_all_domains(stw_terminate_domain, &myself, NULL); | |
} | |
pthread_t myself = pthread_self(); | |
while (!caml_try_run_on_all_domains(stw_terminate_domain, &myself, NULL)) {}; |
Does this work? It avoids hijacking caml_num_domains_running
.
run domain_terminate() on their own, so we need to ask the backup | ||
thread to terminate here. */ | ||
terminate_backup_thread(domain_self); | ||
atomic_fetch_add(&caml_num_domains_running, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic_fetch_add(&caml_num_domains_running, -1); |
(see other comment)
@@ -191,6 +191,7 @@ CAML_RUNTIME_EVENTS_DESTROY(); | |||
#ifndef NATIVE_CODE | |||
caml_debugger(PROGRAM_EXIT, Val_unit); | |||
#endif | |||
caml_terminate_all_domains(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that caml_shutdown
still runs OCaml code. After you ran pthread_exit
from all the threads holding a domain lock, the runtime is probably in an inconsistent state, and trying to run OCaml code can go badly. If running OCaml code afterwards (even trivial functions), as if on a lone domain, is intended, I expect it to be brittle. Terminating all domains could be moved to caml_shutdown
, taking into account the fact that it's also better not to introduce delays at exit.
But then it is not clear to me whether immediately terminating all domains is always the expected behaviour of caml_shutdown
(e.g. for callers of caml_shutdown
who do not calling exit
immediately thereafter). As it leaves threads waiting for their domain lock, it also fails to achieve the goal of cleaning-up for programs embedding the OCaml runtime.
@@ -191,6 +191,7 @@ CAML_RUNTIME_EVENTS_DESTROY(); | |||
#ifndef NATIVE_CODE | |||
caml_debugger(PROGRAM_EXIT, Val_unit); | |||
#endif | |||
caml_terminate_all_domains(); | |||
if (caml_params->cleanup_on_exit) | |||
caml_shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this call to caml_terminate_signals
is problematic, but I also think it can be removed.
I am not sure I understand the point about programs embedding the ocaml runtime. These programs will indeed call caml_shutdown
directly rather than caml_do_exit
.
Thanks for the reply @damiendoligez. |
running this in a Going back to the initial premise of this PR:
that's great for the use case of embedding OCaml as a .so in another program, but if it breaks (in a horrible deadlock way) some already existing programs relying on the current behavior, I'm less than enthusiastic about it. And yes, this would break in a non-trivial way some of my programs that use domain pools and background threads. |
To clarify: domain 1 waits for the threads to stop, whereas domain 0 waits neither for threads nor the domains. To observe a difference with |
To further clarify and sum up, I think a way forward is the following:
|
Ah!!! That makes a lot more sense to me, thank you for clarifying. |
I'm closing this PR for now. This comment sums up what needs to be done, and this PR does not fit. Thanks for the feedback, everyone! I'll try to come up with a better PR in the not-so-distant future. |
This is a subtask of the "memory cleanup at exit time" work in progress.
In order to be able to correctly release the various memory resources held by the program's domains, I need to make sure than no other threads are running.
In the current state of the runtime, no effort is made to ensure that all domain threads exit at the end of the program execution; it relies upon the operating system to reap all the threads at program termination time. This can be a problem if the caml runtime is embedded in a larger program which does not necessarily exit after running a caml program.
I have experimented with various way of getting these threads to terminate. It turns out that the simplest and probably safest way is to simply force an STW rendezvous, during which all the threads but the main thread will ask their backup thread to exit (if it had been started) and exit themselves. Of course, there is a risk of not releasing locks or mutexes, but since all the other threads will also exit this should not matter.
For the record, there are five tests in the compiler testsuite which end up with threads running and cause this new code path to be taken:
tests/lazy/lazy3.ml
tests/lib-runtime-events/test_dropped_events.ml
tests/parallel/backup_thread.ml
tests/parallel/major_gc_wait_backup.ml
tests/parallel/mctest.ml
This PR passes all the compiler test and also the mulitcoretests.