Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force all domain threads to exit before the main thread #13010

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ _______________
- #12915: Port ThreadSanitizer support to Linux on s390x
(Miod Vallat, review by Tim McGilchrist)

- #13010: Force all domain threads to terminate upon caml program exit.
(Miod Vallat, review by Gabriel Scherer)

### Code generation and optimizations:

### Standard library:
Expand Down
2 changes: 2 additions & 0 deletions runtime/caml/domain.h
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,8 @@ int caml_global_barrier_num_domains(void);
int caml_domain_terminating(caml_domain_state *);
int caml_domain_is_terminating(void);

void caml_terminate_all_domains(void);

#endif /* CAML_INTERNALS */

#ifdef __cplusplus
Expand Down
31 changes: 29 additions & 2 deletions runtime/domain.c
Original file line number Diff line number Diff line change
Expand Up @@ -1091,6 +1091,13 @@ static void install_backup_thread (dom_internal* di)
}
}

static void terminate_backup_thread (dom_internal *di)
{
atomic_store_release(&di->backup_thread_msg, BT_TERMINATE);
/* Wakeup backup thread if it is sleeping */
caml_plat_signal(&di->domain_cond);
}

static void caml_domain_initialize_default(void)
{
return;
Expand Down Expand Up @@ -1624,6 +1631,27 @@ void caml_interrupt_all_signal_safe(void)
}
}

static void stw_terminate_domain(caml_domain_state *domain, void *data,
int participating_count,
caml_domain_state **participating)
{
if (!pthread_equal(pthread_self(), *(pthread_t *)data)) {
/* Domains threads forced to exit here will not have a chance to
run domain_terminate(), so we need to ask the backup thread to
terminate here. */
terminate_backup_thread(domain_self);
pthread_exit(0);
Copy link
Contributor

@gadmm gadmm Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is dealing with other threads running on the same domain by not releasing the domain lock. Since the other threads cannot acquire the domain lock, they won't access the runtime state. Doesn't the same reasoning apply for the backup thread? As in, you do not need to terminate it?

Another potential bug (perhaps unrelated to this PR) is that threads calling caml_stat_free without the domain lock will race with the cleanup of the pools. For this bug specifically, rather than changing this PR to stop all threads (some of which might well be blocked actually), I think one should fix the caml_stat_* functions to prevent this race, if the pooled mode is to be retained.

}
dustanddreams marked this conversation as resolved.
Show resolved Hide resolved
}

void caml_terminate_all_domains(void)
{
if (!caml_domain_alone()) {
pthread_t myself = pthread_self();
caml_try_run_on_all_domains(stw_terminate_domain, &myself, NULL);
}
}

/* To avoid any risk of forgetting an action through a race,
[caml_reset_young_limit] is the only way (apart from setting
young_limit to -1 for immediate interruption) through which
Expand Down Expand Up @@ -1972,8 +2000,7 @@ static void domain_terminate (void)
/* signal the domain termination to the backup thread
NB: for a program with no additional domains, the backup thread
will not have been started */
atomic_store_release(&domain_self->backup_thread_msg, BT_TERMINATE);
caml_plat_signal(&domain_self->domain_cond);
terminate_backup_thread(domain_self);
caml_plat_unlock(&domain_self->domain_lock);

caml_plat_assert_all_locks_unlocked();
Expand Down
1 change: 1 addition & 0 deletions runtime/sys.c
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ CAML_RUNTIME_EVENTS_DESTROY();
#ifndef NATIVE_CODE
caml_debugger(PROGRAM_EXIT, Val_unit);
#endif
caml_terminate_all_domains();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that caml_shutdown still runs OCaml code. After you ran pthread_exit from all the threads holding a domain lock, the runtime is probably in an inconsistent state, and trying to run OCaml code can go badly. If running OCaml code afterwards (even trivial functions), as if on a lone domain, is intended, I expect it to be brittle. Terminating all domains could be moved to caml_shutdown, taking into account the fact that it's also better not to introduce delays at exit.

But then it is not clear to me whether immediately terminating all domains is always the expected behaviour of caml_shutdown (e.g. for callers of caml_shutdown who do not calling exit immediately thereafter). As it leaves threads waiting for their domain lock, it also fails to achieve the goal of cleaning-up for programs embedding the OCaml runtime.

if (caml_params->cleanup_on_exit)
caml_shutdown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call caml_terminate_all_domains unconditionally or only when cleanup_on_exit is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to do this regardless of whether cleanup will occur, in order not to leave "dangling" threads in programs embedding the caml runtime (as mentioned in the PR description).

But if people are worried of the consequences of this change, I'll move it to caml_shutdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't personally have a strong opinion -- I don't know about threads and systems programming enough to tell the amount of risk involved here -- so I am happy to follow your preference unless someone else speaks out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, this adds some time overhead.

On the other hand, I think there is a race in the existing code between caml_terminate_signals and the delivery of signals to some other (still-running) domain, which this will fix.

Copy link
Contributor

@gadmm gadmm Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this call to caml_terminate_signals is problematic, but I also think it can be removed.

I am not sure I understand the point about programs embedding the ocaml runtime. These programs will indeed call caml_shutdown directly rather than caml_do_exit.

#ifdef _WIN32
Expand Down
Loading