Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor caml_c_thread_(un)register to fix various bugs #12834

Merged
merged 11 commits into from
Jan 12, 2024

Conversation

gadmm
Copy link
Contributor

@gadmm gadmm commented Dec 15, 2023

Another systhread rabbit-hole: caml_c_thread_(un)register. In order to make it better understandable, I refactored it in order to share code with thread_new, thread_start and thread_stop. Differences between thread_new+thread_start and c_thread_register are subtle and come down to when is the GC available to allocate the thread descriptor. We could share even more code if we change a few details, I can share some idea if someone has the time to implement it.

I think several bugs are fixed by this refactor:

@gadmm
Copy link
Contributor Author

gadmm commented Dec 15, 2023

Effort has been made to let this PR be reviewed commit-per-commit.

@gadmm gadmm changed the title Refactor thread register to fix various bugs Refactor caml_c_thread_(un)register to fix various bugs Dec 15, 2023
@gadmm gadmm force-pushed the refactor_thread_register branch 2 times, most recently from 01a9c6e to a6682e6 Compare December 15, 2023 23:00
@gadmm
Copy link
Contributor Author

gadmm commented Dec 17, 2023

Another discrepancy: in case of an uncaught exception, caml_fatal_uncaught_exception is called and the program stops. This is in contrast with regular systhreads: in case of an uncaught exception, the thread stops and an error is output on stderr, and Thread.set_uncaught_exception_handler is obeyed.

Copy link
Contributor

@dustanddreams dustanddreams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! This looks more readable to me, and should help preventing introducing new bugs or inconsistencies in the future.

I can confirm I can no longer reproduce the sporadic failure encountered in parallel/test_c_thread_register.ml on this branch.

@gasche
Copy link
Member

gasche commented Dec 18, 2023

@dustanddreams I'm thankful for your review (I was worried of ending up being the reviewer by lack of any other volunteer), but I am surprised that you managed to do a full review of this tricky code without asking any question / for any clarification or with no particular improvement to suggest. Is the code really that nice, or are you adverse to nitpicking?

@dustanddreams
Copy link
Contributor

Is the code really that nice, or are you adverse to nitpicking?

The individual commits were simple enough and self-contained to be able to assess the impact of each of them.

@gadmm
Copy link
Contributor Author

gadmm commented Dec 18, 2023

Thanks for the review.

It is indeed tricky code if we consider that we cannot rely on the current state of the code, so a global review could be useful in addition of a commit-per-commit review. Some of the bugs look big-if-true (e.g. missing altsigstack) so a helpful method @gasche could be for each bug listed, to confirm that there was a bug, and to convince oneself that it is fixed correctly in this PR. This would make a cross-review that would usefully increase confidence in this code.

thread_lock_release(Dom_c_threads);
/* Release the domain lock the regular way. Note: we cannot receive
an exception here. */
caml_enter_blocking_section_no_pending();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point there are many ways offered to "enter blocking section" or "release the runtime lock", including this one or thread_lock_release but also caml_thread_enter_blocking_section. I'm a bit lost as to which does what. Can you explain why caml_enter_blocking_section(_no_pending) is the right one to use here, and maybe include this explanation in the comment on lines 709-710?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is confusing to me is that "thread lock" and "domain lock" are two notions that are not orthogonal but overlapping. In particular, if I understand correctly, thread_lock_release eventually calls caml_release_domain_lock (among other things).

Reading this code, I have no clear idea whether one of the following holds:

  1. in the success path, the stuff that occurs before caml_enter_blocking_section sums up to the equivalent of what thread_lock_release would have done, so we end up with something that is equivalent to a well-bracketed acquisition and release of the thread lock, or
  2. in the success path, the code is doing less than what thread_lock_release would have done, because the point is to keep ownership of something after the function returns

(My bet would be that (1) holds, that those caml_c_{register,unregister} functions are expected to be called from foreign threads that are not "active" in any respect with regard to the OCaml runtime (or equivalently, that can run concurrently with the runtime or OCaml mutators), and preserve this property on exit.)

Copy link
Contributor Author

@gadmm gadmm Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the two functions must be intimate of the details of the relationship the thread lock and the domain lock. The goal is to construct and destruct an "OCaml" thread from scratch which is why it deals with the details of acquiring/releasing the lock. Out of the four locking operations, though, two should be straightforward. If the C thread is correctly attached to the domain, then one can call caml_enter_blocking_section at the end of having attached it, and caml_leave_blocking_section at the beginning of detaching it. Indeed the point is to have a thread with which caml_enter/leave_blocking_section are functional. I think the current comment meant to explain this is clear enough, but this might be more clear when reading the code directly rather than from the diff.

caml_thread_t th = thread_alloc_and_add();
/* If it fails, we release the lock and return an error. */
if (th == NULL) goto out_err;
thread_init_current(th);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing something here. My reading of the code suggest that thread_alloc_and_add returns thread info block with an uninitialized runtime state (all fields are 0/NULL), and then thread_init_current calls restore_runtime_state(th) which "restores" this unitialized runtime state into Caml_state, resulting in a domain in an incorrect state.

I would naively expect a new thread to somehow get a valid Caml_state (this could be done by caml_init_domain_self for example), and then start by saving this valid runtime state in the thread info block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ah: maybe it is fine if the domain is in an invalid state, because we are giving up control so immediately another thread will be activated and restore its own (valid) state?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit counter-intuitive, but the thread info is initialized with a valid runtime state indeed. I added a comment to clarify this in a new commit.

In reviewing this fact, I have found that I do not understand the initialization of th->trap_barrier_off which is in discrepancy with that in domain_create. This needs to be reviewed by someone knowledgeable before merging. (See the temporary TODO comment.)

(All this initialization between domain_create and caml_thread_new_info could be further factored, but I won't be doing this here.)

Sorry for piling up the commits, but upon the review you prompted, I fixed another issue with error handling in caml_thread_new_info, whereby the NULL return value of allocated memprof state was ignored.

@gasche
Copy link
Member

gasche commented Dec 21, 2023

This PR introduces a detach_from_runtime abstraction that I like, and in particular it makes the final code of unregister very simple. On the other hand, the current register implementation is not symmetric, it remains messy. Could we maybe introduce an attach_to_runtime function that would serve a symmetric purpose?

@jmid
Copy link
Contributor

jmid commented Dec 21, 2023

FYI, I've given this PR a spin on multicoretests as a stress test. It triggered

(none of which are related to this PR) (actually I had a small hope of the PR fixing the former)
and otherwise held up nicely!

@gadmm
Copy link
Contributor Author

gadmm commented Dec 21, 2023

Thanks everyone, I might get back to this in January! Have nice holidays.

@gadmm
Copy link
Contributor Author

gadmm commented Jan 2, 2024

This PR introduces a detach_from_runtime abstraction that I like, and in particular it makes the final code of unregister very simple. On the other hand, the current register implementation is not symmetric, it remains messy. Could we maybe introduce an attach_to_runtime function that would serve a symmetric purpose?

It is factored as much as possible. (Well, I could move create_tick_thread into thread_alloc_and_add, but this would change the error message in the cases where creating the tick thread fails. It's up for discussion.)

There is an inextricable difference between the two ways to attach a new thread. It is essential in one that the thread descriptor is allocated before creating the thread, while in the other, it must be allocated while attaching the thread. The details of when the GC is available are tricky.

@gadmm
Copy link
Contributor Author

gadmm commented Jan 7, 2024

This PR is a bugfix candidate for 5.2. Do you think this could arrive in time, or should I target 5.3 instead?

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code, made some suggestions, and haven't found any important issue. I am happy to approve this bugfix PR and move towards merging in 5.2 -- I will wait a bit in case @gadmm or others want to discuss further changes.

@@ -298,12 +303,11 @@ static caml_thread_t caml_thread_new_info(void)

#ifndef NATIVE_CODE
th->trap_sp_off = 1;
th->trap_barrier_off = 2;
th->trap_barrier_off = 2; /* TODO: 0? trap_barrier_block? */
th->external_raise = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustanddreams would you by chance be able to dive into this trap_barrier_off business and give an opinion on what would be best here?

For now I think that keeping the current behavior unchanged with the TODO question is okay, so we could merge the PR without clarification if useful to make progress, but it would of course be nicer to answer the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc also @fabbing who presented an impressive detailed account of how exception handling works, hence they might be familiar with this already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The question is why is there a discrepancy—the only one—with domain_create here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why there is such a discrepancy here. I think caml_thread_new_info should definitely initialize trap_barrier_block to -1 here, and I also think setting trap_barrier_off to 0 rather than 2 in domain_create would make sense, unless this was done on purpose to prevent barriers from firing too early in a domain's life (in which case a comment explaining that rationale would have been nice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your analysis.

I see that these fields were not there in OCaml 4. They are new, including this initialization code.

So it is worth fixing quickly. A fix would need a second person to review (not me). Hence I recommend fixing it separately and leave the TODO in the code (and create a new issue about it/add it to the systhread issue).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is worth fixing quickly. A fix would need a second person to review (not me). Hence I recommend fixing it separately and leave the TODO in the code (and create a new issue about it/add it to the systhread issue).

There should be two TODO then, one in caml_thread_new_info regarding the non-initialization of trap_barrier_block and one in domain_create regarding the initialization of trap_barrier_off.

Copy link
Contributor

@fabbing fabbing Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think trap_barrier_block should be initialized to -1.
But, if my understanding is correct, trap_barrier_off should stay initialized to 2, to avoid triggering a trap when reaching the default exception handler (as guessed from https://github.com/gadmm/ocaml/blob/f00e977cc5779fa73c42a0db5bcb112de7f16237/runtime/debugger.c#L248). Also this constant is expected in this computation: https://github.com/gadmm/ocaml/blob/f00e977cc5779fa73c42a0db5bcb112de7f16237/runtime/interp.c#L157

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustanddreams As far I can see, trap_barrier_off is already initialized to 0 inside domain_create. Did you mean something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fabbing and @dustanddreams, I'll simply keep the TODO as is for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is now recorded as part of #12399 with a link to the present discussion

@gadmm
Copy link
Contributor Author

gadmm commented Jan 12, 2024

I just rebased.

@gasche gasche merged commit 5348fa9 into ocaml:trunk Jan 12, 2024
9 checks passed
gasche added a commit that referenced this pull request Jan 12, 2024
Refactor caml_c_thread_(un)register to fix various bugs

(cherry picked from commit 5348fa9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TSan-reported race in tests/parallel/test_c_thread_register.ml
6 participants