Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed implementation of Threads in terms of Domain and Atomic #240

Closed
wants to merge 2 commits into from

Conversation

jhwoodyatt
Copy link

@jhwoodyatt jhwoodyatt commented Apr 21, 2019

This is a request for review and comment on a candidate for implementing the Threads library in terms of the new Domain and and Atomic modules in the Multicore OCaml standard library. This change, when combined with my other patch for Unix.system to not use Unix.fork makes the stock Dune 1.9.2 package compile and run properly on Multicore OCaml. (Or at least, it did until something in the master branch related to Lazy values introduced a regression.)

Some caveats are worth noting up front:

Firstly, I've made a half-hearted attempt at introducing this as a third library without overwriting the existing vmthread and systhread implementations. I noticed this decision is visible when trying to compile the ounit external package, so I'm anticipating the need to revisit some of these interoperability concerns.

Secondly, I've tried to improve the assurance of correctness by using the unit tests, but unfortunately, this pull request does not currently pass all of them reliably, and moreover, the only reason it can pass most of them at all is that I inserted a call to Gc.minor in the blocking section entrance hook. Obviously, that's a suboptimal workaround. Also, the swapchan unit test looks to my eye like it has a concurrency error of its own, and can only pass reliably if the Thread library is expected to have some guarantee about context switches that I don't think it actually promises. In short, I think it only accidentally passes reliably in the mainline monocore distribution.

(This introductory message is updated from its previous content.)

@@ -74,8 +74,10 @@ static void init_segments(void)
}

/* These are termination hooks used by the systhreads library */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not the most friendly way to remove this logic. Should properly delete this logic if it's really obsolete.

@@ -128,10 +130,12 @@ value caml_startup_common(char_os **argv, int pooling)
else
exe_name = caml_search_exe_in_path(exe_name);
caml_init_argv(exe_name, argv);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here.

byterun/domain.c Outdated
@@ -1055,7 +1054,9 @@ CAMLprim value caml_ml_domain_yield_until(value t)
if (ts < caml_time_counter ()) {
ret = Val_int(0); /* Domain.Sync.Timeout */
break;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection is that this change was necessary to address an intermittently observed deadlock failure in the threads unit tests. Unfortunately, the test for pr5325 is still failing deterministically. Not sure why. It does seem to be deadlocking on this same call to caml_plat_timedwait.

configure Outdated
@@ -1280,7 +1280,7 @@ config UNIX_OR_WIN32 "$unix_or_win32"
config UNIXLIB "$unixlib"
config GRAPHLIB "$graphlib"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here are provisional. Not sure how this should be addressed properly.

@@ -0,0 +1,276 @@
(**************************************************************************)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just basically copied the Event library from systhreads. It's weird, actually, there are two Event library implementations already. This is a copy-paste that adds a third.

@@ -0,0 +1,46 @@
(** Lightweight threads. *)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs Doxygen.

@@ -0,0 +1,6 @@
type t
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs Doxygen.

@@ -0,0 +1,6 @@
type t
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs Doxygen.

@jhwoodyatt
Copy link
Author

jhwoodyatt commented Dec 11, 2019

— This branch is not ready for merge. —

I have reimplemented this work with two noteworthy improvements:

  1. It passes the test suite most of the time (and fails for reasons that seem to be either unrelated to the Thread API or because the test suite has what I consider errors).

  2. It has a removable global interlock (enabled at startup) that prevents more than one thread from executing at a time. It's possible for a domain that does not use the thread API to run concurrently with others that do, but the interlock applies as soon as a domain enters the thread module.

Also, the logic is peppered with verbose logging of a very idiosyncratic form. I would like to remove that logic before this branch is considered eligible for merging.

Finally, worth noting: I had been using a slightly modified obsolescent version of Dune as another check on the correctness of this library, but sadly that tactic is no longer working for me because A) something gets into a deadlock trying to force a Lazy value, and B) the new Dune 2.0 releases require OCaml 4.07 or later, and multicore is still baselined on 4.06.

@jhwoodyatt
Copy link
Author

After rebasing, many of the tests, which were previously failing for reasons I suspected were caused by other problems, are now passing. There are still some tests that need attending, and I'll get to those soonish. I don't feel like there is enormous pressure to get this working now now now, but if there is some urgency I'm not sensing, then please respond.

@jhwoodyatt
Copy link
Author

jhwoodyatt commented Mar 10, 2020

And with a little bit of additional effort to make this threads library replace the obsolete ones, I'm happy to see that Dune 2.4.0 now builds with a secondary tool chain, which makes it usable here for building other packages.

Accordingly, I've now seen that this branch also builds ounit2 and successfully uses it to run all the tests in my orsetto project. Also, with a little bit of modification, the omake build tool also works fine, using this threads library. Still needs some polish, as mentioned previously, but I feel like this is a noteworthy intermediate achievement.

@avsm
Copy link
Collaborator

avsm commented Mar 16, 2020

Many thanks for the sustained efforts on this PR! We're going to review this soon as the rest of the GC is now pretty stable.

@jhwoodyatt
Copy link
Author

jhwoodyatt commented Mar 17, 2020

I'm grateful for the opportunity to make what I hope will be a noteworthy contribution. I hope. This branch is ready for preliminary review. It is not remotely close to ready for merge. I suppose if I knew how to file a "draft" pull request, this would be more correctly categorized as that.

@avsm
Copy link
Collaborator

avsm commented May 18, 2020

Now that the rebasing to 4.10 is complete in the default branch, @Engil has been working on rebasing this PR from 4.06. His working tree is at https://github.com/Engil/ocaml-multicore/tree/parallel_minor_gc_4_10%2Bdthreads (to avoid any duplication of efforts).

@abbysmal
Copy link
Collaborator

Hello,
Thank you for your work on this PR @jhwoodyatt.
As @avsm mentioned, I will be assisting with upstreaming your contribution to OCaml Multicore, starting with rebasing it to our latest development branch.

A small update on the rebase work to 4.10 parallel_minor:
The rebase is still unclean and slightly broken, but I managed to get libthread building and running the testsuite.
Minor notes: the signal.ml test, which used to not pass, now seems to reliably pass.
However the testsuite now fails on the delayintr.ml test (which was added at some point after 4.06.0), which I shall investigate in the next few days.
This is still very early observations, I will investigate these changes more properly once the rebasing work reaches an acceptable state.

@jhwoodyatt
Copy link
Author

Thank you for doing this! I'll follow the new branch. Feel free to close this one when that once that one is made authoritative for this change.

@abbysmal
Copy link
Collaborator

Closing this Pull Request, as #342 is now the authoritative PR for merging this contribution.

@abbysmal abbysmal closed this May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants