Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Systhreads in terms of domains #100

Closed
kayceesrk opened this issue Jan 20, 2017 · 44 comments
Closed

Implement Systhreads in terms of domains #100

kayceesrk opened this issue Jan 20, 2017 · 44 comments

Comments

@kayceesrk
Copy link
Contributor

The systhread implementation is currently only safe when there is a single domain. The implementation aims to be backwards compatible with vanilla OCaml. Eventually, systhreads should be deprecated, with possibly a backwards compatible implementation over domains.

@kayceesrk kayceesrk added this to the multicore-beta milestone Jan 20, 2017
@lpw25
Copy link
Contributor

lpw25 commented Jan 21, 2017

I don't think that systhreads should be deprecated. I think having multiple threads inside a single domain is the right way to implement asynchronous versions of synchronous system operations. It is probably useful in other situations as well.

Of course it should be made clear in documentation that domains are how to get parallelism. Since the name "threads" is pretty firmly connected to parallelism, it may even be necessary to rename the threads library to get the point across, but the functionality should still be available.

@lpw25
Copy link
Contributor

lpw25 commented Oct 9, 2018

I think it would be useful to have a vaguely concrete implementation
plan about how to do this, and also how to handle asynchronous effects
and preemption. So here is my attempt to outline a design that I think
will work.

The aims of this design are:

  1. Support the existing Threads API
  2. Support using "asynchronous" effects to do preemptive schedulers and
    handle things like signals
  3. Remove from OCaml any genuine asynchronous exceptions -- this makes
    reasoning about resource management much simpler/possible.

Perhaps, those aren't actually the aims we need, but they are some
things that I'm most interested in, so that's what I've started from.

Threads

The first step is to add a concept of "thread" to the multicore
runtime. These are not the current heavyweight threads tied to system
threads. They are a lightweight concept based on the existing
continuations.

A thread that is not currently running is represented by just a
continuation. Although continuations/stacks would need to be extended to
include information about which asynchronous events (e.g. signals,
alarms for preemptive schedulers) can be handled and how they should be
handled.

Unlike an ordinary continuation created when an effect is performed,
this continuation may be at a location that is not "exception-safe". By
"exception-safe" I mean a place where an effect or exception can safely
be performed/raised. As such, this continuation should never be
discontinued, it may only be continued. The continuation is always at a
point that expects () so there is essentially just a simple resume
operation on threads that takes no other arguments.

At any time a domain may have multiple threads scheduled to run. If there
are multiple threads then they are scheduled preemptively, using similar
heuristics to the current system threads implementation for when to
switch threads.

My reasoning for including another concept alongside effect
continuations is that when multiple threads are created using the
Threads API we need to run them concurrently even when there is no
user-level scheduler provided. You can imagine that there is a built-in
user-level scheduler surrounding the whole program and the threads API
performs effects that are handled by this scheduler. However, we can't
track these effects in the type system backwards compatibly so they
can't really be ordinary effects. Also the concept of "the whole
program" is a bit tricky. The behaviour of these threading effects in
places like signal handlers needs to be well specified -- especially for
the case of finalisers which are crucial for allowing "asynchronous"
effects without causing genuinely asynchronous exceptions.

So, rather than having the threads API perform effects that are handled
by a builtin user-level scheduler, we have the builtin scheduler be part
of the runtime and create an explicit concept in the runtime to
represent the units of work handled by this scheduler.

"Asynchronous" effects

In order to handle asynchronous events like signals we would provide
functions that interpreted these events as effects. Such functions
would look something like:

val handle_signal : int -> ('a -> 'b) -> 'a -[ Signal : int -> unit ]-> 'b

Calling this function would mark the current thread as able to handle the
given signal, and then execute the given function. If that signal is
received whilst running the given function, then this thread may be chosen
by the runtime to handle it. When that happens we:

  1. Pause execution of the currently running thread.
  2. Create a thread/continuation representing the computation from the
    top of the stack down to the nearest enclosing effect handler from
    the call to handle_signal (the information required for this would
    be included in the thread).
  3. Pass this thread to the enclosing effect handler along with the
    Signal effect.

Since this continuation was created when not performing any effects,
this continuation is not exception safe. As such, we need to specify
what happens if it is discontinued by the user or dropped on the floor
and garbage collected. In these cases, we essentially just hand the
thread/continuation to the builtin scheduler to be scheduled just as if
it had been created using the Threads API. When the thread finishes
it's result is simply ignored. If the thread performs an effect which is
not handled by any of the effect handlers within it, then we simply
raise the Killed exception -- which is what we do if an ordinary
effect continuation is garbage collected. This is safe since we know the
thread is now at an exception safe location.

Note that some care needs to be taken to mark the appropriate
continuations as handling a particular asynchronous event. Essentially,
a continuation should only be marked as handling a signal if it contains
the stack frame for the call to handle_signal.

Preemptive schedulers

Preemptive schedulers would be written using a function like:

val preempt : int -> ('a -> 'b) -> 'a -[ Preempted : unit -> unit ]-> 'b

which would mark the current thread/continuation to be preempted, then
run the given function. Whilst that function is running, the runtime
will interrupted at the given period (not sure what measure this should
be yet -- maybe time in ms, maybe some other measure of progress). When
the thread is interrupted the same process as happened for handle_signal
happens, only this time performing the Preempted effect.

As before, the produced continuation cannot be safely discontinued or
garbage collected, so in those cases it will continue to be executed by
the scheduler.

Thread API

The Thread API would be reimplemented on top of the new runtime
threads instead of system threads. Possibly we could use parts of the
existing implementation based on VM-threads, since what we're doing is
pretty similar.

Since Thread.create would no longer create a separate system thread,
we need to handle the case where a blocking C call is made. I believe
@stedolan already has a whole plan for how to handle this transparently
using a pool of system threads. So I defer to his plan on that point.

We would also deprecate Thread.kill, which is already a
noop in native code.

Asynchronous exceptions

OCaml currently supports a number of asynchronous exceptions (as well as
some situations where any exception can become asynchronous, but those
are basically bugs). In particular, there is Out_of_memory,
Stack_overflow, and Break.

Out_of_memory should almost certainly be a fatal_error -- and it
barely ever happens since you either hit swap space and wait for the
heat death of the universe or you get OOM killed.

Stack_overflow is notoriously unreliable and again could easily be
replaced by a fatal error. Some people use it as a lazy implementation
of searching for something up to a limit -- this is a very bad idea and
they should rewrite their code to not be crazy.

Treating Break as an exception has some use cases. People use it to
allow long running operations to be interrupted. However, this is
unsafe in general and instead they should implement a proper mechanism
to safely kill their operations.

One exception to this is the OCaml toplevel -- where the code that might
need to be interrupted is the user's code (e.g. the user accidentally
writes an infinite loop and wants to carry on with their day). I suggest
we add some unsafe mechanism, probably in Obj, to handle this
case. Something like:

val kill_continuation : (_, _) continuation -> unit

that marks a continuation as run even though it hasn't been. Then you
can use handle_signal to catch the interrupt and just kill the
returned continuation.

@kayceesrk
Copy link
Contributor Author

I have a few high-level concerns and a few comments on this specific proposal.

The motivation for Threads API is to provide a way to implement asynchronous versions of blocking system operations (such as reads from the disk). Do we necessarily need a Threads API in addition to effect handlers? It creates an additional layer in the scheduling hierarchy now Effect Handlers => Threads => Domains. Can we solve this in a way that doesn’t introduce a new scheduling layer?

This roughly solves the same problem as Safe FFI calls in GHC, but one that works with the fact that the scheduler is not part of the runtime system. Safe FFI in GHC solves two problems:

  1. How to handoff control to another system thread when the current system thread blocks on a system call.
  2. How to find the next Haskell task to run.

We need to solve (1) in almost the same way as safe FFI, with backup threads servicing a domain. When the current system thread blocks, the backup thread wakes up to do useful work. In GHC, (2) is straightforward since the thread scheduler is part of the runtime system. However, in Multicore OCaml, the scheduler is not (and shouldn’t be) part of the runtime system. How does the Threads API handle (2). I don’t think you are proposing to implement the Thread scheduler in the runtime system?

I am leaning towards the solution that we have described in TFP paper Sec 4.4 (http://kcsrk.info/papers/system_effects_feb_18.pdf) in order to solve (2). When the current task gets blocked on a synchronous system call, the same task gets taken over by the backup thread, in which Delayed effect is raised. This allows the control to get to the scheduler queue and schedule another thread. When the system call returns, we raise Completed effect to so that we can enqueue the thread back into the scheduler. There is some magic that needs to happen to properly handle the return value from the system call (and to get the type of the continuation right), but this is just detail. The default handler for Delayed is to just block the thread making the system call waiting for it to return. With the Delayed and Completed effects, you can implement Systhreads like interface on top.

Other comments

  • In order to reason about the interaction between effect handlers and threads, is it appropriate to consider thread scheduler to be the outermost handler?

  • Once we add a Threads API with its own default scheduler, we need a way to synchronise between them. Perhaps there should also be a simple MVar library to go with the Threads API such as: https://github.com/kayceesrk/effects-examples/blob/master/mvar/MVar.ml. Since these threads are effect oblivious and we imagine a top-level scheduler / MVar library, the effects Suspend and Resume (from the linked example) would also be “built-in” effects, so that they don’t get tracked in the effect system. If the scheduler for Threads live in the runtime system, then the MVar also should live there, which is undesirable.

  • What is the signal handling behaviour when the thread with the signal handler is not currently running? Should the signal be deferred until the thread with the signal handler runs next? Alternatively, the current thread may be interrupted with the signal handler and the default handler kicks in. Perhaps latter is the right behaviour since it minimises bookkeeping associated with signal handlers.

The rest makes sense to me.

@lpw25
Copy link
Contributor

lpw25 commented Oct 10, 2018

My aim with this proposal is to try and address a few issues. Part of the underlying problem is working out exactly how inter-related these issues are. I hope that by determining that it should become clear how many additional concepts are needed in the runtime. The issues I have in mind are the following:

  1. We need a backwards compatible implementation of the Thread API because there is too much code using it in the wild for us to just drop it in multicore OCaml.
  2. In addition to the existing concurrency from the Thread API, there are other parts of the existing runtime that produce concurrency: signal handlers and finalisers. So even if we drop Thread we will still be in a situation where there is not simply a single running fiber.
  3. To make things like handle_signal work for driving signal handling in a user scheduler we need to ensure that all signals are given to one of the schedulers (assuming a scheduler in each domain) rather than attempting to perform them in another thread or dropping them on the floor because the currently running thread doesn't handle them. For example, we need to make things well behaved when a signal arrives whilst we are running a finaliser.
  4. We need a clean story about what to do when a continuation created by handle_signal or preempt is discontinued or garbage collected. Ideally we want to resume that continuation until we hit some exception-safe point and then discontinue it but I think for clean semantics it should probably be an exception-safe point related to the handler which did the discontinue (i.e. when we have executed the continuation to completion or performed an effect which is handled by that handler).

In my first attempt, I essentially addressed issue 4 by treating these zombie continuations like threads started by Thread. I addressed issue 3 by having threads marked with the signals that they can handle. However, now that I've made these issues explicit I can see that I don't address issue 2, in particular I don't actually address issue 3 for finalisers and signal handlers.

Signal handlers and finalisers

I can see two possible directions for addressing issue 2.

  • We could run every signal handler and finaliser in its own thread, scheduled just like all other threads, and thus marked with the signals which they can handle.
  • We could create another concept to represent things like signal handlers and finalisers. I can't think of a good name so lets just use the word "context" -- i.e. when executing a finaliser you create a new context and run the finaliser in that. In particular, the context would be the thing which tracks which signals can be handled, and each continuation/thread would have an associated context. Note that when running a finaliser, the context of the main thread still needs to be stored in the runtime so that we can tell which signals should be delayed until the finaliser is done.

An alternative for "zombie continuations"

Thinking about these issues also suggests to me another approach for "zombie continuations" (i.e. issue 4). Rather than turning them into threads and giving them to the thread scheduler, we could treat them like signal handlers and finalisers. So they would get executed at the next minor collection in a new context. They would run until they completed. If they performed an effect that reached their outermost effect handler -- i.e. the one which turned them into a zombie -- then it would be handled by discontinue. This ensures that zombie continuations don't interfere with the original scheduler -- which is important since they are executed concurrently with it.

This approach leaves the whole threads concept as only used for implementing the Thread API, which does sound better.

Answers to specific questions

Do we necessarily need a Threads API in addition to effect handlers?

I think we need one for backwards compatibility.

I also think we need the "context" part to deal with signals properly. In my mind, I originally tied that idea closely to the "threads" concept, but now I think it is more a property of continuations in general.

Can we solve this in a way that doesn’t introduce a new scheduling layer?

I don't see how. The problem is that it is not a new scheduling layer, it is an existing scheduling layer, so it is hard to remove.

Safe FFI in GHC solves two problems:

  1. How to handoff control to another system thread when the current system thread blocks on a system call.
  2. How to find the next Haskell task to run.

We need to solve (1) in almost the same way as safe FFI, with backup threads servicing a domain. When the current system thread blocks, the backup thread wakes up to do useful work. In GHC, (2) is straightforward since the thread scheduler is part of the runtime system. However, in Multicore OCaml, the scheduler is not (and shouldn’t be) part of the runtime system. How does the Threads API handle (2). I don’t think you are proposing to implement the Thread scheduler in the runtime system?

I am leaning towards the solution that we have described in TFP paper Sec 4.4

I hadn't given this aspect much though. But how about the following scheme:

When we call an external function and it does enter_blocking_section, we pause the current thread and if there are other available threads we try to execute them -- obviously the system thread running the call is no longer available for use in other C calls. When the system thread does leave_blocking_section the thread is made available for scheduling again. This is basically an implementation of the current OCaml behaviour but using a pool of system threads rather than one per OCaml thread.

Then we add a new form of external:

external foo : int -> int -[ Delayed : int -> unit ]-> int = "foo"
[@@blocking]

When one of these externals does enter_blocking_section we do exactly as the paper suggests.

Essentially, the answer to the question of "How to find the next [OCaml] task to run." is to use the thread scheduler to find another thread for old externals and to use the Delayed mechanism with the current thread for new [@@blocking] externals.

This should allow existing code using Thread to work as expected and also allow new code to use the nicer effects based approach. I think we need some kind of transition plan like this anyway because the nice new mechanism requires a change to the type of the external.

In order to reason about the interaction between effect handlers and threads, is it appropriate to consider thread scheduler to be the outermost handler?

I think so. I think it is a handler surrounding the entire program, including all the default handlers, using some special untyped effects. Since those effects can only be handled by this outer handler we optimise their implementation by not searching our way up the stack of handlers.

What is the signal handling behaviour when the thread with the signal handler is not currently running? Should the signal be deferred until the thread with the signal handler runs next?

Yes it should be delayed, or the other thread should be resumed as the handler now. This is an instance of my issue 3 above, and it is crucial for getting predictable behaviour.

Once we add a Threads API with its own default scheduler, we need a way to synchronise between them. Perhaps there should also be a simple MVar library to go with the Threads API

Well, the Threads API already has its own synchronisation primitives. Personally, I'm not particularly interested in adding more as I think that the Threads API will essentially be deprecated by the new stuff.

@bluddy
Copy link
Contributor

bluddy commented Oct 10, 2018

Would it be possible to create an emulation layer for Thread? Wrap it in a generic effect handler, translate all synchronous calls to asynchronous ones, and place a yield effect at every allocation point, similar to the way haskell does it implicitly? This could be a special compilation mode for backwards compatibility invoked when the thread lib is requested. If any synchronous call cannot be mapped to an async one, it would fail.

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

I think that would be ridiculously inefficient. The point is to allow Threads code to keep working, that means it can't take a huge performance hit.

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

Maybe I'm not being ambitious enough. Maybe we could remove Thread support in multicore. A lot of code would be updated by updating the code in Async and Lwt. However, it would be a much more painful upgrade path than we've used for anything else. There would be no version of the OCaml runtime that supported both the old API and the new API, so things using threads will have to have different versions for before and after the change. My proposal here is essentially that we should add Threads support to OCaml multicore so that the API can be deprecated over a number of versions rather than all at once.

@kayceesrk
Copy link
Contributor Author

part 1 of N, rest to follow

  1. We need a backwards compatible implementation of the Thread API because there is too much code using it in the wild for us to just drop it in multicore OCaml.

Agreed. My original comment in the issue of dropping systhreads was on the incorrect assumption that systhreads weren't widely used. I think we should maintain the systhreads API to provide a transition path.

  1. In addition to the existing concurrency from the Thread API, there are other parts of the existing runtime that produce concurrency: signal handlers and finalisers. So even if we drop Thread we will still be in a situation where there is not simply a single running fiber.

Aren't finalisers simpler than signal handlers? Finaliser actions in trunk and also in multicore currently are run in a caml_callback_exn. Exceptions escaping the finaliser action interrupts whatever is being done currently, which isn't a good behaviour and I would like that to be a fatal error. In the same way, if we treat effects escaping finalisers to run their default actions, we would have sensible semantics for finaliser actions.

I like the context idea. If the finaliser were to be run in its own context (with the main computation's context saved in the runtime), we will have sensible semantics for finalisers. Finalisers are always run in a default context (for a start).

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

Aren't finalisers simpler than signal handlers?

Possibly, although to me they seem to have the same problems as finalisers. For clarity, I should point out that I mean the OCaml signal handlers, not the OS level ones. The OCaml runtime creates its own OS level signal handlers that capture signals and put them on a queue to be handled by the appropriate OCaml signal handler when the system is in a sensible state.

Exceptions escaping the finaliser action interrupts whatever is being done currently, which isn't a good behaviour and I would like that to be a fatal error.

100% agree. I basically consider that a bug. IIRC the same happens with exceptions from signal handlers as well.

@mshinwell
Copy link
Contributor

I was going to propose a patch to make the out of memory error, together with any exceptions escaping from finalisers, fatal errors. Maybe I should do that for 4.08.

@mshinwell
Copy link
Contributor

(Exceptions from signal handlers I guess should be treated the same as from finalisers.)

@kayceesrk
Copy link
Contributor Author

kayceesrk commented Oct 11, 2018

  1. To make things like handle_signal work for driving signal handling in a user scheduler we need to ensure that all signals are given to one of the schedulers (assuming a scheduler in each domain) rather than attempting to perform them in another thread or dropping them on the floor because the currently running thread doesn't handle them. For example, we need to make things well behaved when a signal arrives whilst we are running a finaliser.

Agreed. I prefer delivering signals to the handler rather than run in a separate context. This will allow us to implement preemptive multithreading at the very least without resorting to another concept in the RTS. Scoped signal handling also solves interruption in REPL in a sane manner. I believe that scoped signal handling has other benefits as well.

  1. We need a clean story about what to do when a continuation created by handle_signal or preempt is discontinued or garbage collected. Ideally we want to resume that continuation until we hit some exception-safe point and then discontinue it but I think for clean semantics it should probably be an exception-safe point related to the handler which did the discontinue (i.e. when we have executed the continuation to completion or performed an effect which is handled by that handler).

I agree to your solution. Reg the implementation, we can treat the continuations created during signal handling to be associated with the following finaliser:

Gc.finalise (fun k -> ignore @@ 
    try continue k () with 
    | Invalid_argument "continuation already resumed" -> ()) k 

The main observation is that in your Thread API proposal, you introduced a default user-level scheduler which can handle zombie continuations. But we already have a scheduler in the form of finaliser action queue in the runtime system. We can (ab)use this built-in scheduler for handling zombie continuations. We know that the signal handler continuations expect a unit value. If the continuation were resumed already, we ignore the failure. Of course, we need to optimise the actual implementation so that we don't run the Gc.finalise action for every signal interruption. This means that we will need to remove the finaliser attached to the continuation when the continuation is resumed by user code.

@kayceesrk
Copy link
Contributor Author

An alternative for "zombie continuations"

Ok. I just reread your comment, and it sounds like we are on the same page.

@kayceesrk
Copy link
Contributor Author

kayceesrk commented Oct 11, 2018

I was going to propose a patch to make the out of memory error, together with any exceptions escaping from finalisers, fatal errors. Maybe I should do that for 4.08.

sgtm.

(Exceptions from signal handlers I guess should be treated the same as from finalisers.)

Wouldn't this break Ctrl-C in the top-level. IIRC it is implemented by throwing an exception from Ctrl-C handler.

@mshinwell
Copy link
Contributor

Ah, yes, there is the Ctrl+C in the toplevel problem. I thought Jeremie had a solution for that, I will ask him.

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

we can treat the continuations created during signal handling to be associated with the following finaliser:

  Gc.finalise (fun k -> ignore @@ 
      try continue k () with 
      | Invalid_argument "continuation already resumed" -> ()) k 

I agree with the underlying idea here, but I think we actually need something a little more sophisticated than this. Basically, I think it is important that we replace the bottom-most handler of k -- i.e. the handler which created the zombie -- with a handler like:

match ... with
| effect _, k -> discontinue k Killed

where effect _ means handle all effects. Since we have no actual representation of this handler there will need to be some runtime support for it but, as you say, we already want some special support here for performance reasons.

@dbuenzli
Copy link
Contributor

together with any exceptions escaping from finalisers, fatal errors.
[...]
(Exceptions from signal handlers I guess should be treated the same as from finalisers.)

I think you should rather give them to a client definable trap function (whose default behaviour is to be a fatal error) along with a type that indicates the context in which it occured. This could also be used for raising functions registered with at_exit.

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

I think you should rather give them to a client definable trap function

We could have a trap function for an extra layer of safety, but note that if that trap function itself raises we really are going to need to fatal error.

@dbuenzli
Copy link
Contributor

We could have a trap function for an extra layer of safety, but note that if that trap function itself raises we really are going to need to fatal error.

Indeed. Also if you could use a well-defined high exit number for that (I'd suggest to use the highest usable one: 125), , that would be great, see this comment.

@kayceesrk
Copy link
Contributor Author

@lpw25 Given that zombie continuations created by signal handlers can be handled with finalisers++, is there a reason why the systhreads can't just be implemented on top of Domains. Systhreads are already preemptively scheduled, which means that correctly written libraries should already be thread-safe. So we can map every systhread to a domain.

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

Many of the libraries written against Threads rely on knowing the points at which a system thread may yield, so they are not thread-safe from the perspective of Domains.

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

For example, code like this:

https://github.com/janestreet/jenga/blob/114.04/tenacious/lib/ring.ml#L36

would break if Threads were implemented with Domains.

@kayceesrk
Copy link
Contributor Author

I see. Even though systhreads are preemptively scheduled [0], the context switch can only happen at GC safe points. The code that you pointed out is atomic since there are no allocations. Is that right?

We could add a master lock for the Thread library, that needs to be held for a thread to run.

[0] https://github.com/ocaml/ocaml/blob/trunk/otherlibs/systhreads/thread.ml#L61

@lpw25
Copy link
Contributor

lpw25 commented Oct 11, 2018

The code that you pointed out is atomic since there are no allocations. Is that right?

Yeah, that's right.

We could add a master lock for the Thread library, that needs to be held for a thread to run.

Something like that would probably work. You'd definitely need to do benchmarks on what the performance was like. The extra book-keeping for domains and domain creation might be problematic, although equally there might be some savings since some of the GCing could probably still be run in parallel. You might also want to try to somehow have a different master lock for Threads created by different Domains, so that you can e.g. run two separate Async schedulers in parallel whilst still preserving the expected behaviour within each scheduler.

I guess since this is probably less work than implementing the builtin scheduler it should probably be the first implementation to try and see how it goes.

@gadmm
Copy link
Contributor

gadmm commented Nov 4, 2018

I would like to try to summarise your points to be sure I understand the latest proposal regarding the removal of asynchronous exceptions and the clean dropping of threads, and propose an alternative.

Proposal so far

OCaml currently supports preemptive task killing (e.g. Sys.Break, but also any other exception raised from signal handlers the user has set up). On the other hand, resource safety requires to restrict exceptions during the acquisition and the release of resources, so we find ourselves with a distinction between exception-safe and exception-unsafe places in the code.

Previous proposal (Dolan et al. “Concurrent system programming with effect handlers”): in order to make preemptive task killing (Sys.Break) resource-safe, let us mask asynchronous exceptions during the acquisition and release of resources. This is inspired by Haskell's bracket. Concretely, if the user presses Ctrl+C, then the runtime waits to be in an exception-safe place before raising Sys.Break. Then all the user has to worry about is to enforce exception-safety with the standard idioms.

Latest proposal: continuations captured by effect handlers correspond to exception-safe places in the code; but in addition to these, a new kind of continuations are introduced, for now "threads" for want of a better name (https://www.thesaurus.com/browse/thread). Threads correspond to possibly any place in the code, and only support restarting. They are captured by asynchronous effect handlers, which can in turn express preemptive scheduling and signal handling.

Threads cannot be dropped safely, so the proposal is to give up support for resource-safe preemptive thread killing (in particular to remove Sys.Break). Instead, you propose a combination of three alternatives:

  1. Resource-safe cooperative thread killing (“they should implement a proper mechanism to safely kill their operations”)
  2. Resource-safe preemptive thread non-killing (“zombie threads”)
  3. Resource-unsafe preemptive thread killing (dangerous, we put it in the Obj module)

I agree that 1. can make a lot of sense for the scenario you are describing (interrupt a component in accordance with the program's logic). One reservation that I have is that it assumes that the components to be interrupted are written in a specific way (and the components they use, etc.). So while this gives you a clean story in specific situations, I worry that this cannot be the general story for how to interrupt a component without bringing the whole program down, for being a non-compositional solution.

In the event of 3. in production, one may as well abort the program. Whereas for 2., in order to make it possible to reason about the consequences of the thread being dropped, you propose to wait for the next associated effect, if one is performed at some point. Would it be fair to say that the case where this works as intended (because the user took this event into account), we are back to an instance of 1., and that it could already be expressed as 1. (by having the signal handler set a flag rather than dropping the thread, and polling in the effect handlers)? And otherwise (for instance if no associated effect is performed anytime soon), 2. is not much safer: from the outside there is no difference between a leak and a zombie thread that holds on to a resource.

There does not seem to be a way around the fact that you need a way to killing threads both resource-safely and non-cooperatively.

Alternative: masking

What about dropping the thread with an asynchronous exception ASAP, as a replacement for 2. and 3., and using bracket-style masking for asynchronous exceptions during acquisition and release of resource, like in Dolan et al.? Then we get to your point that asynchronous exceptions make it impossible to reason about resource safety. I would like to be sure to understand your point. I thought of two possible reasons you may have in mind.

  • Although bracket provides resource safety in the face of exceptions, there is a fear that we may become trapped with an idiom too limited in expressiveness.

  • It forces the user to think about asynchronous exceptions all the time for exception safety. Code as simple as let push x s = s.c <- x :: s.c; s.len <- s.len + 1 (stack.ml:26) would need explicit masking of asynchronous exceptions. (Here there is no allocation, but I vaguely remember some project to add more safe points, and the alternative of reasoning in terms of allocations points is brittle anyway.)

Let me know if you had more points in mind, especially lower-level obstacles I may be less familiar with.

Solution: poisoning à la Rust

For the first item, I think that resource manipulation could be improved in the short term with a library-supplied solution, if needed, so that it becomes plausible for types managing resources to offer resource-safe interfaces.

I would like to make a point about the second item. The requirement for exception safety can be different for asynchronous exceptions than for synchronous exceptions. Basic exception safety is 1) no leaks (i.e. leave the rest of the world in a consistent state), 2) the invariants of the structure are preserved. This is the most basic notion if you want to reliably use and catch synchronous exceptions in your code. However, one could decide that for asynchronous exceptions, requirement 1) is important so as not to bring the world down with us, but 2) is less important in the perspective that everybody able to see the inconsistent state is going to be disappear with us anyway. This is the principle of panics in Rust.

In the model I propose, the responsibility of not leaving accessible invalid data falls upon whoever catches an asynchronous exception. I give four examples.

  • The asynchronous exception is not caught, so it brings down the whole thread (releasing resources). Now, some mutable values may have been shared with other threads, what happens to them? Given that mutable values shared with other threads are always going to have some synchronisation device, the standard library could implement a strategy similar to poisoning in Rust (https://doc.rust-lang.org/nomicon/poisoning.html).

  • The programmer catches the asynchronous exception, in such a way that the interrupted component is isolated by design. The program may safely continue with its contingency plans.

  • The programmer needs to build a resilient program: they decide to enforce stronger requirements of exception safety for their data structures using rollback guards, masking... Then they can treat asynchronous exceptions like synchronous ones.

  • Un peu cliché for asynchronous exceptions, but the programmer may want to interrupt a pure CPU-bound computation, there are few other possiblities (this is at the intersection of the last two: isolated and strongly exception-safe). This is safe in all circumstances.

I think the new data here, to address your concern of being able to reason about asynchronous exceptions, is to consider the use of poisoning. Keeping asynchronous exceptions leaves the door open to the other three scenarios, but they are only side benefits.

One implication is that it gives to threads (I mean your new kind of continuation) a role of unit of isolation like threads in Rust. For instance, mutable values can be shared between fibers without synchronisation nor poison guards, and the default behaviour of asynchronous exceptions is to kill the whole thread, so (correctly) also the user's scheduler and all its fibers, IIUC.

Note that implementing poisoning requires to introduce a formal distinction between "serious" and "non-serious" exceptions, where all asynchronous exceptions would be "serious", so that it can be detected from the poison guard (https://github.com/rust-lang/rust/blob/master/src/libstd/sys_common/poison.rs#L46). Maybe this can be done library-side.

Aside: OOM exception

Like the relaxed safety requirement above, one should distinguish being resilient in the face of OOM and having the possiblity to clean-up. The OOM exception is a synchronous one (you cannot "wait"). To be resource-safe, one needs the acquisition of resources to be exception-safe, and that their release do not raise exceptions. To me, the only real obstacle which you mention is that it is hard to control allocations. In the long run, one could imagine using Minamide's values-with-hole to express preallocation, in the dialect needed to write correct acquisition functions (they are one of the lowest-hanging example of linear values).

I am not sure about the argument that there is swapping and overcommitting, because the user who cares can choose and setup their OS according to their needs. Furthermore, OOM does not necessarily mean that one is running out of memory, only that one is trying to allocate something too big (or a few other conditions)—and Linux by default does not always overcommits but rejects allocations that are obviously too big. For instance, would the proposal affect the behaviour of Array.make 10000000000 0?

It is worth noting that Rust went the other direction and decided to support panic on OOM (for reference see https://github.com/rust-lang/rfcs/blob/master/text/2116-alloc-me-maybe.md, but the various discussions around OOM behaviour in Rust were at least as interesting as the RFC itself).

@gadmm
Copy link
Contributor

gadmm commented Nov 5, 2018

In addition to poisoning, OCaml lets you easily implement synchronisation devices with other strategies, such as having your mutex roll back the value.

@DemiMarie
Copy link
Contributor

Out_of_memory should almost certainly be a fatal_error -- and it
barely ever happens since you either hit swap space and wait for the
heat death of the universe or you get OOM killed.

I strongly disagree with this. Turning Out_of_memory into a fatal error would actually be a breaking change. Libvirt, in particular, depends on recovering from it, and Mirage may also. So would anyone using OCaml in a kernel-mode driver or a critical system process (such as PID 1), where a crash of OCaml means the entire machine goes down.

@gadmm
Copy link
Contributor

gadmm commented Nov 17, 2018

@DemiMarie For OOM specifically, there is a more advanced discussion on the short-term changes there: ocaml/ocaml#1926 (thanks to Gabriel for referring me to it). Perhaps could you contribute your use-case and tell us if the current proposal suits it?

@ejgallego
Copy link

Treating Break as an exception has some use cases. People use it to
allow long running operations to be interrupted. However, this is
unsafe in general and instead they should implement a proper mechanism
to safely kill their operations.

Maybe I am confused here, but what would the alternative be in today's OCaml?

@gadmm
Copy link
Contributor

gadmm commented Feb 26, 2019

Leo means that the operation itself could periodically check for some condition and, say, raise an exception if met, as far as I understand. This ensures that the exception only happen in expected contexts.

@ejgallego
Copy link

Leo means that the operation itself could periodically check for some condition and, say, raise an exception if met, as far as I understand. This ensures that the exception only happen in expected contexts.

I see, thanks; however this is very painful to do for computation intensive code that may not have a predictable path, or a continuously long one.

@jhwoodyatt
Copy link

In related news, I have written a sufficiently close approximation of the Thread, Mutex and Condition modules in terms of Domain and Atomic that Dune 1.8.2 can build and run correctly with it. With some refinements, I think it could be made into a workable substitute for the existing thread library.

The basic theory I used is different from the proposals above. My design comprises the following ideas:

  • Mutex.t is an adaptive mutex comprising a spin lock, some state, and a queue of excluded domains in critical sections awaiting notification by other domains calling Mutex.unlock.
  • Condition.t is a spin lock and a queue of waiting domains in critical sections awaiting notification by other domains calling Condition.signal and Condition.broadcast.
  • Thread.t is a more complicated record and some global state protected by a singleton mutex.
  • Each thread is its own domain, i.e. a thread identifier is a domain identifier.
  • Not every domain is a thread. All domains spawned in Thread.create are threads, and other domains are only promoted into threads while holding ownership of a mutex.
  • The invariant in the unicore OCaml system that no more than a single thread may run in parallel would be respected by using a global mutex held by whichever domain corresponds to the currently running thread. Accordingly, when a thread would awaken after their domain leaves the critical section in Condition.wait or Mutex.lock, its domain could block again on the global thread interlock mutex while another thread would run. The synchronous I/O functions would all temporarily give up the global thread interlock, allowing other ready threads to run while they are blocked on a synchronous I/O function.
  • Would not define any new effects.
  • Thread.kill could be implemented by acquiring the spin lock in the thread record and setting a "killed" flag, notifying its domain if it's currently blocked, and when the thread awakens, it would check the killed flag and throw a private exception rather than proceed normally.

I don't think this method would necessarily offer great performance, but I believe it could be made to correctly simulate the behavior of the thread library in Unicore OCaml.

@lpw25
Copy link
Contributor

lpw25 commented Mar 21, 2019

Thread.kill could be implemented by acquiring the spin lock in the thread record and setting a "killed" flag, notifying its domain if it's currently blocked, and when the thread awakens, it would check the killed flag and throw a private exception rather than proceed normally.

Note that Thread.kill currently just raises an Invalid_arg exception, and it should probably stay that way rather than be given an actual implementation.

@jhwoodyatt
Copy link

Still working on this in my copious spare time.

@jhwoodyatt
Copy link

jhwoodyatt commented Mar 30, 2019

Ugh. So there are multiple things blocking me now.

  1. The fileio test calls Unix.system which is implemented in terms of Unix.fork so that fails just because my tactic of using a domain per thread means the fork isn't allowed. I posted that other patch that reimplements all those functions without using Unix.fork, but who knows what will ever become of that.
  2. The pr5325 test is deadlocking. Not sure why. Investigation reveals that something somewhere inside the multicore platform does not like Domain.Sync.wait_until when there is only one other domain and it's blocked in a system call. It can decide for reasons I don't fully understand that it needs to go do a GC major slice, and that blocks on the platform lock somehow.

I'm gonna post my unfinished branch to my account and go do other things for a while.

@kayceesrk
Copy link
Contributor Author

kayceesrk commented Apr 11, 2019

The fileio test calls Unix.system which is implemented in terms of Unix.fork so that fails just because my tactic of using a domain per thread means the fork isn't allowed. I posted that other patch that reimplements all those functions without using Unix.fork, but who knows what will ever become of that.

Apologies for the delay in looking at this @jhwoodyatt. Have you opened a PR for this in ocaml/ocaml?

I'm gonna post my unfinished branch to my account and go do other things for a while.

I can take a shot at getting this working.

@jhwoodyatt
Copy link

Have you opened a PR for this in ocaml/ocaml?

Alas, no I have not. I'm actually not sure the mainline OCaml committers view this is as an upstream problem, and I was hoping that authoritative voices in the multicore project would champion this among them.

@kayceesrk
Copy link
Contributor Author

Multicore is certainly an upstream problem. See the PRs tagged with label multicore-prerequisite https://github.com/ocaml/ocaml/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3Amulticore-prerequisite+. The use of Unix.fork is certainly a problem. If the fix is reasonable, I can certainly champion it and so would others.

@jhwoodyatt
Copy link

jhwoodyatt commented Apr 17, 2019

If the fix is reasonable, I can certainly champion it and so would others.

My branch is currently based on the multicore master branch, and it's medium sized I would say.

It would not be difficult to rebase it onto the mainline OCaml trunk. If you think it would help to have such a rebase, I could probably find time in the next few days to make one. I'd also be willing to be responsive to review comments and make updates accordingly if the pull request had an authoritative champion.

Whether the change is "reasonable" is I think a matter of some concern. Some of the logic around Unix.execvpe is a bit hairy, and I made some choices that I feel are probably open to challenge.

@kayceesrk
Copy link
Contributor Author

kayceesrk commented Apr 18, 2019 via email

@jhwoodyatt
Copy link

Ok that's useful. Where is your private branch?

Under the link I keep making for the word "branch" in my comments above. I'll make a pull request later this weekend. On a related note, my current work in progress for replacing the Threads library with a pure implementation over domains, when combined with the branch for the Unix.fork problem, is successfully compiling and running stock Dune on Mac OS X at least. I'll polish up them both and make pull requests for discussion purposes.

@jhwoodyatt
Copy link

I have filed #240 as a candidate for addressing this issue.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@kayceesrk
Copy link
Contributor Author

Multicore now supports systhreads. Removing the stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Domains-only Multicore OCaml
  
To do : Systhreads
Development

No branches or pull requests

9 participants