Skip to content

The Actor Model

Tim Watson edited this page Dec 2, 2018 · 2 revisions

Discussion

Our implementation of the actor model should facilitate using actors in the general case, and be highly useful for building the distribution mechanism and protocol. Said distribution mechanism(s) should also be useful for facilitating typed channels.

There is actually a weird triangular dependency we need to avoid between the subsystems. Actors ought to really use typed channels as their core communication strategy, then actors would make the most sense as a means to implement distributed typed channels, which require a node and carrier subsystem. In turn, these distributed channels (which utilise a node and carrier that are built using actors), provide the backbone for the actors to communicate with one another across remote boundaries. And of course, we'll need specialised channels between nodes for communicating actor related info (such as monitor signals).

Key points...

  • our actor system should work locally, without needing a network-transport to make a solitary node work
  • remoting (i.e. running as a distributed system) with other nodes should be an add-on layer
  • actors are a useful model for networking code, so we should use to build remoting/distribution support
  • our remoting/distribution mechanism should work for (remote) typed channels without forcing us to use actors
  • thus we may use actors to implement remoting for typed channels, but users of channels aren't forced to use them

Our actor system should follow the same formal semantics that distributed-process currently observes. It might be wise, however, to implement the lower level constructs (such as sending and receiving) according to a subset of the formal semantics, and layer additional features on top. What I mean by this is to break up the formal semantics, and implement it in separate layers, such that sending, receiving, monitoring and linking are all covered by the core, but things like name registration are not considered core and can be handled higher up the stack.

Let's take some examples here from akka...

  • akka implements supervision strategies for all actors
  • makes all actors children of some actor or another
  • makes all actors addressable through a URI scheme, akka.{protocol}.{addr}//{parent}/{child}/{etc}
  • allows you to select an actor reference by passing the URI
  • since all akka actors are registered at a URI on creation, races surrounding name registration are avoided

Frankly this seems like a preferable scheme to Erlang's use of the process/name registry, and simplifies name/id management quite a lot. The URI scheme also removes awkward complexities surrounding global name registration, and simplifies routing. Taking register and nsend as a semantic layer separate from send and receive would allow us to consider offering both options.

Built in supervision

Monitoring, supervision, and restarts are complex issues. The formal semantics consider monitoring and unidirectional linking. I believe the monitoring and linking primitives should remain as they are, however there are some things for us to consider.

Again, to look at akka as an alternative actor implementation. To borrow from their documentation:

Akka implements a specific form called “parental supervision”. Actors can only be created by other actors—where the top-level actor is provided by the library—and each created actor is supervised by its parent. This restriction makes the formation of actor supervision hierarchies implicit and encourages sound design decisions. It should be noted that this also guarantees that actors cannot be orphaned or attached to supervisors from the outside, which might otherwise catch them unawares. In addition, this yields a natural and clean shutdown procedure for (sub-trees of) actor applications.

Since Akka inserts guardian actors at the / root, /user, and /sys elements in its implicit supervision tree, it is not only able to standardise the shutdown procedure of system processes, but can also protect system processes from outside interference. In Cloud Haskell, anyone can send messages to system processes like the logger, tracer, on so on. Especially if we consider spinning up actors to handle network connections, we really need to make sure that it's not possible to dispatch unwanted traffic to them from /user/.. land.

Another important note viz Akka's supervision approach, is that its actors have a special (separate) mailbox into which termination control messages are placed. Quite how this works at runtime isn't clear to me, but since akka runs on the JVM, which has no green threads, it schedules its actors to be run by a configurable executor (such as the thread pool executor). This approach is pointless in a language with green threads, and since I doubt we will have much luck trying to be more efficient than GHC about scheduling, I personally think that's the end of the debate. It seems, therefore, that asynchronous exceptions are the more sensible route for us to continue using for actor termination.

So do we want to copy akka?

Well... I think Erlang's semantics for working with actors is somewhat simpler, and potentially easier to reason about. However, the kind of internal infrastructure we'd need to be able to duplicate some of akka's features would fit quite nicely.

Let's make some assumptions and see if this all fits...

  1. In the local case, actors communicate by sending data to one another asynchronously

Currently, whilst this is true, all local sending happens by placing messages in a Control.Concurrent.Chan, which is read by the node controller thread, which in turn looks up the target process' CQueue and dispatches the message by writing to the STM TChan inbound queue. Could the local call site for send potentially block, since Chan is implemented in terms of MVars? I'm not sure, however I would've though there's an alternative approach...

Since we only have to guarantee ordering between any two processes (the sender and the receiver), we could bake the routing tables into the actor's hidden internal state and skip the node controller altogether for local sending. Currently the LocalProcess record holds the LocalNode and in turn, the LocalNodeState. However... I do not think we need to worry about that anyway. Since address information can only be written by the node controller (or it's subsystem, if we choose to break it up), we can protect writes (and synchronise the addressing subsystem as a whole, if necessary) using an MVar. Reads, however, could take place without acquiring the lock, since they're not modifying the data at all.

The semantics make no mention of what happens if one tries to send to a process using a name, before the process is properly registered. It seems obvious that this will silently fail. Indeed, in the current implementation (and in Erlang) you would avoid this by sending to a ProcessId, and it is not possible for this object to escape the node controller's scope and be visible to the outside world until after spawn has completed. Since this is the case today, and spawn updates the routing tables before it starts running the new actor - and certainly before handing the ProcessId back to the initial call site - it seems reasonable to me that we do not need to synchronise on reads.

Given the above, we could store the local address tables in an IORef and access these in read-only fashion from the sender's threads, and protect any writes to the said IORef with an MVar. In fact, since we currently only know of one node controller thread, atomicModifyIORef' might be sufficient, with not explicit locking.

This might not work for the process registry as it stands today - two processes could race to write their ProcessId against "Name1" - but that's a separate discussion anyway, since the proc-reg is not core semantics as far as I'm concerned.

Given this mechanism then, we could see very fast writes between local intra-node peers. They would simply evaluate

  1. ask for my local state, which holds a WeakRef (IORef Routes)
  2. deRefWeak and make sure the reference is still valid - if it's not, the node is presumably screwed...
  3. readIORef (or atomicModifyIORef?) and grab the routing tables
  4. routeForName "//user/target/actor/name" (where //user simply ignores the URI prefix)
  5. ALTERNATIVELY routeForPid (p :: ProcessId)
  6. the result of (4) or (5) should be either Nothing or Just (WeakRef InputChannel)
  7. deRefWeak and if it's a valid reference, then atomically $ writeTChan to place the message in the target mailbox

The type/definition of InputChannel is up for discussion elsewhere on this wiki...

Someone please pick this apart... @qnikst?