Skip to content

Remote (Distributed) Channels

Tim Watson edited this page Dec 2, 2018 · 5 revisions

Location transparency is a double edged sword. On the one hand, we wish to not have to deal with the intricacies of remote calls - how do we serialise data, how do we connect to the network, how do we deal with network failures, etc etc. On the other hand, if we don't know that an interaction might travel over the network, get delayed due to congestion, find the recipient is no longer running (or in a state where it is able to handle our inputs), and so on - how can we build reliability into our systems? Erlang offers to deal with all these issues for us, and when it works it is great to not have to worry about all the difficult details. And when it doesn't work, it's difficult to understand what went wrong, since we're used to the platform, the framework, the language, all taking care of things for us.

In Cloud Haskell, real location transparency means observing constraints for the remote case even in local cases. Erlang has no concept of data that cannot be serialised (actually that's not strictly true, but even passing handles that would be of no use on a remote node will not actually fail - the data will travel over the wire, but not be usable, or will lead to a crash on one side or the other). In Cloud Haskell, this also means taking a huge performance hit for local message passing as well. In an early version of distributed-process we created an API that allowed you to send locally and not serialise messages unnecessarily, which improved performance massively. We can avoid all these issues entirely for local channels, therefore I propose two separate APIs, one for local and one for remote sending. The remote case can enforce Serialisable m => Channel m, whilst the local case doesn't have to worry about this constraint.

There are a couple of different API options here, too. One would be to create a broad set of type classes:

class ReceivePort c where
  readChannel  :: forall a m . (Typeable a, MonadIO m) => c a -> m a

class SendPort c where
  writeChannel :: forall a m . (Typeable a, MonadIO m) => c a -> a -> m ()

class CheckedReceivePort c where
  readCChan  :: forall a m . (Typeable a, MonadIO m) => 
                c a -> m (Either ChannelException (m a))

class CheckedSendPort c where
  writeCChan :: forall a m . (Typeable a, MonadIO m) => 
                c a -> a -> m (Either ChannelException ())

class RemoteSendPort c where
  writeRemote :: forall a m . (Serializable a, MonadIO m) => 
                 c a -> a -> m ()

Note that we don't need to fuss about constraints in the local case for read.

Another options would be to create two distinct data types, one for local and one for remote channels, and provide different APIs for working with them. The advantage of this scheme is that we can bake common concerns into the channel handling framework and store relevant data in these handles/references so as not to burden each channel implementation. Indeed, those data types might make use of the type classes:

data TypedChannel a =
    forall r s . ( ReceivePort r, CheckedReceivePort r
                 , SendPort s, CheckedSendPort s) =>
    TypedChannel { channelLocalId  :: Integer
                 , channelMonitors :: WeakRef (TMVar ThreadId)
                 , receivePort :: r a
                 , sendPort :: s a
                 }
  | forall s r . (ReceivePort r, CheckedReceivePort r, RemoteSendPort s) =>
    RemoteChannel { channelLocalId  :: Integer
                  , channelMonitors :: WeakRef (TMVar ThreadId)
                  , receivePort :: r a
                  , remoteSendPort :: s a
                  }
  • Exercise: Since we do want distribution, what separates channels with remote endpoints from local ones?
  • Exercise: What would it mean to compose local and remote channels?