## Parallel and Distributed Simulaion

Simulus supports parallel discrete-event simulation via concurrent execution of multiple simulators. These simulators can be created and run simultaneously on different processors or cores on the same machine or on different interconnected machines in a cluster. To distinguish the two cases, we sometimes call the former "parallel simulation", and the latter "distributed simulation", although these terms have never been standardized. People have been using "parallel simulation" to refer to both cases. 

To understand how simulus handles parallel and distributed simulation, we first introduce the concept of synchronized group.

### Synchronized Group

A synchronized group is a group of simulators whose simulation clocks will advance synchronously. That is, although each simulator still processes events on its own event list according to the timestamp ordering, the simulators in a synchronized group will advance their simulation clock in a coordinated fashion such that no one simulator will get too far ahead in simulation time from the rest of the simulators in the group. 

Here, some synchronization or coordination must take place so as to guarantee that an event generated from one simulator destined for another simulator in the synchronized group would not arrive in the simulated past. This would happen, however, if the other simulator's clock gets too far ahead into the simulated future. 

In parallel discrete-event simulation terminology, this is called a causality error. Causality error should never happen in simulation as long as we are modeling a world without time traveling. In parallel simulation, one has to contemplate the possibility of causality errors as simulators may executed on separate processors or cores, or even on separate machines. To solve the problem, one could either constrain the time advancement of the simulators to prevent causality errors from happening, or somehow rollback the simulators to an earlier state to correct the causality errors. 

The former is called "conservative simulation". The latter is called "optimistic simulation". Simulus implements the synchronized group using a conservative simulation approach. In this case, a synchronized group is created with a "lookahead", which dictates how far a simulator can advance its simulation clock ahead of the rest of the simulators in the group.

Let's look at an example. First, let's create a couple of simulators:

In [1]:
import random, simulus
random.seed(13579)

def p(sim, mean_intv):
    while True:
        sim.sleep(random.expovariate(1/mean_intv))
        print("'%s' gets to %g" % (sim.name, sim.now))
        
sim1 = simulus.simulator('sim1')
sim1.process(p, sim1, 1)

sim2 = simulus.simulator('sim2')
sim2.process(p, sim2, 0.5)

sim1.run(5)
sim2.run(2)

'sim1' gets to 0.315117
'sim1' gets to 0.498901
'sim1' gets to 1.78212
'sim1' gets to 2.46515
'sim1' gets to 4.82285
'sim2' gets to 0.414588
'sim2' gets to 0.69689
'sim2' gets to 1.67856
'sim2' gets to 1.72484


The two simulators, `sim1` and `sim2`, each runs one process starting from the same function `p()`, which takes two arguments: the simulator instance and the mean sleep interval. The process simply sleeps for some random time, which is exponentially distributed with the given mean, prints out a message, and then repeats.

We run both simulators separately, one to time 5 and the other to time 2. As expected, both simulators advance their simulation time independently. Their simulation clocks are not synchronized or coordinated.

In [2]:
sim1.now, sim2.now

(5, 2)

We now create a synchronized group to include both simulators above. This can be done by calling `sync()` with either the names of the simulators or the simulator instances. `sync()` will first bring all simulators in the group to synchrony by advancing the simulation time of the simulators in the group independently to the maximum simulation time among all simulators (that's 5, in the above case). 

In [3]:
# this cell can be run only once, because each simulator
# can belong to only one synchronized group and the 
# membership is immutable; one has to restart the 
# notebook's kernel in order to to run this cell again
g = simulus.sync([sim1, 'sim2'], lookahead=2)

'sim2' gets to 3.4852
'sim2' gets to 3.672
'sim2' gets to 4.11777


We see that `sim2` gets to advance its simulation from time 2 to 5. Both simulators' current simulation time should be 5 by now. The find the current simulation time, we can either inspect each simulator's `now` variable, or using eh the `now` variable provided by the synchronized group. 

In [4]:
sim1.now, sim2.now, g.now

(5, 5, 5)

From now on, all the simulators are bound to the synchronized group. That is, their simulation time will be advanced synchronously. When we call `sync()`, we have specified the lookahead to be 2. That is, the simulation clock of one simulator will never gets ahead of the other by 
more than 2 in simulation time. (Later we will see that we don't need to specify the lookahead at all; simulus can calculate the lookahead automatically).

We can now run all simulators together using the `run()` method of the synchronized group. The method is similar to the simulator's `run()` method. The user can specify either 'offset' or 'until' (but not both). Each simulator will process their events in timestamp order up to the given time, and yet the simulators do not get too far ahead of the others as dictated by the lookahead. Simulus handles the synchronization, so that when messages are sent between the simulators, they may not produce causality errors.

In [5]:
# 5 is the offset (as a positional argument)
g.run(5)

'sim1' gets to 5.20161
'sim1' gets to 5.2917
'sim1' gets to 5.77787
'sim1' gets to 5.79523
'sim2' gets to 5.84559
'sim2' gets to 5.90297
'sim2' gets to 6.00801
'sim2' gets to 6.09771
'sim2' gets to 6.7076
'sim2' gets to 6.7555
'sim1' gets to 7.84578
'sim1' gets to 8.35717
'sim1' gets to 8.55197
'sim1' gets to 8.76079
'sim2' gets to 7.53865
'sim2' gets to 8.68613
'sim2' gets to 9.35494


We see that both simulators gets to run from time 5 to time 10. But since the lookahead is 2, the simulators don't gets too far ahead of the others. And eventually both simulators reach time 10.

In [6]:
sim1.now, sim2.now, g.now

(10, 10, 10)

### Communication among Simulators

There's no obviously advantage for having multiple simulators in a synchronized group unless we want to have them participate in a large model and have them each simulate a component of the large model. For example, in a simulation of a computer system, we could have one or more simulators to model the CPUs or cores, one to model the memory, one for each I/O devices, and so on. To be part of a large model, the simulators need to communicate by sending timestamped messages to each other.

Simulus facilitates communication between simulators through mailboxes. Recall a mailbox in simulus is a facility for message passing between processes or functions. A mailbox consists of one or more compartments or partitions. A sender can send a message to one of the partitions of a mailbox with a specified delay. The message will be delivered to the designated mailbox at the expected time and will be stored in the designated partitions until a receiver retrieves them and removes them from the mailbox.

Simulators in a synchronized group can send messages to named mailboxes that belong to other simulators. A message in simulus takes a broader meaning: a message could be any Python object as long as it's pickle-able. This is because, as we will see, the simulators can potentially run in different processes (with different Python interpreters) on separate machines. Simulus depends on the 'pickle' module to serialize and deserialize the Python objects.

In the following example, we create two simulators each with a mailbox. A ping-pong message will be sent back and forth between the two simulators.

In [5]:
# %load "../examples/advanced/pingpong.py"
import simulus

def p(sim, mbox, mbname):
    while True:
        msg = mbox.recv(isall=False)
        print("%g: '%s' rcvd msg '%s'" % (sim.now, sim.name, msg))
        sim.sync().send(sim, mbname, 'pong' if msg=='ping' else 'ping')
        
sim1 = simulus.simulator('sim1')
mb1 = sim1.mailbox('mb1', 1)
sim1.process(p, sim1, mb1, 'mb2')

sim2 = simulus.simulator('sim2')
mb2 = sim2.mailbox('mb2', 1)
sim2.process(p, sim2, mb2, 'mb1')

mb1.send('ping') # send initial message to start ping-ponging

g = simulus.sync([sim1, sim2])
g.run(10)


1: 'sim1' rcvd msg 'ping'
2: 'sim2' rcvd msg 'pong'
3: 'sim1' rcvd msg 'ping'
4: 'sim2' rcvd msg 'pong'
5: 'sim1' rcvd msg 'ping'
6: 'sim2' rcvd msg 'pong'
7: 'sim1' rcvd msg 'ping'
8: 'sim2' rcvd msg 'pong'
9: 'sim1' rcvd msg 'ping'
10: 'sim2' rcvd msg 'pong'


In the above example, each simulator creates a mailbox with a distinct name. Each mailbox also defines a 'min_delay', which, as the name suggests, is the minimum delay for which messages are expected to be delivered to the mailbox. Simulus uses the min_delay of all the named mailboxes of the simulators to calculate the lookahead for the synchronized group. 

It is required that the min_delay of all named mailboxes for the simulators be strictly positive. The overhead of parallel simulation is directly related to the size of the lookahead. In general, a larger min_delay is always preferable. A larger min_delay for the mailboxes would mean a larger lookahead; and a larger lookahead would entail less synchronization overhead and therefore better performance.

The simulators communicate by sending messages using the `send()` method of the synchronized group to which they belong. (A simulator can find its synchronized group using the `sync()` method.) The `send()` method of the synchronized group takes at least three arguments: 'sim' is the simulator from which the message will be sent; 'name' is the name of the mailbox to which the message is expected to be delivered (the mailbox has to belong to one of the simulators in the group), 'msg' is the message itself, which can be any Python object as long as it's pickle-able (a message cannot be None). Optionally, one can specify the 'delay' of the message. If it is ignored, the delay will be set to be the min_delay of the mailbox; if it is set, the delay value must not be smaller than the min_delay of the mailbox. One can also specify the parameter 'part', which is the partition number of the mailbox to which the message will be delivered; the default is zero.

In the example, the process at each simulator directly uses the `recv()` method of the mailbox to receive the messages. This method must be called within a process context (in a starting function of a process or at a function called directly or indirectly from the starting function). If the mailbox partition is empty when the call is made, the process will be put on hold until a message arrives. When this method returns, at least one message will be retrieved from the mailbox. It will return a list containing all the messages currently stored at the mailbox partition, if 'isall' is True (by default). If 'isall' is False, this method returns only the first arrived message (not wrapped in a list).

### Parallel Simulation on Shared-Memory Multiprocessors

A synchronized group of simulators can run sequentially (as shown in the previous section) or in parallel. In the latter case, they can run on the shared-memory multiprocessors, or on distributed-memory machines in a cluster, or a combination of both. 

Most computers today are shared-memory multiprocessors. A computer with multiple CPUs or cores can support parallel execution of multiple processes. For example, my laptop has one CPU with two cores. With hyper-threading enabled, the machine in theory has four "processors" for parallel execution. 

To take advantage of parallel simulation, we need to set `enable_smp` to be True when we create the synchronized group. In that case, simulus will automatically fork separate processes to run the simulators in the group. By default, simulus will use as many processes as the number of "processors" to run the simulators (4 on my laptop). If there are more simulators in the synchronized group than the processors, the simulators are divided among the processors and each process may run multiple simulators. If the number of simulators is less than the number of processors, each simulator will run as a separate process and each process will run on a different processor.

In the following example, we run the ping-pong example on shared-memory multiprocessors. The only difference here from the previous example is that we set `enable_smp` when calling `sync()`.