## MPI Messaging and Deadlock

### Point-to-Point Messaging

* This is the fundamental operation in MPI
* Send a message from one process to another
  * Blocking I/O
  * Blocking provides built in synchronization
  * Blocking can lead to deadlock
* Send and receive, let's do an example in `nodeadlock.c`
  * this program passes a "token" around a ring of processes
  * ring defined by the program
  * **NOTE** this is a flawed program.  The correct implementation is `passitforward.c`
  
<img src="https://upload.wikimedia.org/wikipedia/commons/3/36/MPI_Ring_topology.png" width=312 />

  
* What's in a message?
  * first three arguments specify content
  * then location (receive from or send to)
  * message metadata
  * and a "communicator" which is a virtual network
  
```c
int MPI_Send ( 
  	void* sendbuf, 
	int count, 
	MPI_Datatype datatype,
	int dest,
	int tag,
	MPI_Comm comm )
    
int MPI_Recv ( 
  	void* recvbuf, 
	int count, 
	MPI_Datatype datatype,
	int source,
	. . . )
```

* All MPI data are arrays
  * Where is it? `void *`
  * How many? `count`
  * What type? `MPI_Datatype`


### Deadlock

In concurrent computing, a deadlock is a state in which each member of a group is waiting for another member, including itself, to take action, such as sending a message or more commonly releasing a lock."

* Conditions for deadlock
  * Mutual exclusion
  * Hold and wait
  * No preemption
  * Circular wait
  
The simplest deadlock occurs with two processes and 2 resources.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/28/Process_deadlock.svg/440px-Process_deadlock.svg.png" width=256 />

<img src="https://upload.wikimedia.org/wikipedia/commons/d/df/Two_processes%2C_two_resources.gif" width=256 />


Note that in the gif, the deadlock is resolved by restarting the top process.  This violates the "hold and wait" principle.

In MPI, programmers design their own messaging protocols.  We note that:
  * MPI messaging is synchronous and blocking
  * i.e. the caller waits on the message to be delivered prior to returning
So why didn't nodeadlock.c produce a deadlock?

Examples:
* `deadlock.c` shows the flaw in `nodeadlock.c`
  * forces the send to be synchronous.
* `passitforward.c` correct implementation with _paired send and receive_

### ON `MPI_Send`

* MPI has the option to buffer message when there is memory available
   * So `MPI_Send` can return immediately
* MPI is not required to buffer messages
    * and does not do so when memory is limited
* This leads to horrible errors
  * Because semantics/correctness change based on job configuration.
  * You develop program on small cluster
      * Has plenty of memory for small instances
      * Messages get buffered which hides unsafe (deadlock) messaging protocol
  * You launch code on big cluster with big instance
      * More memory consumption means that MPI canâ€™t buffer messages
      * Your code deadlocks

_Best practice_: test messaging protocols with synchronous sends `MPI_Ssend`, deploy code with `MPI_Send`.

## Messaging Topologies

There are many strategies for breaking deadlock.

### One Receiver

For linear orderings and rings
* Simplest and sufficient: (n-1) send/receive, 1 receive/send
* Correct for any connected topology, any number of nodes
```c
// Example for a ring
next =  ( ID + 1 ) % num_procs;
prev = ID == 0 ? num_procs -1 : ID-1; 
if ( ID==0 )
    receive(source=prev);
    send(target=next);
} else  {  
    send(target=next);
    receive(source=prev);
}
```
This discipline is used by the `MPI_send_receive()` call.  It's always correct, but creates a serial dependency among messages.  The following chart shows the outcome of send calls (yellow) and receive calls (blue)

<img src="http://pages.tacc.utexas.edu/~eijkhout/pcse/html/graphics/linear-serial.jpg" width="400" />

### Paired Sends and Receives

Order/pair sends and receives to avoid deadlocks (see `passitforward.c`)
 * More parallel alternative
 * Break two-way communication into two phases in which half of the nodes send in phase one and receive in phase two and vice version

<img src="./images/pairedsr.png" width=512 />

#### More complex communication topologies?

How would we design a messaging discipline for a continuous/cyclic 2-d grid
of processes (as in our Game of Life assignment)?
  * That want to send to up/down/left/right neighbors?
  * That want to send to diagonal neighbors also?