############# Markdown note ##################

<div class="alert alert-block alert-info"> <b>NOTE</b> Use blue boxes for Tips and notes. </div>

<div class="alert alert-block alert-success"> Use green boxes sparingly, and only for some specific purpose that the other boxes can't cover. For example, if you have a lot of related content to link to, maybe you decide to use green boxes for related links from each section of a notebook. </div>

<div class="alert alert-block alert-warning"> Use yellow boxes for examples that are not inside code cells, or use for mathematical formulas if needed. </div>

<div class="alert alert-block alert-danger"> In general, just avoid the red boxes. </div>

<img src="<path>" width=20% style="margin-left:auto; margin-right:auto">
<img src="<path>" width=40% style="float: right;">  

In [None]:
%%sh

# reset all programs
rm -rf debug*

# MPI Communications

Communications with **Message Passing Interface** (MPI)

## Standard Point-to-point communications

Basic communication method provided by MPI library - communication between 2 processes.

* Source process `A` sends a **message** to destination process `B`, `B` then receives the message from `A`;
* Communication take places within a **communicator**;
* The processes are identified by their **rank** in the communicator.

<img src="./Images/pointToPoint.png" width=40% style="margin-left:auto; margin-right:auto">

## Message

Composed by a **buffer** and an **envelope**.
 
* Data is exchanged in the buffer, an array of count elements of some particular **MPI data type**;
* The envelope identifies the message. A message could be exchanged **only if** the sender and receiver specify the correct envelope.

<img src="./Images/message.png" width=60% style="margin-left:auto; margin-right:auto">

## DataTypes

MPI Data types can be:
* Basic types
* Derived types (`MPI_Type_xxx` functions)

<div class="alert alert-block alert-info"> <b>NOTE</b>: a derived type can be built up from basic types. </div>

MPI defines **handles** to allow programmers to refer to data types and structures

<div class="alert alert-block alert-warning"> C/C++ handles are macros to structs (<code>#define MPI_INT</code> …) </div>

### Blocking Send and Recv

Calls used to send and receive a simple message.

* `MPI_Send`: see https://www-lb.open-mpi.org/doc/v4.1/man3/MPI_Send.3.php
* `MPI_Recv`: see https://www-lb.open-mpi.org/doc/v4.1/man3/MPI_Recv.3.php

In [None]:
%%writefile main_send_recv.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int process_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &process_rank);
    
    int tag = 10;
    float a[2] = { 1.2, 3.4 };
    float b[2] = { 0.0, 0.0 };
    MPI_Status status;
    
    if (process_rank == 0)
        MPI_Send(&a, 2, MPI_FLOAT, 1, tag, MPI_COMM_WORLD);
    else if (process_rank == 1)
        MPI_Recv(&b, 2, MPI_FLOAT, 0, tag, MPI_COMM_WORLD, &status);

    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "Status SOURCE: "<< status.MPI_SOURCE<< " ";
    std::cout<< "TAG: "<< status.MPI_TAG<< " ";
    std::cout<< "ERROR: "<< status.MPI_ERROR<< std::endl;
    
    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "a "<< a[0]<< ", "<< a[1]<< " ";
    std::cout<< "b "<< b[0]<< ", "<< b[1]<< std::endl; 
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_send_recv
cd debug_send_recv
cmake -DSOURCES="main_send_recv.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_send_recv
mpirun -np 2 2_MPI_Communications

<div class="alert alert-block alert-warning"> <b>NOTE</b>: the use of the IF statements - remember each task runs exactly the same program. </div>

### C/C++ MPI Data Types

<img src="./Images/types.png" width=90% style="margin-left:auto; margin-right:auto">

DataTypes can be created with different MPI routines, for example:

* `MPI_Pack`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Pack.3.php
* `MPI_Type_create_struct`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_create_struct.3.php
* ...

<div class="alert alert-block alert-success"> Before using a new DataType, we shall <b>commit</b> it. </div>

* `MPI_Type_commit`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_commit.3.php

<div class="alert alert-block alert-danger"> <b>REMARK</b>: Once a new data type is created we shall <b>destroy</b> it before closing the application:</div>

* `MPI_Type_free`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_free.3.php

<div class="alert alert-block alert-info"> <b>NOTE</b>: We are going to see more about data types when we are going to see <b>advanced</b> data types. </div>

In [None]:
%%writefile main_struct.cpp

#include <iostream>
#include <mpi.h>

struct Car 
{
    int Model;
    int Color;
};

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);

    const int tag = 13;
    int size, rank;
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    // create a MPI type for struct Car
    const int nitems = 2; // number of struct fields
    int blocklengths[nitems] = { 1, 1 }; // lengths of types for each struct field
    MPI_Datatype types[nitems] = { MPI_INT, MPI_INT }; // MPI types of each struct field
    MPI_Datatype mpi_car_type; // the new MPI dataType
    MPI_Aint offsets[nitems]; // offset computed directly from fields

    offsets[0] = offsetof(Car, Model); // like size_of
    offsets[1] = offsetof(Car, Color); // like size_of

    // create the new dataType
    MPI_Type_create_struct(nitems, blocklengths, offsets, types, &mpi_car_type);
    MPI_Type_commit(&mpi_car_type); // commit operation

    Car car = { 0, 0 };
    
    if (rank == 0) 
    {
        car.Model = 4;
        car.Color = 100;

        MPI_Send(&car, 1, mpi_car_type, 1, tag, MPI_COMM_WORLD);

        std::cout<< "Process "<< rank<< ": sent structure car"<< std::endl;
    }
    else if (rank == 1) 
    {
        MPI_Status status;
        MPI_Recv(&car, 1, mpi_car_type, 0, tag, MPI_COMM_WORLD, &status);
        std::cout<< "Process "<< rank<< ": recv structure car"<< std::endl;
    }
    
    std::cout<< "Process "<< rank<< ": car.Model "<< car.Model<< " car.Color "<< car.Color<< std::endl;

    MPI_Type_free(&mpi_car_type); // destroy operation

    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_struct
cd debug_struct
cmake -DSOURCES="main_struct.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_struct
mpirun -np 3 2_MPI_Communications

## More about communications...

For a communication to succeed:

1. Always specify a **valid** sorce/destination rank in the communicator;
2. The communicator must be **the same**;
3. Tags must **match**;
4. The range of valid tag values is 0,..., **UB**, where the value of **UB** is implementation dependent.
5. Buffers must be **large enough**!

<div class="alert alert-block alert-danger"><b>Check</b> very carefully all the arguments - the command may succeed but with wrong data. </div>

### Message Order

Messages are **nonovertaking**

> ...If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the
second message if the first one is still pending...

### Wildcards

MPI allows also some special constant values, called **wildcards**.

* `MPI_ANY_SOURCE`: To receive from any source;
* `MPI_ANY_TAG`: To receive with any tag;

<div class="alert alert-block alert-danger"> The reverse <b>DOES NOT</b> apply: there is no <code>MPI_ANY_DEST</code>, so send calls knows where they are going.</div>

* `MPI_STATUS_IGNORE`: ignore the status of the communication
* `MPI_PROC_NULL`: dummy destination/source process
* ...

### Deadlock

A **Deadlock** or a *Race condition* occurs when $2$ (or more) processes are **blocked**, and each is waiting
for the other to make progress.

<img src="./Images/deadlock.png" width=60% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-warning"><b>NOTE</b>: the allocated time (and budget) may expire but <b>no work</b> is done.</div>

In [None]:
%%writefile main_deadlock.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int process_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &process_rank);
    
    int tag = 10;
    float a[2] = { 0.0, 0.0 };
    float b[2] = { 0.0, 0.0 };
    MPI_Status status;
    
    if (process_rank == 0)
    {
        a[0] = 1.0; a[1] = 2.0;
        MPI_Recv(&b, 2, MPI_FLOAT, 1, tag + 1, MPI_COMM_WORLD, &status);
        MPI_Send(&a, 2, MPI_FLOAT, 1, tag, MPI_COMM_WORLD);
    }
    else if (process_rank == 1)
    {
        b[0] = 3.0; b[1] = 4.0;
        MPI_Recv(&a, 2, MPI_FLOAT, 0, tag, MPI_COMM_WORLD, &status);
        MPI_Send(&b, 2, MPI_FLOAT, 0, tag + 1, MPI_COMM_WORLD);
    }
        
    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "Status SOURCE: "<< status.MPI_SOURCE<< " ";
    std::cout<< "TAG: "<< status.MPI_TAG<< " ";
    std::cout<< "ERROR: "<< status.MPI_ERROR<< std::endl;
        
    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "a "<< a[0]<< ", "<< a[1]<< " ";
    std::cout<< "b "<< b[0]<< ", "<< b[1]<< std::endl; 
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_deadlock
cd debug_deadlock
cmake -DSOURCES="main_deadlock.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_deadlock
mpirun -np 2 2_MPI_Communications

### Sendrecv

Both sends and receives a message.

* `MPI_Sendrecv`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Sendrecv.3.php

<div class="alert alert-block alert-success"> <b>REMARK</b>: <code>MPI_Sendrecv</code> can also be used to
eliminate deadlock</div>

<img src="./Images/sendrecv.png" width=40% style="margin-left:auto; margin-right:auto">

In [None]:
%%writefile main_sendrecv.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc,&argv);
    
    int rank, procs, tag = 10;
    MPI_Comm_size(MPI_COMM_WORLD, &procs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    int right = (rank + 1) % procs;
    int left = rank == 0 ? procs - 1 : rank - 1;
    
    int buffer1 = rank, buffer2 = -1;
    MPI_Sendrecv(&buffer1, 1, MPI_INT, right, tag, 
                 &buffer2, 1, MPI_INT, left, tag, 
                 MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    
    std::cout<< "Process "<< buffer1<< " ";
    std::cout<< "recvs from "<< buffer2<< std::endl; 
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_sendrecv
cd debug_sendrecv
cmake -DSOURCES="main_sendrecv.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_sendrecv
mpirun -np 4 2_MPI_Communications

## Communication Modes

In a perfect world, every send operation would be perfectly synchronized with its matching receive. 
This is **rarely** the case.

<div class="alert alert-block alert-success"><b>NOTE</b>: The MPI implementation is able to deal with storing data when the two tasks are out of sync. </div>

> ...The message might be copied directly into the matching receive buffer, or it
might be copied into a temporary system buffer...

### Communication Mode Types

MPI offers the choice of **several comunication modes** that allow one to control the choice of the communication protocol:

* **Buffered Mode**
* **Syncronous Mode**
* **Ready Mode**

<div class="alert alert-block alert-warning"> The send call <code>MPI_SEND</code> uses the <b>standard</b> communication mode: it is up to MPI to decide whether outgoing messages will be buffered (threshold). </div>

<img src="./Images/send.png" width=30% style="margin-left:auto; margin-right:auto">


### Buffered Send

<img src="./Images/bsend.png" width=30% style="margin-left:auto; margin-right:auto">

* `MPI_Bsend`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Bsend.3.php
* `MPI_Ibsend`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Ibsend.3.php

<div class="alert alert-block alert-info"> <b>NOTE</b> Remove synchronization between processes.</div>

<div class="alert alert-block alert-warning"> <b>WARNING</b> the programmer is responsible for allocating and managing the data buffer.</div>

### Synchronous Send

<img src="./Images/ssend.png" width=40% style="margin-left:auto; margin-right:auto">

* `MPI_Ssend`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Ssend.3.php
* `MPI_Issend`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Issend.3.php

<div class="alert alert-block alert-info"> <b>NOTE</b> this is the safest Point-To-Point communication.</div>

### Ready Send

<img src="./Images/rsend.png" width=30% style="margin-left:auto; margin-right:auto">

* `MPI_Rsend`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Rsend.3.php
* `MPI_Irsend`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Irsend.3.php


<div class="alert alert-block alert-info"> <b>NOTE</b> attempts to reduce system and synchronization overhead .</div>

<div class="alert alert-block alert-danger"><b>WARNING</b> If the receive is not posted soon enough, an error will be triggered.</div>

### To undestand better

> ...A possible communication protocol for the various communication modes is outlined below....
> 
* **ready send**: The message is sent as soon as possible.
* **synchronous send**: The sender sends a request-to-send message. The receiver stores this request. When a matching receive is started, the receiver sends back a permission-to-send message, and the sender now sends the message.
* **standard send**: First protocol may be used for short messages, and second protocol for long messages.
* **buffered send**: The sender copies the message into a buffer and then sends it with a nonblocking send (using the same protocol as for standard send).

## Non-blocking communications

MPI point-to-point routines can be used in either **blocking** or **non-blocking** mode.

<div class="alert alert-block alert-info"> Non-blocking communications are identified by prefix <code>I</code></div>

<img src="./Images/non_block.png" width=90% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-warning"><b>NOTE</b>: Not always possible but worth trying - depends how much
calculation can be done which does not require the transferred data.</div>

### Nonblocking communication is important for: 

* **performance**
* express complex and possibly dynamic communication patterns without needing to ensure that all sends and receives are issued in an order (prevents **deadlock**)
* **overlap** of communication with different communication operations
* **overlap** of communication with computation

### Moreover

<div class="alert alert-block alert-warning"><b>NOTE</b>: Nonblocking send start calls can use the same four modes as blocking sends: standard, buffered, synchronous, and ready..</div>

> ...a nonblocking **ready** send can be started only if the matching receive is already started...If the send mode is **synchronous**, then the send-complete call is nonlocal; the send can  complete only if a matching receive has been started and has been matched with the send. Note that a synchronous mode send may complete, if matched by a nonblocking receive,  before the receive complete call occurs...If the send mode is **buffered**, then the send-complete call is local; the send must complete irrespective of the status of a matching receive. If there is no pending receive operation, then the message must be buffered...f the send mode is **standard**, then the send-complete call can be either local or nonlocal...

### Request Objects and Communication Completition

<div class="alert alert-block alert-success">Nonblocking communication operations use <b>opaque</b> request objects to identify communication operations</div>

* The completion of a send operation indicates that the sender is now **free to update
the send buffer**

> ...It does not indicate that the message has been received, rather, it may have been buffered
by the communication subsystem. However, if a synchronous mode send was used, the
completion of the send operation indicates that a matching receive was initiated, and that
the message will eventually be received by this matching receive...

* The completion of a receive operation indicates that the **receive buffer contains the
received message**, the receiver is now free to access it, and that the status object is set.

> ...It does not indicate that the matching send operation has completed (but indicates, of course,
that the send was initiated )...


### Isend and Irecv

Calls used to send and receive a non-blocking message.

* `MPI_Isend`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Isend.3.php
* `MPI_Irecv`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Irecv.3.php

<div class="alert alert-block alert-danger"> <b>REMARK</b>: we should <b>wait</b> for completing each non-blocking operation:</div>

* `MPI_Wait`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Wait.3.php
* `MPI_Waitall`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Waitall.3.php
* ...

In [None]:
%%writefile main_isend_irecv.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int process_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &process_rank);
    
    int tag = 10;
    float a[2] = { 0.0, 0.0 };
    float b[2] = { 0.0, 0.0 };
    MPI_Status status[2];
    MPI_Request request[2];
    
    if (process_rank == 0)
    {
        a[0] = 1.0; a[1] = 2.0;
        MPI_Irecv(&b, 2, MPI_FLOAT, 1, tag + 1, MPI_COMM_WORLD, &request[1]);
        MPI_Isend(&a, 2, MPI_FLOAT, 1, tag, MPI_COMM_WORLD, &request[0]);
    }
    else if (process_rank == 1)
    {
        b[0] = 3.0; b[1] = 4.0;
        MPI_Irecv(&a, 2, MPI_FLOAT, 0, tag, MPI_COMM_WORLD, &request[1]);
        MPI_Isend(&b, 2, MPI_FLOAT, 0, tag + 1, MPI_COMM_WORLD, &request[0]);
    }

    std::cout<< "Process "<< process_rank<< " waiting..."<< std::endl;
    
    MPI_Wait(&request[0], &status[0]);
    MPI_Wait(&request[1], &status[1]);
    //MPI_Waitall(2, request, status); // alternative
    
    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "Status[0] SOURCE: "<< status[0].MPI_SOURCE<< " ";
    std::cout<< "TAG: "<< status[0].MPI_TAG<< " ";
    std::cout<< "ERROR: "<< status[0].MPI_ERROR<< std::endl;
    
    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "Status[1] SOURCE: "<< status[1].MPI_SOURCE<< " ";
    std::cout<< "TAG: "<< status[1].MPI_TAG<< " ";
    std::cout<< "ERROR: "<< status[1].MPI_ERROR<< std::endl;
    
    std::cout<< "Process "<< process_rank<< " ";
    std::cout<< "a "<< a[0]<< ", "<< a[1]<< " ";
    std::cout<< "b "<< b[0]<< ", "<< b[1]<< std::endl; 
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_isend_irecv
cd debug_isend_irecv
cmake -DSOURCES="main_isend_irecv.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_isend_irecv
mpirun -np 2 2_MPI_Communications

## Persistent communications

## Persistent communications

If a point-to-point message-passing routine is called repeatedly _with the same arguments_, **persistent communication** can be used to avoid redundancy in setting up the message each time it is sent.

* `MPI_Send_init`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Send_init.3.php
* `MPI_Ssend_init`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Ssend_init.3.php
* ...
* `MPI_Recv_init`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Recv_init.3.php

<div class="alert alert-block alert-info"> <b>NOTE</b>: persistent communications are <b>non-blocking</b>, we shall use <code>MPI_Wait</code> .</div>

* `MPI_Start`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Start.3.php

<div class="alert alert-block alert-danger"> <b>REMARK</b>: We shall <b>destroy</b> each `MPI_Request` created.</div>

In [None]:
%%writefile main_persistent.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc,&argv);
    
    int rank, procs, tag = 10;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    MPI_Request rqst;
    MPI_Status status;

    int buffer[2];
    
    if (rank == 0)
    {
        buffer[0] = 2; 
        buffer[1] = 3; 
    }
    
    //Step 1) Initialize persistent send/recv requests
    if (rank == 1)
        MPI_Recv_init(buffer, 2, MPI_INT, 0, tag, MPI_COMM_WORLD, &rqst);
    if (rank == 0)
        MPI_Send_init(buffer, 2, MPI_INT, 1, tag, MPI_COMM_WORLD, &rqst);

    for (unsigned int i = 0; i < 2; i++)
    {
        //Step 2) Use start in place of send/recv
        if (rank == 1)
            MPI_Start(&rqst);

        if (rank == 0)
            buffer[0] += i;
        //... do work

        if (rank == 0)
            MPI_Start(&rqst);

        //Wait for send/recv to complete
        MPI_Wait(&rqst, &status);
        
        std::cout<< "Iteration "<< i<< " ";
        std::cout<< "Rank "<< rank<< " ";
        std::cout<< "Buffer "<< buffer[0]<< ", "<< buffer[1]<< std::endl;
    }

    //Step 3) Clean up the requests
    MPI_Request_free(&rqst);
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_persistent
cd debug_persistent
cmake -DSOURCES="main_persistent.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_persistent
mpirun -np 2 2_MPI_Communications