############# Markdown note ##################

<div class="alert alert-block alert-info"> <b>NOTE</b> Use blue boxes for Tips and notes. </div>

<div class="alert alert-block alert-success"> Use green boxes sparingly, and only for some specific purpose that the other boxes can't cover. For example, if you have a lot of related content to link to, maybe you decide to use green boxes for related links from each section of a notebook. </div>

<div class="alert alert-block alert-warning"> Use yellow boxes for examples that are not inside code cells, or use for mathematical formulas if needed. </div>

<div class="alert alert-block alert-danger"> In general, just avoid the red boxes. </div>

<img src="<path>" width=20% style="margin-left:auto; margin-right:auto">
<img src="<path>" width=40% style="float: right;">  

In [None]:
%%sh

# reset all programs
rm -rf debug*

# MPI Advanced

Advanced concept of **Message Passing Interface** (MPI)

## Advanced DataTypes

## Previously on DataTypes...

DataTypes can be created with different MPI routines, for example:

* `MPI_Pack`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Pack.3.php
* `MPI_Type_create_struct`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_create_struct.3.php

<div class="alert alert-block alert-success"> Before using a new DataType, we shall <b>commit</b> it. </div>

* `MPI_Type_commit`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_commit.3.php

<div class="alert alert-block alert-danger"> <b>REMARK</b>: Once a new data type is created we shall <b>destroy</b> it before closing the application:</div>

* `MPI_Type_free`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_free.3.php

## Type_Contiguous

Simplest constructor. Makes count **copies** of an existing datatype

* `MPI_Type_contiguous`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_contiguous.3.php

<img src="./Images/contigous.png" width=80% style="margin-left:auto; margin-right:auto">

## Type_Vector

Like contiguous, but allows for regular gaps (**stride**) in the displacements. 

* `MPI_Type_vector`, `MPI_Type_hvector`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_vector.3.php

<img src="./Images/vector.png" width=60% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-info"> <b>NOTE</b>: For <code>MPI_Type_hvector</code> the stride is specified in bytes.. </div>

## Type_Index

An array of **non regular displacements** of the input data type is provided as the map for the new data type.

* `MPI_Type_indexed`, `MPI_Type_hindexed`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Type_indexed.3.php

<img src="./Images/indexed.png" width=60% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-info"> <b>NOTE</b>: For <code>MPI_Type_hindexed</code> offsets are specified in byte. </div>

In [None]:
%%writefile main_vector.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    MPI_Status status;
    double a[4][4];
    for (unsigned int i = 0; i < 4; i++)
        for (unsigned int j = 0; j < 4; j++)
            a[i][j] = (rank == 0) ? i * 4.0 + j : 0.0;
    
    MPI_Datatype rowType, colType;
    
    MPI_Type_contiguous(4, MPI_DOUBLE, &rowType);
    MPI_Type_vector(4, 2, 4, MPI_DOUBLE, &colType);
    
    MPI_Type_commit(&rowType);
    MPI_Type_commit(&colType);
    
    if (rank == 0)
    {
        MPI_Send(&a[2][0], 1, rowType, 1, 10, MPI_COMM_WORLD);
        MPI_Send(&a[0][2], 1, colType, 1, 11, MPI_COMM_WORLD);
    }
    else if (rank == 1)
    {
        MPI_Recv(&a[2][0], 1, rowType, 0, 10, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        MPI_Recv(&a[0][2], 1, colType, 0, 11, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    }
    
    std::cout<< "Process "<< rank<< ": "<< std::endl;
    for (unsigned int i = 0; i < 4; i++)
    {
        for (unsigned int j = 0; j < 4; j++)
            std::cout<< (j == 0 ? "" : ", ")<< a[i][j];
        std::cout<< std::endl;
    }
    
    MPI_Type_free(&rowType);
    MPI_Type_free(&colType);
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_vector
cd debug_vector
cmake -DSOURCES="main_vector.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_vector
mpirun -np 2 4_MPI_Advanced

## Topologies

## Virtual topology vs Physical Topology

MPI **Topology**, or **Virtual Topology** is a process arrangement in topological patterns. 

Examples are 2D or 3D **grid**, or more generally can be described by a **graph**.

<div class="alert alert-block alert-success"><b>NOTE</b>: Virtual topology can be exploited by the system in the assignment of processes to physical processors (<b>Physical Topology</b>). This helps to improve the
communication performance on a given machine.</div>

* `MPI_Cart_create`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Cart_create.3.php
* `MPI_Graph_create`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Graph_create.3.php
* ...

<div class="alert alert-block alert-info">Topology information is always associated with a <b>new communicator</b>. </div>

### REMARK

<div class="alert alert-block alert-danger">Once a new communicator is created we shall <b>destroy</b> it before closing the application:</div>

* `MPI_Comm_free`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_free.3.php


### Example of MPI Graph Topology

`MPI_Cart_create`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Cart_create.3.php

<img src="./Images/graph.png" width=40% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-info"> <b>NOTE</b> Periodicity can be selected for each direction. </div>

### Cart Topology Utilities

* `MPI_Dims_Create`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Dims_create.3.php - compute optimal balanced distribution of processes per coordinate direction with respect constraints;
* `MPI_Cart_coords`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Cart_coords.3.php - given a rank, returns process's coordinates
* `MPI_Cart_rank`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Cart_rank.3.php - given process's coordinates, returns the rank
* `MPI_Cart_shift`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Cart_shift.3.php - get source and destination rank ids in SendRecv operations

In [None]:
%%writefile main_cart.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int rank, nprocs;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    
    int dim[] = { 0, 0 };
    MPI_Dims_create(nprocs, 2, dim); // create 2D cart dimensions
    
    int period[] = {1, 0}; // periodic in the first dimension
    MPI_Comm grid_comm;
    MPI_Cart_create(MPI_COMM_WORLD, 2, dim, period, 1, &grid_comm);
    
    int new_rank;
    int coordinates[2] = { 0, 0 };
    MPI_Comm_rank(grid_comm, &new_rank); // new rank in the communicator
    MPI_Cart_coords(grid_comm, new_rank, 2, coordinates); // coordinate in the grid
    
    std::cout<< "Process "<< rank<< ": "<< " ";
    std::cout<< "new_rank "<< new_rank<< " ";
    std::cout<< "coordinates "<< coordinates[0]<< ", "<< coordinates[1]<< std::endl;
    
    MPI_Barrier(grid_comm);
    
    int source, dest;
    for (int direction = 0; direction < 2; direction++) 
    {
        for (int displacement = -1; displacement < 2; displacement += 2) 
        {
            int bufferSend = new_rank, bufferRecv = -1;
            
            MPI_Cart_shift(grid_comm, direction, displacement, &source, &dest);
                        
            MPI_Sendrecv(&bufferSend, 1, MPI_INT, source, 10, 
                         &bufferRecv, 1, MPI_INT, dest, 10,
                         grid_comm, MPI_STATUS_IGNORE);
            
            std::cout<< "Process "<< rank<< " - ";
            std::cout<< "direction "<< direction<< " ";
            std::cout<< "displacement "<< displacement<< ": ";
            std::cout<< "send to "<< dest<< " ";
            std::cout<< "data "<< bufferSend<< "; ";
            std::cout<< "recv from "<< source<< " ";
            std::cout<< "data "<< bufferRecv<< std::endl;
        }
    }
           
    MPI_Comm_free(&grid_comm);
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_cart
cd debug_cart
cmake -DSOURCES="main_cart.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_cart
mpirun --oversubscribe -np 4 4_MPI_Advanced

<div class="alert alert-block alert-warning"><b>NOTE</b>: negative rank can be used in send/recv. Those values correspond to constant <code>MPI_PROC_NULL</code>, ignored by communications operations </div>

## Circular Graph

<img src="./Images/ring.png" width=30% style="float: right;">

**Circular shift** is another typical MPI communication pattern.

<div class="alert alert-block alert-success">Such a pattern is nothing more than a <b>1D cartesian grid</b> topology with optional periodicity.</div>

<img src="./Images/cart1d.png" width=50% style="margin-left:auto; margin-right:auto">


## Neighbour Communications

`MPI-3.0` introduced **advanced communications** for topologies:

* `MPI_NEIGHBOR_ALL(GATHER[V] | TOALL[V])`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Neighbor_allgather.3.php - neighbor collective communications, optimized because the communication pattern is known statically

<div class="alert alert-block alert-info"><b>NOTE:</b>There are also the <b>non-blocking</b> collective communications <code>MPI_INEIGHBOR_ALL(GATHER[V] | TOALL[V])</code>. </div>



## One side communications

Standard point-to-point communications are **two-sided**: there can be a delay if the sender has to wait to send the data because the receiver is not ready.

`MPI-2` standard added Remote Memory Access (**RMA**), also called **one-sided communication**.

`MPI-3` further extended RMA to improve functionality and performances.

<img src="./Images/oneside.png" width=50% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-warning"><b>GOAL</b>: to <b>decouple</b> data transfer from system synchronisation. </div>

### Pros and cons of RMA

**Advantages**:
* <div class="alert alert-block alert-success">Only <b>one process</b> taking part performance should be greater: no implicit synchronization, all data movement routines are non-blocking;</div>
* <div class="alert alert-block alert-success">Programs are more <b>easily written</b> with RMA: similar to <b>shared-memory</b> and opposed to message passing.</div>

**Disadvantages**:
* <div class="alert alert-block alert-danger">The programmer shall specify the <b>synchronization</b> of the processes to write in the public memory;</div>
* <div class="alert alert-block alert-danger">In practice RMA may not be <b>faster</b>, if compared to well written P2P.</div>

### Main RMA Routines

Routines to manage the RMA shared memory window:
* `MPI_Win_create`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Win_create.3.php
* `MPI_Win_fence`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Win_fence.3.php
* `MPI_Win_free`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Win_free.3.php

<div class="alert alert-block alert-danger"><b>REMARK</b>: each window created shall be <b>destroyed</b>.</div>

Rountes to manage the RMA communications:
* `MPI_Put`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Put.3.php
* `MPI_Get`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Get.3.php

In [None]:
%%writefile main_rma.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int rank, nprocs;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    MPI_Win window;
    int shared_buffer[2] = { rank, rank };
    // create a simple RMA window of 2 integers
    MPI_Win_create(shared_buffer, sizeof(int) * 2, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &window);
    
    MPI_Win_fence(0, window); // synchronization
    
    int local_buffer[2] = { rank + 2, rank + 4 };
    int get_local_buffer[2] = { 0, 0 };
    
    if (rank == 0)
        MPI_Get(get_local_buffer, 2, MPI_INT, 1, 0, 2, MPI_INT, window);
    else if (rank == 1)
        MPI_Get(get_local_buffer, 2, MPI_INT, 0, 0, 2, MPI_INT, window);
    
    MPI_Win_fence(0, window); // synchronization
    
    std::cout<< "Process "<< rank<< ": ";
    std::cout<< "shared_buffer "<< shared_buffer[0]<< " "<< shared_buffer[1]<< " ";
    std::cout<< "local_buffer "<< local_buffer[0]<< " "<< local_buffer[1]<< " ";
    std::cout<< "get_local_buffer "<< get_local_buffer[0]<< " "<< get_local_buffer[1]<< std::endl;
    
    if (rank == 0)
        MPI_Put(local_buffer, 2, MPI_INT, 1, 0, 2, MPI_INT, window);
    else if (rank == 1)
        MPI_Put(local_buffer, 2, MPI_INT, 0, 0, 2, MPI_INT, window);
    
    MPI_Win_fence(0, window); // synchronization
    
    if (rank == 0)
        MPI_Get(get_local_buffer, 2, MPI_INT, 0, 0, 2, MPI_INT, window);
    else if (rank == 1)
        MPI_Get(get_local_buffer, 2, MPI_INT, 1, 0, 2, MPI_INT, window);
    
    MPI_Win_fence(0, window); // synchronization
    
    std::cout<< "Process "<< rank<< ": ";
    std::cout<< "shared_buffer "<< shared_buffer[0]<< " "<< shared_buffer[1]<< " ";
    std::cout<< "local_buffer "<< local_buffer[0]<< " "<< local_buffer[1]<< " ";
    std::cout<< "get_local_buffer "<< get_local_buffer[0]<< " "<< get_local_buffer[1]<< std::endl;
    
    MPI_Win_free(&window);
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_rma
cd debug_rma
cmake -DSOURCES="main_rma.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_rma
mpirun -np 2 4_MPI_Advanced

## Dynamic processes in MPI

Normally MPI tasks are **fixed** (e.g. by mpirun) at the start of execution.

`MPI-2` provides calls to create processes "**on the fly**".

* `MPI_COMM_SPAWN`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_spawn.3.php - new set of processes with the same command lines (SIMD)
* `MPI_COMM_SPAWN_MULTIPLE`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_spawn_multiple.3.php - new set of processes with potentially different command (MIMD)