############# Markdown note ##################

<div class="alert alert-block alert-info"> <b>NOTE</b> Use blue boxes for Tips and notes. </div>

<div class="alert alert-block alert-success"> Use green boxes sparingly, and only for some specific purpose that the other boxes can't cover. For example, if you have a lot of related content to link to, maybe you decide to use green boxes for related links from each section of a notebook. </div>

<div class="alert alert-block alert-warning"> Use yellow boxes for examples that are not inside code cells, or use for mathematical formulas if needed. </div>

<div class="alert alert-block alert-danger"> In general, just avoid the red boxes. </div>

<img src="<path>" width=20% style="margin-left:auto; margin-right:auto">
<img src="<path>" width=40% style="float: right;">  

In [None]:
%%sh

# reset all programs
rm -rf debug*

# MPI Collectives

Collective Communications with **Message Passing Interface** (MPI)

Communications involving groups of processes are called **collectives**.

<div class="alert alert-block alert-warning"><code>MPI 1.0-2.0</code> collective calls are blocking. <code>MPI-3</code> introduced <b>non-blocking</b> collectives.</div>

They have the following characteristics:
* **Every** process in the communicator shall call the collective function;
* **No tags** are required.

<div class="alert alert-block alert-success"> Designed to replace loops of point-to-point calls to be <b>more efficient</b>. </div>

## Barriers

To stops a group of processes until they are **synchronized**.

* `MPI_Barrier`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Barrier.3.php

<img src="./Images/barrier.png" width=50% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-danger"> <b>Severe</b> performance impact if used too often. </div>

## Broadcast

Broadcasts a message from a process to **all other processes** of the group.

* `MPI_Bcast`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Bcast.3.php

<img src="./Images/bcast.png" width=40% style="margin-left:auto; margin-right:auto">

In [None]:
%%writefile main_bcast.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    MPI_Status status;
    double a[2] = { 0.0, 0.0 };
    if ( rank == 0 ) 
    {
        a[0] = 2.1; 
        a[1] = 4.3;
    }
    
    // send the information to all the other processes
    MPI_Bcast(a, 2, MPI_DOUBLE, 0, MPI_COMM_WORLD);
    
    std::cout<< "Process "<< rank<< " ";
    std::cout<< "a "<< a[0]<< ", "<< a[1]<< std::endl; 
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_bcast
cd debug_bcast
cmake -DSOURCES="main_bcast.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_bcast
mpirun -np 4 3_Collectives

## Gather and Scatter

One process collects data elements from all the processes and stores them in rank order (**Gather**) and viceversa (**Scatter**).

* `MPI_Gather`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Gather.3.php
* `MPI_Scatter`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Scatter.3.php

<img src="./Images/gather.png" width=48% style="float: left;">  
<img src="./Images/scatter.png" width=48% style="float: right;">  

In [None]:
%%writefile main_gather_scatter.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    MPI_Status status;
    int a[8] = { 0 };
    if (rank == 0) 
    {
        for (unsigned int i = 0; i < 8; i++)
            a[i] = i + 1; 
    }
    
    // send the information to all the other processes
    MPI_Scatter(a, 2, MPI_INT, a, 2, MPI_INT, 0, MPI_COMM_WORLD);
    
    std::cout<< "Before Process "<< rank<< " ";
    std::cout<< "a: ";
    for (unsigned int i = 0; i < 8; i++)
        std::cout<< (i == 0 ? "" : ", ")<< a[i];
    std::cout<< std::endl;
        
    a[0] *= 2;
    a[1] *= 2;
    
    // get the information from the other processes
    MPI_Gather(a, 2, MPI_INT, a, 2, MPI_INT, 0, MPI_COMM_WORLD);
    
    // stop processes to obtain a good output
    MPI_Barrier(MPI_COMM_WORLD);
    
    std::cout<< "After Process "<< rank<< " ";
    std::cout<< "a: ";
    for (unsigned int i = 0; i < 8; i++)
        std::cout<< (i == 0 ? "" : ", ")<< a[i];
    std::cout<< std::endl;
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_gather_scatter
cd debug_gather_scatter
cmake -DSOURCES="main_gather_scatter.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_gather_scatter
mpirun -np 4 3_Collectives

## GatherV and ScatterV

More **complex** gather and scatter call where it’s possible to define a different length of arrays.

* `MPI_GatherV`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Gatherv.3.php
* `MPI_ScatterV`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Scatterv.3.php

<img src="./Images/scatterv.png" width=50% style="margin-left:auto; margin-right:auto"> 

## Other collectives

* `MPI_Allgather`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Allgather.3.php
* `MPI_Alltoall`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Alltoall.3.php
* `MPI_Allgatherv`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Allgatherv.3.php
* `MPI_Alltoallv`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Alltoallv.3.php
* ...

<div class="alert alert-block alert-danger"><b>NOTE</b>: expensive calls - use only when needed.</div>

<img src="./Images/allgather.png" width=48% style="float: left;">  
<img src="./Images/alltoall.png" width=48% style="float: right;">  

## Reductions

<img src="./Images/reduce.png" width=23% style="float: right;">  

A **reduction** takes values from a group of processes and generates a **single value** with some operation (e.g. a `sum`, `average`, etc.).

* collect data from different processes;
* store the result on a single process or distribute the value to all processes

<div class="alert alert-block alert-success"> Designed to <b>avoid race-conditions</b>.</div>

* `MPI_Reduce`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Reduce.3.php
* `MPI_Allreduce`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Allreduce.3.php

### Reduction Operations
<img src="./Images/reduction.png" width=50% style="margin-left:auto; margin-right:auto"> 

In [None]:
%%writefile main_reduce.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    MPI_Init(&argc, &argv);
    
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    MPI_Status status;
    int a[2] = { rank, rank + 1 };
    int sum[2] = { 0, 0 };
    
    std::cout<< "Before Process "<< rank<< " ";
    std::cout<< "a: "<< a[0]<< ", "<< a[1]<< std::endl;
    
    MPI_Reduce(&a, &sum, 2, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
            
    std::cout<< "After Process "<< rank<< " ";
    std::cout<< "sum: "<< sum[0]<< ", "<< sum[1]<< std::endl;
    
    MPI_Finalize();
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_reduce
cd debug_reduce
cmake -DSOURCES="main_reduce.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_reduce
mpirun -np 4 3_Collectives

## Collectives and Performance

MPI vendors work **hard** to optimise collectives for parallel hardware.

_Latency measurements_ (minimum time needed to transfer data) in different machines: image taken from: https://doi.org/10.1002/cpe.6769
<img src="./Images/performance.png" width=80% style="margin-left:auto; margin-right:auto"> 

<div class="alert alert-block alert-warning"> <b>NOTE</b>: parallel scaling is often dictated by MPI collectives. </div>