############# Markdown note ##################

<div class="alert alert-block alert-info"> <b>NOTE</b> Use blue boxes for Tips and notes. </div>

<div class="alert alert-block alert-success"> Use green boxes sparingly, and only for some specific purpose that the other boxes can't cover. For example, if you have a lot of related content to link to, maybe you decide to use green boxes for related links from each section of a notebook. </div>

<div class="alert alert-block alert-warning"> Use yellow boxes for examples that are not inside code cells, or use for mathematical formulas if needed. </div>

<div class="alert alert-block alert-danger"> In general, just avoid the red boxes. </div>

<img src="<path>" width=20% style="margin-left:auto; margin-right:auto">
<img src="<path>" width=40% style="float: right;">  

In [None]:
%%sh

# reset all programs
rm -rf debug*

# MPI Introduction

An introduction to basic concept of **Message Passing Interface** (MPI)

## It is easy to parallelize a code? (1)

Few comments:

* For intra-node, `OpenMP` parallelization is "simple" **but** it is easy to reach **bad** performances...
* `OpenAcc` provides more implicit and "good" parallelism than `OpenMP` but **only** supported by Nvidia...
* `CUDA` **only** works on Nvidia GPUs, but `OpenCL` not common and easy to use...
* `SYCL`, `oneAPI` (Intel) can offer complete solution but **only** for C++...

### What about MPI?

* requires **many** programming changes to go from serial to parallel version...
* can be hard to **debug**...

## It is easy to parallelize a code? (2)

<div class="alert alert-block alert-danger"><b>No free meals</b> - can’t just "turn on" parallelism</div>

Parallel programming requires work:
* **Code modification** - always
* **Algorithm modification** - often
* **New sneaky bugs** - you bet

## MPI 

Message-Passing Interface

> MPI is *NOT* a language!

<div class="alert alert-block alert-warning"> The MPI is a <b>standard specification</b> interface ruled by <a href="https://www.mpi-forum.org/">MPI Forum</a>. </div>

<div class="alert alert-block alert-info"> <b>MAIN GOAL</b> specify message-passing parallel model. </div>

Extensions:

* collective operators
* dynamic process creation
* parallel I/O
* ... 

## Story

Born in 1992, first version released in 1994.

Versions:
* *MPI-1*, 1994
* *MPI-2*, 2002
* *MPI-3*, 2014
* *MPI-4*, 2021
* *MPI-5*, 2024

> We will analyse *MPI-4.1*, (2023) https://www.open-mpi.org/doc/v4.1/

## Story: MPI-1

Born in 1992, first version released in 1994.

> ...The standardization process began with the Workshop on Standards for Message-Passing in a Distributed Memory Environment, sponsored by the Center for Research on
Parallel Computing, held April 29–30, 1992, in Williamsburg, Virginia...

Focused mainly on **point-to-point communications**

> ...MPI standard by the Fall of 1993 was set. To achieve this goal the MPI working group met every 6 weeks for two days throughout the first 9 months of 1993, and presented the draft MPI standard at the Supercomputing 93 conference in November 1993...


## Story: MPI-2

Beginning in 1995, first version released in 2002.

* _classic_ **collective operators**
* **persistent** communications
* new types of functionality (dynamic processes, one-sided communication,
**parallel I/O**, etc.)
* Bindings for Fortran 90 and **C++**.


## Story: MPI-3

First version released around 2014.

* Collective **non-blocking** operators
* **non-blocking** I/O routines

<div class="alert alert-block alert-danger">MPI for C++ is deprecated from version MPI 3.0</div>

## Story: MPI-4

First version released around 2021.

* **large** counter
* **persistent** collectives
* **partitioned** communications

## What Platforms Are Targets for Implementation?

<div class="alert alert-block alert-warning"> <b>GOAL</b> wide portability. </div>

> ...Programs expressed this way may run on distributed-memory multiprocessors,
networks of workstations, and combinations of all of these. In addition, shared-memory
implementations, including those for multi-core processors and hybrid architectures, are
possible....

Use by general **MIMD** (Multiple Instruction, Multiple Data) programs, as well as **SPMD** (Single Program, Multiple Data) one. 

> ...implementations of MPI on top of standard Unix interprocessor communication protocols will provide portability to workstation clusters and heterogenous networks of workstations....

## MPI Implementations

<div class="alert alert-block alert-warning"> Different <b>implementation</b> of the standard: </div>

* C/C++;
* Fortran;
* Pyhton;
* ...

Both **open source** and **proprietary**:
* [open-mpi](https://www.open-mpi.org/)
* [mpich](https://www.mpich.org/)
* [intelmpi](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) (proprietary)
* [mpi4py](https://mpi4py.readthedocs.io/en/stable/index.html)

<div class="alert alert-block alert-info"> <b>DOCUMENTATION</b>: official used in this course (MPI 4.1) <a href="https://www.mpi-forum.org/docs/">https://www.mpi-forum.org/docs/</a>. For the <b>implementation</b> we are going to use C/C++ <a href="https://www.open-mpi.org/doc/v4.1/">https://www.open-mpi.org/doc/v4.1/</a>.</div>

<div class="alert alert-block alert-success"> For some implementations (e.g. openmpi, intelmpi) multiple <b>compilers</b> are available. </div>

## MPI - Start coding

### To start

* For C++ we are going to use **mpic++** compiler (extension of **g++**) with `openmpi` MPI implementation.
* For Python we use the `mpi4py` library.


#### C++ (1)

C++ Source file (*.cpp) + CMake file (CMakeLists.txt)

In [None]:
%%writefile main_example.cpp

#include <iostream> // output library 
#include <mpi.h> // MPI library

int main(int argc, char **argv) 
{
    std::cout<< "Hello world!"<< std::endl;
    return 0;
}

In [None]:
%%writefile CMakeLists.txt

cmake_minimum_required(VERSION 3.2)
project(1_MPI_Basic LANGUAGES CXX C VERSION 1.0.0) # Name of the project

set(SOURCES "" CACHE STRING "The sources list") # Variable which stores the cpp files

find_package(MPI REQUIRED) # Find the MPI library

add_executable(${PROJECT_NAME} ${SOURCES}) # add list of files to executable
target_link_libraries(${PROJECT_NAME} MPI::MPI_CXX) # Link MPI library

#### C++ (2)

**Compile** and run code from terminal:

In [None]:
%%sh
# compile program
mkdir -p ./debug_example
cd debug_example
cmake -DSOURCES="main_example.cpp" ..
make

In [None]:
%%sh
# run program serial
cd debug_example
./1_MPI_Basic # classic call
mpirun -np 1 1_MPI_Basic # mpi call

In [None]:
%%sh
# run program in parallel
cd debug_example
mpirun -np 2 1_MPI_Basic

In [None]:
%%sh
# run program with more processes
cd debug_example
mpirun --oversubscribe -np 10 1_MPI_Basic

#### Python

Python Source file (*.py) and run code from terminal (**No Compilation step**)

In [None]:
%%writefile main_example.py

import mpi4py # import mpi4py module
mpi4py.rc.initialize = False  # do not initialize MPI automatically
mpi4py.rc.finalize = False    # do not finalize MPI automatically

from mpi4py import MPI # import the 'MPI' module

print("Hello world!")

In [None]:
%%sh
# run program serial
python main_example.py # classic run
mpirun -np 1 python main_example.py # mpi serial run

In [None]:
%%sh
# run program in parallel
mpirun -np 2 python main_example.py

In [None]:
%%sh
# run program with more processes
mpirun --oversubscribe -np 10 python main_example.py

## MPI Logic and Ratio

### MPI Logic

> ...all MPI operations are expressed as functions...

MPI procedures are specified using a **language-independent** notation.

<div class="alert alert-block alert-success"> Arguments are marked as <b>IN</b> , <b>OUT</b> , or <b>INOUT</b> </div>

<div class="alert alert-block alert-danger"> Argument OUT or INOUT <b>cannot</b> be aliased. </div>

### MPI in an image...

<img src="./Images/mpi_idea.webp" width=90% style="margin-left:auto; margin-right:auto">

### Distributed Memory Programming

<img src="./Images/parallelism.png" width=75% style="margin-left:auto; margin-right:auto">

### MPI Message

<img src="./Images/message.jpg" width=30% style="margin-left:auto; margin-right:auto">

<div class="alert alert-block alert-success"> <b>MPI Data Buffer</b>: data exchanged by the MPI procedure.</div>

* **Message Data Buffer**: used in MPI communication procedures (send, receive,...)
* **File Data Buffer**: used in MPI I/O procedures




### MPI Operation

<div class="alert alert-block alert-success"> <b>MPI Operation</b>: a sequence of steps performed by the MPI library.</div>

4 stages:
1. **Initialization**: initialization of the variable in the operation, but _NOT_ the content of the data buffer;
2. **Starting**: data buffer is managed
3. **Completition**: data buffer is completed
4. **Freeing**: return the control of all parameters to the user


### Operation Types

* **Blocking**
 
<img src="./Images/block.png" width=30% style="margin-left:auto; margin-right:auto">

* **Nonblocking**

<img src="./Images/non_block.png" width=30% style="margin-left:auto; margin-right:auto">

* **Persistent**

<img src="./Images/persistent.png" width=30% style="margin-left:auto; margin-right:auto">

### MPI Procedure (1)

<div class="alert alert-block alert-success"> <b>MPI procedure</b>: is a subpart of an MPI operation.</div>

MPI procedures can be:
* **Nonlocal**: if returning requires, during its execution, some specific semantically-related MPI procedure to be called on another MPI process
* **Local**: if it is not nonlocal
* **Completing**: if return from the procedure indicates that at least one associated operation has finished its completion stage, which implies that the user can rely on the content of the output data buffers
* **Incomplete**: if it is not completing.

> ...in most cases incomplete procedures are local and completing procedures are nonlocal...

### MPI Procedures (2)

* **Nonblocking**: if it is incomplete and local
* **Blocking**: if it is not nonblocking
* **Collective**: if all processes in a group or groups of MPI processes need to invoke the procedure

### MPI Process

<div class="alert alert-block alert-success"> MPI program consists of <b>autonomous processes</b>, executing their own code (<b>MIMD</b> style). </div>

* The codes executed by each process need **NOT** be identical. 
* The processes **communicate** via calls to MPI communication primitives. 
* Each process executes in its **OWN** address space

<div class="alert alert-block alert-warning"> Shared-memory implementations of MPI are possible </div>

## MPI Parallel session - Init and Finalize

### Opaque Objects (1)

**MPI manages system memory** that is used for buffering messages and for storing internal
representations of various MPI objects (such as groups, communicators, datatypes, ...)

<div class="alert alert-block alert-info"> This memory is not directly accessible to the user, and objects stored there are <b>OPAQUE</b>: their size and shape is not visible to the user. </div>

* Opaque objects are accessed via **handles**, which exist in user space.
* MPI procedures that operate on opaque objects are passed handle arguments to access these objects.


### Opaque Objects (2)

<div class="alert alert-block alert-warning"> ⚠️ <b>NOTE:</b> opaque object and its handle are significant only at the process where the object was created and cannot be transferred to another process. The user must not free such objects. </div>

> ...This design hides the internal representation used for MPI data structures, thus allowing similar calls in C and Fortran...

### Init and Finalize

Calls used to initialize and terminate the parallel session.

<div class="alert alert-block alert-info"> 🔵 They <b>allocate</b> and <b>deallocate</b> the opaque objects 🔵</div>

* `MPI_Init`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Init.3.php
* `MPI_Finalize`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Finalize.3.php

### C++ (1)

In [None]:
%%writefile main_init.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv) 
{
    // Initialize MPI
    // This must always be called before any other MPI functions
    MPI_Init(&argc, &argv);
    
    std::cout<< "Hello world!"<< std::endl;

    // Finalize MPI
    // This must always be called after all other MPI functions
    MPI_Finalize();

    return 0;
}

<div class="alert alert-block alert-info"> The arguments in <code>MPI_Init</code> are <strong>not used</strong> anymore but some compilers insist they are there. </div>

### C++ (2)

In [None]:
%%sh

# compile program
mkdir -p ./debug_init
cd debug_init
cmake -DSOURCES="main_init.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_init
mpirun -np 4 1_MPI_Basic

### Python

In [None]:
%%writefile main_init.py

import mpi4py
mpi4py.rc.initialize = False  # do not initialize MPI automatically
mpi4py.rc.finalize = False    # do not finalize MPI automatically

from mpi4py import MPI # import the 'MPI' module

# manual initialization of the MPI environment
MPI.Init()

print("Hello world!")

# manual finalization of the MPI environment
MPI.Finalize()

In [None]:
%%sh

# run program
mpirun -np 4 python main_init.py

## MPI Communicators

It is possible to divide the total number of processes (tasks) into groups called **communicators**.

> ...A communicator specifies the communication context for a communication operation...


<div class="alert alert-block alert-info">The default communicator is called <code>MPI_COMM_WORLD</code> and includes <b>all</b> the tasks available to the program.</div>

<img src="./Images/COMM_WORLD.png" width=40% style="margin-left:auto; margin-right:auto">

The group inside the communicator must be a **finite size**.

<div class="alert alert-block alert-success"> The group is <b>ordered</b> and MPI processes are identified by their <b>RANK</b> within this group </div>

* `MPI_Comm_size`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_size.3.php
* `MPI_Comm_rank`: see https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_rank.3.php

### MPI in an image...now we know

<img src="./Images/mpi_idea.webp" width=90% style="margin-left:auto; margin-right:auto">

In [None]:
%%writefile main_communicators.cpp

#include <iostream>
#include <mpi.h>

int main(int argc, char **argv)
{
    int err;
    err = MPI_Init(&argc, &argv);
    
    int nprocs, my_rank;
    
    // Get the number of processes in MPI_COMM_WORLD
    err = MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    
    // Get the rank of this process in MPI_COMM_WORLD
    err = MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    std::cout<< "Hello I am process "<< my_rank<< " of "<< nprocs<< " processes"<< std::endl; 
    
    err = MPI_Finalize();
    
    return 0;
}

<div class="alert alert-block alert-success">
    <ol>
        <li> Remember that <b>every</b> process is running the same code independently 
        <li> At the end of the call, rank will have a <b>different</b> value for every process!
    </ol>
</div>

In [None]:
%%sh

# compile program
mkdir -p ./debug_communicators
cd debug_communicators
cmake -DSOURCES="main_communicators.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_communicators
mpirun -np 4 1_MPI_Basic

### Communicator Creation

Communicators can be created with different MPI routines:
* `MPI_Comm_split`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_split.3.php
* `MPI_Comm_dup`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_dup.3.php
* `MPI_Comm_create`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_create.3.php
* ...

<div class="alert alert-block alert-danger"> <b>REMARK</b>: Once a new communicator is created we shall <b>destroy</b> it before closing the application:</div>

* `MPI_Comm_free`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_free.3.php

<div class="alert alert-block alert-info"> <b>NOTE</b>: We are going to see more about communicators when we are going to see <b>Topologies</b>. </div>


### Example of MPI Comm Split

`MPI_Comm_split`: https://www.open-mpi.org/doc/v4.1/man3/MPI_Comm_split.3.php
<img src="./Images/comm_split.png" width=60% style="margin-left:auto; margin-right:auto">

In [None]:
%%writefile main_comm_split.cpp

#include <iostream>
#include <mpi.h>
int main(int argc, char **argv)
{
    int err;
    MPI_Init(&argc, &argv);
    
    int nprocs, my_rank;
    
    // Get the rank and size in the original communicator
    int world_rank, world_size;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    unsigned int color = world_rank / 4; // Determine color based on row
    
    // Split the communicator based on the color and use the
    // original rank for ordering
    MPI_Comm row_comm;
    err = MPI_Comm_split(MPI_COMM_WORLD, color, world_rank, &row_comm);
    
    int row_rank, row_size;
    MPI_Comm_rank(row_comm, &row_rank);
    MPI_Comm_size(row_comm, &row_size);
    
    std::cout<< "WORLD RANK/SIZE: "<< world_rank<< "/"<< world_size<< "\t";
    std::cout<< "ROW RANK/SIZE: "<< row_rank<< "/"<< row_size<< std::endl;
    
    // Free the communicator
    MPI_Comm_free(&row_comm);
    
    MPI_Finalize();
    
    return 0;
}

In [None]:
%%sh

# compile program
mkdir -p ./debug_comm_split
cd debug_comm_split
cmake -DSOURCES="main_comm_split.cpp" ..
make

In [None]:
%%sh

# run program
cd debug_comm_split
mpirun --oversubscribe -np 16 1_MPI_Basic