# Introduction To Message Passing Interface (MPI) Using Python And MPI4Py Library

## Introduction

Serial processing is a type of programming where one task is completed at a time and all the tasks are executed by a single processor in a sequential order. However, for applications involving large-scale Physics and Mathematics, a single processor do not provide sufficient performance. In those cases, one needs to make use of a framework which is capable of running applications across multiple processors simultaneously. Message Passing Interface (MPI) framework is particularly designed to work on a variety of parallel computing architectures with high performance. MPI is a standardized and portable message-passing system primarily written in C, C++ and Fortran languages. MPI4Py library in Python provides bindings of the MPI standard for the Python programming language, allowing any Python program to exploit multiple processors.

In this blog post, we learn the basics of MPI programming using MPI4Py library and show how programs can be written and executed. First let us install MPI4Py library.

## Installation 

1. Creating A Conda Environment

```shell
$ conda create --name Parallel-Programming
```

2. Activating An Environment

```shell
$ conda activate Parallel-Programming
```
For more information on managing the conda environments refer [**Managing environments**](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). 

<div class="alert alert-block alert-info">
    <b> Note: </b> When running a parallel program, the environment Parallel-Programming must be always active.
</div>
 

3. Installing MPI4Py

```shell
$ conda install -c conda-forge mpi4py
```

For other installation options visit [**Anaconda.org**](https://anaconda.org/conda-forge/mpi4py).

4. Check The Installation 

On typing the command <span style="color:DimGray;background-color: #F0FFFF">which mpirun</span>, you must get the following output. This ensures successfull installation.

```shell
$ which mpirun
~/anaconda/envs/Parallel-Programming/bin/mpirun
```

## Running Python Scripts Using MPI

Python programs that use MPI commands must be run using an MPI interpreter, which is provided with the command mpirun. 

1. Activating The Conda Environment

```shell
$ conda activate Parallel-Programming
```

2. Running The Program

```shell
$ mpirun -n 4 python script.py
```

Here the <span style="color:DimGray;background-color: #F0FFFF">-n 4</span> tells MPI to use four processes, which is the number of cores on my laptop. The command <span style="color:DimGray;background-color: #F0FFFF">python script.py</span> tells MPI to run the python script named script.py.

If you are running this on a desktop computer, then you should adjust the -n argument to be the number of cores on your system or the maximum number of processes needed for your job, whichever is smaller. Or on a large cluster you would specify the number of cores that your program needs or the maximum number of cores available on the particular cluster.

## Hello World Program

Here is a basic hello world program using MPI. 

```python
from mpi4py import MPI
print("hello world")
```


Type it in a text file named <span style="color:DimGray;background-color: #F0FFFF">hello-world.py</span> and execute the program as follows:

```shell
$ mpirun -n 4 python hello-world.py
hello world
hello world
hello world
hello world
$
```

Here we are running the program using four processes. Therefore, when we run the mpirun command, the program gets copied to four different processes. From then on, each process runs the seperate local version of the program independently. It will first import the MPI library and then execute the print statement resulting in each process printing out "hello world" as directed. Since the terminal output from every program is directed to the same terminal window, we see four lines saying "hello world".

In the above program each processor does the same task. Therefore, we cannot tell which "hello world" line was printed by which process. To identify a process we need to assign some kind of process ID to each process. MPI  provides functions that let the process determine its process ID, as well as the number of processes that have been created. By default, MPI assigns an integer to each process beginning with 0 and incrementing each time a new process is called. A process ID is called the "rank" of the process.

Let us re-write the hello world program such that each process prints their own rank and the process with rank = 0, prints out the total number of processes.

```python
from mpi4py import MPI

comm = MPI.COMM_WORLD  # communicator
size = comm.Get_size() # save the total number of processes in size variable
rank = comm.Get_rank() # save the rank of each process in rank variable
print("hello world from process-%d "%rank)

if (rank == 0):
    print("There are %d processes in total"%size)
```
The output is as follows:

```shell
$ mpirun -n 4 python hello-world.py
hello world from process-0 
There are 4 processes in total
hello world from process-1 
hello world from process-3 
hello world from process-2 
$
```

Note that the process numbers are not printed in any order. Every time the program is executed, this order can change. That is because the processes execute independently and execution order is not controlled in any way in this particular program.