# Distributed memory parallelism with mpi4py

## mpi4py basics

In MPI for Python, `Comm` is the base class of communicators. 

The two available predefined intracommunicator instances are `COMM_SELF` and `COMM_WORLD`. 

The number of processes in a communicator and the calling process rank can be respectively obtained with methods `Get_size()` and `Get_rank()`.

In [17]:
%%writefile communicator.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

print('Number of processes is %i.' %size)
print('Hello, I am process %i.' % rank)

Overwriting communicator.py


To run an MPI-enabled Python application, one can use command **`mpirun -np .. python3 myprog.py`**, where users can specify how many processes MPI should start. 

The `mpirun` command below starts two-processes to run the `communicator.py` script. Each process gets the total number of processes and its own rank number.

In [18]:
!mpirun -np 2 --oversubscribe python3 communicator.py

Number of processes is 2.
Hello, I am process 0.
Number of processes is 2.
Hello, I am process 1.


When we run this program with ```mpirun```, the code is executed by all processors in the communicator. 
We need somehow to differentiate the work if we want to use efficiently MPI.

From the example above, try to make process ```rank == 0``` to print ```Hoi. Ik ben process 0``` instead.
You can use python syntax!!!

In [4]:
%%writefile communicator_mod.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

# continue from here


Writing communicator_mod.py


In [5]:
!mpirun --oversubscribe -np 2 python3 communicator_mod.py

Hoi. Ik ben process 0


To look up the communication function definition, one can use `help(...)` as shown below.

In [6]:
from mpi4py import MPI
help(MPI.COMM_WORLD.Get_rank)

Help on built-in function Get_rank:

Get_rank(...) method of mpi4py.MPI.Intracomm instance
    Comm.Get_rank(self)
    
    Return the rank of this process in a communicator



In [7]:
from mpi4py import MPI
MPI.COMM_WORLD.Get_rank??

## Collective Communications

Collective communications allow the communication of data between multiple processes of a group simultaneously. 
Collective functions come in blocking versions only.
![mpi_coll_com](https://hpc-tutorials.llnl.gov/mpi/images/collective_comm.gif)

### Broadcast

In [19]:
%%writefile broadcast.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
print("Hello")
if comm.rank == 0:
    data = [1,2,3,4]
else:
    data = None

data = comm.bcast(data)
print("rank:", comm.rank, "data:", data)

Overwriting broadcast.py


In [20]:
!mpirun -np 4 --oversubscribe python3 broadcast.py

Hello
Hello
rank: 0 data: [1, 2, 3, 4]
rank: 1 data: [1, 2, 3, 4]
Hello
rank: 2 data: [1, 2, 3, 4]
Hello
rank: 3 data: [1, 2, 3, 4]


### Scatter

In [21]:
%%writefile scatter.py
from mpi4py import MPI
comm = MPI.COMM_WORLD

if comm.rank == 0:
    data = [1,2,3,4]
else:
    data = None

data = comm.scatter(data)
print("rank:", comm.rank, "data:", data)

Overwriting scatter.py


In [22]:
!mpirun --oversubscribe -np 4 python3 scatter.py

rank: 0 data: 1
rank: 1 data: 2
rank: 2 data: 3
rank: 3 data: 4


### Gather

In [23]:
%%writefile gather.py
from mpi4py import MPI
comm = MPI.COMM_WORLD

data = comm.rank
gathered_data = comm.gather(data, root=0)

if comm.rank == 0:
    print("rank:", comm.rank, "data:", data)
    print("rank:", comm.rank, "data:", gathered_data)
else:
    print("rank:", comm.rank, "data:", data)
    print("rank:", comm.rank, "gathered data:", gathered_data)


Overwriting gather.py


In [24]:
!mpirun --oversubscribe -np 4 python3 gather.py

rank: 3 data: 3
rank: 3 gathered data: None
rank: 2 data: 2
rank: 2 gathered data: None
rank: 0 data: 0
rank: 0 data: [0, 1, 2, 3]
rank: 1 data: 1
rank: 1 gathered data: None


### Reduction

To explain the reduction we are going to use the "PI with Montecarlo" example.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/MonteCarloIntegrationCircle.svg/1024px-MonteCarloIntegrationCircle.svg.png" alt="drawing" width="400"/>

$$
    \pi \approx \frac{A_{circle}}{A_{square}} =>  \sum_{0}^{N_{iter}} \frac{N_{inside}}{N_{total}}   
$$

In [7]:
#Serial Example
import time 
from random import random 

count = 0
Niter = 10000000

t1 = time.time()
for i in range(0,Niter):
    (x,y) = (random(), random())
    if (x * x) + (y * y) <= 1.0:
        count += 1
t2 = time.time()

print(t2-t1)

pi = 4.0 * count / Niter
print("pi = {}".format(pi))
print("% error = {}".format(abs(pi - 3.14159265359)/3.14159265359*100))

3.4409689903259277
pi = 3.1417604
% error = 0.005339534067478378


In [8]:
%%writefile reduction_pi.py
from mpi4py import MPI
from random import random 
comm = MPI.COMM_WORLD

count = 0
Niter = 10000000

t1 = MPI.Wtime()
for i in range(0,Niter,comm.size):
    (x,y) = (random(), random())
    if (x * x) + (y * y) <= 1.0:
        count += 1

sum_count = comm.reduce(count, MPI.SUM)
t2 = MPI.Wtime()

if comm.rank == 0:
    print(t2-t1)
    pi = 4.0 * sum_count / Niter
    print("pi = {}".format(pi))
    print("% error = {}".format(abs(pi - 3.14159265359)/3.14159265359*100))

Overwriting reduction_pi.py


In [9]:
!mpirun --oversubscribe -np 3 python3 reduction_pi.py

1.248424767
pi = 3.1414944
% error = 0.0031275089050059957


We are interested in performances. Let's time it with Python.

## Point-to-Point Communications.

Point to point communication enables the transmission of data between a pair of processes, one side sending, the other reciving. MPI provides a set of *send* and *receive* functions allowing the communication of *typed* data with an associated *tag*.

### Blocking Communications

Blocking functions in MPI block the caller until the data buffers involved in the communication can be safely reused by the application program.

In [19]:
%%writefile sendrecv.py
from mpi4py import MPI
comm = MPI.COMM_WORLD

if comm.rank == 0:
    comm.send("Hello world", 1)

if comm.rank == 0:
    message = comm.recv()
    print("Rank 1 received '%s'" %
          message)

Overwriting sendrecv.py


In [20]:
!mpirun -np 2 --oversubscribe python3 sendrecv.py

^C


Let's now take a look at the Alice/Bob example

In [12]:
%%writefile alicebob.py
from mpi4py import MPI
comm = MPI.COMM_WORLD

# Alice; say Hello to Bob
if comm.rank == 0:
    comm.send("Hello Bob!", 1)
    mesg = comm.recv()
    print("Alice: Bob said {}".format(mesg))

# Bob; say Hello to Alice
if comm.rank == 1:
    comm.send("Hello Alice!", 0)
    mesg = comm.recv()
    print("Bob: Alice said {}".format(mesg))


Overwriting alicebob.py


In [11]:
!mpirun -np 2 --oversubscribe python3 alicebob.py

^C


MPI implementation usually eager-send small messages so ```MPI_Send()``` can return immediately, but this is an implementation choice not mandated by the standard, and "small" message depends on the library version, the interconnect you are using and other factors.

Trying changing the above ```comm.send``` with the secure ```comm.ssend``` and see what happens.

Can you solve the deadlock?

### Nonblocking Communications.

Nonblocking send and receive functions return immediately after *send/receive* operation. This means the process can continue to do something else, e.g. computation and check the status of the *send/receive* operation later.
This gives the possibility of overlapping communication and computation, such that the performance of the program can be increased.

In MPI, non-blocking communication is achieved using the `Isend` and `Irecv` methods. The Isend and Irecv methods initiate a send and receive operation respectively, and then return **immediately**.

These methods return a instance of the `Request` class, which uniquely identifys the started operation. The completion can then be managed using the `Test`, `Wait`, and `Cancel` methods of the `Request` class. The management of Request objects and associated memory buffers involved in c ommunication requires careful coordination. Users must ensure that objects exposing their memory buffers are not accessed at the Python level while they are involved in nonblocking message-passing operations.

In [27]:
%%writefile p2pisendirecv.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

start = MPI.Wtime()

if rank == 0:
    data = {'a': 7, 'b': 3.14} 
    req = comm.isend(data, dest=1, tag=11)
    #req.wait()
elif rank == 1:
    req = comm.irecv(source=0, tag=11)
    data = req.wait()

end = MPI.Wtime()
elapsed = end - start

print("Rank {}: Elapsed time is {} seconds.  Data is {}.".format(rank, elapsed, data))

Overwriting p2pisendirecv.py


In [28]:
!mpirun -np 2 --oversubscribe python3 p2pisendirecv.py

Rank 1: Elapsed time is 4.4991e-05 seconds.  Data is {'a': 7, 'b': 3.14}.
Rank 0: Elapsed time is 3.1976e-05 seconds.  Data is {'a': 7, 'b': 3.14}.


### Working with numpy array 

MPI for Python can communicate any built-in or user-defined Python object by using the Python pickle module under the hood.

It also supports direct communication of any object exporting the single-segment buffer interface (e.g. Numpy arrays) with negligible overhead.

The Python buffer protocol is a framework in which Python objects can expose raw byte arrays to other Python objects.  Using the buffer protocol, we can let multiple objects efficiently manipulate views of the same data buffers, without having to make copies of the often large datasets.

As seen in the above examples, communication of generic Python objects makes use of **all-lowercase** methods of the `Comm` class, i.e. `send()`, `recv()`, `isend()`, etc.

To communicate buffer-like objects, one has to use method names starting with an **upper-case** letter, like `Send()`, `Recv()`, `Bcast()`, etc.

In general, buffer arguments to these calls must be explicitly specified by using a tuple like ```[data, MPI.DOUBLE]```, or ```[data, MPI.INT]```.

Let's see two examples using both methods:

**Exercise:**

Modify `p2psendrecv.py` to communicate 1000 integers. How long does the communication take?

Compare the results with the ones obtained from `p2pnumpysendrecv.py` (the example above).

Which one is faster?

In [48]:
%%writefile p2psendrecv.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# initialize data
if rank == 0:
    data = [i for i in range(100000)]
if rank == 1: 
    data = [0 for i in range(100000)]

print('data sum for rank {} is: {}'.format(rank,sum(data)))

# measure communication time
start = MPI.Wtime()
if rank == 0:
    comm.send(data, dest=1, tag=11)
elif rank == 1:
    data = comm.recv(source=0, tag=11)

end = MPI.Wtime()
elapsed = end - start

print('data sum for rank {} is: {}'.format(rank,sum(data)))
print("Rank {} Elapsed time is {} seconds.".format(rank, elapsed))

Overwriting p2psendrecv.py


In [49]:
!mpirun -np 2 --oversubscribe python3 p2psendrecv.py

data sum for rank 1 is: 0
data sum for rank 0 is: 4999950000
data sum for rank 0 is: 4999950000
Rank 0 Elapsed time is 0.002038242 seconds.
data sum for rank 1 is: 4999950000
Rank 1 Elapsed time is 0.007116538 seconds.


In [51]:
%%writefile p2pnumpysendrecv.py
from mpi4py import MPI
import numpy

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# initialize data
if rank == 0:
    data = numpy.arange(100000, dtype='i')
elif rank == 1:
    data = numpy.empty(100000, dtype='i')

# measure communication time
start = MPI.Wtime()
if rank == 0:
    comm.Send([data, MPI.INT], dest=1, tag=77)
elif rank == 1:
    comm.Recv([data, MPI.INT], source=0, tag=77)
end = MPI.Wtime()
elapsed = end - start

print("Rank {} Elapsed time is {} seconds.".format(rank, elapsed))

Overwriting p2pnumpysendrecv.py


In [52]:
!mpirun -np 2 --oversubscribe python3 p2pnumpysendrecv.py

Rank 0 Elapsed time is 0.000228778 seconds.
Rank 1 Elapsed time is 0.000434174 seconds.
