#### Resources
Documentations
https://mpi4py.readthedocs.io/en/stable/tutorial.html

Princeton Computing (this includes how to do it  on real clusters, but if you are just running it on PC, you can just look at the example code here)
https://researchcomputing.princeton.edu/mpi4py

Other sources (these videos are great)
https://www.youtube.com/watch?v=CT9tqR7XeX0&list=PLQVvvaa0QuDf9IW-fe6No8SCw-aVnCfRi&index=14

### To install on PC
On linux: (tested on ubuntu 18.04)

easiest just do it through conda (or anaconda if you have installed the anaconda-navigator, go to environment page and just install the packages), 
first get openMPI, then get the mpi4py

for reference and alternative installation methods, check these links:

https://pypi.org/project/mpi4py/

https://pythonprogramming.net/installing-testing-mpi4py-mpi-python-tutorial/

https://anaconda.org/anaconda/mpi4py

#### To run code: (OPENMPI)
mpiexec -n 8 python Script.py (this will run Script.py on 8 slots in parallel)

### If running on a PC and want the program to run on threads instead of cores as "slot", use the command similiar to this example below

mpiexec --use-hwthread-cpus -n 16 python Script.py

instad of 

mpiexec -n 8 python Script.py

for an 8 core 16 threads CPU

check this links for more controls over the "slots":https://github.com/open-mpi/ompi/issues/6020

### Now, test if the block below works. (Hint, it should!)
running each block will generate a python script file. Then use the bash command given above to run it on how ever many slots you want, just changed the filenames you want to run.

In [2]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

if rank == 0:
   data = [(x+1)**x for x in range(size)]
   print ('we will be scattering:',data)
else:
   data = None
   
data = comm.scatter(data, root=0)
print ('rank',rank,'has data:',data)

Overwriting Script_Parallel_Testing.py


In [4]:
%%writefile Script_Parallel_Testing.py
# hello_mpi.py:
# usage: python hello_mpi.py

from mpi4py import MPI
import sys

def print_hello(rank, size, name): 
#rank = processor index, size = # of processors, name = cluster name
  msg = "Hello World! I am process {0} of {1} on {2}.\n"
  sys.stdout.write(msg.format(rank, size, name))

if __name__ == "__main__":
  size = MPI.COMM_WORLD.Get_size()
  rank = MPI.COMM_WORLD.Get_rank()
  name = MPI.Get_processor_name()

  print_hello(rank, size, name)

#printouts: 5 tasks, 1/cpu
#Hello World! I am process 1 of 5 on DESKTOP-HV1REH2.
#Hello World! I am process 3 of 5 on DESKTOP-HV1REH2.
#Hello World! I am process 2 of 5 on DESKTOP-HV1REH2.
#Hello World! I am process 0 of 5 on DESKTOP-HV1REH2.
#Hello World! I am process 4 of 5 on DESKTOP-HV1REH2.

Overwriting Script_Parallel_Testing.py


## Below are tutorials

Some of these examples are taken and modified from online examples, such as those given in the resources mentioned above.

### Basic Commands

In [118]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
comm = MPI.COMM_WORLD #can pull name, rank, size etc. from this. This object is the barebone basic of MPI
print('Hi, my rank is:', comm.rank)
if comm.rank == 1: #if this i the node 1
    print('doing the task for rank(node) 1')
elif comm.rank == 0: 
    print('doing the task for rank(node) 0')
elif comm.rank == 2:
    print('doing the task for rank(node) 2')

print('finishing node %i \n\n'%comm.rank)

Overwriting Script_Parallel_Testing.py


In [12]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI

comm = MPI.COMM_WORLD #can pull name, rank, size etc. from this
rank = comm.rank
size = comm.size #so that we dont need to hard code size into our code
print('Hi, my rank is:', rank)
print('rank^size is ',rank**size)
print('finishing node %i \n'%rank)

Overwriting Script_Parallel_Testing.py


### Passing data

In [30]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
import time
comm = MPI.COMM_WORLD
rank = comm.rank
size = comm.size
name = MPI.Get_processor_name() #the() at the end!

shared1 = 'shared 1'#(rank+1)*5 #the message to be shared
shared2 = 'shared 2'
if rank == 1:
    data_from_0 = comm.recv(source=0)
    data_from_0pt2 = comm.recv(source=0)
    print('on rank',rank,name,'we received data=',data_from_0,data_from_0pt2)
if rank == 0: #0 usually master node
    data_to_1 = shared1
    data_to_1pt2 = shared2
    time.sleep(2) #sleep for 2 senconds
    comm.send(data_to_1,dest=1) #sharing data from R0 to R1
    comm.send(data_to_1pt2,dest=1)
    #comm.send(DATA_TO_BE_SENT,DESTINATION_RANK_#)
    print('from rank',rank,name,'we sent data=',data_to_1,data_to_1pt2)

####PRINTOUTS
#from rank 0 DESKTOP-HV1REH2 we sent data= shared 1 shared 2
#on rank 1 DESKTOP-HV1REH2 we received data= shared 1 shared 2

Overwriting Script_Parallel_Testing.py


- So regardless if we code receive or send first, the receiver will wait
The data will be received in order. 
- If a node is waiting for a data thta is never sent, then the task will run indefinitely, with full CPU Utilization (more on tagging below)

### Dynamically send and receive

In [42]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
import time
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank
size = comm.size
name = MPI.Get_processor_name()

shared = [rank,rank**2*np.pi]
comm.send(shared,dest = (rank+1)%size) #send to the next worker/node
data = comm.recv(source = (rank-1)%size) #receive from the previous worker/node
print(rank, name)
print('sent data = ', shared, ' to rank ',(rank+1)%size)
print('received = ', data, ' from rank ',(rank-1)%size)

Overwriting Script_Parallel_Testing.py


### Tagging
compare the two below

In [122]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank

if rank == 0:
    shared1 = {'d1':1,'d2':2}
    comm.send(shared1,dest = 1)
    shared2 = {'d1':999,'d2':9999}
    comm.send(shared2,dest = 1)
if rank == 1:
    receive1 = comm.recv(source=0)
    print(receive1)
    print(receive1['d1'])
    receive2 = comm.recv(source=0)
    print(receive2)
    print(receive2['d1'])

Writing Script_Parallel_Testing.py


In [53]:
%%writefile Script_Parallel_Testing.py
# But if we dont know the order of the shared msg that are sent out
# and we still want to differentiate them?
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank

if rank == 0:
    shared1 = {'d1':1,'d2':2}
    comm.send(shared1, dest=1, tag=1)
    shared2 = {'d1':999,'d2':9999}
    comm.send(shared2, dest=1, tag=2)
if rank == 1:
    receive2 = comm.recv(source=0, tag=2)
    print(receive2)
    print(receive2['d1'])
    receive1 = comm.recv(source=0, tag=1)
    print(receive1)
    print(receive1['d1'])

Overwriting Script_Parallel_Testing.py


### BroadCasting, the msg is sent to all the nodes

In [72]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank

if rank == 0: #create data onn master node
    data_0 = {'a':1,'b':2,'c':3}
else:
    data_0 = None #set data = None on all other nodes
    print('rank',rank,'before receiving, has ',data_0)
data_0 = comm.bcast(data_0, root=0)


print('rank ',rank,' has data = ',data_0)
if rank == 0:
    print('this is rank 0, I have both', data_0,data_0)

Overwriting Script_Parallel_Testing.py


In [87]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank

if rank == 0: #create data onn master node
    data_0 = {'a':1,'b':2,'c':3}
else:
    #set data = None on all other nodes
    data_0 = None 
    #this is very important!! 
    #the argument variable in bcast need to be declared on all nodes before Bcast
    pass
print('before Bcast, rank',rank,'has data',data_0)

data_0 = comm.bcast(data_0,root=0)
print('after Bcast, rank',rank,'has data',data_0)

Overwriting Script_Parallel_Testing.py


### Scatter and Gather

Scatter: take a list, explode it in pieces to nodes

then 'gather' gathers the exploded pieces of information and reassembles them

In [123]:
%%writefile Script_Parallel_Testing.py
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

if rank == 0:
    data = np.array([(x+1) for x in range(size)]) #so at least one obj per node
    #data = np.append(data,[9,9,9,9]) #this wont work, can only scatter # = size for this command
    print('we will be scattering:',data)
else:
    data = None

data_scat = comm.scatter(data,root=0)
print('rank',rank,'has scattered data = ', data_scat)
if rank == 0:
    print('this is rank 0')
    print('data = ',data)
    print('data_scat = ',data_scat)

data_scat = data_scat*2
dataNew = comm.gather(data_scat,root=0)
if rank == 0:
    print('this is rank 0, dataNew = ', dataNew)
print(rank,'check',dataNew)#for other non-0 nodes, this will report None
print(rank,'check',data_scat)#the nodes still retained the gathered data 

Overwriting Script_Parallel_Testing.py


### CPU LOAD TESTINGS
cat /proc/cpuinfo | grep processor | wc -l 
gives # of cores but not nec. cpu

max slots can get for example on a Ryzen 3700X are 8 if in cores, although it has 16 Hthreads which you can call by the second command below

#### Normally:
mpiexec -n 8 python CPU_Load_5s_Testing.py

#### If wanting to run a task/HThread, then use the command:
mpiexec --use-hwthread-cpus -n 16 python CPU_Load_5s_Testing.py 

In [124]:
%%writefile CPU_Load_5s_Testing.py 
#this is about 5 sec, but depends on the computer you are working on of course. Although should not be too long
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank
for i in range(500000000):
    if i>=-1:
        pass
print('rank = ',rank,'done')

Overwriting CPU_Load_5s_Testing.py
