# Parallel Programming 

## MPI

*Flesh out introduction here*
- mpi_rank and mpi_size 
- scattered_arrays
- reduced arrays
- play with sizes and timing. 
- create a graph with time vs array size
- create a graph with time vs MPI tasks

In [1]:
from mpi4py import MPI 
import numpy as np 
import timeit


comm = MPI.COMM_WORLD
mpi_size = comm.Get_size()
mpi_rank = comm.Get_rank()
array_size = 100

def add_cpu():
    a = np.arange(array_size)    
    a_scattered = np.zeros_like(a)
    a_reduced = np.zeros_like(a)
    
    comm.Scatter(a, a_scattered, root=0)
    comm.Reduce(a_scattered, a_reduced, op=MPI.SUM, root=0)
    return np.sum(a_reduced)

total_time = timeit.timeit(add_cpu)
print(f'Total time for MPI addition is: {total_time}')

Total time for MPI addition is: 7.157103275999816


In [3]:
import matplotlib.pyplot as plt 

array_sizes = [100, 500, 1000, 10000, 50000, 1000000]
times = []

for size in array_sizes:
    times.append(timeit.timeit(add))
    
plt.style.use('bmh')
plt.plot(array_sizes, times)
plt.show()

## CUDA

*Flesh out introduction here*
- Array on CPU, and array on GPU.
- Time vs array size. 

In [2]:
import cupy as cp 
import timeit 

array_size = 100

def add_gpu():
    a = cp.arange(array_size) 
    return cp.sum(a)   

total_time = timeit.timeit(add_gpu)
print(f'Total time for GPU addition is: {total_time}')

Total time for GPU addition is: 61.88499339000009


In [None]:
import matplotlib.pyplot as plt 

array_sizes = [100, 500, 1000, 10000, 50000, 1000000]
times = []

for size in array_sizes:
    times.append(timeit.timeit(add_gpu))
    
plt.style.use('bmh')
plt.plot(array_sizes, times)
plt.show()

*Other operations maybe: scalar multiply?*