### A "Hello World" Example of MPI on Domino

In this example we compare using a distributed system on GPUs versus calculating pi on a single GPU.  For both instances we will use a simple inference to approximate pi.  Notice that there more than a sixty-fold reduction in time.  

MPI is a powerful tool used in supercomputing.  Try it on Domino with this 'hello world example'.  For more complex examples reach out to training@dominodatalab.com

In [1]:
import time

import math
import random
import time

def sample(num_samples):
    num_inside = 0
    for _ in range(num_samples):
        x, y = random.uniform(-1, 1), random.uniform(-1, 1)
        if math.hypot(x, y) <= 1:
            num_inside += 1
    return num_inside

def approximate_pi(num_samples):
    start = time.time()
    num_inside = sample(num_samples)
    end = time.time()
    
    print("pi ~= {}".format((4*num_inside)/num_samples))

In [2]:
%%time

approximate_pi(10**8)

pi ~= 3.1414932
CPU times: user 1min 23s, sys: 17.4 ms, total: 1min 23s
Wall time: 1min 23s


To calculate pi using MPI we will use the mpi4py library which is a python wrapper around MPI command line tasks.  Spawning is a method to distribute the calculations over several GPUs (or CPUs).

In [3]:
%%time

from mpi4py import MPI
import numpy
import sys

comm = MPI.COMM_SELF.Spawn(sys.executable,
                           args=['cpi.py'],
                           maxprocs=2)

N = numpy.array(100, 'i')
comm.Bcast([N, MPI.INT], root=MPI.ROOT)
PI = numpy.array(0.0, 'd')
comm.Reduce(None, [PI, MPI.DOUBLE],
            op=MPI.SUM, root=MPI.ROOT)
print(PI)

comm.Disconnect()

3.141600986923124
CPU times: user 180 ms, sys: 72.6 ms, total: 253 ms
Wall time: 1.17 s
