# Python tools for high-performance computing applications

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

## multiprocessing

Python has a built-in process-based library for concurrent computing, called `multiprocessing`. 

The multiprocessing module has a major limitation when it comes to IPython use:

Functionality within this package requires that the __main__ module be importable by the children. [...] This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter. [from the documentation]

Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.

Just install multiprocess and replace multiprocessing with multiprocess in your imports:

import multiprocess as mp

def f(x):
    return x*x

with mp.Pool(5) as pool:
    print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.

<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

In [2]:
import multiprocess as multiprocessing
import os
import time
import numpy

In [3]:
def task(args):
    print("PID =", os.getpid(), ", args =", args)
    
    return os.getpid(), args

In [4]:
task("test")

PID = 97950 , args = test


(97950, 'test')

In [5]:
pool = multiprocessing.Pool(processes=4)

PID =PID =PID =PID =    97954979569795597953    , args =, args =, args =, args =    2341



PID =PID =PID =PID =    97954979559795397956   , args = , args =, args = , args =  56 
7
8



In [6]:
result = pool.map(task, [1,2,3,4,5,6,7,8])

In [7]:
result

[(97953, 1),
 (97954, 2),
 (97955, 3),
 (97956, 4),
 (97954, 5),
 (97955, 6),
 (97953, 7),
 (97956, 8)]

The multiprocessing package is very useful for highly parallel tasks that do not need to communicate with each other, other than when sending the initial data to the pool of processes and when and collecting the results. 

In [8]:
def mc_pi(n):
    count = 0
    rvec = numpy.random.random((n,2))
    r = numpy.sum(rvec**2, axis=1)
    inside = numpy.sum(r < 1.)
    return 4. * inside / float(n)

In [9]:
Nmc = 100000
time0 = time.time()
print(mc_pi(Nmc))
time1 = time.time()
print(f"Took {time1-time0} seconds")

3.1334
Took 0.010711193084716797 seconds


In [None]:
nproc = 4
est_pis = pool.map



(mc_pi, [Nmc//nproc]*nproc)
time0 = time.time()
print(numpy.sum(est_pis)/nproc)
time1 = time.time()
print(f"Took {time1-time0} seconds")