# Parallel Processing in Python

The idea here is to distribute processes over multiple machines or cores of single machines to achieve better performance. Depending on how many cores your computer has you will see smaller or larger gains.

You can get details [here](https://ipyparallel.readthedocs.io/en/latest/index.html)

Before you begin, be sure to:

    pip install pyparallel


In [18]:
import ipyparallel as ipp 
import numpy as np # for local execution

In [2]:
rc = ipp.Client()

In [3]:
rc.ids

[0, 1, 2, 3]

In [4]:
rc

<ipyparallel.client.client.Client at 0x7f9cb94d0d30>

In [5]:
dview = rc[:] # use all engines

In [6]:
dview.block = True # wait until functions finish before returning

In [19]:
dview.execute('import numpy as np, os')

<AsyncResult: execute:finished>

In [20]:
dview.execute('pid = os.getpid()')

<AsyncResult: execute:finished>

In [21]:
dview['pid']

[1212, 1213, 1215, 1217]

In [22]:
dview.scatter('a',[1,2,3,4,5,6,7,8,9,10])

In [23]:
dview['a']

[[1, 2, 3], [4, 5, 6], [7, 8], [9, 10]]

In [24]:
dview.gather('a')

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [25]:
dview.execute('a = np.sum(a)')

<AsyncResult: execute:finished>

In [26]:
dview['a']

[6, 15, 15, 19]

In [27]:
dview.gather('a')

[6, 15, 15, 19]

In [28]:
def calcPi_dumb(N):
    x = np.random.random(N)
    y = np.random.random(N)
    return ((x**2+y**2)<1).sum()*4/N

calcPi_dumb(1000)

3.12

In [29]:
%time calcPi_dumb(300000000)

CPU times: user 8.33 s, sys: 4.65 s, total: 13 s
Wall time: 16.4 s


3.1416860933333335

In [31]:
def calcPi_inMem(N=100,chunksize=10000):

    def parts(N, chunksize):
        begin = [chunksize]*int(N/chunksize)
        end = N % chunksize
        if end:
            begin += [end]
        return  begin

    sum = 0
    count = 0
    sizes = parts(N, chunksize)
    for s in sizes:
        x = numpy.random.random(s)
        y = numpy.random.random(s)
        sum += ((x**2+y**2)<1).sum()
        count += s
    assert(count == N)
    return sum*4/N


In [42]:
%time calcPi_inMem(1000000000,10000)

CPU times: user 18.4 s, sys: 171 ms, total: 18.6 s
Wall time: 18.6 s


3.141537164

In [35]:
def calcPi_parallel(N=100, chunksize=10000):
    results = dview.map(calcPi_inMem,[int(N/4)]*len(rc.ids))
    return np.average(results)

In [43]:
%time calcPi_parallel(1000000000)

CPU times: user 15 ms, sys: 2.66 ms, total: 17.6 ms
Wall time: 8.49 s


3.141541872