# Parallel computing in the notebook

We can use the IPython `ipyparallel` environment for parallel computing right in the notebook.

[Read the docs.](https://ipyparallel.readthedocs.io/en/latest/intro.html)

[See also.](https://github.com/ResearchComputing/jupyter-at-rc/wiki/Parallel-Programming-with-Jupyter-Notebooks)

## Getting started

[To install](https://ipyparallel.readthedocs.io/en/latest/index.html):

    conda install ipyparallel
    ipcluster nbextension enable
    
If that doesn't work, try doing `conda install jupyter` first.

In [1]:
import numpy as np
import ipyparallel as ipp

## A little demo

In [2]:
c = ipp.Client()
c.ids

[0, 1, 2, 3, 4, 5, 6, 7]

If this doesn't work, the cluster is not running. One of these should work:

- Go to the Clusters tab in Jupyter and start the cluster from there. (For some reason this doesn't currently work for me.)
- Go to the command line and type this:

      ipcluster start -n 6
    
(Or substitute the number of cores you want.)

In [3]:
# DirectView
dview = c[:]

In [4]:
dview.apply_sync(lambda: "Hello world")

['Hello world',
 'Hello world',
 'Hello world',
 'Hello world',
 'Hello world',
 'Hello world',
 'Hello world',
 'Hello world']

Let's now try mapping a math function (as a `lambda`) to all the numbers up to 10 million:

In [13]:
%timeit list(map(lambda x: (x**3.14159)**0.5, range(int(1e7))))

23.7 s ± 304 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [14]:
%timeit list(dview.map_sync(lambda x: (x**3.14159)**0.5, range(int(1e7))))

13.9 s ± 448 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


With 8 cores, that's a reduction in time of about 40%, from ca. 2.4 s on one core to 1.4 on eight.

## `%px` magic

We can do parallel execution easily with a magic:

In [15]:
with c[:].sync_imports():
    import numpy

importing numpy on engine(s)


In [16]:
%px a = numpy.random.rand(4, 4)

In [20]:
%px a

[0;31mOut[0:3]: [0m
array([[ 0.03701927,  0.53774334,  0.44958466,  0.89284712],
       [ 0.16523111,  0.50169926,  0.96849713,  0.35176255],
       [ 0.25380272,  0.81994303,  0.28031714,  0.16830454],
       [ 0.84968901,  0.27967622,  0.90894912,  0.65984922]])

[0;31mOut[1:3]: [0m
array([[ 0.07710678,  0.62904108,  0.03239626,  0.61971934],
       [ 0.99078508,  0.45049191,  0.49148635,  0.33234115],
       [ 0.45242296,  0.47661924,  0.04298056,  0.5380677 ],
       [ 0.46871512,  0.1392049 ,  0.82761392,  0.92057602]])

[0;31mOut[2:3]: [0m
array([[ 0.47755782,  0.7779398 ,  0.85881187,  0.46616464],
       [ 0.36717042,  0.74342638,  0.32894222,  0.76166346],
       [ 0.32240941,  0.49473283,  0.21707251,  0.31168806],
       [ 0.99247435,  0.04802051,  0.5545358 ,  0.17899133]])

[0;31mOut[3:3]: [0m
array([[ 0.73110534,  0.00139245,  0.19784837,  0.47421884],
       [ 0.31537955,  0.77746832,  0.34500844,  0.04950764],
       [ 0.54524569,  0.9087478 ,  0.24922474,  0.18884121],
       [ 0.51323712,  0.94718406,  0.25388826,  0.07001268]])

[0;31mOut[4:3]: [0m
array([[ 0.77158134,  0.69796627,  0.97938771,  0.42326031],
       [ 0.64858394,  0.70046635,  0.57153017,  0.20946679],
       [ 0.24345341,  0.7238011 ,  0.6396564 ,  0.45897476],
       [ 0.7651639 ,  0.56153819,  0.74853815,  0.91700093]])

[0;31mOut[5:3]: [0m
array([[ 0.1920007 ,  0.38059109,  0.1287843 ,  0.80417254],
       [ 0.84958604,  0.64542689,  0.80368946,  0.32733817],
       [ 0.73366387,  0.16158646,  0.80335254,  0.07140604],
       [ 0.3137122 ,  0.86613477,  0.62957465,  0.55042083]])

[0;31mOut[6:3]: [0m
array([[ 0.37385365,  0.29406662,  0.00637818,  0.05550585],
       [ 0.41579396,  0.30873879,  0.92848221,  0.12675799],
       [ 0.5769089 ,  0.2849222 ,  0.19418576,  0.38765198],
       [ 0.12798724,  0.80824266,  0.95212294,  0.8691391 ]])

[0;31mOut[7:3]: [0m
array([[ 0.75582248,  0.28577498,  0.50809606,  0.98393874],
       [ 0.33922415,  0.613402  ,  0.14156422,  0.27024774],
       [ 0.93938644,  0.64325037,  0.4486949 ,  0.13230209],
       [ 0.90007662,  0.96159618,  0.92573198,  0.05786313]])

In [17]:
%px numpy.linalg.eigvals(a)

[0;31mOut[0:2]: [0m
array([ 1.98623425+0.j        ,  0.57268100+0.j        ,
       -0.54001518+0.19381252j, -0.54001518-0.19381252j])

[0;31mOut[1:2]: [0marray([ 1.91343718,  0.46218753, -0.66650623, -0.2179632 ])

[0;31mOut[2:2]: [0m
array([ 1.99547203+0.j      , -0.08917408+0.437758j, -0.08917408-0.437758j,
       -0.20007584+0.j      ])

[0;31mOut[3:2]: [0m
array([ 1.56988099+0.j        ,  0.48634515+0.j        ,
       -0.11420753+0.02267252j, -0.11420753-0.02267252j])

[0;31mOut[4:2]: [0m
array([ 2.45181526+0.j        ,  0.04391217+0.26334858j,
        0.04391217-0.26334858j,  0.48906542+0.j        ])

[0;31mOut[5:2]: [0m
array([ 2.02744443+0.j        , -0.02028533+0.49720693j,
       -0.02028533-0.49720693j,  0.20432719+0.j        ])

[0;31mOut[6:2]: [0m
array([ 1.63231000+0.j        ,  0.49537134+0.j        ,
       -0.19088202+0.32879812j, -0.19088202-0.32879812j])

[0;31mOut[7:2]: [0m
array([ 2.20548965+0.j        ,  0.37495345+0.j        ,
       -0.35233030+0.34227802j, -0.35233030-0.34227802j])

## Function decorator

In [21]:
np.random.seed(1)
layers = np.random.random(int(1e6))

In [22]:
#@dview.parallel()  ## See remark below about this decorator.
def compute_rc(layers):
    """
    Computes reflection coefficients given
    a list of layer impedances.
    """
    uppers = layers[:-1]
    lowers = layers[1:]
    rcs = []
    for pair in zip(lowers, uppers):
        rc = (pair[1] - pair[0]) / (pair[1] + pair[0])
        rcs.append(rc)
    return rcs

In [23]:
def compute_rc_vector(layers):
    layers = np.array(layers)
    uppers = layers[:-1]
    lowers = layers[1:]
    return (lowers - uppers) / (uppers + lowers)

#### list, serial

In [24]:
%timeit compute_rc(layers)

444 ms ± 5.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### list, parallel

In [25]:
# NB This is the same as using @dview.parallel() to decorate the
# original function when we defined it, as shown in that block.
compute_rc_parallel = dview.parallel()(compute_rc)

In [26]:
%timeit compute_rc_parallel(layers)

17.1 ms ± 632 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#### ndarray, serial

In [27]:
%timeit compute_rc_vector(layers)

The slowest run took 12.69 times longer than the fastest. This could mean that an intermediate result is being cached.
76.3 ms ± 71.6 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


#### ndarray, parallel

In [28]:
compute_rc_vector_parallel = dview.parallel()(compute_rc_vector)

In [30]:
%timeit compute_rc_vector_parallel(layers)

19.7 ms ± 3.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Parallel list comprehension

Via `scatter` and `gather`. [From the docs](https://ipyparallel.readthedocs.io/en/latest/multiengine.html#scatter-and-gather):

> Sometimes it is useful to partition a sequence and push the partitions to different engines. In MPI language, this is know as scatter/gather and we follow that terminology [...] `scatter()` is from the interactive IPython session to the engines and `gather()` is from the engines back to the interactive IPython session.

We start by scattering the iterable (notice that we have to call list on everything because everything is lazily executed):

In [5]:
dview.scatter('y', range(16))

# And look at it:
list(dview['y'])

[range(0, 2),
 range(2, 4),
 range(4, 6),
 range(6, 8),
 range(8, 10),
 range(10, 12),
 range(12, 14),
 range(14, 16)]

Now we can compute with the pieces, using the 'parallel execution' magic, `%px`:

In [6]:
%px z = [(i**3.14159)**0.5 for i in y]
list(dview['z'])

[[0.0, 1.0],
 [2.970683691519495, 5.616421346404785],
 [8.824961595059897, 12.529639852302871],
 [16.68461129846666, 21.255717809934282],
 [26.216169488730305, 31.544188740351338],
 [37.22159676984888, 43.23290542839072],
 [49.564702683696815, 56.20521948320366],
 [63.14401424951226, 70.37173672923794]]

Now cast back out to a Python sequence.

In [7]:
z = dview.gather('z')
list(z)

[0.0,
 1.0,
 2.970683691519495,
 5.616421346404785,
 8.824961595059897,
 12.529639852302871,
 16.68461129846666,
 21.255717809934282,
 26.216169488730305,
 31.544188740351338,
 37.22159676984888,
 43.23290542839072,
 49.564702683696815,
 56.20521948320366,
 63.14401424951226,
 70.37173672923794]