## Calculate $\pi$ with IPython Parallel

A simple Monte Carlo simulation to approximate the value of  $\pi$  involves randomly selecting points in a unit square and determining how many of them land in $x^2+y^2=1$. 



Start the ipcluster in a terminal with 4 engines:
```
    $ source pythonhpc.sh 
    $ ipcluster start --n 4 &
```
Then import ipyparallel in your notebook, and initialize a Client instance:

In [None]:
import ipyparallel as ipp
client = ipp.Client()
client

Create a DirectView object for direct execution on the engines:

In [None]:
dview = client[:]
dview

Parallelize the evaluation of $pi$ using a Monte Carlo method. First load modules, and export the random module to the engines: 

In [None]:
from random import random
from math import pi
dview['random'] = random

The function monte_carlo_pi is a Monte Carlo method to calculate $\pi$. We will time the execution of this function using %timeit -n 1 and a sample size of 10 million. This is running in serial.

In [None]:
def monte_carlo_pi(nsamples):
    s = 0
    for i in range(nsamples):
        x = random()
        y = random()
        if x*x + y*y <= 1:
            s+=1
    return 4.*s/nsamples

In [None]:
%%timeit -n 1 
monte_carlo_pi(10000000)
# should take a couple of seconds per timeit trial 

<mark>Excercise</mark>: The incomplete function below should take a DirectView object and a number of samples, divide the number of samples between the engines, and call monte_carlo_pi() with a subset of the samples on each engine. 

Complete the parallel function (by replacing the "TO_DO" fields), call it with $10^7$ samples, time it and compare the performance with the serial version.

In [None]:
def multi_monte_carlo_pi(view, nsamples):
    p = len(<TO_DO>.targets)
    if nsamples % p:
        # to ensure even divisibility
        nsamples += p - (nsamples%p)
    
    subsamples = <TO_DO>//p
    
    ar = view.<TO_DO>(monte_carlo_pi, <TO_DO>)
    return sum(ar)/<TO_DO>

In [None]:
%%timeit 
multi_monte_carlo_pi(dview, 10000000)

<mark>Question</mark> Confirm that the results of the serial and parallel pi calculation are correct.

<mark>Question</mark> Did you see any speedup? 

Add another 2 engines by running 

```
$ ipengine &
```

twice in your terminal. Ensure that your view includes the additional engines. Run the parallel $\pi$ calculation again. Do you see any further speedup? 

<mark>Question</mark> Finally add another 2 engines and test again. Have you reached the limit of scalability on this toy problem?
