# Monte Carlo Estimate of $\pi$

<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg" 
     width="50%" 
     align=top
     alt="Dask logo">
<img src="https://upload.wikimedia.org/wikipedia/commons/b/ba/Monte-Carlo01.gif" 
     width="30%" 
     align=top
     alt="PI monte-carlo estimate">
     
Using [Dask's adaptivity](http://docs.dask.org/en/latest/setup/adaptive.html), we'll show that it is possible to scale the available resources to meet almost identical wall times irrespective of the acutal work load:

- Estimating $\pi$ from 16 GB of random data is done in 17 seconds using 3 workers (with 2 cores each).
- Estimating $\pi$ from 512 GB of random data is done in 19 seconds using 142 workers (with 2 cores each).
- Estimating $\pi$ from 1024 GB of random data is done in 21 seconds using 273 workers (with 2 cores each).

In [1]:
from dask_kubernetes import KubeCluster
cluster = KubeCluster(n_workers=1)

In [2]:
# check Adaptive? for help on adapt's kwargs.
from dask.distributed import Adaptive

In [3]:
cluster.adapt(minimum=1, maximum=400,
              target_duration="20s",  # more realistic than the default "5s"?
              wait_count=10,  # 10 seconds before killing an idle worker
              scale_factor=1.2);  # scale slower than doubling (default)

In [4]:
from dask.distributed import Client
c = Client(cluster)
c

0,1
Client  Scheduler: tcp://10.23.27.5:37004  Dashboard: /user/willirath/proxy/8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


(Check the dash board to see the cluster scale up and down!)

In [5]:
import dask.array as da
import numpy as np
from time import time

def calc_pi_mc(size):
    xy = da.random.uniform(0, 1, size=(size / 8 / 2, 2), chunks=(0.25e9 / 8, 2))
    
    in_circle = ((xy ** 2).sum(axis=-1) < 1)
    pi = 4 * in_circle.mean()

    start = time()
    pi = pi.compute()
    end = time()
    
    num_pods = len(cluster.pods())
    
    print("Size of data:", xy.nbytes / 1e9, "GB")
    print("Monte-Carlo pi:", pi)
    print("Numpys pi:", np.pi)
    print("Delta:", abs(pi - np.pi))
    print("Duration: {:.2f} seconds with {} pods".format(end-start, num_pods))
    print()

In [6]:
from time import sleep

for size in [1e9 * 2 ** n for n in range(11)]:
    
    calc_pi_mc(size)
    sleep(10)  # allow for some scale-down time

Size of data: 1.0 GB
Monte-Carlo pi: 3.141738048
Numpys pi: 3.141592653589793
Delta: 0.0001453944102070004
Duration: 4.68 seconds with 1 pods

Size of data: 2.0 GB
Monte-Carlo pi: 3.1416384
Numpys pi: 3.141592653589793
Delta: 4.574641020704817e-05
Duration: 5.31 seconds with 1 pods

Size of data: 4.0 GB
Monte-Carlo pi: 3.141615792
Numpys pi: 3.141592653589793
Delta: 2.3138410206957616e-05
Duration: 7.91 seconds with 2 pods

Size of data: 8.0 GB
Monte-Carlo pi: 3.141654136
Numpys pi: 3.141592653589793
Delta: 6.148241020698109e-05
Duration: 10.73 seconds with 3 pods

Size of data: 16.0 GB
Monte-Carlo pi: 3.141506724
Numpys pi: 3.141592653589793
Delta: 8.592958979303233e-05
Duration: 17.35 seconds with 3 pods

Size of data: 32.0 GB
Monte-Carlo pi: 3.141638062
Numpys pi: 3.141592653589793
Delta: 4.5408410207059546e-05
Duration: 12.77 seconds with 12 pods

Size of data: 64.0 GB
Monte-Carlo pi: 3.141572989
Numpys pi: 3.141592653589793
Delta: 1.9664589792967035e-05
Duration: 19.20 seconds wit