<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg" 
     width="30%" 
     align=right
     alt="Dask logo">

Custom Workloads
-------------------------

*Because not all problems are dataframes*

This notebook shows using [dask.delayed](http://dask.pydata.org/en/latest/delayed.html) to parallelize generic Python code.  

Dask.delayed is a simple and powerful way to parallelize existing code.  It allows users to delay function calls into a task graph with dependencies.  Dask.delayed doesn't provide any fancy parallel algorithms like Dask.dataframe, but it does give the user complete control over what they want to build.

Systems like Dask.dataframe are built with Dask.delayed.  If you have a problem that is paralellizable, but isn't as simple as just a big array or a big dataframe, then dask.delayed may be the right choice for you.

### Normal Python code

These aren't exciting functions, but hopefully you can see how your functions for your workflow might fit in here.

In [1]:
from time import sleep

def inc(x):
    from random import random
    sleep(random())
    return x + 1

def dec(x):
    from random import random
    sleep(random())
    return x - 1
    
def add(x, y):
    from random import random
    sleep(random())
    return x + y 

### Run sequentially

In [2]:
%%time
x = inc(1)
y = dec(2)
z = add(x, y)
z

CPU times: user 989 µs, sys: 101 µs, total: 1.09 ms
Wall time: 1.68 s


In [3]:
import jupyterlab
jupyterlab.__version__

'0.31.1'

### Annotate Normal Python functions with Dask

These now become lazy versions.  Rather than compute the result immediately, they record what we want to compute and stick that task into a graph that we'll run later on parallel hardware.

In [None]:
import dask
inc = dask.delayed(inc)
dec = dask.delayed(dec)
add = dask.delayed(add)

Calling these lazy functions is now almost free.  We're just constructing a graph

In [None]:
x = inc(1)
y = dec(x)

x = dec(1)
y = inc(x)

In [None]:
%%time
x = inc(1)
y = dec(2)
z = add(x, y)
z

### Visualize computation

In [None]:
z.visualize(rankdir='LR')

### Execute with threads on our local machine

In [None]:
%%time
z.compute()

### Connect to a cluster and run there

We connect to our cluster.  Now rather than run locally, all of our computations will happen on our cluster.

In [None]:
from dask.distributed import Client, progress
c = Client('localhost:8786')
c

In [None]:
z.compute()

In [None]:
c.start_ipython_scheduler(qtconsole=True)

### Parallelize Normal Python code

Now we use Dask in normal for-loopy Python code.  This generates graphs instead of doing computations directly, but still looks like the code we had before.  Dask is a convenient way to add parallelism to existing workflows.

In [None]:
%%time
zs = []
for i in range(256):
    x = inc(i)
    y = dec(x)
    z = add(x, y)
    zs.append(z)
    
zs = dask.persist(*zs)
total = dask.delayed(sum)(zs)

In [None]:
total.compute()

By looking at the Dask dashboard we can see that Dask spreads this work around our cluster, managing load balancing, dependencies, etc..

### Custom computation: Tree summation

As an example of a non-trivial algorithm, consider the classic tree reduction.  We accomplish this with a nested for loop and a bit of normal Python logic.

```
finish           total             single output
    ^          /        \
    |        c1          c2        neighbors merge
    |       /  \        /  \
    |     b1    b2    b3    b4     neighbors merge
    ^    / \   / \   / \   / \
start   a1 a2 a3 a4 a5 a6 a7 a8    many inputs
```

In [None]:
L = zs
while len(L) > 1:
    new_L = []
    for i in range(0, len(L), 2):
        lazy = add(L[i], L[i + 1])  # add neighbors
        new_L.append(lazy)
    L = new_L                       # swap old list for new   

In [None]:
dask.visualize(*L)

In [None]:
dask.compute(L)

In [None]:
import dask.dataframe as dd
df = dd.read_parquet('/home/mrocklin/data/nyc/nyc-2015.parquet/')
df.head()

In [None]:
client

In [None]:
df.passenger_count.sum().compute()

Note the red bars for inter-worker communication.  Also note how there is lots of parallelism at the beginning but less towards the end as we reach the top of the tree where there is less work to do.

## Asynchronous Computing

In [None]:
from time import sleep

def inc(x):
    from random import random
    sleep(random())
    return x + 1

def dec(x):
    from random import random
    sleep(random())
    return x - 1
    
def add(x, y):
    from random import random
    sleep(random())
    return x + y1

In [None]:
from dask.distributed import as_completed
futures = c.map(inc, range(256))

seq = as_completed(futures)    # As tasks complete

while True:
    try:
        a = next(seq)          # Get two finished tasks
        b = next(seq)
    except StopIteration:
        break
    
    new = c.submit(add, a, b)  # Submit new task adding them
    seq.add(new)