<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg"
     align="right"
     width="30%"
     alt="Dask logo\">


# Parallelize code with Dask.delayed

In this section we parallelize simple for-loop style code with Dask and Dask.delayed.

This is a simple way to use Dask to parallelize existing codebases or build complex systems.  This will also help us to build intuition for future sections.

### Alternative software

The solutions presented in this section are similar to the following tools:

1.  concurrent.futures
2.  multiprocessing.Pool
3.  Airflow/Luigi

### Simplest possible example

We make some simple functions, inc and add, that sleep for a while to simulate work.  We time running these functions normally.  

In the next section we'll parallelize this code

In [None]:
from time import sleep

def inc(x):
    sleep(1)
    return x + 1

def add(x, y):
    sleep(1)
    return x + y

In [None]:
%%time
# This takes three seconds to run because we call each function sequentially, one after the other
x = inc(1)
y = inc(2)
z = add(x, y)

### Parallelize with dask.delayed decorator

Those two increment calls *could* be called in parallel.

In this section we call `inc` and `add`, wrapped with `dask.delayed`.  This changes those functions so that they don't run immediately, but instead put those functions and arguments into a task graph.  Now when we run our code this runs immediately, but all it does it create a graph.  We then separately compute the result by calling the `.compute()` method.

In [None]:
import dask

In [None]:
%%time
# This runs immediately, all it does is build a graph
x = dask.delayed(inc)(1)
y = dask.delayed(inc)(2)
z = dask.delayed(add)(x, y)

In [None]:
%%time
# This actually runs our computation using a local thread pool
z.compute()

### What just happened?

The `z` object is a lazy `dask.Delayed` object.  This object holds everything we need to compute the final result.  We can compute the result with `.compute()` as above or we can visualize the result with `.visualize()`.

In [None]:
z

In [None]:
z.visualize(rankdir="LR")

### Some questions to consider:

-  Why did we go from 3s to 2s?  Why weren't we able to parallelize down to 1s?
-  What would have happened if the inc and add functions didn't include the `sleep(1)`?  Would Dask still be able to speed up this code?
-  What if we have multiple outputs or also want to get access to x or y?

### Exercise: Parallelize a for loop

For loops are one of the most common things that we want to parallelize.  Use dask.delayed on `inc` and `sum` to parallelize the computation below:

In [None]:
data = [1, 2, 3, 4, 5, 6, 7, 8]

In [None]:
%%time
# Sequential code

results = []
for x in data:
    y = inc(x)
    results.append(y)
    
total = sum(results)

In [None]:
total

In [None]:
%%time
# Parallel code

results = []
for x in data:
    # TODO

In [None]:
total

### Parallelizing for-loop code with control flow

Often we want to delay only *some* functions, running a few of them immediately.  This is especially helpful when those functions are fast and help us to determine what other slower functions we should call.  This decision, to delay or not to delay, is usually where we need to be thoughtful when using dask.delayed.

In the example below we iterate through a list of inputs.  If that input is even then we want to call `inc`.  If the input is odd then we want to call `double`.  This `iseven` decision to call `inc` or `double` has to be made immediately (not lazily) in order for our graph-building Python code to proceed.

In [None]:
def double(x):
    sleep(1)
    return 2 * x

def iseven(x):
    return x % 2 == 0

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
%%time
# Sequential code

results = []
for x in data:
    if iseven(x):
        y = double(x)
    else:
        y = inc(x)
    results.append(y)
    
total = sum(results)
total

In [None]:
%%time
# Parallel code
# TODO: parallelize the sequential code above using dask.delayed
# You will need to delay some functions, but not all



In [None]:
total.visualize()

In [None]:
%time total.compute()

In [None]:
%load solutions/01-delayed-inc-double.py

### Some questions to consider:

-  What are other examples of control flow where we can't use delayed?
-  What would have happened if we had delayed the evaluation of `iseven(x)` in the example above?
-  What are your thoughts on delaying `sum`?  This function was both computational but also very fast to run.

## Pandas exercise

In this exercise we read several CSV files and perform a groupby operation in parallel.  We are given sequential code to do this and parallelize it with Dask.delayed.

The computation we will parallelize is to compute the daily high-low spread of a stock over time.  We will do this by using dask.delayed together with Pandas.  In a future section we will do this same exercise with dask.dataframes.

### Prep data

First, run this code to prep some data.  You don't need to understand this code.

This downloads daily stock prices for a few tech companies and then interpolates between these daily values with random data to simulate per-second prices.  This will create a local `data` directory that holds around 1GB of time series data as CSV files.  It should only require downloading a few kilobytes of data from the internet.

In [None]:
%run prep-stocks.py

### Inspect data

In [None]:
import os
sorted(os.listdir(os.path.join('data', 'stocks')))

In [None]:
sorted(os.listdir(os.path.join('data', 'stocks', 'GOOG')))

### Read one file with pandas.read_csv and compute spread

In [None]:
import pandas as pd
df = pd.read_csv(os.path.join('data', 'stocks', 'GOOG', '2015-01-02.csv'), 
                 parse_dates=['timestamp'], 
                 index_col='timestamp')
df.head()

In [None]:
spread = df.high.max() - df.low.min()
spread

### Sequential code: spread over time

This code performs the spread computation on every day of data using a sequential for loop.

In [None]:
from glob import glob
filenames = sorted(glob(os.path.join('data', 'stocks', 'GOOG', '*.csv')))

In [None]:
%%time

spreads = []
days = []
for fn in filenames:
    df = pd.read_csv(fn, parse_dates=['timestamp'], index_col='timestamp')
    spread = df.high.max() - df.low.min()
    day = df.index[0].round('1d')
    
    spreads.append(spread)
    days.append(day)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(10, 5))
plt.plot(days, spreads)

### Exercise: parallelize the code above

Use dask.delayed to parallelize the code above.  Some extra things you will need to know.

1.  Methods and attribute access on delayed objects work automatically, so if you have a delayed object you can perform normal arithmetic, slicing, and method calls on it and it will produce the correct delayed calls.

    ```python
    x = delayed(np.arange)(10)
    y = (x + 1)[::2].sum()  # everything here was delayed
    ```
2.  Calling the `.compute()` method works well when you have a single output.  When you have multiple outputs you might want to use the `dask.compute` function:

    ```python
    >>> x = dask.delayed(np.arange)(10)
    >>> y = x ** 2
    >>> min, max = dask.compute(y.min(), y.max())
    (0, 81)
    ```
    
    This way dask can share the intermediate values (like `y = x**2`)
    
So your goal is to parallelize the code above (which has been copied below) using Dask.delayed.  You may also want to visualize a bit of the computation to see if you're doing it correctly.

*Note: performance will improve a little bit, but not a whole lot.  We'll discuss why afterwards*

In [None]:
%%time

spreads = []
days = []
for fn in filenames:
    ...
    
spreads, days = dask.compute(spreads, days)

In [None]:
%load solutions/01-delayed-pandas.py