Ad-hoc computations with Futures
------------------------------------

Some computations are more complex than an embarrassingly parallel map over a linear collection.  We might call several different functions, we might iterate over multiple collections, or we might conditionally run computations based on the values of the data.

In this section we look at the asynchronous `Future` interface, which provides a simple API for ad-hoc parallelism.

### Objectives

*  Use the `concurrent.futures` function `submit` to parallelize non-map patterns

### Requirements

*  SciKit Learn
*  concurrent.futures (standard in Python 3, `pip install futures` in Python 2)


    pip install snakeviz
    pip install futures

### Application

We train a machine learning model across many parameters with cross validation.  This is slightly more complex than a map so we use `submit`.  We train a support vector classifier on handwritten digits using cross validation to avoid over-fitting.

As before we start with a sequential solution.

In [None]:
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.grid_search import ParameterSampler
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

from cv_params_demo import load_cv_split, evaluate_one

digits = load_digits()

plt.imshow(digits.data[0].reshape(8, 8),
           interpolation='nearest', cmap='gray');

In [None]:
param_grid = {
    'C': np.logspace(-10, 10, 1001),
    'gamma': np.logspace(-10, 10, 1001),
    'tol': np.logspace(-4, -1, 4),
}

param_space = ParameterSampler(param_grid, 10)

next(iter(param_space))

In [None]:
from cv_params_demo import load_cv_split

cv_splits = [load_cv_split(i) for i in range(2)]
idx, (x_train, x_test, y_train, y_test) = cv_splits[0]
x_train, y_train

### Compute and Profile sequentially

In [None]:
%load_ext snakeviz

In [None]:
%%time
# %%snakeviz  # uncomment for profiling

results = []

for split in cv_splits:
    for params in param_space:
        result = evaluate_one(SVC, params, split)
        results.append(result)

### Plot results

In [None]:
from cv_params_demo import plot_results

plot_results(results)

### Analysis

This computation uses a doubly nested for loop.  

```python
results = []

for split in cv_splits:
    for params in param_space:
        result = evaluate_one(SVC, params, split)
        results.append(result)
```

It *is* possible to solve this problem with `map`, but it requires some cleverness.  Instead we'll learn `submit`, an interface to start many individual function calls at the same time.  Submit is also capable of solving problems that are much more complex than this one, so it's a good tool to learn.

Our solution finishes with an analysis and plotting of all intermediate results.  We're not going to care about parallelizing this because it's already very fast.

```python
plot_results(results)
```

### Learn Executor.submit

The `submit` method starts a computation in a separate thread or process and immediately gives us a `Future` object that refers to the result.  At first, the future is pending.  Once the task completes it is finished.  We can collect the result of the task with the `.result()` method.

In [None]:
from time import sleep
from concurrent.futures import ThreadPoolExecutor

e = ThreadPoolExecutor(4)

def slowadd(a, b, delay=1):
    sleep(delay)
    return a + b

future = e.submit(slowadd, 1, 2)
future

In [None]:
future

In [None]:
future.result()

### Submit many tasks, receive many futures

Because submit returns immediately we can submit many tasks all at once and they will execute in parallel.

In [None]:
%%time
results = [slowadd(i, i, delay=1) for i in range(10)]

In [None]:
%%time
futures = [e.submit(slowadd, 1, 1, delay=1) for i in range(10)]

In [None]:
%%time
results = [f.result() for f in futures]

### Submit different tasks

The virtue of submit is that you can submit different functions and you can perform a bit of logic on each input

### Exercise: parallelize the following code with e.submit

1.  Replace the `results` list with a list called `futures`
2.  Replace calls to `slowadd` and `slowinc` with `e.submit` calls on those functions

3.  At the end block on the computation by recreating the `results` list by calling `.result()` on each future in the `futures` list.

In [None]:
%%time

### Sequential Version

def slowsub(a, b, delay=1):
    sleep(delay)
    return a + b

results = []
for i in range(5):
    for j in range(5):
        if i < j:
            results.append(slowadd(i, j, delay=1))
        elif i > j:
            results.append(slowsub(i, j, delay=1))
            
total = sum(results)

In [None]:
%%time

### Parallel Version

# TODO

### Conclusion on submit

*  Submit fires off a single function call in the background, returning a future.  
*  When we combine submit with a single for loop we recover the functionality of map.  
*  When we want to collect our results we replace each of our futures, `f`, with a call to `f.result()`
*  We can combine submit with multiple for loops and other general programming to get something more general than map.


Application
------------

Now we use `e.submit` to parallelize our nested for loop over `evaluate_one` from above.

Here is the sequential code that we want to parallelize:

```python
results = []

for split in cv_splits:
    for params in param_space:
        result = evaluate_one(SVC, params, split)
        results.append(result)
```

In [None]:
%%time

futures = []

for split in cv_splits:
    for params in param_space:
        future = e.submit(evaluate_one, SVC, params, split)
        futures.append(future)
        
results = [future.result() for future in futures]

Conclusion
-----------

*  We learned how `e.submit` can help us to parallelize more complex applications
*  We used `e.submit` to parallelize cross validated parameter sweeps