# Parallelized Feature Location using IPython Parallel

Feature-finding can easily be parallelized: each frame an independent task, and the tasks can be divided among the available CPUs. [IPython parallel](https://github.com/ipython/ipyparallel) makes this very straightforward.

## Intsall ipyparallel

As of IPython 6.2 (November 2017), IPython parallel is a separate package. You may need to install it at the command prompt using `pip install ipyparallel` or `conda install ipyparallel`.

It is simplest to start a cluster on the CPUs of the local machine. In order to start a cluster, you will need to go to a Terminal and type:
```
ipcluster start -n 4
```

The number 4 should be replaced by the number of available CPUs. Now you are running a cluster -- it's that easy. More information on IPython parallel is available in [the IPython parallel documentation](http://ipyparallel.readthedocs.io/en/latest/intro.html).

In [2]:
from ipyparallel import Client
client = Client()
view = client.load_balanced_view()

We can see that there are four cores available.

In [3]:
client[:]

<DirectView [0, 1, 2, 3]>

Use a little magic, ``%%px``, to import trackpy on all cores.

In [4]:
%%px
import trackpy as tp

Do the normal setup now, import trackpy normally and loading frames to analyze.

In [5]:
import pims
import trackpy as tp

@pims.pipeline
def gray(image):
    return image[:, :, 0]

frames = gray(pims.ImageSequence('../sample_data/bulk_water/*.png'))

Define a function from ``locate`` with all the parameters specified, so the function's only argument is the image to be analyzed. We can map this function directly onto our collection of images. (This is a called "currying" a function, hence the choice of name.)

In [6]:
curried_locate = lambda image: tp.locate(image, 13, invert=True)

In [7]:
view.map(curried_locate, frames[:4])  # Optionally, prime each engine: make it set up numba.

<AsyncMapResult: <lambda>>

Compare the time it takes to locate features in the first ten images with and without parallelization.

In [8]:
%%timeit
amr = view.map_async(curried_locate, frames[:32])
amr.wait_interactive()
results = amr.get()

  32/32 tasks finished after    0 s
done
817 ms ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [9]:
%%timeit
serial_result = list(map(curried_locate, frames[:32]))

1.29 s ± 45.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The speedup is not very impressive in this case because each frame is relatively easy to compute, and parallel processing introduces some overhead (for example, the parent process still has to read each frame and send it to a worker). But for more challenging uses of `locate` this will greatly speed up processing.