# Parallelization

In [1]:
from IPython.parallel import Client
client = Client()
view = client.load_balanced_view()

We can see that there are four cores available.

In [2]:
client[:]

<DirectView [0, 1, 2, 3]>

Use a little magic, ``%%px``, to import trackpy on all cores.

In [3]:
%%px
import trackpy as tp

Do the normal setup now, import trackpy normally and loading frames to analyze.

In [4]:
import trackpy as tp

def gray(image):
    return image[:, :, 0]

images = tp.ImageSequence('../sample_data/bulk_water/*.png', process_func=gray)

Define a function from ``locate`` with all the parameters specified, so the function's only argument is the image to be analyzed. We can map this function directly onto our collection of images. (This is a called "currying" a function, hence the choice of name.)

In [5]:
curried_locate = lambda image: tp.locate(image, 9, invert=True)

Compare the time it takes to locate features in the first ten images with and without parallelization.

In [6]:
%%timeit
features_normal = pd.concat(map(curried_locate, images[:10]))

1 loops, best of 3: 3.59 s per loop


In [7]:
%%timeit
results = view.map(curried_locate, images[:10])
features_parallel = pd.concat(list(results))

1 loops, best of 3: 1.91 s per loop


Verify that the results match to high precision.

In [8]:
from numpy.testing import assert_allclose
# We can't check for exact equality because
# the numbers are floating-point numbers.

features_normal = pd.concat(map(curried_locate, images[:10]))

results = view.map_async(curried_locate, images[:10])
features_parallel = pd.concat(list(results))

assert_allclose(features_normal.values, features_parallel.values)