<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg"
     align="right"
     width="30%"
     alt="Dask logo\">


# Diagnostics(Local)

Profiling parallel code can be challenging, but dask.diagnostics provides functionality to aid in profiling and inspecting execution with the local task scheduler.

This page describes the following few built-in options:

- ProgressBar
- Profiler
- ResourceProfiler
- CacheProfiler

Furthermore, this page then provides instructions on how to build your own custom diagnostic.

### 1. Progress Bar

The ProgressBar class builds on the scheduler callbacks described above to display a progress bar in the terminal or notebook during computation. This can give a nice feedback during long running graph execution. It can be used as a context manager around calls to get or compute to profile the computation:

In [1]:
import dask.array as da
from dask.diagnostics import ProgressBar
a = da.random.normal(size=(1000, 1000), chunks=(100, 100))
res = a.dot(a.T).mean(axis=0)

with ProgressBar():
     out = res.compute()


  axes=(left_axes, right_axes))


[########################################] | 100% Completed |  2.8s


### 2. Profiler

Dask provides a few tools for profiling execution. As with the ProgressBar, they each can be used as context managers or registered globally.

The Profiler class is used to profile Dask’s execution at the task level. During execution, it records the following information for each task:

- Key
- Task
- Start time in seconds since the epoch
- Finish time in seconds since the epoch
- Worker id

### 3. ResourceProfiler

The ResourceProfiler class is used to profile Dask’s execution at the resource level. During execution, it records the following information for each timestep:

- Time in seconds since the epoch
- Memory usage in MB
- % CPU usage

The default timestep is 1 second, but can be set manually using the dt keyword:

In [4]:
from dask.diagnostics import ResourceProfiler
rprof = ResourceProfiler(dt=0.5)

### 4. CacheProfiler

The CacheProfiler class is used to profile Dask’s execution at the scheduler cache level. During execution, it records the following information for each task:

- Key
- Task
- Size metric
- Cache entry time in seconds since the epoch
- Cache exit time in seconds since the epoch

Here the size metric is the output of a function called on the result of each task. The default metric is to count each task (metric is 1 for all tasks). Other functions may be used as a metric instead through the metric keyword. For example, the nbytes function found in cachey can be used to measure the number of bytes in the scheduler cache:

In [None]:
from dask.diagnostics import CacheProfiler
from cachey import nbytes
cprof = CacheProfiler(metric=nbytes)

### Example
An example to demonstrate using the diagnostics, we’ll profile some linear algebra done with Dask Array. 

We’ll create a random array, take its QR decomposition, and then reconstruct the initial array by multiplying the Q and R components together. Note that since the profilers (and all diagnostics) are just context managers, multiple profilers can be used in a with block:

In [2]:
import dask.array as da
from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler
a = da.random.random(size=(10000, 1000), chunks=(1000, 1000))
q, r = da.linalg.qr(a)#Factor the matrix  as qr, where q is orthonormal and r is upper-triangular.
a2 = q.dot(r)

with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof,CacheProfiler() as cprof:
    out = a2.compute()

The results of each profiler are stored in their results attribute as a list of namedtuple objects:

<img src="images/profiler.JPG">

In [8]:
rprof.results[0]

ResourceData(time=0.3811103, mem=169.402368, cpu=0.0)

In [None]:
cprof.results[0]

In [3]:
prof.visualize()

<img src="images/profilegraph.JPG">

In [4]:
from dask.diagnostics import visualize
visualize([prof, rprof, cprof])

<img src="images/multiprofiler.JPG">

Looking at the above figure, from top to bottom:

- The results from the Profiler object: This shows the execution time for each task as a rectangle, organized along the y-axis by worker (in this case threads). Similar tasks are grouped by color and, by hovering over each task, one can see the key and task that each block represents.
- The results from the ResourceProfiler object: This shows two lines, one for total CPU percentage used by all the workers, and one for total memory usage.
- The results from the CacheProfiler object: This shows a line for each task group, plotting the sum of the current metric in the cache against time. In this case it’s the default metric (count) and the lines represent the number of each object in the cache at time. Note that the grouping and coloring is the same as for the Profiler plot, and that the task represented by each line can be found by hovering over the line.

From these plots we can see that the initial tasks (calls to numpy.random.random and numpy.linalg.qr for each chunk) are run concurrently, but only use slightly more than 100% CPU. This is because the call to numpy.linalg.qr currently doesn’t release the Global Interpreter Lock (GIL), so those calls can’t truly be done in parallel. Next, there’s a reduction step where all the blocks are combined. This requires all the results from the first step to be held in memory, as shown by the increased number of results in the cache, and increase in memory usage. Immediately after this task ends, the number of elements in the cache decreases, showing that they were only needed for this step. Finally, there’s an interleaved set of calls to dot and sum. Looking at the CPU plot, it shows that these run both concurrently and in parallel, as the CPU percentage spikes up to around 350%.