## Structured debugging

Writing code will always include a significant amount of debugging. Therefore treating debugging in a structured way can prevent unnecessary work and frustration.

#### 1. Avoiding bugs from the start

- We all write buggy code. Accept it. Deal with it.
- Write your code with testing and debugging in mind.
- KISS (Keep It Simple, Stupid): What is simplest thing that could possible work?
- DRY (Don't Repeat Yourself): Every piece of knowledge must have a single, unambigous representation within a system.
- Try to limit interdependencies of your code (Loose Coupling).
- Give your variables, functions and modules meaningful names (not mathematics names).

#### 2. Debugging workflow

For non-trivial bugs in a larger system:
1. Make it fail reliably. Find a test case that makes the code fail every time.
2. Divide and Conquer. Once you have a failing test case, isolate the failing code:
    - Which module.
    - Which function.
    - Which line of code.
3. Change one thing at a time and re-run the failing test case.
4. Use the debugger to understand what is going wrong.
5. Take notes and be patient. It may take a while.

#### 3. Using the python debugger

A very efficient way of debugging is using tools that are specifically designed for it. One example is the built-in python debugger [`pdb`](https://docs.python.org/library/pdb.html).

Debugger commands are not python commands. Some examples:

|  |  |
| :--- | :--- |
| `list` | List the code at the current position |
| `up` | Walk up the call stack |
| `down` | Walk down the call stack |
| `bt` | Print the call stack |
| `a` | Print the local variables |
| `!(python code)` | Allows for use of python code within debugger |
| `q` | Quit the debugger console |

Invoking `pdb` postmortem:

In [1]:
def add_strings(x, s):
    return str(x) + s

a = 's'
b = 2.3
c = add_strings(a, b)

d = add_string(2.3, 's')

TypeError: can only concatenate str (not "float") to str

In [2]:
%debug

> [0;32m<ipython-input-1-19b4066e23d7>[0m(2)[0;36madd_strings[0;34m()[0m
[0;32m      1 [0;31m[0;32mdef[0m [0madd_strings[0m[0;34m([0m[0mx[0m[0;34m,[0m [0ms[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 2 [0;31m    [0;32mreturn[0m [0mstr[0m[0;34m([0m[0mx[0m[0;34m)[0m [0;34m+[0m [0ms[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      3 [0;31m[0;34m[0m[0m
[0m[0;32m      4 [0;31m[0ma[0m [0;34m=[0m [0;34m's'[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m[0mb[0m [0;34m=[0m [0;36m2.3[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> list
[1;32m      1 [0m[0;32mdef[0m [0madd_strings[0m[0;34m([0m[0mx[0m[0;34m,[0m [0ms[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 2 [0;31m    [0;32mreturn[0m [0mstr[0m[0;34m([0m[0mx[0m[0;34m)[0m [0;34m+[0m [0ms[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m      3 [0m[0;34m[0m[0m
[1;32m      4 [0m[0ma[0m [0;34m=[0m [0;34m's'[0m[0;34m[0m

Invoking the debugger by setting a flag upon script execution:
```ipython
%run script.py
%run -d script.py
```

## Optimizing code

Optimizing code is only relevant for problems that require many loop-iterations or deal with large data samples. Premature optimization is considered bad style as it oftentimes goes against the KISS principle. Code should only be optimized, if runtime is the only issue with it.

#### 1. Optimization workflow
- Make it work: write the code in simple, legible ways (KISS).
- Make it work reliably: write automated test cases, make really sure that your algorithm is right and that if you break it, the test will capture the breakage.
- Optimize the code by **profiling** simple use-cases to find the bottlenecks and speeding up these bottlenecks by finding a better algorithm or implementation.

#### 2. Profiling by time measurements: `timeit`

Directly measuring time between code snippets:

In [3]:
import numpy as np

n = 256

arr = np.ones((20, n, n))
mask = np.zeros((n, n))
res = arr.copy()

for i in range(20):
    for x in range(n):
        for y in range(n):
            res[i, x, y] = arr[i, x, y] * mask[x, y]

In [4]:
from timeit import default_timer as dt
import numpy as np

n = 256

t0 = dt()
arr = np.ones((20, n, n))
mask = np.zeros((n, n))
res = arr.copy()
t1 = dt()

t2 = dt()
for i in range(20):
    for x in range(n):
        for y in range(n):
            res[i, x, y] = arr[i, x, y] * mask[x, y]
t3 = dt()

profile0 = t1 - t0
profile1 = t3 - t2
print("Profiling: array creation took %.4f s, mask application took %.4f s" %
     (profile0, profile1))

Profiling: array creation took 0.0080 s, mask application took 0.7851 s


In [5]:
from timeit import default_timer as dt
import numpy as np

n = 256

t0 = dt()
arr = np.ones((20, n, n))
mask = np.zeros((n, n))
t1 = dt()

t2 = dt()
arr *= mask
t3 = dt()

profile0 = t1 - t0
profile1 = t3 - t2
print("Profiling: array creation took %.4f s, mask application took %.4f s" %
     (profile0, profile1))

Profiling: array creation took 0.0048 s, mask application took 0.0018 s


Measuring runtime quantitatively:

In [6]:
import numpy as np


def implementation_1(n):    
    arr = np.ones((20, n, n))
    mask = np.zeros((n, n))
    res = arr.copy()
    
    for i in range(20):
        for x in range(n):
            for y in range(n):
                res[i, x, y] = arr[i, x, y] * mask[x, y]
    
    return res


def implementation_2(n):
    arr = np.ones((20, n, n))
    mask = np.zeros((n, n))
    
    arr *= mask
    
    return arr

In [7]:
%timeit implementation_1(8)

607 µs ± 7.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [8]:
%timeit implementation_2(8)

4.73 µs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Profiling by setting a flag upon script execution:
```ipython
%run script.py
%run -t script.py
%run -p script.py
```

## Implementing parallelization with `dask`

For some problems, algorithms are as effecient as python allows but they still take a while to execute. Although these problems might not be reducible, they might be seperable. This is were parallelization is the final option to speed up execution. [`dask`](https://docs.dask.org/en/latest/) offers convenient, high level implementations of parallelization and supports numpy arrays and pandas dataframes in a simple fashion.

#### 0. Setup of CPU and memory usage

In [9]:
from dask.distributed import Client, progress

client = Client(processes=False, threads_per_worker=2, n_workers=2, 
                memory_limit='2GB')
client

0,1
Client  Scheduler: inproc://192.168.0.227/15155/1  Dashboard: http://192.168.0.227:8787/status,Cluster  Workers: 2  Cores: 4  Memory: 4.00 GB


#### 1. Using dask methods to initialize parallel computation

In [10]:
import dask.array as da
x = da.random.random((10000, 10000), chunks=(1000, 1000))
x

Unnamed: 0,Array,Chunk
Bytes,800.00 MB,8.00 MB
Shape,"(10000, 10000)","(1000, 1000)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 800.00 MB 8.00 MB Shape (10000, 10000) (1000, 1000) Count 100 Tasks 100 Chunks Type float64 numpy.ndarray",10000  10000,

Unnamed: 0,Array,Chunk
Bytes,800.00 MB,8.00 MB
Shape,"(10000, 10000)","(1000, 1000)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray


In [11]:
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z

Unnamed: 0,Array,Chunk
Bytes,40.00 kB,4.00 kB
Shape,"(5000,)","(500,)"
Count,430 Tasks,10 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 40.00 kB 4.00 kB Shape (5000,) (500,) Count 430 Tasks 10 Chunks Type float64 numpy.ndarray",5000  1,

Unnamed: 0,Array,Chunk
Bytes,40.00 kB,4.00 kB
Shape,"(5000,)","(500,)"
Count,430 Tasks,10 Chunks
Type,float64,numpy.ndarray


In [12]:
z = z.compute()
z

array([0.99870379, 0.99977211, 0.99483732, ..., 0.996346  , 0.99309488,
       1.0088732 ])

#### 2. Parallelized computation of a numpy array

In [13]:
%reset -f

In [14]:
import numpy as np
import dask.array as da

arr = np.ones((20, 1028, 1028))

da_arr = da.array(arr)
da_arr

Unnamed: 0,Array,Chunk
Bytes,169.09 MB,42.27 MB
Shape,"(20, 1028, 1028)","(20, 514, 514)"
Count,5 Tasks,4 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 169.09 MB 42.27 MB Shape (20, 1028, 1028) (20, 514, 514) Count 5 Tasks 4 Chunks Type float64 numpy.ndarray",1028  1028  20,

Unnamed: 0,Array,Chunk
Bytes,169.09 MB,42.27 MB
Shape,"(20, 1028, 1028)","(20, 514, 514)"
Count,5 Tasks,4 Chunks
Type,float64,numpy.ndarray


In [15]:
da_arr = da_arr.rechunk((1, 1028, 1028))
da_arr

Unnamed: 0,Array,Chunk
Bytes,169.09 MB,8.45 MB
Shape,"(20, 1028, 1028)","(1, 1028, 1028)"
Count,105 Tasks,20 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 169.09 MB 8.45 MB Shape (20, 1028, 1028) (1, 1028, 1028) Count 105 Tasks 20 Chunks Type float64 numpy.ndarray",1028  1028  20,

Unnamed: 0,Array,Chunk
Bytes,169.09 MB,8.45 MB
Shape,"(20, 1028, 1028)","(1, 1028, 1028)"
Count,105 Tasks,20 Chunks
Type,float64,numpy.ndarray


In [16]:
mask = np.zeros((1028, 1028))

res = da_arr * mask
res

Unnamed: 0,Array,Chunk
Bytes,169.09 MB,8.45 MB
Shape,"(20, 1028, 1028)","(1, 1028, 1028)"
Count,126 Tasks,20 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 169.09 MB 8.45 MB Shape (20, 1028, 1028) (1, 1028, 1028) Count 126 Tasks 20 Chunks Type float64 numpy.ndarray",1028  1028  20,

Unnamed: 0,Array,Chunk
Bytes,169.09 MB,8.45 MB
Shape,"(20, 1028, 1028)","(1, 1028, 1028)"
Count,126 Tasks,20 Chunks
Type,float64,numpy.ndarray


In [17]:
res[0, 0, 0]

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,(),()
Count,127 Tasks,1 Chunks
Type,float64,numpy.ndarray
Array Chunk Bytes 8 B 8 B Shape () () Count 127 Tasks 1 Chunks Type float64 numpy.ndarray,,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,(),()
Count,127 Tasks,1 Chunks
Type,float64,numpy.ndarray


In [18]:
res = res.compute()

In [19]:
res[0, 0, 0]

0.0

#### 3. Parallelized computation of a pandas dataframe

In [20]:
%reset -f

In [21]:
import numpy as np
import pandas as pd
import dask.dataframe as dd

n = 501
t = np.linspace(0, 1, n)  # time-axis
data = np.array([t, np.sin(t), np.cos(t)]).T  # (rows, columns) = (n, 3)
labels = ['time', 'sin', 'cos']

signal = pd.DataFrame(data, columns=labels)
signal

Unnamed: 0,time,sin,cos
0,0.000,0.000000,1.000000
1,0.002,0.002000,0.999998
2,0.004,0.004000,0.999992
3,0.006,0.006000,0.999982
4,0.008,0.008000,0.999968
...,...,...,...
496,0.992,0.837122,0.547017
497,0.994,0.838214,0.545341
498,0.996,0.839303,0.543664
499,0.998,0.840389,0.541984


In [22]:
dd_signal = dd.from_pandas(signal, chunksize=125)
dd_signal

Unnamed: 0_level_0,time,sin,cos
npartitions=4,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,float64,float64,float64
125,...,...,...
250,...,...,...
375,...,...,...
500,...,...,...


In [23]:
y = dd_signal.mean()
y

Dask Series Structure:
npartitions=1
cos     float64
time        ...
dtype: float64
Dask Name: dataframe-mean, 15 tasks

In [24]:
y.compute()

time    0.500000
sin     0.459620
cos     0.841328
dtype: float64