# Debugging Pycuda Kernels with cuda-gdb
To debug an executable `gdb` must attatch to the process of an executable. This is usually achieved by passing an executable as an argument such as:
```bash
nvcc kernel.cu
cuda-gdb a.out
```
But Pycuda is a uses JIT (just in time) compilation to execute cuda code so how do we debug kernels if we don't have access to the binary?


- **Step1**: Set Debugging Flags.

    we must complie with debugging option by passing `-g -G` flags to the nvidia compiler.
    This is achived by running
    `
    export PYCUDA_DEFAULT_NVCC_FLAGS="-g -G"
    `
    before launching this notebook.


- **Step2**: Attach to the correct process.

    After the debugging environment variable has been set and the ipython notebook has been launched,
    the python kernel can be attached to with the following command:
    ```
    cuda-gdb -p $(pgrep -f ipykernel)
    ```
    
    We are using the command line tool `pgrep` to get the pid of the nootbook session.
    This is the same as running
    ```
    cuda-gdb --pid $(ps aux | grep [i]pykernel | awk '{print $2}')
    ```
    Note: this might fail if you have more than one ipython kernel running. In that case you can copy and paste the pid
    for the desired process. Use the same technique to attach to a python process 
    e.g. `cuda-gdb -p $(pgrep -f python)`.

- **Step3**: set desired breakpoints.

    when the cuda-gdb attaches to the ipython kernel it will freeze the process so that the desired breakpoints
    can be set. Breakpoints can be set in the usual gdb way. e.g.
    ```
    break cuda_kernel_name
    ```
    then `continue` to resume the ipython process. See [cuda-gdb docs](https://docs.nvidia.com/cuda/cuda-gdb/index.html) for more.


In [6]:
%matplotlib inline

# ensure that any edits to libraries
# are reloaded automatically
%reload_ext autoreload
%autoreload 2

In [7]:
# make cuda_slic importable from the notebook directory
import sys
sys.path.append("./..")

In [8]:
# does not work for some reason

# this environment variable must be set in the same bash session
# that the jupyter notebook is being launched from
#%set_env PYCUDA_DEFAULT_NVCC_FLAGS="-g -G"

In [9]:
from cuda_slic.slic import slic3d as cuda_slic

In [10]:
import numpy as np
from skimage import data, color, filters, segmentation
from skimage.util import img_as_float32, img_as_float64
from skimage.segmentation import slic as sk_slic
from skimage.segmentation import mark_boundaries

import matplotlib.pyplot as plt
from ipywidgets import interact

Using `cuda-gdb`, set a breakpint such as `break init_clusters` then run the following code block.

In [12]:
# initializing a 3 dim dataset

blob = data.binary_blobs(length=20, n_dim=3, seed=2)
blob = np.float32(blob)

cuda_labels = cuda_slic(blob, n_segments=4, compactness=.5)


  return _merge_small3d(data, labels, labels.max()+1, min_size)


IndexError: too many indices for array

lets explore the segmentation of the whole 3D array