In [2]:
import numpy as np
from os import system
from importlib import reload
from pygdbmi.gdbcontroller import GdbController
from IPython.display import display, HTML

import util.gdb_nb as gdb_nb
from util.gdb_nb import cmd, show_current_location, get_variable_type, load_array


# Reload changes if needed
reload(gdb_nb)

<module 'util.gdb_nb' from '/workspace/util/gdb_nb.py'>

# The multi-process/remote case

We now seek to debug a multi-process program where many instances will be running independently with their own execution state. A straightforward strategy is to launch one GDB instance for each process and inspect their lomentaceous scope independently. Our case is significantly simpler within this aspect, since numerical simulations built on top of the MPI framework usually share the same code-base wich a few branches depending on the current process number (rank) to handle specific tasks such as IO or value aggregations. These way of writing parallelized programs rely on some synchronization points within the code (MPI_BARRIER, as other synchronous calls) in which a proccess waits for the others to be on the same point. Debugging programs like this is significantly easier since pausing one of the proccess does not trigger race conditions and further inspection of the local scope tends to give a representative view of the whole state. 

It is also worth noting that debugging a single process programmatically avoids the task of coordinating multiple GDB instances, which can be cumberstone if one is not used to async programming.   

To put things together, our debugging process follows as:
1. Launch the multi-proccess inferior which at least one GDB instance attached to a process. 
2. Attach to this instance and have programmatic inspection using GDB MI

Task (1) in general is very hard, specially if one cannot predict when the targeted process will be launched [1]. In our case Intel MPI implementation has a `-gtool` option which lets one wraps each process launch with an external tool [2]. Note that, as opposed to the debugging of a serial program, gdb is **not responsible** for launching the inferior [3]! Even more, the gdb instance which controls each process is not launched by our ipython instance. To handle this kind of cases, GDB exposes a client-server functionality which enables an instance (the server) to be controlled remotely by a client just as in the local serial case.

```
ipython GDB MI <---> GDB client <---> GDB server at port XXXX <---> Rank k of the n-proccess inferior
```

GDB server let us do this client-server communication in two ways, either by a TPC connection listening on a port or by attaching to a TTY. While the later is probably more flexible and performant (I really havent evaluated) the former is way easier and works out of the box in cluster scenarios, one have just to find a free port (or port range) in the machine which the server is launched. I will restrict the scope of the examples to match our interest, which is doing programmatic debugging from python. Assuming the inferior executable is at `./my_prog`, launching 4 MPI processes each wrapped by a GDB instance is done by:

```bash
mpirun -n 4 -gtoolfile ./gtool_config ./my_prog
```

Where the content of `./gtool_config` is:

```
gdbserver :8800:0
```

If other gdb servers are to be controlled (which would require also more clients) more lines can be added such as:

```
gdbserver :8800:0
gdbserver :9301:1
gdbserver :1234:2
```

We just have to make sure each port is not being used by any other program in the host (server) machine. In the client, instead of launching the inferior executable, one tries to connect to one of the launched servers:

```python
# Instead of this
cmd(gdb, "-file-exec-and-symbols ./my_prog")

# Do this
cmd(gdb, "-target-select remote localhost:8800")
```

Connecting to a remote target takes a little more time than usual, because some debugging information need to be downloaded from the server. Make sure to connect to every instance which you've launched gdb. When a gdb instance is loaded, it breaks at the first program instruction and waits for someone to attach.

*Footnotes*
1. In real world programs this can happen in the middle of the execution.
2. For other MPI implementations, such as Open MPI, MPICH... specific tricks are needed to accomplish the same goal.
3. One would encounter the same situation when running the inferior on another machine, such a cluster's node.

## Create controller and set unlimited output

In [3]:
gdb = GdbController()
gdb.write("-gdb-set max-value-size unlimited")
gdb.write("-gdb-set max-composite-size unlimited")
gdb.write("-gdb-set print repeats 0")
gdb.write("-gdb-set print elements 0")
print("Controller loaded")

Controller loaded


## Launch the proccess using gtool

Go ahead to another terminal session and launch the MPI processes with a `gtool` configuration.

```bash
mpirun -n 2 -gtoolfile gtool_config ./build/parallel
```

## Load inferior and setup breakpoints

In [4]:
# cmd() is just a wrapper for gdb.write which does pretty output 
# (see util/gdb_nb.py for details)

#cmd(gdb, "-file-exec-and-symbols ./build/simple")
cmd(gdb, "-target-select remote localhost:8800")

In [6]:
# run until the breakpoint
cmd(gdb, "-break-insert parallel.f90:23")
cmd(gdb, "-exec-continue")

## Debug at current breakpoint

In [7]:
show_current_location(gdb)

In [8]:
# Get variable type parses gdb output to fetch a variable's info
get_variable_type(gdb, "x")

{'type': 'real'}

In [9]:
# Load array uses get_variable_type in the background.
# For some reason it fails with scalars
arr = load_array(gdb, "x")
arr

IndexError: index 1 is out of bounds for axis 0 with size 1

In [8]:
# From program's line 5, we expect this slice to equal 1
arr[:, 0, 0]

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

## Exit (important!)

In [13]:
gdb.exit()