Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugging: adding a script and instructions for debugging the GO shim #9585

Merged
merged 1 commit into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
185 changes: 185 additions & 0 deletions docs/Debug-shim-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Using a debugger with the runtime

Setting up a debugger for the runtime is pretty complex: the shim is a server
process that is run by the runtime manager (containerd/CRI-O), and controlled by
sending gRPC requests to it.
Starting the shim with a debugger then just gives you a process that waits for
commands on its socket, and if the runtime manager doesn't start it, it won't
send request to it.

A first method is to attach a debugger to the process that was started by the
runtime manager.
If the issue you're trying to debug is not located at container creation, this
is probably the easiest method.

The other method involves a script that is placed in between the runtime manager
and the actual shim binary. This allows to start the shim with a debugger, and
wait for a client debugger connection before execution, allowing debugging of the
kata runtime from the very beginning.

## Prerequisite

At the time of writing, a debugger was used only with the go shim, but a similar
process should be doable with runtime-rs. This documentation will be enhanced
with rust-specific instructions later on.

In order to debug the go runtime, you need to use the [Delve debugger](https://github.com/go-delve/delve).

You will also need to build the shim binary with debug flags to make sure symbols
are available to the debugger.
Typically, the flags should be: `-gcflags=all=-N -l`

## Attach to the running process

To attach the debugger to the running process, all you need is to let the container
start as usual, then use the following command with `dlv`:

`$ dlv attach [pid of your kata shim]`

If you need to use your debugger remotely, you can use the following on your target
machine:

`$ dlv attach [pid of your kata shim] --headless --listen=[IP:port]`

then from your client computer:

`$ dlv connect [IP:port]`

## Make CRI-O/containerd start the shim with the debugger

You can use the [this script](../tools/containerd-shim-katadbg-v2) to make the
shim binary executed through a debugger, and make the debugger wait for a client
connection before running the shim.
This allows starting your container, connecting your debugger, and controlling the
shim execution from the beginning.

### Adapt the script to your setup

You need to edit the script itself to give it the actual binary
to execute.
Locate the following line in the script, and set the path accordingly.

```bash
SHIM_BINARY=
```

You may also need to edit the `PATH` variable set within the script,
to make sure that the `dlv` binary is accessible.

### Configure your runtime manager to use the script

Using either containerd or CRI-O, you will need to have a runtime class that
uses the script in place of the actual runtime binary.
To do that, we will create a separate runtime class dedicated to debugging.

- **For containerd**:
Make sure that the `containerd-shim-katadbg-v2` script is available to containerd
(putting it in the same folder as your regular kata shim typically).
Then edit the containerd configuration, and add the following runtime configuration,.

```toml
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.katadbg]
runtime_type = "io.containerd.katadbg.v2"
```

- **For CRI-O**:
Copy your existing kata runtime configuration from `/etc/crio/crio.conf.d/`, and
make a new one with the name `katadbg`, and the runtime_path set to the location
of the script.

E.g:

```toml
[crio.runtime.runtimes.katadbg]
runtime_path = "/usr/local/bin/containerd-shim-katadbg-v2"
runtime_root = "/run/vc"
runtime_type = "vm"
privileged_without_host_devices = true
runtime_config_path = "/usr/share/defaults/kata-containers/configuration.toml"
```

NOTE: for CRI-O, the name of the runtime class doesn't need to match the name of the
script. But for consistency, we're using `katadbg` here too.

### Start your container and connect to the debugger

Once the above configuration is in place, you can start your container, using
your `katadbg` runtime class.

E.g: `$ crictl runp --runtime=katadbg sandbox.json`

The command will hang, and you can see that a `dlv` process is started

```
$ ps aux | grep dlv
root 9137 1.4 6.8 6231104 273980 pts/10 Sl 15:04 0:02 dlv exec /go/src/github.com/kata-containers/kata-containers/src/runtime/__debug_bin --headless --listen=:12345 --accept-multiclient -r stdout:/tmp/shim_output_oMC6Jo -r stderr:/tmp/shim_output_oMC6Jo -- -namespace default -address -publish-binary /usr/local/bin/crio -id 0bc23d2208d4ff8c407a80cd5635610e772cae36c73d512824490ef671be9293 -debug start
```

Then you can use the `dlv` debugger to connect to it:

```
$ dlv connect localhost:12345
Type 'help' for list of commands.
(dlv)
```

Before doing anything else, you need to to enable `follow-exec` mode in delve.
This is because the first thing that the shim will do is to daemonize itself,
i.e: start itself as a subprocess, and exit. So you really want the debugger
to attach to the child process.

```
(dlv) target follow-exec -on .*/__debug_bin
littlejawa marked this conversation as resolved.
Show resolved Hide resolved
```

Note that we are providing a regular expression to filter the name of the binary.
This is to make sure that the debugger attaches to the runtime shim, and not
to other subprocesses (hypervisor typically).

To ease this process, we recommand the use of an init file containing the above
command.

```
$ cat dlv.ini
littlejawa marked this conversation as resolved.
Show resolved Hide resolved
target follow-exec -on .*/__debug_bin
$ dlv connect localhost:12345 --init=dlv.ini
Type 'help' for list of commands.
(dlv)
```

Once this is done, you can set breakpoints, and use the `continue` keyword to
start the execution of the shim.

You can also use a different client, like VSCode, to connect to it.
A typical `launch.json` configuration for VSCode would look like:

```yaml
[...]
{
"name": "Connect to the debugger",
"type": "go",
"request": "attach",
"mode": "remote",
"port": 12345,
"host": "127.0.0.1",
}
[...]
```

NOTE: VSCode's go extension doesn't seem to support the `follow-exec` mode from
Delve. So if you want to use VScode, you'll still need to use a commandline
`dlv` client to set the `follow-exec` flag.

## Caveats

Debugging takes time, and there are a lot of timeouts going on in a Kubernetes
environments. It is very possible that while you're debugging, some processes
will timeout and cancel the container execution, possibly breaking your debugging
session.

You can mitigate that by increasing the timeouts in the different components
involved in your environment.
5 changes: 5 additions & 0 deletions docs/Developer-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -771,6 +771,11 @@ $ sudo su -c 'cd /var/run/vc/vm/${sandbox_id} && socat "stdin,raw,echo=0,escape=
To disconnect from the virtual machine, type `CONTROL+q` (hold down the
`CONTROL` key and press `q`).

## Use a debugger with the runtime

For developers interested in using a debugger with the runtime, please
look at [this document](Debug-shim-guide.md).

## Obtain details of the image

If the image is created using
Expand Down
101 changes: 101 additions & 0 deletions tools/containerd-shim-katadbg-v2
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/bin/bash
#
# Copyright (c) 2024 Red Hat, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
# This script allows debugging the GO kata shim using Delve.
# It will start the delve debugger in a way where it runs the program and waits
# for connections from your client interface (dlv commandline, vscode, etc).
#
# You need to configure crio or containerd to use this script in place of the
# regular kata shim binary.
# For cri-o, that would be in the runtime configuration, under
# /etc/crio/crio.conf.d/
#

# Use this for quick-testing the shim binary without a debugger
#NO_DEBUG=1

# Edit this to point to the actual shim binary that needs to be debugged
# Make sure you build it with the following flags:
# -gcflags=all=-N -l
SHIM_BINARY=/go/src/github.com/kata-containers/kata-containers/src/runtime/__debug_bin

DLV_PORT=12345

# Edit the following to make sure dlv is in the PATH
export PATH=/usr/local/go/bin/:$PATH

# The shim can be called multiple times for the same container.
# If it is already running, subsequent calls just return the socket address that
# crio/containerd need to connect to.
# This is useful for recovery, if crio/contaienrd is restarted and loses context.
#
# We usually don't want to debug those additional calls while we're already
# debugging the actual server process.
# To avoid running additional debuggers and blocking on them, we use a lock file.
LOCK_FILE=/tmp/shim_debug.lock
if [ -e $LOCK_FILE ]; then
NO_DEBUG=1
fi

# crio can try to call the shim with the "features" or "--version" parameters
# to get capabilities from the runtime (assuming it's an OCI compatible runtime).
# No need to debug that, so just run the regular shim.
case "$1" in
"features" | "--version")
NO_DEBUG=1
;;
esac


if [ "$NO_DEBUG" == "1" ]; then
$SHIM_BINARY "$@"
exit $?
fi


# dlv commandline
#
# --headless: dlv run as a server, waiting for a connection
#
# --listen: port to listen to
#
# --accept-multiclient: allow multiple dlv client connections
# Allows having both a commandline and a GUI
#
# -r: have the output of the shim redirected to a separate file.
# This script will retrieve the output and return it to the
# caller, while letting dlv run in the background for debugging.
#
# -- $@ => give the shim all the parameters this script was given
#

SHIMOUTPUT=$(mktemp /tmp/shim_output_XXXXXX)

cat > $LOCK_FILE << EOF
#!/bin/bash
dlv exec ${SHIM_BINARY} --headless --listen=:$DLV_PORT --accept-multiclient -r stdout:$SHIMOUTPUT -r stderr:$SHIMOUTPUT -- "\$@"
rm $LOCK_FILE
EOF
chmod +x $LOCK_FILE

# We're starting dlv as a background task, so that it continues to run while
# this script returns, letting the caller resume its execution.
#
# We're redirecting the outputs of dlv itself to a separate file so that the
# only output the caller will have is the one from this script, giving it the
# address of the socket to connect to.
#
${LOCK_FILE} "$@" > /tmp/dlv_output 2>&1 &


# wait for the output file of the shim process to be filled with the address.
while [ ! -s $SHIMOUTPUT ]; do
sleep 1
done

# write the adress to stdout
cat $SHIMOUTPUT
exit 0