Skip to content

tutorials vrx_docker_debug_info

crvogt edited this page Aug 23, 2023 · 10 revisions

Get Basic Debugging Information

If your competitor image is not working as expected, it's helpful to know the following:

  • Whether your image is running at all.
  • Whether it exits early or stays running the entire trial.
  • Whether it is running the correct entrypoint.

In this tutorial we will walk through how to get this information using docker commands.

Debugging Steps

Preparation

Before you begin, open two terminals:

  • In one terminal, we will run the trial you want to debug, using the testing instructions.
  • In the the other terminal we will run some Docker commands to get information about your container.

Is your container running?

  • Begin running your trial as described in the testing tutorial, using the command:
    ./run_trial.bash -n $TEAM $TASK $TRIAL
    
  • While you are running the trial in one terminal, execute the following in the other:
    docker container ls
    
    This will list all currently running containers.
  • While the simulation is still starting up, the output of the above command will be empty (other than the list headers):
    CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
    
  • If you repeatedly run the docker container ls command, you should eventually see the VRX server listed, then both the server and your competitor image listed as running at the same time. For example:
    CONTAINER ID   IMAGE                                    COMMAND                  CREATED          STATUS                  PORTS       NAMES
    efb815475139   virtualrobotx/vrx_2022_simple:test       "/ros_entrypoint.sh"     1 second ago     Up Less than a second               vrx-competitor-system
    61768582e987   vrx-server-noetic-nvidia:latest          "/vrx_entrypoint.sh …"   10 seconds ago   Up 6 seconds            11345/tcp   vrx-server-system
    
  • Note that you can automate the process of repeatedly running the docker container ls command using the watch command:
    watch docker container ls
    
    This will show the output of docker container ls and by default will dynamically refresh every 2 seconds.

If you only see the server image listed, or if your image only appears for a short period of time, then it is most likely exiting early. To find out more about what happened, we can check the exit status.

Preserving terminated containers

  • By default, the run_trial.bash script cleans up its server and competitor containers before exiting.
  • When debugging an image it is often useful to disable this behavior.
  • To do this, use a text editor to open the run_trial.bash script at the root of the vrx-docker repository.
  • Go to the bottom of the script and comment out the second-to-last command:
    # Kill and remove all containers before exit
    #${DIR}/utils/kill_vrx_containers.bash
    
    exit 0
    
  • Save the file.
  • You will now be able to use docker commands to get information about these containers after the run has terminated.
  • When you are finished debugging you can reverse this change.

Get exit status of your container

These instructions assume you have modified your run_trial.bash script as described in the previous section.

  • If your container is exiting early, you can check the exit status:
    docker ps -a | grep "CONTAINER\|vrx-competitor-system"
    
  • If this command produces no output except the header row, then it is possible that your image did not run at all.
    • In this case, the most likely culprit is the spelling of the url in your dockerhub_image.txt.
    • A second possibility is that your image is stored in a private repository and you do not have access from your terminal.
    • This situation should be very noticeable because the entire trial will exit early when your image cannot be downloaded.
    • In either case, the best way to troubleshoot is to re-run the validation tutorial and make sure you can pull your image.
  • If your container ran, but exited early, the output of the command should like something like:
    CONTAINER ID   IMAGE                                COMMAND                  CREATED              STATUS                            PORTS     NAMES
    1bd20825263d   virtualrobotx/vrx_2022_simple:test   "/bin/bash -c 'catki…"   About a minute ago   Exited (127) About a minute ago             vrx-competitor-system
    
  • In this example, the output tells us:
    • the container did run, but it exited about the same time it was created
    • The exit code was 127, which means it could not find the command it tried to run, according Docker's exit code documentation.
    • The command it tried to run began with /bin/bash -c 'catki…. This command is abbreviated, but it would be possible to see the expanded version using the --no-trunc option. For example:
      docker ps -a --no-trunc | grep "vrx-competitor-system"
      
    • In our case, however, we can already guess what is wrong. The correct entrypoint for the container should be /ros_entrypoint.sh, so the command listed should also be /ros_entrypoint.sh.
    • Instead, the image has been misconfigured to call a catkin tool on startup.
    • Since nothing is running ros_entrypoint.sh, setup.bash has not been sourced, and the container is crashing because it can't find the requested tool in its path.

Note: Avoid building at runtime

  • Using the --no-trunc option reveals that the intended command was catkin_make.
  • This demonstrates a second common error: attempting to build software in the container at runtime.
  • Since our Docker image represents the WAMV platform, this is analoguous to compiling software on the WAMV after it has been placed in the water.
  • Although there may be some exceptional cases where building on the fly might make sense, generally the best practice is to build any required system software into the image during the image build process.
Back: Troubleshooting Prerequisites Up: VRX Docker Image Overview Next: Examine a Running Container
Clone this wiki locally