How the Dontbug Debugger works

Sidharth Kshatriya edited this page Oct 15, 2016 · 50 revisions

Copyright © 2016 Sidharth Kshatriya. This document is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Preface

The Dontbug Debugger is a reverse debugger (aka time travel debugger) for PHP. Please see the README for an overview of its capabilities. This document is a technical overview of how the Dontbug Debugger actually works.

  • If you just want to use Dontbug for its (awesome) reverse debugging abilities you don't need to read this document
  • If you're planning to contribute to Dontbug development or are just curious about how Dontbug gets the job done, please read how to use Dontbug first and then come back to this document

Table of Contents

Introduction

Dontbug is a reverse debugger (aka time travel debugger) for PHP. This means that Dontbug allows you to move backwards and forwards in time while executing PHP scripts. It provides the normal facilities you would expect from debuggers such as breakpoints, ability to inspect the call stack and variables, stepping in/out of code etc. The twist is that it provides these facilities in both forward and reverse debugging flavors. So "step over", for instance, can be done as both "step over forwards" and "step over backwards" in PHP source code.

Dontbug uses capabilities provided by Mozilla/RR to run execution forwards and backwards. But RR does not know anything about PHP. To RR, the PHP interpreter is just like any other program written in C/C++. Dontbug therefore needs to build upon the foundations provided by RR to "expose" a debugger backend to PHP IDEs like Netbeans PHP, Eclipse PDT and PhpStorm etc.

How exactly this happens is something that is the subject of this document.

But for now, as a first approximation, think of Dontbug as a program that sits between RR and the PHP IDE. Here is what happens when you debug a PHP script using Dontbug (this corresponds to what happens during dontbug replay):

 PHP IDE <---> Dontbug <---> GDB C/C++ Debugger <---> RR <---> PHP Interpreter

To the PHP IDE (Netbeans, PhpStorm, EclipsePDT), Dontbug appears just like any PHP debugger engine/backend and communicates with it using the dbgp protocol. The dbgp protocol is the defacto standard for debuggers in the PHP world. Behind the scenes, Dontbug is communicating with RR by sending commands to the GNU GDB debugger. Dontbug talks to GDB using the GDB/MI (Machine Interface) protocol1. GDB in turn talks to RR using the GDB Remote Protocol. Finally RR controls the PHP interpreter using the ptrace API.

Understanding how Dontbug works can often get confusing because the words "breakpoint", "stack" can mean different things in different contexts. To the PHP IDE, "breakpoints" mean PHP script breakpoints. But to GDB and RR, "breakpoints" mean the breakpoints placed in the C code of the PHP interpreter itself (the PHP interpreter is written in the C language). Similarly, the PHP IDE understands the "call stack" as the call stack of the currently executing PHP script (e.g. PHP function foo() calls PHP function bar() etc.). GDB/RR on the other hand understands the "call stack" as the call stack of the PHP interpreter executable /usr/bin/php. It is in the Dontbug implementation where both of these worlds collide: the "C-code" world of GDB/RR and the "PHP script world" of the PHP IDE debugger. Very approximately speaking, Dontbug can be thought of as a translation layer between PHP world and C-world of GDB/RR.

What exactly is the role of GNU GDB? Basically, GDB provides a well understood and widely used API (GDB/MI) for setting breakpoints, inspecting program values, asking for forward/reverse execution etc. Also importantly, GDB allows the creation of backend remote execution engines. RR provides a GDB remote execution engine with reverse execution powers2. It is RR that is actually running the the PHP interpreter backwards/forwards, providing breakpoint functionality and so forth and communicating with GDB via the low level GDB Remote Protocol. Note: GDB also provides its own "native" reverse execution abilities but that's hundreds of times slower than what RR can do. Dontbug uses the reverse/forward execution abilities provided by RR and not GDB.

Now, its important to keep in mind that RR is executing the whole of the /usr/bin/php executable forwards and backwards as required. But, RR does not know anything about PHP scripts. However it is Dontbug that knows about the internal structure of PHP interpreter and uses that knowledge to control the execution of the PHP interpreter. If you want to step forward or back one PHP statement, Dontbug sets the appropriate breakpoints using GDB/MI in /usr/bin/php C-code and asks RR to run forward/backwards accordingly.

Thus you can think of Dontbug as a program which understands the "C-level" execution of the PHP interpreter and uses that "C-level" understanding of the execution to do the corresponding "PHP execution" demanded by the PHP IDE.

Dontbug also uses some facilities provided by the Xdebug debugger to accomplish its goals. Xdebug is the the dominant debugger in the PHP world. Xdebug is a "traditional" debugger in the sense that debugging is only in the forward direction. Xdebug is loaded as a shared object (xdebug.so) by the PHP executable. Now, this might be confusing because you might think, "Dontbug is already a PHP debugger then how does Xdebug come into the picture?" The answer is subtle, but more on that later. Fundamentally speaking, Dontbug does not compulsorily need Xdebug to be present. But the presence of Xdebug has made the implementation of Dontbug much simpler. If you're confused, don't worry, all will be explained in due course.

Now may be a good time to take a detour and understand how Mozilla/RR record/replay execution works. It will help you understand how Dontbug works on a more fundamental level. The RR project homepage has some great videos on how RR works and how to use it. There is a fantastic presentation on RR that should be not be missed. Once you come back, we will look at Dontbug in detail. There is also a brief overview of how RR works also below, if you just want to read that.

Roadmap

How RR Works (Briefly)

RR is a record and replay framework. RR allows you to record the execution of arbitrary programs. Later, you can replay the same program and obtain exactly the same results as before. Lets say we have a mathematical calculation e.g. sin(x) or fibonacci(x) as our program. We don't need to do anything special to replay the program: running the program again on the same input will give you exactly the same result. Alas, most functions/programs are not so well behaved: they read or write from network sockets or from arbitrary files or devices or use random numbers or ask the time(). They can (and usually do) give different results every time they execute. Making executions exactly the same during future runs (for even non-deterministic code) is the core ability of RR.

Now it is probably impossible to decide by just looking at source code to see which functions are pure (like sin(x)) or impure. So RR does something interesting: whenever there is a Linux system call (i.e. a function call handled by the Linux Kernel) it records the results of that system call. All reads/writes to network sockets or files or devices etc. at the end boil down to Linux Kernel calls so simply recording the results of system calls is a way to capture all non-determinism in your program. During the record phase, RR records results of all system calls into a trace log file. During the replay phase, these system calls are not re-executed: the program that is being "replayed" is given an illusion that they were executed. Behind the scenes, RR is injecting the (previously recorded) results of the system calls from the trace log into the program being replayed.

This means that during replay, read/write from a network socket is exactly what it was during record, or a call to read() to a file gives the same data as before etc. so each replayed execution is the same. Note that this is a conceptual simplification: when the program is run again, a network socket is not really created again in the Linux kernel network stack. But as far as the program being replayed is concerned, it does not know any better. All its "system calls" return the same results as before.

Linux provides some powerful APIs that allows RR to implement this functionality. The ptrace(2) API allows a controlling program (RR in this case) to trap all system calls made by a tracee program (the PHP interpreter in our case). During record phase, every time the PHP interpreter makes a system call, RR records the results. During replay, RR reads the trace log and injects the results of these system calls into the program being replayed. This means that a recorded execution trace of PHP interpreter on a particular PHP script can be replayed again and again "exactly" at any point in the future.

This is of course a simplification of how RR works. RR has quite a few optimizations and tricks. For instance, by using the magic of LD_PRELOAD, RR can reduce the huge overhead of the ptrace API for syscall interception by injecting code into the tracee to create a syscall buffer. This syscall buffer allows some system calls to be recorded/replayed with lower overhead. RR also deals elegantly with multithreaded programs by using various CPU performance counters to record when exactly a thread was active and do the same scheduling during replay3.

Please consult RR resources for more details, this is just an overview and does not do full justice to the various techniques and optimizations used by RR to do performant record/replay.

Reverse Execution in RR

Till now we have been talking about record and replay. So where does reverse execution come into the picture? Actually it's the totally deterministic replay that makes reverse execution possible. When you set a breakpoint (using GDB) to anywhere before the current line of code (in the program being replayed), RR replays the program right from the beginning until it hits the breakpoint4. Since there is a guarantee that each replay will be exactly like it was during record, we can do this without worrying that the program will give different results when run again. This is somewhat of a simplification: to prevent RR from having to replay from the beginning every time (which would be very slow), RR maintains checkpoints at various points of time of execution of the tracee. RR can start running the tracee from these checkpoints instead of the beginning. This makes reverse execution performant.

Building a toy PHP debugger on top of GDB & RR

In this section we will learn how to build a toy version of the Dontbug PHP debugger to illustrate some fundamental concepts. In the next section we will see how the Dontbug debugger is actually implemented in its full complexity.

Preliminaries and Implementing Step Into

The PHP interpreter provides various mechanisms to extend/enhance/improve its out-of-the-box functionality. One such mechanism is to allow developers to build a Zend Extension. A Zend Extension is an advanced way to extend/hook into PHP and should not be confused with PHP Extensions which is the more common way to extend/improve/enhance PHP. Most of the time, users install PHP extensions on their installations. It is mainly PHP extensions that talk to SQL databases or popular libraries like LibXML2, bzip2, iconv, image manipulation libraries like imagemagick, GD etc. Zend Extensions on the other hand are more advanced "plugins" that allow you to hook deeply into the execution of the PHP interpreter itself. As an example, Zend Extensions can provide a callback function that will be called by the PHP interpreter every time a PHP statement is going to be executed (the statement handler callback) or a callback that will be called every time a PHP function is entered (function call begin handler) and so forth.

Till now whenever we have talked about the Dontbug Debugger, we have meant the golang executable (dontbug) that communicates with both GDB and a PHP IDE. Actually the Dontbug debugger also consists of a custom Zend Extension (dontbug.so) that is dynamically loaded by the PHP interpreter. The source code for the Zend Extension is in the ext subfolder of the github repository and it is written in C.

At the core of the Dontbug Zend Extension implementation is the dontbug_statement_handler() callback function. Every time the PHP Interpreter is going to call a new PHP statement, it calls this function. Our implementation of the Dontbug PHP debugger is centered around this function.

We're going to understand the actual dontbug_statement_handler() function in the next section. For now lets understand a toy version:

// This function is called before every PHP statement that is
// about to be executed. This is a simplified/toy version.
void dontbug_statement_handler(zend_op_array *op_array) {
  zend_execute_data* execute_data = EG(current_execute_data);

  if (!execute_data) {
      return;
  }

  if (ZEND_USER_CODE(execute_data->func->type) && op_array->filename) {
      char *filename = ZSTR_VAL(op_array->filename); // PHP script filename
      int lineno = execute_data->opline->lineno;     // PHP script line number

      return;  // master breakpoint position
  }
}

Don't worry about the actual mechanics of how we actually register this callback with PHP -- it's done in a very standard way. Let's try to understand the function itself.

The op_array is a structure that consists of the current set of PHP opcodes that are to be interpreted after the function returns. execute_data is a struct that represents the current PHP stack entry. Every time a PHP function calls another PHP function a new zend_execute_data struct is allocated5. We can traverse the full PHP call stack by simply chasing execute_data->prev_execute_data.

See the source code above and locate the return; statement a.k.a the master breakpoint location. Now lets assume the master breakpoint location is in file dontbug.c on line 99. The central question is:

Given the above toy statement handler, how can we implement the debugger command that will take us to the following PHP statement i.e. step_into ?

Note that what is referred to as step-into in dbgp parlance is simply step in GDB parlance.

The answer is straightforward:

  • The PHP IDE sends the Dontbug Debugger i.e. the golang compiled executable (dontbug) a step_into dbgp command on TCP port 9000. In dbgp protocol this looks like: step_into -i 1006
  • Remember that the PHP executable /usr/bin/php is additionally running with the Dontbug Zend Extension (dontbug.so) loaded. It is this shared object that has the dontbug_statement_handler() function defined in it. We now set a GDB breakpoint b dontbug.c:99. (Actually since we use GDB/MI to communicate with GDB, so it actually it looks like -break-insert --source dontbug.c --line 99. Both forms are equivalent)
  • We issue a continue command to GDB. (In GDB/MI it is: -exec-continue). The PHP interpreter starts executing again. We wait for the breakpoint to hit and execution to stop
  • Once the execution stops, we have have reached the next PHP statement successfully. This because we have stopped inside the statement hander function and that function is called every time a new statement is about to be executed. Now we just need to know which PHP script and PHP script line number we have reached so that the PHP IDE can highlight the appropriate line. So, we issue -data-evaluate-expression filename and -data-evaluate-expression lineno GDB/MI commands (equivalent GDB commands would be print filename and print lineno). Lets say the returned filename is /home/sidk/testd/drupal/index.php and lineno is 14
  • Finally Dontbug creates returns back XML to the PHP IDE:
<response xmlns="urn:debugger_protocol_v1" 
    xmlns:xdebug="http://xdebug.org/dbgp/xdebug"
    command="step_into" transaction_id="100"
    status="break" reason="ok">
      <xdebug:message filename="file:///home/sidk/testd/drupal/index.php" 
          lineno="14"></xdebug:message>
</response>

If you notice the use of the xdebug XML namespace for <xdebug:message> understand that it's nothing unusual. We're using the dbgp protocol that allows debuggers to add their own XML extensions for replies. The Xdebug debugger is a particular implementation of the dbgp protocol that is PHP focussed. In this case the Dontbug Debugger uses <xdebug:message> (instead of something like <dontbug:message>) because PHP IDE debuggers expect this specific XML element when an execution break happens.

So we've implemented the ability to do a PHP step_into using GDB/RR. How would we implement reverse step-into? The answer is not difficult. First we have to realize that (currently) PHP Debuggers don't come with separate icons/buttons for forward/backward step-into. So, we need to tell Dontbug to switch to reverse mode. This is accomplished by pressing r <enter> on the Dontbug prompt. When that happens, Dontbug is put into reverse mode. When the user presses the step-into button, the same steps as in the bullet list above are put into operation. The only difference is that we issue a -exec-continue --reverse command instead of a -exec-continue command. This results in the debugger going "back" a single PHP statement. In GDB parlance, we've implemented reverse-step.

This implementation of step_into is actually quite close to how it's implemented in the full Dontbug Debugger. This is because step_into is actually quite a simple operation as we've seen.

Implementing Line Breakpoints

We've been able to implement forward and backward PHP step_into on top of GDB/RR. Given the above dontbug_statement_handler() function, how can we implement breakpoints in PHP scripts? i.e. If, for example, we have a script cat.php and we want a breakpoint on line 30 of that script?

A naive way might be to set the following breakpoint in GDB (the equivalent GDB/MI syntax is slightly different and not provided for brevity):

b dontbug.c:99 if strcmp(filename, "full-path-to/cat.php) == 0 && lineno == 30

(Here filename and lineno are local variables in C-code)

In other words we create a conditional breakpoint. This approach is unviable. The main problem is that its going to be be quite slow. Every time a PHP statement is going to be executed, the CPU is going to have to execute a strcmp(). RR also does not behave nicely when conditional breakpoints involve executing functions (though this may have been a transient issue that has now been fixed).

In summary, while this is a theoretically sound approach, we'll take a better one in the actual Dontbug implementation.

Getting the value of variables and inspecting the call stack

The dgbp protocol allows the debugger front-ends (PhpStorm, Netbeans etc.) to ask the debugger engine (Dontbug) things like:

You will notice that we have not listed property_set. Since Dontbug (via RR) replays a recorded trace, you cannot persistently modify a PHP variable value in the debugger. All variables (and "state") in the PHP script are read-only as far as the debugger is concerned. This limitation is fundamental in the current record/replay architecture. Therefore, there is no hope in being able to implement property_set and we omit it. In practice, not being able to change a PHP variable on the fly during debugging is not such a big limitation as we rarely need to do that.

Coming back to the central question: How would we implement the various dbgp commands above? The answer is: through RR diversion sessions. RR diversion sessions allow you to call arbitrary functions defined in the tracee program codebase. These functions are executed in a fork() session so they don't interfere with the program being replayed.

The PHP interpreter supports eval in which you can evaluate any PHP expression. An interesting eval function is the zend_eval_stringl function that will PHP evaluate a string. So if we want to get the value of a specific variable $foo in a diversion session in RR we can execute print zend_eval_stringl("var_export($foo, true)", ...) in GDB7. To get the stack trace we can call the PHP function debug_backtrace() via zend_eval_stringl. Generally speaking though, an eval is unnecessarily "heavy-handed" for things like getting the stack trace or stack depth. We can do things in a more lightweight fashion. We can actually call any C function in the PHP interpreter code base. We can create helper C functions in the Dontbug Zend Extension that call Zend API functions to implement property_get,stack_get etc. that we can subsequently call in diversion sessions. (The Zend API is the API of the core of PHP Interpreter called the Zend Engine).

So here is the approach we could take:

  • Implement helper functions in C corresponding to property_get, stack_get in the Dontbug Zend Extension using the Zend API. Program these helper functions in such a way that they return XML strings (as per dbgp protocol) so that GDB/MI picks it up and passes it onto Dontbug
  • When the PHP IDE sends the dbgp commands to Dontbug, execute the corresponding helper function in a RR diversion session. These diversion sessions will execute in a fork() and will have the correct PHP state they need to when they start executing. The fork() takes place from wherever the replayed program is currently paused at in GDB.

This is a viable approach and during initial experiments, Dontbug was to be implemented like this. But this approach is actually quite arduous. Now property_get, stack_get, property_value etc. all have quite complicated XML schemas and would need to be implemented from scratch by calling the appropriate Zend API. But the Xdebug codebase already implements all the dbgp commands like property_get, property_value, stack_get !!

Idea: What if the Xdebug codebase was available to called by the Dontbug Zend Extension?

To put this idea into practice, we do the following during dontbug record:

  • When dontbug record happens, we make sure the Xdebug Zend extension (xdebug.so) is loaded by /usr/bin/php. We also create a debug listener at TCP port 9000
  • Xdebug connects to TCP port 9000. Xdebug thinks that it is connecting to a PHP IDE, but actually it's just connecting to the golang executable dontbug that temporarily behaves like a PHP IDE would. There are no breakpoints set in this phase and dontbug golang executable keeps asking Xdebug to run

At every point in time during recording, Xdebug ends up maintaining the appropriate data structures in its local and global variables but its services are never required. During replay we can call Xdebug functions in RR diversion sessions. We can do that because Xdebug was part of the recording and has got the appropriate global and init state "locked into" the recording.

So in summary, in the full Dontbug implementation, we call into Xdebug code (via RR Diversion sessions) to implement functionality that needs us to get the value of PHP variables or inspect the call stack. When it comes to breakpoints and script forward/reverse execution we use the GDB/RR infrastructure.

Implementing Debugger Functionalities

In this section we look at how things are actually implemented in the Dontbug debugger. We are no longer talking about the toy implementation.

This is the full dontbug_statement_handler() function. Please look at this function carefully. There are some new things here, but everything will be explained.

void dontbug_statement_handler(zend_op_array *op_array) {
    zend_execute_data* execute_data = EG(current_execute_data);

    if (!execute_data) {
        return;
    }

    if (ZEND_USER_CODE(execute_data->func->type) && op_array->filename) {
        // Here just for gdb purposes
        char *filename = ZSTR_VAL(op_array->filename);
        // php line number
        int lineno = execute_data->opline->lineno;

        // stack depth
        // XG is the macro for the xdebug globals variable
        // XG(level) is equivalent to xdebug_globals.level
        unsigned long level = XG(level);

        // level related breakpoints
        dontbug_level_location(level, filename, lineno);

        // Pass the zend_string and not the cstring
        dontbug_break_location(op_array->filename, execute_data, lineno, level);

        return;  // master breakpoint position
    }
}

Implementing Line breakpoints

Let's understand how Dontbug implements line breakpoints with an example. Say you have a PHP project with PHP files within the /home/sidk/php-play folder. Assume that the PHP files don't include() or require() any PHP files outside this directory. In other words, the /home/sidk/php-play folder is the "PHP source root directory" (see dontbug record --help for more details on PHP source directories).

Furthermore, assume that the php-play directory contains the following PHP files:

/home/sidk/php-play/new/print.php
/home/sidk/php-play/new/list.php
/home/sidk/php-play/index.php
/home/sidk/php-play/tst.php
/home/sidk/php-play/print.php
/home/sidk/php-play/test.php
/home/sidk/php-play/foobar.php
/home/sidk/php-play/another.php
/home/sidk/php-play/phpinfo.php

So if we run this project (with index.php as the initial PHP file), the PHP interpreter may need to interpret any of these PHP files, depending on how these files call each other. Now, if we wish to debug this project in PHP IDE like PhpStorm, Dontbug should allow breakpoints to be placed in any of these files.

To accomplish this requirement, when we do dontbug record /home/sidk/php-play, Dontbug compiles the dontbug.so Zend extension (whose source code is the ext/dontbug folder of the github repo) along with some dynamically generated C code which is specific to the /home/sidk/php-play project. Here is an excerpt of the generated code:

// file dontbug_break.c
void dontbug_break_location(zend_string* zfilename, zend_execute_data *execute_data, int lineno, unsigned long level) {
    zend_ulong hash = zfilename->h;
    char *filename = ZSTR_VAL(zfilename);

    if (hash == Z_UL(15356533969297793014)) {
        // hash == 15356533969297793014
        return; //### /home/sidk/php-play/test.php
    } else if (hash < Z_UL(15356533969297793014)) {
        if (hash == Z_UL(13944496156718642123)) {
            // hash == 13944496156718642123
            return; //### /home/sidk/php-play/new/list.php
        } else if (hash < Z_UL(13944496156718642123)) {
            // hash == 10514725896023383951
            return; //### /home/sidk/php-play/foobar.php
        } else {
            if (hash == Z_UL(14936857711330785703)) {
                // hash == 14936857711330785703
                return; //### /home/sidk/php-play/another.php
            } else {
                // hash == 14959553872593471658
                return; //### /home/sidk/php-play/phpinfo.php
            }
        }
    } else {
        if (hash == Z_UL(17446521398817570172)) {
            // hash == 17446521398817570172
            return; //### /home/sidk/php-play/new/print.php
        } else if (hash < Z_UL(17446521398817570172)) {
            // hash == 17235116854503738705
            return; //### /home/sidk/php-play/tst.php
        } else {
            if (hash == Z_UL(17926887926721582222)) {
                // hash == 17926887926721582222
                return; //### /home/sidk/php-play/index.php
            } else {
                // hash == 17926897948860496515
                return; //### /home/sidk/php-play/print.php
            }
        }
    }
}

This code looks complicated but is actually quite simple; please read it. The dontbug_break_location() function is generated automatically behind the scenes and gives Dontbug the ability to add breakpoints in C code that maps back to line breakpoints in PHP scripts.

The function dontbug_break_location() is called by dontbug_statement_handler() function. Now strings in PHP are represented by the zend_string struct. Each zend_string has a hash, calculated using the "Daniel J. Bernstein, Times 33 with Addition" string hashing algorithm (DJBX33A), which makes it easy to compare two strings for equality (assuming we don't have a hash collision, of course).

Furthermore, in the above code:

  • The PHP filename for a file being interpreted is the absolute, full system path with any symlink paths converted to the real path
  • The variable hash represents the DJBX33A hash of the full path of the PHP file (filename) that is currently being interpreted andlineno is the line number of the current PHP file being interpreted
  • Z_UL(1234) is just a Zend macro for the C literal 1234ULL i.e. unsigned long long on 64-bit platforms and 1234UL i.e. unsigned long on 32-bit platforms

Each PHP file in /home/sidk/php-play has its own if/else/else if with a return; statement in the generated source e.g. for /home/sidk/php-play/tst.php we have:

// dontbug_break.c
// Assume the code has the source line numbers as indicated in the comments
...
...
/*line 54*/        } else if (hash < Z_UL(17446521398817570172)) {
/*line 55*/            // hash == 17235116854503738705
/*line 56*/            return; //### /home/sidk/php-play/tst.php
/*line 57*/        } else {
...
...

So if we want to place a breakpoint in file tst.php line 333 we can place the following breakpoint using GDB:

b dontbug_break.c:56 if lineno == 333

Elaborating, this is the breakpoint setting workflow:

  • PHP IDE sends dbgp command breakpoint_set that looks like breakpoint_set -i 33 -t line -f file:///home/sidk/php-play/tst.php -n 333
  • Dontbug looks up source line number at which return; //### /home/sidk/php-play/tst.php is mentioned in a golang map[string]int. This map is prepared in advance by processing the dontbug_break.c file (very early during dontbug replay)
  • Dontbug places the breakpoint using GDB/MI
  • In the future, whenever this breakpoint is hit while running forward/backward, Dontbug returns the appropriate XML back to the PHP IDE

This approach has many advantages over the naive approach discussed earlier:

  • No function needs to be executed (e.g. the strcmp function that needed to be executed in the conditional breakpoint in the naive approach)
  • If PHP is not interpreting tst.php, the else if statement (line 54 onwards) is never true and GDB does not even need to evaluate the conditional breakpoint placed on line 56 (thus saving us from the killing performance hit from a conditional breakpoint whose condition was not met).

Look at dontbug_break_location(). At each C code indentation level you only have a single if and else (with a single else if sometimes). Inside the if/else/else we may have nested if/else if/elses. This generated code is structured as a binary search for a matchinghash. Because its a binary search its extremely fast and this approach can scale.

Consider this: A vanilla Dupal 8 install has ~9000 PHP files. Lets say the Drupal PHP source is at /home/sidk/testd/drupal82. When we do drupal record /home/sidk/testd/drupal82 we generate a file dontbug_break.c which has a huge dontbug_break_location() function in dontbug_break.c. This generated function is approximately 36000 lines of C code long!

void dontbug_break_location(zend_string* zfilename, zend_execute_data *execute_data, int lineno, unsigned long level) {
    zend_ulong hash = zfilename->h;
    char *filename = ZSTR_VAL(zfilename);

    if (hash == Z_UL(13945946094730388462)) {
        // hash == 13945946094730388462
        return; //### /home/sidk/testd/drupal82/core/lib/Drupal/Core/Theme/ActiveTheme.php
    } else if (hash < Z_UL(13945946094730388462)) {
        if (hash == Z_UL(11634453371230061943)) {
            // hash == 11634453371230061943
            return; //### /home/sidk/testd/drupal82/vendor/phpunit/phpunit/src/Framework/Constraint/Attribute.php
        } else if (hash < Z_UL(11634453371230061943)) {
            if (hash == Z_UL(10424772969510268645)) {
                // hash == 10424772969510268645
                return; //### /home/sidk/testd/drupal82/core/modules/taxonomy/src/Tests/Views/TaxonomyRelationshipTest.php
            } else if (hash < Z_UL(10424772969510268645)) {
                if (hash == Z_UL(9809122038075710629)) {
                    // hash == 9809122038075710629
                    return; //### /home/sidk/testd/drupal82/core/modules/system/tests/modules/keyvalue_test/keyvalue_test.module
                } else if (hash < Z_UL(9809122038075710629)) {
...
... etc. etc. so that each PHP file within /home/sidk/testd/drupal82 is covered
...

But traversing dontbug_break_location() function (which is placed in dontbug.so) when /usr/bin/php runs is extremely fast. In the worst case you would need to evaluate approximately log2(9000) or approx 14 if statements (which is hardly anything).

Another interesting thing about dontbug_break_location() is that you always end up in a return; statement and nothing useful is done by the function (except serve as a source code "bed" where you can place breakpoints via GDB/MI). So we must always compile the dontbug.so Zend extension as -g -O0 i.e. with debugging information and without any optimizations. If we applied optimizations, the C compilation would probably remove most of the code in dontbug_break_location() as dead code, rendering the whole exercise useless.

Implementing Step Over and Step Out

In this section we will learn how step-over and step-out is implemented in Dontbug. Step-over is a suprisingly subtle operation and so we will review what step-over is before learning how it's implemented. Step-out is very similar to step-over and we'll be able to understand it after understanding step-over.

Refer to the example PHP program below. Let's say the PHP Debugger is currently at line 2, in function foo(). Also let's assume foo() was originally called from line 20 which is the outermost level in this script (the outermost level is often referred to as {main} in PHP).

function foo() {   // line 1
    $a = 1;        // line 2
    bar();         // line 3
    buzz();        // line 4
    return $a;     // line 5
}                  // line 6
                   // line 7
function bar() {   // line 8
    $b = 2;        // line 9
    buzz();        // line 10
    return $b;     // line 11
}                  // line 12
                   // line 13
function buzz() {  // line 14
    $b = 2;        // line 15
    $c = 3;        // line 16
    return $c;     // line 17
}                  // line 18
                   // line 19
foo();             // line 20
$l = 3;            // line 21

Assume we start doing step-over from line 2 onwards. Successive step-overs will take the program from:

line 2 -> line 3 -> line 4 -> line 5 -> line 21

Note:
- At line 2 the call stack is {main} => foo() (call stack 2 levels deep)
- At line 3 the call stack is {main} => foo() (call stack 2 levels deep)
- At line 4 the call stack is {main} => foo() (call stack 2 levels deep)
- At line 5 the call stack is {main} => foo() (call stack 2 levels deep)
- At line 21 the call stack is {main}         (call stack  1 level deep)

What if we were doing a reverse step-overs and started out on line 21 ? Our execution trace would simply be line 21 -> line 20. In both those source locations our call stack is simply {main} and we have a call stack depth of 1.

Observations for step-over:

  • When a single step-over is takes place the call stack depth either remains the same or can decrease if nothing is left to step-over in the current function
  • The call stack depth can never increase in a step-over
  • Stack depth delta is always either -1 or 0 between two successive step-overs
  • These observations hold true for step-over in both forward and reverse directions

Lets come back to the forward step-over example. When we start stepping-over at line 2 the stack-depth is 2 and it remains at 2 and eventually becomes 1. So while stepping over, the stack depth should always be <= 2. In other words, if we start stepping at stack depth x, the step-over constraint will be to stay at stack depth <= x.

Let's contrast this with step-into. Assume we start out on line 2. Successive step-intos will take us from:

line 20 -> line 2 -> line 3 -> line 9 -> line 10 -> line 15 -> line 16 -> line 17 ->
 line 11 -> line 4 -> line 15 -> line 16 -> line 17 -> line 5 -> line 21

Note:
- At line 20 the call stack is {main} (1 level deep)
- At line 2 the call stack is {main} => foo() (2 levels deep)
- At line 3 the call stack is {main} => foo() (2 levels deep)
- At line 9 the call stack is {main} => foo() => bar() (3 levels deep)
- At line 10 the call stack is {main} => foo() => bar() (3 levels deep)
- At line 15 the call stack is {main} => foo() => bar() => buzz() (4 levels deep)
- At line 16 the call stack is {main} => foo() => bar() => buzz() (4 levels deep)
- At line 17 the call stack is {main} => foo() => bar() => buzz() (4 levels deep)
- At line 11 the call stack is {main} => foo() => bar() (3 levels deep)
- At line 4 the call stack is {main} => foo() (2 levels deep)
- At line 15 the call stack is {main} => foo() => buzz() (3 levels deep)
- At line 16 the call stack is {main} => foo() => buzz() (3 levels deep)
- At line 17 the call stack is {main} => foo() => buzz() (3 levels deep)
- At line 5 the call stack is {main} => foo() (2 levels deep)
- At line 21 the call stack is {main} (1 level deep)

There is no difference between forward and reverse mode execution trace when it comes to step-into forwards or step-into backwards. If we started out at line 21 and did successive backward step-into's the execution trace would be:

line 21 -> line 5 -> line 17 -> line 16 -> line 15 -> line 4 -> line 11 -> line 17 -> line 16 -> line 15
  -> line 10 -> line 9 -> line 3 -> line 2 -> line 20

The calls stack looks the same whether we are in forward or reverse direction so we don't repeat call stack details again.

Observations for step-into:

  • Stack depth can either increase or decrease or remain the same
  • Stack depth delta is either -1, 0, +1 between two step-intos

Implementation of Step Over

In a previous section we explored how it would be advantageous to do a record with xdebug.so loaded and use some its functionality during the replay phase. We implement stepping-over using by using a little bit of help from Xdebug.

Let's review the dontbug_statement_handler() function. Only the relevant portions are repeated here. See here for a full listing.

void dontbug_statement_handler(zend_op_array *op_array) {
...
        // stack depth
        // XG is the macro for the xdebug_globals variable
        // XG(level) is equivalent to xdebug_globals.level
        unsigned long level = XG(level);

        // level related breakpoints
        dontbug_level_location(level, filename, lineno);
...
}

In the above code snippet, we simply obtain the current call stack level by accessing Xdebug global variable xdebug_globals and its struct member level. We could have calculated the stack depth by simply chasing execute_data->prev_execute_data (and so on) but there could be edge cases (e.g. internal functions) and its just simpler use what Xdebug has already calculated.

Incidentally, Xdebug is also a Zend Extension. The Xdebug Zend Extension is activated before Dontbug and so its statement call handler (the equivalent of dontbug_statement_handler() is always called before Dontbug's), so level should have the appropriate and upto date stack depth value in it.

Once we have the stack depth available in the level variable, we call the dontbug_level_location() function which is where we place some breakpoints (in the same vein as we do in dontbug_break_location()). This function looks like:

 // This function is generated in dontbug_break.c, after dontbug_break_location()
 void dontbug_level_location(unsigned long level, char* filename, int lineno) {
     int count = 0;

     if (level <= 0) {
         count++; //$$$ 0
     }
     if (level <= 1) {
         count++; //$$$ 1
     }
     if (level <= 2) {
         count++; //$$$ 2
     }
     if (level <= 3) {
         count++; //$$$ 3
     }
     if (level <= 4) {
         count++; //$$$ 4
     }
     if (level <= 5) {
         count++; //$$$ 5
     }
     if (level <= 6) {
         count++; //$$$ 6
     }
     ...
     ...
     ...
     if (level <= 253) {
         count++; //$$$ 125
     }
     if (level <= 254) {
         count++; //$$$ 126
     }
     if (level <= 255) {
         count++; //$$$ 127
     }
 }

As we saw above, if we start stepping-over at stack depth x, the step-over constraint is stack depth <= x. We can use the above function to place the appropriate stack depth constraint breakpoint and wait for it to be hit. When the breakpoint hits, the execution of /usr/bin/php is paused (and so is the the execution of the PHP script) and can tell the PHP IDE that we've reached the next step-over statement.

This is the step-over workflow:

  • The PHP IDE sends the Dontbug debugger a dbgp step_over command
  • Dontbug ascertains the current stack depth by reading the level local variable in dontbug_statement_handler() via GDB/MI. Lets say the stack depth was 5
  • Dontbug places a breakpoint at count++; i.e. within the if (level <= 5) statement in dontbug_level_location() function:
...
     if (level <= 5) {
         count++; // GDB breakpoint PLACED HERE via GDB/MI
     }
...
  • Depending on whether we are in forward or reverse mode, Dontbug either issues a continue or reverse-continue command (in GDB/MI this is -exec-continue and exec-continue --reverse respectively)
  • Dontbug waits for the breakpoint to hit. Once its hits, Dontbug knows that we've reached the next step-over PHP statement
  • Dontbug now positions the instruction pointer at the "master breakpoint position". The master breakpoint position is a position in the dontbug_statement_handler() function (see the function listing). Its a sort of "default" position and the Dontbug debugger always re-position's the program there once breakpoints have been hit (usually in places like dontbug_break_location() and dontbug_level_location() functions). It gives the Dontbug Debugger a reliable position to start the processing the next dbgp command. If you're wondering "How did we have access to the level local variable" in bullet 2 of this list, that was because we were on master breakpoint position when the step_over function was issued. If you're confused by the concept of the "master breakpoint", ignore it. This is only necessary to understand if you wish to develop for Dontbug.
  • Since we're in the dontbug_statement_handler() function after coming back to the master breakpoint, we can access the the filename and lineno local variables. Using GDB/MI commands -data-evaluate-expression filename and -data-evaluate-expression lineno we obtain the current values of these variables
  • We then return back the appropriate XML to the PHP IDE saying that we're currently located at PHP filename filename and source line number lineno

Some finer points on dontbug_level_location() implementation:

  • If you're wondering what the purpose of count++ is, the answer is that there is no real purpose. We're really interested in the source line positions where we can place a breakpoint -- that's all. We could have as well put any other C statement there
  • Implicit in the construction of the dontbug_level_location() is that the stack won't be deeper than 256 levels. This is more than enough for most programs. You may change that by adding the configuration max-stack-depth: 300 (or any other "sane" integer) in the $HOME/.dontbug.yaml configuration file
  • Unlike the dontbug_break_location() function where we calculated that a breakpoint in a Drupal PHP installation would result in ~14 if statements being evaluated in the worst case, the overhead of going through dontbug_level_location() is fixed at 256 if statements. But this is not a problem for interpreter performance during record/replay, because, unlike the binary search like structure of dontbug_break_location() in which we're jumping around non-determinately in code, this is more or less straight line code8

Implementation of Step Out

We can quickly understand step-out as it's similar to step-over. Unlike step-over in which the call stack level must remain the same or decrease, in step-out the call stack depth must strictly decrease. This makes sense: we want to exit the current function which will lead to the reduction of the stack depth by 1. Step-out is often called finish in GDB parlance and you can probably understand why.

So if we're on call stack level 5 and we want to do a step-over we're going to place a breakpoint inside the if (level <= 5) statement (as we saw above). But if we want to do a step-out the stack level needs to be < 5 after the step-out. So we place a breakpoint inside the if (level <= 4) statement in the dontbug_level_location() function. In other words, if we are at stack depth x, the step-out constraint will be <= x - 1.

Just like we have step-out in forward direction we have a reverse step-out which is called reverse-finish in GDB parlance. In reverse-finish we get to the source line where are just about to enter the current function we were originally in. In Dontbug we provide users the ability to do both step-out and reverse step-out (by pressing the step-out button in the PHP IDE) depending on whether the reverse mode is enabled or not.

Getting the value of variables and inspecting the call stack

In a previous section we built a case for calling code in Xdebug to implement reponses to "What is the current state of the PHP program?" kind of dbgp commands like context_get, context_names, property_get, property_value stack_depth, stack_get and so forth. In this section, we learn how these we implement these commands in Dontbug. (Hint: it involves RR Diversion sessions. Please read above if you don't know what that is).

Preliminaries:

  • Xdebug has a function called xdebug_dbgp_parse_option() whose job it is to parse the incoming dbgp command, process it and return back the appropriate XML
  • The Dontbug Zend Extension has a wrapper/helper function char * dontbug_xdebug_cmd(char *). It receives a dbgp command e.g. "stack_get -i 101", sets up the required argument list and calls xdebug_dbgp_parse_option()

Armed with this knowledge, here is the workflow:

  • The PHP IDE sends Dontbug a dbgp command e.g. stack_get -i 101
  • Dontbug calls dontbug_xdebug_cmd("stack_get -i 101") via GDB/MI. In GDB/MI syntax it is:
-data-evaluate-expression dontbug_xdebug_cmd("stack_get -i 101")
  • The dontbug_xdebug_cmd() function is executed in a RR diversion session. The function is really a wrapper function and the real work is done by the xdebug_dbgp_parse_option() function
  • Once RR is done executing dontbug_xdebug_cmd(), it return a string which is the XML response. Dontbug forwards this XML response to the PHP IDE

One important thing to note is that, currently, the xdebug_dbgp_parse_option() command is defined with static linkage. A very small patch converts this to extern linkage so that dontbug_xdebug_cmd() may call xdebug_dbgp_cmd_option() in a diversion session.

Summary

We've seen how Dontbug implements a reversible debugger using various techniques and technologies.

Here is a summary of the various dbgp commands handled by Dontbug:

For more details on the various dbgp commands, consult the dbgp protocol reference. Final Notes:

  • Dontbug will fail it receives any other dbgp command. Either Dontbug will reply to the PHP IDE with an XML message indicating failure or abort with a fatal error
  • As discussed earlier, Dontbug does not allow users to change a PHP variable on the fly during debugging so something like property_set will result in an error
  • Some of the features of various dbpg commands are not currently implemented e.g. we cannot set "exception breakpoints" in breakpoint_set

If you want to go into even more detail than what has been presented in this document, please look at the Dontbug source code.

Footnotes

[1] Dontbug uses the excellent golang library cyrus-and/gdb to send GDB/MI commands to GDB

[2] GDB internals make for very interesting reading. Here is an excellent resource

[3] Note that RR runs all programs on one core only, even if you have a multicore CPU. Therefore RR supports concurrency but not parallelism

[4] Its slightly more complicated: RR needs to hit the breakpoint as many times as possible and only take the last hit before it comes back to the current instruction. Because only that would be the correct "reverse" run

[5] This is a slight simplification: PHP functions can also call special C functions which are registered with the PHP interpreter. These "internal" functions also will get their own zend_execute_data structure as of PHP 7.0

[6] The -i flag is a just a monotonically increasing sequence number required by the dbgp protocol so that each command can be uniquely referred to in the debug engine's response

[7] In practice, we would not call zend_eval_stringl directly from GDB but we would create a helper function in the Dontbug Zend extension code which would setup all the required arguments before calling zend_eval_stringl but the concept remains the same

[8] Strictly speaking since there are ifs, jumps are present but these are small fixed jumps compared to variable size jumps in dontbug_break_location() and therefore should be executed by the CPU faster