Skip to content

Compiler and Code Generator Debugging

Jiří Malák edited this page Aug 29, 2024 · 6 revisions
Table of Contents

This page contains notes and tips related to debugging the Open Watcom compilers and especially the code generator. It is assumed the reader is familiar with the pertinent Compiler Architecture.

Isolating Problems

An important first step is isolating the problem. This may take some effort but is usually well worth the trouble. The vast majority of bugs can be condensed into testcases with under five lines of code; this is extremely helpful in making it much easier to zero in on the issue and not be distracted by irrelevancies. Not infrequently, isolating the problem also provides clues as to what might be causing it.

For C code, if the initial testcase is complex, it is recommended to run it through a preprocessor and strip away anything that does not directly contribute to the problem. Typedefs may be removed, macros are (obviously) expanded and conditionally compiled code does not get in the way. Function bodies should be replaced by extern declarations to the maximum extent possible.

For debugging front end issues, it is necessary to get rid of headers and as many redundant declarations as possible so that they don't get in the way. For back end problems, headers don't matter much but it is critical to minimize the amount of generated code and reduce its complexity. The only exception to this rule is a situation where the compiler crashes (not that this ever happens!), in which case executing the compiler under debugger control may be enough.

Tip: If the compiler is not crashing but dying with an internal error, place a breakpoint on a function called Zoiks.

Debug Builds of Compilers

The next step in debugging the compilers is getting a debug build. Apart from having full debugging information, debug builds include a number of routines to dump internal data structures, and especially in the case of the C++ front end also a number of additional sanity checks.

Depending on one's area of interest, a debug build of the code generator may or may not be needed. For work that is restricted purely to the language front end, this is not necessary. For anything involving lower level code generation, debug cg is needed. It is often useful even for bugs in the front ends where the calls to the cg are incorrect. Pre-made debug directories are supplied for Win32 and OS/2, for instance bld/cg/intel/386/nt386.dbg (Win32-hosted 386 code generator). Simply run wmake in the appropriate directory. Feel free to create new ones for your host platform.

Similarly, there are pre-made debug directories for the front ends, such as bld/cc/dnt386.386 (Win32-hosted 386 C front end). Note that the makefiles turn on debugging and wmake may be run without any additional input, unlike the usual situation where debugging needs to be explicitly requested.

Debugging Front/Back End Interface

Once a debug compiler is built, a number of avenues are available. One of the simplest is tracing of the cg library calls. This is accomplished using the -lc switch (C compiler only), or specifying #pragma on( dump_cg ) in the source. Understanding the output takes a little practice, but isn't tremendously difficult.

Tracing the cg interface calls is often a good starting point when it isn't clear whether a problem is in the front end or back end. A review of the call trace typically shows whether the generated code corresponds to the calls (and hence the front end is incorrect) or the generated code is different (and hence the back end is incorrect).

Debugging the C Front End

The C front end does not offer a whole lot in the way of debug instrumentation and hence a debugger with cleverly planted breakpoints and strategically selected watches is by far the most productive tool.

There are, however, several useful routines callable from the debugger. Most of them are located in cfedump.c, but a few are also in cmac1.c.

The most basic way to use those routines is to simply execute them from the debugger using the call command, such as executing

call DumpProgram

on the debugger command line (reminder: the command line may be brought up by hitting colon, ala vi). This will execute the routine and print information on the debuggee's console.

A more interesting way is capturing the program's output and displaying it within the debugger. Note that this works well on OS/2 but may not work on other platforms, depending on debugger trap file capabilities.

vc DumpProgram

will call the DumpProgram and View its Captured output in a debugger window.

The above mentioned DumpProgram routine will print a human-readable version of the cfe's representation of the source module. Examine the cfedump.c module to see which functions are available. Keep in mind that if a function takes arguments, you must supply them.

For instance the DumpStmt function takes a single argument of TREEPTR type. If you call it without any argument, it's almost certainly going to crash. But suppose you're debugging the AddrFold function in cdinit.c and the debugger stopped in the middle of it. You could run

vc DumpStmt(CurFuncNode)

to see what goodies the CurFuncNode global is pointing to. But you can also execute

vc DumpStmt(tree)

to dump what the tree argument to AddrFold represents.

The debugger includes a good expression evaluator so you shouldn't feel constrained in what information you can dump; you could for instance run

vc DumpStmt(tree->left->right)

to dig a little deeper into the expression tree.

Debugging the C++ Front End

The C++ front end provides a good amount of debug instrumentation in several flavours. There is considerable passive debugging help in the form of debug assertions. In some cases, simply running a debug version of the compiler will trigger an assertion and point in the direction of the problem.

The C++ compiler also provides a way to dump various internal structures to the console. This is controlled through #pragma statements. See bld/plusplus/notes/debug.txt for further information.

Debugging the Code Generator

For diagnosing many problems related to code generation, a good way to start is with a breakpoint at the call to FixReturns in function Generate (in generate.c). Generate is called for every function in the program to generate its code.

The code generator provides a wealth of debug routines to print internal data structures. These are mostly located in modules whose names start with dump. Taking a brief look at these modules to see what's available is highly recommended (needless to say, implementing additional dumping functions is always an option).

The most useful function is DumpBlk. This function will dump the pseudo-assembly representation of the current function, with some additional information. While this pseudo-assembly is not formally documented, reading it is easy for anyone passingly familiar with assembler. All instructions are simple operations such as MOV, ADD, XOR, CALL, NOP. Results and operands are 'names', often machine registers or memory locations, but also constants and (initially) temporaries. It is often instructive to watch how the pseudo-assembly changes in the process of optimization and code generation, and running DumpBlk after each function call and viewing its output often shows where the code generation went wrong.

An example may be helpful. Consider the following C source code:

int foo( int a, int b )
{
    return( a > b ? 3 : 4 );
}

Running DumpBlk early inside Generate will show the following output:

002A93B0 Block 1(1) L002A4260 foo Depth 0
----Jmp -------------LBL -------------------------------------------------------
00000000           Origins: 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A94F0 (       1):  nop             XX  
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A95F0 (       2):  parm            I4  ==> EAX 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A9670 (       3):  cnv             I4  I4 EAX ==> t1(a) 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A9730 (       4):  parm            I4  ==> EDX 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002A97B0 (       5):  cnv             I4  I4 EDX ==> t2(b) 
     00000000000000000000000000000000 00000000
002A9488           Destinations: L002A9830

002A9FB0 Block 2(2) L002A9830 *** NULL *** Depth 0
--------Cond -------------------------------------------------------------------
00000000           Origins: 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
    Line number=3
002A9F30 (       6):  if >            I4 T=0 t1(a), t2(b)  then Block 7(3333) else Block 129(2793392)
     00000000000000000000000000000000 00000000
002AA088           Destinations: L002AA1D0, L002A9E70

002A9870 Block 3(3) L002AA1D0 *** NULL *** label dies Depth 0
----Jmp ------------------------------------------------------------------------
00000000           Origins: 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA250 (       7):  mov             I4 00000003 ==> t4 
     00000000000000000000000000000000 00000000
002A9948           Destinations: L002A9EB0

002AA2D0 Block 4(4) L002A9E70 *** NULL *** label dies Depth 0
----Jmp ------------------------------------------------------------------------
00000000           Origins: 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA3D0 (       8):  mov             I4 00000004 ==> t4 
     00000000000000000000000000000000 00000000
002AA3A8           Destinations: L002A9EB0

002AA450 Block 5(5) L002A9EB0 *** NULL *** label dies Depth 0
Ret ----------------------------------------------------------------------------
00000000           Origins: 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA550 (       9):  mov             I4 t4 ==> t3 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
    Line number=4
002AA650 (      10):  mov             U4 t3 ==> EAX 
     DADADADADADADADADADADADADADADADA DADADADA  EDX:ESP:GS:FS:DS:CS:SS:AL:BL:CH:DI:ST(1):ST(3):ST(4):ST(6):ST(7):
002AA5D0 (      11):  nop             XX  
     00000000000000000000000000000000 00000000
002AA528           Destinations: 

First off, the 0xDADA pattern is what the memory allocator pre-initializes memory to. Any memory containing this pattern is unused. The code is split into basic blocks, all nicely separated. Several types of blocks are shown here. The Jmp blocks simply transfer control to another block. The Cond block ends with a conditional and will jump to one of two basic blocks depending on the result of the comparison. The final block is marked with Ret and represents return from the function.

Where applicable, instructions are preceded by source code line number. Each instruction has its ID, initially sequential numbers, from 1 to 11 in the example. Every block has information about its origins (who jumps to it) and destinations (where it jumps to), but it hasn't been filled in yet. Note that each basic block except the first has one or more origins and each basic block except the last has one or two destinations. Blocks with no origins (which may arise during optimization) are dead code and wi ll be culled.

The first block represents function prolog and contains information about passed arguments. In this case, because the 386 compiler with register calling convention was used, the parameters (parm) were passed in registers EAX and EDX. Observe that the type of each instruction is provided, in this case I4 for signed 32-bit integer. These are converted (cnv) into temporaries t1 and t2. The dump provides information about the variable that a temporary corresponds to (if any); also note that the conversions do not really change type and will likely be eliminated shortly.

The second block contains a comparison instruction (if) with a 'greater than' condition code. The two temporaries t1 and t2 are compared and control will be transferred to one of two blocks that follow.

The next two blocks simply assign (mov) a constant, either 3 or 4, to temporary t4. The final block moves t4 to t3, which is then assigned to EAX and used as function return value.

After working through the pre-optimization steps and just before register allocation, the pseudo-assembly is much transformed:

002A93B0 Block 1(1) L002A4260 foo Depth 0
----Jmp -------------LBL -------------------------------------------------------
00000000           Origins: 
IN   00000000000000000000000000000000 OUT  00000006000000000000000000000000
DEF  00000006000000000000000000000000 USE  00000000000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
     00000000000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
002A94F0 (       0):  nop             XX  
     00000000000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
002A95F0 (      10):  parm            I4  ==> EAX 
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS
002A9670 (      20):  cnv             I4  I4 EAX ==> t1(a) 
     00000004000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
002A9730 (      30):  parm            I4  ==> EDX 
     00000004000000000000000000000000 00000000  EDX:ESP:FS:ES:DS:CS:SS
002A97B0 (      40):  cnv             I4  I4 EDX ==> t2(b) 
     00000006000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
002AA250 (      50):  nop             XX  
     00000006000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
002A9488           Destinations: Block 2(5)

002AA450 Block 2(5) L002A9830 *** NULL *** label dies Depth 0
Ret ----------------------------------------------------------------------------
002A9488           Origins: Block 1(1)
IN   00000006000000000000000000000000 OUT  00000000000000000000000000000000
DEF  00000001000000000000000000000000 USE  00000006000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
     00000006000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
    Line number=3
002A9F30 (      60):  if <=           I4 T=0 t1(a), t2(b) ==> t5 
     00000000000000000000000000000000 00000002  ESP:FS:ES:DS:CS:SS
002AA550 (      70):  cnv             I4  U1 t5 ==> t6 
     00000000000000000000000000000000 00000001  ESP:FS:ES:DS:CS:SS
002AA9F0 (      80):  add             I4 t6, 00000003 ==> t4 
     00000001000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS
    Line number=4
002AA650 (      90):  mov             U4 t4 ==> EAX 
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS
002AA5D0 (     100):  nop             XX  
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS
002AA3D0 (     110):  nop             XX  
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS
002AA528           Destinations:

We only have two basic blocks now because the code generator figured out that the IA-32 SETcc instruction can be used for the conditional. The pseudo-assembly instruction is still called if, but now it is a conditional assignment, not conditional jump. The long rows of 0xDADA are gone too, mostly replaced by zeros. This data tells the code generator which registers are 'live' at each point. At the top of each block, a summary for the block is provided with information about registers that are INput or OUTp ut, DEFined and USEd, LOADed and STORed.

Some of the old temporaries are now gone but new ones have shown up (the temp numbers are not reused). The register allocator will get rid of them. After register allocation is performed and the code further optimized, the pseudo-assembly looks like this:

002A93B0 Block 1(1) L002A4260 foo Depth 0
----Jmp -------------LBL -------------------------------------------------------
00000000           Origins: 
IN   00000000000000000000000000000000 OUT  00000006000000000000000000000000
DEF  00000006000000000000000000000000 USE  00000000000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
     00000000000000000000000000000000 00000000  EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002A94F0 (       0):  nop             XX  
     00000000000000000000000000000000 00000000  EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA250 (      10):  nop             XX  
     00000000000000000000000000000000 00000000  EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002A9488           Destinations: L002A9830

002AA450 Block 2(5) L002A9830 *** NULL *** label dies Depth 0
Ret ----------------------------------------------------------------------------
00000000           Origins: 
IN   00000006000000000000000000000000 OUT  00000000000000000000000000000000
DEF  00000001000000000000000000000000 USE  00000006000000000000000000000000
LOAD 00000000000000000000000000000000 STOR 00000000000000000000000000000000
     00000000000000000000000000000000 00000000  EAX:EDX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
    Line number=3
002A9F30 (      20):  if <=           I4 T=0 EAX, EDX ==> AL 
     00000000000000000000000000000000 00000000  ESP:FS:ES:DS:CS:SS:AL:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA550 (      30):  cnv             I4  U1 AL ==> EAX 
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA9F0 (      40):  add             I4 EAX, 00000003 ==> EAX 
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
    Line number=4
002AA5D0 (      50):  nop             XX  
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA3D0 (      60):  nop             XX  
     00000000000000000000000000000000 00000000  EAX:ESP:FS:ES:DS:CS:SS:ST(0):ST(1):ST(2):ST(3):ST(4):ST(5):ST(6):ST(7)
002AA528           Destinations:

The prolog is now empty because parameters arrived in registers and didn't need any further work. There are now only three instructions left. It may be easiest to compare them with the final generated code:

0000                          foo_:
0000    39 D0                     cmp         eax,edx 
0002    0F 9E C0                  setle       al 
0005    0F B6 C0                  movzx       eax,al 
0008    83 C0 03                  add         eax,0x00000003 
000B    C3                        ret         

The if pseudo-instruction got turned into two machine instructions, CMP and SETLE. Conversion from U1 type to I4 is handled by MOVZX, although it could be also implemented using AND. The final ADD looks the same way in IA-32 assembler as it looked in the pseudo-assembler, and of course there is now also a return instruction.

Functions for dumping Code Generator structures

DumpFPUIns
Dumpan
DumpBlk
DumpConflicts
DumpCurrLoop
DumpIns
DumpOpcodeName
DumpRegTree
DumpSc
DumpGen
DumpTree
DumpCurrLoop
DumpIVList
DumpInvariants
DumpOpt
DumpDataDag
Clone this wiki locally