Skip to content

Debugging Optimized Code

R. Bernstein edited this page Aug 11, 2016 · 9 revisions

In describing decompilation to Mark Wegman at IBM Research, he mentioned the question of debugging optimized code. It was a project his thesis advisor Sue Graham had suggested a while ago, and I think the gold standard for this research is what Polle Zellweger did on her Ph.D thesis.

Also see An Interactive High-Level Debugger for Control-Flow Optimized Programs

I have a lot of admiration for Dr. Zellweger in tackling this task. To work on debuggers, you have to roll up your sleeves and get your hands plenty dirty. I think we'd both agree that adding debug support of optimized code is a lot of work, however you accomplish it.

My particular point of view comes is a little different though, from having debugged lots of optimized and unoptimized code.

Mark's example of a canonical tricky problem in optimized code was:

    d = ....
    ...
    t1 = x / d  # hoisted code
    ...
    if ...
         ...  x / d  # or code replacing this with t1
    else
         .... x / d  #  or code replacing this with t1
    end

Now suppose I get an error in the hoisted code, specifically that you got a divide by zero in evaluating x / d.

If you're trying to figure out what went wrong or why d is zero, you really don't care about all the stuff after the t1 assignment. In fact, it's kind of helpful to know that you can just forget about all that code there. What you are interested in is that how d got its value and that's still around in the deparsed code.

Sure, you might be puzzled about the assignment to t1 since you didn't write t1 anywhere, so that's why it would be nice to say, hey, this code was hoisted or derived those lines below.

I maintain that even without that info, deparsing the rewritten code will show you the uses of t1. In sum, being able to see the rewritten code in a higher level language, rather than assembly or VM instructions, is a win.

So although perhaps winding up in a similar place as as Zellweger, deparsing has a different point of view. The important thing here is just to describe what's there first, and then possibly unmangle or explain how the code was mangled second. In the Perl deparser which does peephole optimization, it is possible to easily undo that. But in Python some of those peephole optimizations are just covered by additional grammar rules.

I believe that if you just present the code just the way the computer sees it, and if the programmer has reference to the original source code, the programmer would be smart enough to grok why the two are equivalent. And she might decide to rewrite her program along the lines of the transformed program if she thought that would make for a better program.

As an example, in Python when I deparse:

   if x:
       return x
   else:
       return f(x)

the deparser produces the equivalent form:

       return x or f(x)

You may decide that you like this formulation better. On the other hand, if you had:

   JUMP_ABSOLUTE = 100
   ...
   if op == JUMP_ABSOLUTE:

And this deparses to:

   if op == 100:

I'd probably stick with the longer form. But my point is that in either case, a programmer would be able to figure out what was going on without being told. And if you didn't have any tool, that's a process you might go through