Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track source-level program state when debug info is present #1552

Open
jryans opened this issue Oct 5, 2022 · 3 comments
Open

Track source-level program state when debug info is present #1552

jryans opened this issue Oct 5, 2022 · 3 comments

Comments

@jryans
Copy link

jryans commented Oct 5, 2022

Context

KLEE tracks program state at the LLVM IR level. For some applications, it would be helpful to know how this maps back to some source-level state in whichever language was compiled to IR.

For example, the following C function...

int example(int n) {
  int y = 0;
  for (unsigned int i = 0; i < n; i++) {
    y += 4 + n;
  }
  return y;
}

...becomes something like the following IR using Clang 13 (-O1)...

define i32 @example(i32 %0) local_unnamed_addr #0 {
  %2 = icmp eq i32 %0, 0
  br i1 %2, label %9, label %3

3:                                                ; preds = %1
  %4 = add i32 %0, -1
  %5 = add i32 %0, 4
  %6 = mul i32 %4, %5
  %7 = add i32 %6, %0
  %8 = add i32 %7, 4
  br label %9

9:                                                ; preds = %3, %1
  %10 = phi i32 [ 0, %1 ], [ %8, %3 ]
  ret i32 %10
}

...which makes no mention of source-level variables like y, and KLEE is thus unable to follow them as it executes. This also means KLEE cannot report errors in terms of source-level variables either.

Desired outcome

Compilers like Clang can add debug info to the LLVM IR (enabled via the -g flag), which traditionally is emitted to a native binary and then read by debuggers like GDB, LLDB, etc. While current KLEE does use the file / line / column annotations in debug info when reporting stack traces, it could go further. As a future enhancement, it would be great for KLEE to use the variable debug info to map its IR-level program state up to source-level constructs when reporting to the user.

Workaround

While it's not the same as a real mapping of variables using debug info, you can get a modestly better view if your compiler names IR values based on source-level constructs. With Clang, you can add -fno-discard-value-names to achieve this, which gives something like the following...

define i32 @example(i32 %n) local_unnamed_addr #0 {
entry:
  %cmp7.not = icmp eq i32 %n, 0
  br i1 %cmp7.not, label %for.cond.cleanup, label %for.cond.cleanup.loopexit

for.cond.cleanup.loopexit:                        ; preds = %entry
  %0 = add i32 %n, -1
  %1 = add i32 %n, 4
  %2 = mul i32 %0, %1
  %3 = add i32 %2, %n
  %4 = add i32 %3, 4
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  %y.0.lcssa = phi i32 [ 0, %entry ], [ %4, %for.cond.cleanup.loopexit ]
  ret i32 %y.0.lcssa
}

...where some of the IR values (such as %n for the function argument) appear with their source-level names. To be clear, this only tweaks the names alone. An unoptimised version would also have a %y IR value for the source-level variable y, but that value was removed by the optimiser, so we no longer see that name here. Source-level variables move through numerous IR values and memory locations during computation, so this value naming workaround is not enough to follow source-level program state.

@jryans
Copy link
Author

jryans commented Oct 5, 2022

I am currently working on this source-level support in KLEE as part of my ongoing research. I hope to eventually contribute it back here once it's ready for general use.

@MartinNowack
Copy link
Contributor

@jryans That sounds super interesting.

Just to clarify, KLEE supports debug information as long as your bitcode is compiled with it, i.e. clang-13 -O1 -g -c -emit-llvm would emit debug information as part of the IR as well, i.e. stack traces will contain the correct file/line(/column) information.

But I guess you are more focusing on the variable names? You plan to utilise the llvm.dbg.* intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#format-common-intrinsics) in a more sophisticated way and map them to specific variables?

Sounds great and useful! 😄

@jryans
Copy link
Author

jryans commented Oct 6, 2022

Just to clarify, KLEE supports debug information as long as your bitcode is compiled with it, i.e. clang-13 -O1 -g -c -emit-llvm would emit debug information as part of the IR as well, i.e. stack traces will contain the correct file/line(/column) information.

Ah of course, I forgot about this use of debug info when writing up the issue. 😅 I have edited my original post to acknowledge this existing support as part of stack trace reporting, so hopefully that will avoid any confusion. 🙂

But I guess you are more focusing on the variable names? You plan to utilise the llvm.dbg.* intrinsics (llvm.org/docs/SourceLevelDebugging.html#format-common-intrinsics) in a more sophisticated way and map them to specific variables?

Yes, exactly. Glad to hear it sounds useful! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants