Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang produces confusing debug location for end of loops #91285

Open
mmozeiko opened this issue May 7, 2024 · 7 comments
Open

clang produces confusing debug location for end of loops #91285

mmozeiko opened this issue May 7, 2024 · 7 comments

Comments

@mmozeiko
Copy link

mmozeiko commented May 7, 2024

For simple loops like this:

int x = 0;
while (x == 0)
{
    x = 1;
} // <-- end of loop

clang emits branch at end of loop to jump back to loop condition.
Debug information for this branch encodes location of beginning of loop - while location.

This is with v18.1.5, but happens with earlier versions too (v17 and v16). No optimizations are enabled.
Here's the complete program: https://godbolt.org/z/E7x4b5Whb

As you can see br label %while.cond, !dbg !18, !llvm.loop !23 enodes same !dbg !18 as other instructions, which means line 4, not line 7. The AST encodes WhileStmt location as <line:4:5, line:7:5> - so clearly it has information about end of loop at line 7.

This leads to poor debugging experience.

For example, if you put breakpoint on line 4 and try to step (F10) in VisualStudio debugger, then you will need to press F10 two times to get out of while loop. Because first time it hits breakpoint of "jmp" instruction at end of loop.

Or in other words - if you put breakpoint on line 4 and run this program with F5, then it will hit line 4 breakpoint 3 times, when it should do that only 2 times.

cl.exe compiles correct location, line 7, for "jmp" instruction at end of loop.

I think branch emitted for end of loop (in this case WhileStmt, but for other type of loops too) should encode end of loop location, not beginning of loop:

EmitBranch(LoopHeader.getBlock());

Here's a screenshot of clang compiled exe when I put breakpoint on line 4. You can see how it automatically puts second breakpoint on jmp, and also mixed source-disassembly window shows while (x == 0) for jmp instruction.

image

Here's a screenshot of cl.exe compiled code, showing only one breakpoint on line 4, and proper line 7 location for jmp instruction.

image

@github-actions github-actions bot added the clang Clang issues not falling into any other category label May 7, 2024
@EugeneZelenko EugeneZelenko added clang:codegen debuginfo and removed clang Clang issues not falling into any other category labels May 7, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 7, 2024

@llvm/issue-subscribers-debuginfo

Author: Mārtiņš Možeiko (mmozeiko)

For simple loops like this: ``` int x = 0; while (x == 0) { x = 1; } // <-- end of loop ``` clang emits branch at end of loop to jump back to loop condition. Debug information for this branch encodes location of beginning of loop - while location.

Here's the complete program: https://godbolt.org/z/E7x4b5Whb

As you can see br label %while.cond, !dbg !18, !llvm.loop !23 enodes same !dbg !18 as other instructions, which means line 4, not line 7. The AST encodes WhileStmt location as &lt;line:4:5, line:7:5&gt; - so clearly it has information about end of loop at line 7.

This leads to poor debugging experience.

For example, if you put breakpoint on line 4 and try to step (F10) in VisualStudio debugger, then you will need to press F10 two times to get out of while loop. Because first time it hits breakpoint of "jmp" instruction at end of loop.

Or in other words - if you put breakpoint on line 4 and run this program with F5, then it will hit line 4 breakpoint 3 times, when it should do that only 2 times.

cl.exe compiles correct location, line 7, for "jmp" instruction at end of loop.

I think branch emitted for end of loop (in this case WhileStmt, but for other type of loops too) should encode end of loop location, not beginning of loop:

EmitBranch(LoopHeader.getBlock());

Here's a screenshot of clang compiled exe when I put breakpoint on line 4. You can see how it automatically puts second breakpoint on jmp, and also mixed source-disassembly window shows while (x == 0) for jmp instruction.

image

Here's a screenshot of cl.exe compiled code, showing only one breakpoint on line 4, and proper line 7 location for jmp instruction.

image

@llvmbot
Copy link
Collaborator

llvmbot commented May 7, 2024

@llvm/issue-subscribers-clang-codegen

Author: Mārtiņš Možeiko (mmozeiko)

For simple loops like this: ``` int x = 0; while (x == 0) { x = 1; } // <-- end of loop ``` clang emits branch at end of loop to jump back to loop condition. Debug information for this branch encodes location of beginning of loop - while location.

Here's the complete program: https://godbolt.org/z/E7x4b5Whb

As you can see br label %while.cond, !dbg !18, !llvm.loop !23 enodes same !dbg !18 as other instructions, which means line 4, not line 7. The AST encodes WhileStmt location as &lt;line:4:5, line:7:5&gt; - so clearly it has information about end of loop at line 7.

This leads to poor debugging experience.

For example, if you put breakpoint on line 4 and try to step (F10) in VisualStudio debugger, then you will need to press F10 two times to get out of while loop. Because first time it hits breakpoint of "jmp" instruction at end of loop.

Or in other words - if you put breakpoint on line 4 and run this program with F5, then it will hit line 4 breakpoint 3 times, when it should do that only 2 times.

cl.exe compiles correct location, line 7, for "jmp" instruction at end of loop.

I think branch emitted for end of loop (in this case WhileStmt, but for other type of loops too) should encode end of loop location, not beginning of loop:

EmitBranch(LoopHeader.getBlock());

Here's a screenshot of clang compiled exe when I put breakpoint on line 4. You can see how it automatically puts second breakpoint on jmp, and also mixed source-disassembly window shows while (x == 0) for jmp instruction.

image

Here's a screenshot of cl.exe compiled code, showing only one breakpoint on line 4, and proper line 7 location for jmp instruction.

image

@OCHyams
Copy link
Contributor

OCHyams commented May 7, 2024

Hi, thanks for filing this issue.

Breakpoint locations on braces seems to a perennial question (though maybe that is just in relation to functions) but I can't recall the discussions or answers off the top of my head. I think @pogo59 or @dwblaikie may have an opinion on whether this is desirable or not.

A quick observation -

Here's a screenshot of cl.exe compiled code, showing only one breakpoint on line 4, and proper line 7 location for jmp instruction.

I was wondering about what happens if you remove the loop's braces (line 7 is the closing brace). It looks like in that case MSVC (v19.latest on godbolt) attributes the jmp to line 5 (the x = 1 assignment) which makes the stepping smoother but feels like it could be potentially misleading (especially with optimisations enabled). https://godbolt.org/z/7cYc3Gnvo

@mmozeiko
Copy link
Author

mmozeiko commented May 7, 2024

In code without braces I feel like this behavior is fine - attributing jmp to line 5 does not cause any harm. If user puts breakpoint on line 5, they get exactly one breakpoint. No special "ghost" breakpoints like in my initial example appear. You will break exactly once on this line. I feel that's exactly what you would want in unoptimized code - it should be as intuitive as possible.

In optimized code - fine I understand that lines are rearranged and sometimes code will happen to produce multiple breakpoints. I would not be very surprised about that. But all of what I'm talking is about "debug" builds.

@dwblaikie
Copy link
Collaborator

Some previous context and discussions on what line to assign to the jump back to the start of a loop:
#44948
#20238

@adrian-prantl
Copy link
Collaborator

Hi, thanks for filing this issue.

Breakpoint locations on braces seems to a perennial question (though maybe that is just in relation to functions) but I can't recall the discussions or answers off the top of my head. I think @pogo59 or @dwblaikie may have an opinion on whether this is desirable or not.

Associating the closing } with the unconditional jump back to the loop condition sounds correct to me, this is also the location that any cleanups of the loop body should get.

@dwblaikie
Copy link
Collaborator

Be careful not to use the end of the scope, assuming it's the closing brace. If the block lacks braces then the end of the scope will be inside the body, and stepping there might be confusing in some cases (see the related bugs I linked above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants