Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang adds incorrect frame base information. #64390

Open
GwenTheKween opened this issue Aug 3, 2023 · 15 comments
Open

Clang adds incorrect frame base information. #64390

GwenTheKween opened this issue Aug 3, 2023 · 15 comments

Comments

@GwenTheKween
Copy link

The following code:

int main() {
    return 0;
}

when compiled with clang -O0 -g3 gives the following debug information and assembly:

00000058 000000000000001c 0000005c FDE cie=00000000 pc=0000000000001130..000000000000113f
  DW_CFA_advance_loc: 1 to 0000000000001131
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r6 (rbp) at cfa-16
  DW_CFA_advance_loc: 3 to 0000000000001134
  DW_CFA_def_cfa_register: r6 (rbp)
  DW_CFA_advance_loc: 10 to 000000000000113e
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

 <1><23>: Abbrev Number: 2 (DW_TAG_subprogram)
    <24>   DW_AT_low_pc      : (addr_index: 0x0): 1130
    <25>   DW_AT_high_pc     : 0xf
    <29>   DW_AT_frame_base  : 1 byte block: 56         (DW_OP_reg6 (rbp))
    <2b>   DW_AT_name        : (indexed string: 0x3): main
    <2c>   DW_AT_decl_file   : 0
    <2d>   DW_AT_decl_line   : 1
    <2e>   DW_AT_type        : <0x32>
    <32>   DW_AT_external    : 1

0000000000001130 <main>:
    1130:       55                      push   %rbp
    1131:       48 89 e5                mov    %rsp,%rbp
    1134:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    113b:       31 c0                   xor    %eax,%eax
    113d:       5d                      pop    %rbp
    113e:       c3                      ret    

As you can see, the CFA information regarding frame bases is completely correct and perfectly usable, but the DIE for the main function says that the frame information should be obtained from rbp.

This is a problem because when you set a watchpoint (at least in GDB with software watchpoints), when we reach instruction 113e, the difference in rbp makes it look like the variable was changed, because there's no way for the debugger to know that the frame base is no longer valid, and we get a spurious warning of a value change. GCC handles it by making the frame base be DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa) instead.

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 3, 2023

@llvm/issue-subscribers-debuginfo

@GwenTheKween
Copy link
Author

Ping, has anyone been able to take a look at this?

@jmorse
Copy link
Member

jmorse commented Oct 25, 2023

My understanding of this area is that LLVM sets the epilogue_begin flag on the 113d instruction, indicating that past that point the stack frame isn't valid, and then doesn't attempt to maintain variable location information past that point. There's a few different ways we could try to describe variable locations during frame-destruction, however I don't believe there's a strong motivation to pursue any because developers typically don't step on any instructions in the epilogue. Does gdb put a breakpoint on the "ret" instruction instead of "pop %rbp" if you try to put a breakpoint on on any of the lines in the function?

@GwenTheKween
Copy link
Author

The issue is not a breakpoint, but a software watchpoint. With the location as described (4 bytes off of %rbp), when the pop instruction happens, the value of the variable seems to change, because we start analysing a new area of memory.

But also, the information that is being produces around the CFA is good enough, it just isn't used because in the DWARF information the CFA is never specified, instead it says that the location "4 bytes off of rbp" is valid for the whole block. that is, it sets DW_AT_frame_base : 1 byte block: 56 (DW_OP_reg6 (rbp)) instead referencing the CFA like DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa).

@OCHyams
Copy link
Contributor

OCHyams commented Oct 25, 2023

instead it says that the location "4 bytes off of rbp" is valid for the whole block. that is, it sets DW_AT_frame_base : 1 byte block: 56 (DW_OP_reg6 (rbp))

I filed a similar issue a while back #49629. You can see in the comments that there was an attempt to use DW_OP_call_frame_cfa for DW_AT_frame_base but there were some issues with LLDB (or LLDB tests) that meant the author ended up applying a targeted rather than general fix.

I'm going to close this as a duplicate (but I agree that it'd be good to get that fixed).

@OCHyams OCHyams closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2023
@OCHyams
Copy link
Contributor

OCHyams commented Oct 25, 2023

Duplicate of #49629

@OCHyams OCHyams marked this as a duplicate of #49629 Oct 25, 2023
@EugeneZelenko EugeneZelenko added the duplicate Resolved as duplicate label Oct 25, 2023
@GwenTheKween
Copy link
Author

I understand that this is blocked by #49629, but I don't think this is a duplicate. That issue refers to LLDB's inability to deal with the CFA, while this one talk about clang not referring back to it when producing a binary. IMHO, this issue should remain open until clang is doing that.

@OCHyams
Copy link
Contributor

OCHyams commented Oct 26, 2023

The fix mentioned in #49629 (to the compiler) couldn't be applied due to issues with LLDB (because it caused LLDB tests to fail in strange ways).

The linked ticket summary reports missing variable locations due to some compiler internal details, but the reason I thought to mark this ticket as duplicate is because the solution to that problem (mentioned in the first comment) is the same as what you're asking for: have the compiler emit DW_AT_frame_base (DW_OP_call_frame_cfa).

The preferred solution currently appears to be to set DW_AT_frame_base to DW_OP_call_frame_cfa in the parent subroutine DIE and use DW_OP_fbreg to describe the stack location.

That said, #49629's fix has additional requirements, which may be unneeded additional work for your purposes. In addition, although unlikely and undesirable, the compiler issue mentioned ticket could technically be resolved in other ways.

Re-opening, sorry for the additional noise!

@OCHyams OCHyams reopened this Oct 26, 2023
@OCHyams OCHyams removed the duplicate Resolved as duplicate label Oct 26, 2023
@jmorse
Copy link
Member

jmorse commented Oct 26, 2023

The issue is not a breakpoint, but a software watchpoint. With the location as described (4 bytes off of %rbp), when the pop instruction happens, the value of the variable seems to change, because we start analysing a new area of memory.

Ah, I get it, gdb breaks-in when something affects a variable location, which here includes rbp,

The difficulty here is that (AFAIUI) we don't want to force all consumers to have to support an extra part of DWARF, i.e. loading .eh_frame / .debug_frame so that they can get at their variable locations. Additionally, from the top post,

there's no way for the debugger to know that the frame base is no longer valid

I think this is possible with the epilogue_begin flag, which is signalling where frame-based variable locations cease to be valid because we're destroying the frame.

Regardless: LLVM does support producing CFA-based call frames, support was added in https://reviews.llvm.org/D143463, we just usually don't choose to describe stack variables with CFA. Perhaps we can adjust the --debugger-tuning=gdb flag to select CFA-based variable locations, so that users who are definitely using gdb can opt-in to having these locations?

@GwenTheKween
Copy link
Author

The issue is not a breakpoint, but a software watchpoint. With the location as described (4 bytes off of %rbp), when the pop instruction happens, the value of the variable seems to change, because we start analysing a new area of memory.

Ah, I get it, gdb breaks-in when something affects a variable location, which here includes rbp,

Well, the point is to change when the value is changed, but if how GDB calculates the location is changed, it is likely (though not necessary) that the new location has a different value, so it is likely that GDB will think the value is changed, thus breaking.

The difficulty here is that (AFAIUI) we don't want to force all consumers to have to support an extra part of DWARF, i.e. loading .eh_frame / .debug_frame so that they can get at their variable locations. Additionally, from the top post,

Ah, yeah, that makes sense.

there's no way for the debugger to know that the frame base is no longer valid

I think this is possible with the epilogue_begin flag, which is signalling where frame-based variable locations cease to be valid because we're destroying the frame.

Oh, I haven't seen that flag anywhere so I imagined you were talking about something internal to clang. Where is that stored?

Regardless: LLVM does support producing CFA-based call frames, support was added in https://reviews.llvm.org/D143463, we just usually don't choose to describe stack variables with CFA. Perhaps we can adjust the --debugger-tuning=gdb flag to select CFA-based variable locations, so that users who are definitely using gdb can opt-in to having these locations?

debugger-tuning sounds like a great option to work around the "forcing other consumers to support this" issue. I think this would be enough for me to close this bug, if it isn't stopped by those internal compiler problems that OCHyams mentioned.

@JunningWu
Copy link

@Billionai Hi, I have the same problem as yours.
Here is what I have got when I make some tests.
The following is for LLVM:
image
and this is GCC:
image
For me, LLVM(13.0.1) does not support --debugger-tuning=gdb.

@GwenTheKween
Copy link
Author

@JunningWu I'm not familiar with the assembly you posted, but from what I can understand, it seems to be a different issue?

It looks like you are stopping when the variable is being changed, but the line is being reported incorrectly. My issue is that the stack pointer being changed causes GDB to think the variable's value has been changed.

If it is the same issue, I'm tying to add a workaround in GDB, that I expect to make it in for GDB 15, so it will work with it... it would just be better if debugger-tuning=gdb worked, because the workaround has a small chance of suppressing a real change, and if there is a real change, it will be the biggest headache to figure out, and the CFA would allow us to not suppress any changes.

@JunningWu
Copy link

Let me make it more clear.
Here is the debug infomation dumped form llvm and gcc,

 <1><6e>: Abbrev Number: 5 (DW_TAG_subprogram)
    <6f>   DW_AT_low_pc      : 0x10200
    <73>   DW_AT_high_pc     : 0x10282
    <77>   DW_AT_frame_base  : 1 byte block: 58 	(DW_OP_reg8 (s0))
    <79>   DW_AT_name        : (indirect string, offset: 0xfd): Foo_ISR
    <7d>   DW_AT_decl_file   : 2
    <7e>   DW_AT_decl_line   : 42
    <7f>   DW_AT_external    : 1
	
<1><122>: Abbrev Number: 8 (DW_TAG_subprogram)
    <123>   DW_AT_external    : 1
    <124>   DW_AT_name        : (indirect string, offset: 0x44210): Foo_ISR
    <128>   DW_AT_decl_file   : 1
    <129>   DW_AT_decl_line   : 42
    <12a>   DW_AT_decl_column : 18
    <12b>   DW_AT_low_pc      : 0x10200
    <12f>   DW_AT_high_pc     : 0x1026c
    <133>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
    <135>   DW_AT_GNU_all_call_sites: 1	

I want to set one breakpoint to main.c:line 43, but llvm sets it to 0x10208, which is fall in the prelog code section; the gcc, which using the cfa info, sets the correct position, 0x10222.
The difference is caused by s0 vs cfa, like yours, rbp vs cfa.
I am using the upstream gdb12.1 for riscv, I guess I have to make some changes also.

@GwenTheKween
Copy link
Author

I see. I think I misread your original comment, then, sorry about that. That said, I don't think GDB should be worrying about CFA when you try to set a breakpoint on a given line, and I think there's something else different wrong, either with LLVM debug info or GDB's reading of it.

To avoid spamming LLVM developers while we work this out, I'll email you so we can work on diagnosing what's up with this (or if I'm just missing something in GDB)

@JunningWu
Copy link

JunningWu commented Dec 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants