forked from bminor/binutils-gdb
-
Notifications
You must be signed in to change notification settings - Fork 3
Removed dependency on __file__ being set in python module #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
guyush1
requested changes
Jan 8, 2025
Python's freezing doesn't set up the __file__ variable, so we cannot assume it exists and use it.
f3e181e to
34012c4
Compare
guyush1
approved these changes
Jan 9, 2025
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
After the commit: commit b9de07a Date: Thu Oct 10 11:37:34 2024 +0100 gdb: fix handling of DW_AT_entry_pc of inlined subroutines GDB's buildbot CI testing highlighted this assertion failure: (gdb) c Continuing. ../../binutils-gdb/gdb/block.h:203: internal-error: set_entry_pc: Assertion `start >= this->start () && start < this->end ()' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- FAIL: gdb.base/break-probes.exp: run til our library loads (GDB internal error) This assertion was in the new function set_entry_pc and is asserting that the default_entry_pc() value is within the blocks start/end range. The default_entry_pc() is the value GDB will use as the entry-pc if the DWARF doesn't specifically override the entry-pc. This value is calculated as: 1. The start address of the first sub-range within the block, if the block has more than 1 range, or 2. The low address (from DW_AT_low_pc) for the block. If the block only has a single range then this means the block was defined with low/high pc attributes (case #2 above). These low/high pc values are what block::start() and block::end() return. This means that by definition, if the block is continuous, the above assert cannot trigger as 'start', the default_entry_pc() would be equivalent to block::start(). This means that, for the assert to trigger, the block must have multiple ranges, and the first address of the first range is not within the blocks low/high address range. This seems wrong. I inspected the state at the time the assert triggered and discovered the block's start() address. Then I removed the assert and restarted GDB. I was now able to inspect the blocks at the offending address: (gdb) maintenance info blocks 0x7ffff7dddaa4 Blocks at 0x7ffff7dddaa4: from objfile: [(objfile *) 0x44a37f0] /lib64/ld-linux-x86-64.so.2 [(block *) 0x46b30c0] 0x7ffff7ddd5a0..0x7ffff7dde8a6 entry pc: 0x7ffff7ddd5a0 is global block symbol count: 4 is contiguous [(block *) 0x46b3020] 0x7ffff7ddd5a0..0x7ffff7dde8a6 entry pc: 0x7ffff7ddd5a0 is static block symbol count: 9 is contiguous [(block *) 0x46b2f70] 0x7ffff7ddda00..0x7ffff7dddac3 entry pc: 0x7ffff7ddda00 function: __GI__dl_find_dso_for_object symbol count: 4 is contiguous [(block *) 0x46b2e10] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 inline function: __GI__dl_find_dso_for_object symbol count: 5 is contiguous [(block *) 0x46b2a40] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 symbol count: 1 is contiguous [(block *) 0x46b2970] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 symbol count: 2 address ranges: 0x7ffff7ddda0e..0x7ffff7ddda77 0x7ffff7ddda90..0x7ffff7ddda96 I've left everything in for context, but the only really interesting bit is the very last block, it's low/high range is: 0x7ffff7dddaa4..0x7ffff7dddac3 but it has separate ranges: 0x7ffff7ddda0e..0x7ffff7ddda77 0x7ffff7ddda90..0x7ffff7ddda96 which are all outside the low/high range. This is what triggers the assert. But why does that block exist at all? What I believe is happening is that we're running into a bug in older versions of GCC. The buildbot failure was with an 8.5 gcc, and Tom de Vries also reported seeing failures when using version 7 and 8 gcc, but not with gcc 9 and onward. Looking at the DWARF I can see that the problematic block is created from this DIE: <4><15efb>: Abbrev Number: 83 (DW_TAG_lexical_block) <15efc> DW_AT_abstract_origin: <0x15e9f> <15efe> DW_AT_low_pc : 0x7ffff7dddaa4 <15f06> DW_AT_high_pc : 31 which links via DW_AT_abstract_origin to: <2><15e9f>: Abbrev Number: 80 (DW_TAG_lexical_block) <15ea0> DW_AT_ranges : 0x38e0 <15ea4> DW_AT_sibling : <0x15eca> And so we can see that <15efb> has got both low/high pc attributes and a ranges attribute. If I widen my checking to parents of DIE <15efb> then I see that they also have DW_AT_abstract_origin, however, there is something interesting going on, the parent DIEs are linking to a different DIE tree than <15efb>. What I believe is happening is this, we have an abstract instance tree, this is rooted at a DW_AT_subprogram, and contains all the blocks, variables, parameters, etc, that you would expect. As this is an abstract instance, then there are no low/high pc attributes, and no ranges attributes in this tree. This makes sense. Now elsewhere we have a DW_TAG_subprogram (not DW_TAG_inlined_subroutine) which links via DW_AT_abstract_origin to the abstract DW_AT_subprogram. This case is documented in the DWARF 5 spec in section 3.3.8.3, and describes an Out-of-Line Instance of an Inlined Subroutine. Within this out of line instance many of the DIE correctly link back, using DW_AT_abstract_origin to the abstract instance tree. This tree also includes the DIE <15e9f>, which is where our problem DIE references. Now, to really confuse things, within this out-of-line instance we have a DW_TAG_inlined_subroutine, which is another instance of the same abstract instance tree! This would seem to indicate a recursive call to the inline function, and the compiler, for some reason, needed to instantiate an out of line instance of this function. And it is within this nested, inlined subroutine, that the problem DIE exists. The problem DIE is referencing the corresponding DIE within the out of line instance tree, but I am convinced this must be a (long fixed) GCC bug, and that the problem DIE should be referencing the DIE within the abstract instance tree. I'm aware that the above is pretty confusing. The actual DWARF would be a around 200 lines long, so I'd like to avoid dumping it in here. But here's my attempt at representing what's going on in a minimal example. The numbers down the side represent the section offset, not the nesting level, and I've removed any attributes that are not relevant: <1> DW_TAG_subprogram <2> DW_TAG_lexical_block <3> DW_TAG_subprogram DW_AT_abstract_origin <1> <4> DW_TAG_lexical_block DW_AT_ranges ... <5> DW_TAG_inlined_subroutine DW_AT_abstract_origin <1> <6> DW_TAG_lexical_block DW_AT_abstract_origin <4> DW_AT_low_pc ... DW_AT_high_pc ... The lexical block at <6> is linking to <4> when it should be linking to <2>. There is one additional thing that we might wonder about, which is, when calculating the low/high pc range for a block, why does GDB not make use of the range information and expand the range beyond the defined low/high values? The answer to this is in dwarf_get_pc_bounds_ranges_or_highlow_pc in dwarf/read.c. This is where the low/high bounds are calculated. What we see is that GDB first checks for a low/high attribute pair, and if that is present, this defines the address range for the block. Only if there is no DW_AT_low_pc do we check for the DW_AT_ranges, and use that to define the extent of the block. And this makes sense, section 3.5 of the DWARF-5 spec says: The lexical block entry may have either a DW_AT_low_pc and DW_AT_high_pc pair of attributes or a DW_AT_ranges attribute whose values encode the contiguous or non-contiguous address ranges, respectively, of the machine instructions generated for the lexical block... Section 3.5 is specifically about lexical blocks, but the same wording, about it being either low/high OR ranges is repeated for other DW_TAG_ types. So this explains why GDB doesn't use the ranges to expand the problem blocks ranges; as the first DIE has low/high addresses, these are used, and the ranges is not consulted. It is only later in dwarf2_record_block_ranges that we create a range based off the low/high pc, and then also process the ranges data, this allows the problem block to exist with ranges that are outside the low/high range. To solve this I considered a number of options: 1. Prevent loading certain attributes from an abstract instance. Section 3.3.8.1 of the DWARF-5 spec talks about which attributes are appropriate to place in an abstract instance. Any attribute that might vary between instances should not appear in an abstract instance. DW_AT_ranges is included as an example in the non-exhaustive list of attributes that should not appear in an abstract instance. Currently in dwarf2_attr (dwarf2/read.c), when we see a DW_AT_abstract_origin attribute, we always follow this to try and find the attribute we are looking for. But we could change this function so that we prevent this following for attributes that we know should not be looked up in an abstract instance. This would solve the problem in this case by preventing us finding the DW_AT_ranges in the incorrect abstract instance. 2. Filter the ranges. Having established a blocks low/high address range in dwarf_get_pc_bounds_ranges_or_highlow_pc, we could allow dwarf2_record_block_ranges to parse the ranges, but we could reject any range that extends outside the blocks defined start and end addresses. For well behaved DWARF where we have either low/high or ranges, then the blocks start/end are defined from the range data, and so, by definition, every range would be acceptable. But in our problem case we would reject all of the invalid ranges. This is my least favourite solution as it feels like rejecting the ranges is tackling the problem too late on. 3. Don't try to parse ranges when we have low/high attributes. This option involves updating dwarf2_record_block_ranges to match the behaviour of dwarf_get_pc_bounds_ranges_or_highlow_pc, and, I believe, to match the DWARF spec: don't try to read range data from DW_AT_ranges if we have low/high pc attributes. In our case this solves the issue because the problematic DIE has the low/high attributes, and it then links to the wrong DIE which happens to have DW_AT_ranges. With this change in place we don't even look for the DW_AT_ranges. If the problem were reversed, and the initial DIE had DW_AT_ranges, but the incorrectly referenced DIE had the low/high pc attributes, we would pick up the wrong addresses, but this wouldn't trigger any asserts. The reason is that dwarf_get_pc_bounds_ranges_or_highlow_pc would also find the low/high addresses from the incorrectly referenced DIE, and so we would just end up with a block which had the wrong address ranges, but the block would be self consistent, which is different to the problem we hit here. In the end, in this commit I went with solution #3, having dwarf_get_pc_bounds_ranges_or_highlow_pc and dwarf2_record_block_ranges be consistent seems sensible. However, I do wonder if in the future we might want to explore solution #1 as an additional safety feature. With this patch in place I'm able to run the gdb.base/break-probes.exp without seeing the assert that CI testing highlighted. I see no regressions when testing on x86-64 GNU/Linux with gcc 9.3.1. Note: the diff in this commit looks big, but it's really just me indenting the code. Approved-By: Tom Tromey <tom@tromey.com>
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
When building gdb with -fsanitize=thread and running test-case
gdb.base/bg-exec-sigint-bp-cond.exp, I run into:
...
==================^M
WARNING: ThreadSanitizer: signal handler spoils errno (pid=25422)^M
#0 handler_wrapper gdb/posix-hdep.c:66^M
#1 decltype ({parm#2}({parm#3}...)) gdb::handle_eintr<>() \
gdbsupport/eintr.h:67^M
#2 gdb::waitpid(int, int*, int) gdbsupport/eintr.h:78^M
#3 run_under_shell gdb/cli/cli-cmds.c:926^M
...
Likewise in:
- tui_sigwinch_handler with test-case gdb.python/tui-window.exp, and
- handle_sighup with test-case gdb.base/quit-live.exp.
Fix this by saving the original errno, and restoring it before returning [1].
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
[1] https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
This commit adds support for a `gstack' command which Fedora has been carrying for many years. gstack is a natural counterpart to the gcore command. Whereas gcore dumps a core file, gstack prints stack traces of a running process. There are many improvements over Fedora's version of this script. The dependency on procfs is gone; gstack will run anywhere gdb runs. The only runtime dependencies are bash and awk. The script includes suggestions from gdb/32325 to include versioning and help. [If this approach to gdb/32325 is acceptable, I could propagate the solution to gcore/gdb-add-index.] I've rewritten the documentation, integrating it into the User Manual. The manpage is now output using this one source. Example run (on x86_64 Fedora 40) $ gstack --help Usage: gstack [-h|--help] [-v|--version] PID Print a stack trace of a running program -h, --help Print this message then exit. -v, --version Print version information then exit. $ gstack -v GNU gstack (GDB) 16.0.50.20241119-git $ gstack 12345678 Process 12345678 not found. $ gstack $(pidof emacs) Thread 6 (Thread 0x7fd5ec1c06c0 (LWP 2491423) "pool-spawner"): #0 0x00007fd6015ca3dd in syscall () at /lib64/libc.so.6 #1 0x00007fd60b31eccd in g_cond_wait () at /lib64/libglib-2.0.so.0 #2 0x00007fd60b28a61b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0 #3 0x00007fd60b2f1a03 in g_thread_pool_spawn_thread () at /lib64/libglib-2.0.so.0 bminor#4 0x00007fd60b2f0813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 bminor#5 0x00007fd6015486d7 in start_thread () at /lib64/libc.so.6 bminor#6 0x00007fd6015cc60c in clone3 () at /lib64/libc.so.6 bminor#7 0x0000000000000000 in ??? () Thread 5 (Thread 0x7fd5eb9bf6c0 (LWP 2491424) "gmain"): #0 0x00007fd6015be87d in poll () at /lib64/libc.so.6 #1 0x0000000000000001 in ??? () #2 0xffffffff00000001 in ??? () #3 0x0000000000000001 in ??? () bminor#4 0x000000002104cfd0 in ??? () bminor#5 0x00007fd5eb9be320 in ??? () bminor#6 0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 Thread 4 (Thread 0x7fd5eb1be6c0 (LWP 2491425) "gdbus"): #0 0x00007fd6015be87d in poll () at /lib64/libc.so.6 #1 0x0000000020f9b558 in ??? () #2 0xffffffff00000003 in ??? () #3 0x0000000000000003 in ??? () bminor#4 0x00007fd5d8000b90 in ??? () bminor#5 0x00007fd5eb1bd320 in ??? () bminor#6 0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 Thread 3 (Thread 0x7fd5ea9bd6c0 (LWP 2491426) "emacs"): #0 0x00007fd6015ca3dd in syscall () at /lib64/libc.so.6 #1 0x00007fd60b31eccd in g_cond_wait () at /lib64/libglib-2.0.so.0 #2 0x00007fd60b28a61b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0 #3 0x00007fd60b28a67c in g_async_queue_pop () at /lib64/libglib-2.0.so.0 bminor#4 0x00007fd603f4d0d9 in fc_thread_func () at /lib64/libpangoft2-1.0.so.0 bminor#5 0x00007fd60b2f0813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 bminor#6 0x00007fd6015486d7 in start_thread () at /lib64/libc.so.6 bminor#7 0x00007fd6015cc60c in clone3 () at /lib64/libc.so.6 bminor#8 0x0000000000000000 in ??? () Thread 2 (Thread 0x7fd5e9e6d6c0 (LWP 2491427) "dconf worker"): #0 0x00007fd6015be87d in poll () at /lib64/libc.so.6 #1 0x0000000000000001 in ??? () #2 0xffffffff00000001 in ??? () #3 0x0000000000000001 in ??? () bminor#4 0x00007fd5cc000b90 in ??? () bminor#5 0x00007fd5e9e6c320 in ??? () bminor#6 0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 Thread 1 (Thread 0x7fd5fcc45280 (LWP 2491417) "emacs"): #0 0x00007fd6015c9197 in pselect () at /lib64/libc.so.6 #1 0x0000000000000000 in ??? () Since this is essentially a complete rewrite of the original script and documentation, I've chosen to only keep a 2024 copyright date. Reviewed-By: Eli Zaretskii <eliz@gnu.org> Approved-By: Tom Tromey <tom@tromey.com>
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
On s390x-linux, I run into: ... (gdb) backtrace #0 0x000000000100061a in fun_three () #1 0x000000000100067a in fun_two () #2 0x000003fffdfa9470 in ?? () Backtrace stopped: frame did not save the PC (gdb) FAIL: gdb.base/readnever.exp: backtrace ... This is really due to a problem handling the fun_three frame. When generating a backtrace from fun_two, everying looks ok: ... $ gdb -readnever -q -batch outputs/gdb.base/readnever/readnever \ -ex "b fun_two" \ -ex run \ -ex bt ... #0 0x0000000001000650 in fun_two () #1 0x00000000010006b6 in fun_one () #2 0x00000000010006ee in main () ... For reference the frame info with debug info (without -readnever) looks like this: ... $ gdb -q -batch outputs/gdb.base/readnever/readnever \ -ex "b fun_three" \ -ex run \ -ex "info frame" ... Stack level 0, frame at 0x3fffffff140: pc = 0x1000632 in fun_three (readnever.c:20); saved pc = 0x100067a called by frame at 0x3fffffff1f0 source language c. Arglist at 0x3fffffff140, args: a=10, b=49 '1', c=0x3fffffff29c Locals at 0x3fffffff140, Previous frame's sp in v0 ... But with -readnever, like this instead: ... Stack level 0, frame at 0x0: pc = 0x100061a in fun_three; saved pc = 0x100067a called by frame at 0x3fffffff140 Arglist at 0xffffffffffffffff, args: Locals at 0xffffffffffffffff, Previous frame's sp in r15 ... An obvious difference is the "Previous frame's sp in" v0 vs. r15. Looking at the code: ... 0000000001000608 <fun_three>: 1000608: b3 c1 00 2b ldgr %f2,%r11 100060c: b3 c1 00 0f ldgr %f0,%r15 1000610: e3 f0 ff 50 ff 71 lay %r15,-176(%r15) 1000616: b9 04 00 bf lgr %r11,%r15 ... it becomes clear what is going on. This is an unusual prologue. Rather than saving r11 (frame pointer) and r15 (stack pointer) to stack, instead they're saved into call-clobbered floating point registers. [ For reference, this is the prologue of fun_two: ... 0000000001000640 <fun_two>: 1000640: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15) 1000646: e3 f0 ff 50 ff 71 lay %r15,-176(%r15) 100064c: b9 04 00 bf lgr %r11,%r15 ... where the first instruction stores registers r11 to r15 to stack. ] Gdb fails to properly analyze the prologue, which causes the problems getting the frame info. Fix this by: - adding handling of the ldgr insn [1] in s390_analyze_prologue, and - recognizing the insn as saving a register in s390_prologue_frame_unwind_cache. This gets us instead: ... Stack level 0, frame at 0x0: pc = 0x100061a in fun_three; saved pc = 0x100067a called by frame at 0x3fffffff1f0 Arglist at 0xffffffffffffffff, args: Locals at 0xffffffffffffffff, Previous frame's sp in f0 ... and: ... (gdb) backtrace^M #0 0x000000000100061a in fun_three ()^M #1 0x000000000100067a in fun_two ()^M #2 0x00000000010006b6 in fun_one ()^M #3 0x00000000010006ee in main ()^M (gdb) PASS: gdb.base/readnever.exp: backtrace ... Tested on s390x-linux. PR tdep/32417 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32417 Approved-By: Andreas Arnez <arnez@linux.ibm.com> [1] https://www.ibm.com/support/pages/sites/default/files/2021-05/SA22-7871-10.pdf
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
…read call Commit 7fcdec0 ("GDB: Use gdb::array_view for buffers used in register reading and unwinding") introduces a regression in gdb.base/jit-reader.exp: $ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.base/jit-reader/jit-reader -ex 'jit-reader-load /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/jit-reader/jit-reader.so' -ex r -batch This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.archlinux.org> Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". Program received signal SIGTRAP, Trace/breakpoint trap. Recursive internal problem. The "Recusive internal problem" part is not good, but it's not the point of this patch. It still means we hit an internal error. The stack trace is: #0 internal_error_loc (file=0x55555ebefb20 "/home/simark/src/binutils-gdb/gdb/frame.c", line=1207, fmt=0x55555ebef500 "%s: Assertion `%s' failed.") at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:53 #1 0x0000555561604d83 in frame_register_unwind (next_frame=..., regnum=16, optimizedp=0x7ffff12e5a20, unavailablep=0x7ffff12e5a30, lvalp=0x7ffff12e5a40, addrp=0x7ffff12e5a60, realnump=0x7ffff12e5a50, buffer=...) at /home/simark/src/binutils-gdb/gdb/frame.c:1207 #2 0x0000555561608334 in deprecated_frame_register_read (frame=..., regnum=16, myaddr=...) at /home/simark/src/binutils-gdb/gdb/frame.c:1496 #3 0x0000555561a74259 in jit_unwind_reg_get_impl (cb=0x7ffff1049ca0, regnum=16) at /home/simark/src/binutils-gdb/gdb/jit.c:988 bminor#4 0x00007fffd26e634e in read_register (callbacks=0x7ffff1049ca0, dw_reg=16, value=0x7fffffffb4c8) at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/jit-reader.c:100 bminor#5 0x00007fffd26e645f in unwind_frame (self=0x50400000ac10, cbs=0x7ffff1049ca0) at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/jit-reader.c:143 bminor#6 0x0000555561a74a12 in jit_frame_sniffer (self=0x55556374d040 <jit_frame_unwind>, this_frame=..., cache=0x5210002905f8) at /home/simark/src/binutils-gdb/gdb/jit.c:1042 bminor#7 0x00005555615f499e in frame_unwind_try_unwinder (this_frame=..., this_cache=0x5210002905f8, unwinder=0x55556374d040 <jit_frame_unwind>) at /home/simark/src/binutils-gdb/gdb/frame-unwind.c:138 bminor#8 0x00005555615f512c in frame_unwind_find_by_frame (this_frame=..., this_cache=0x5210002905f8) at /home/simark/src/binutils-gdb/gdb/frame-unwind.c:209 bminor#9 0x00005555616178d0 in get_frame_type (frame=...) at /home/simark/src/binutils-gdb/gdb/frame.c:2996 bminor#10 0x000055556282db03 in do_print_frame_info (uiout=0x511000027500, fp_opts=..., frame=..., print_level=0, print_what=SRC_AND_LOC, print_args=1, set_current_sal=1) at /home/simark/src/binutils-gdb/gdb/stack.c:1033 The problem is that function `jit_unwind_reg_get_impl` passes field `gdb_reg_value::value`, a gdb_byte array of 1 element (used as a flexible array member), as the array view parameter of `deprecated_frame_register_read`. This results in an array view of size 1. The assertion in `frame_register_unwind` that verifies the passed in buffer is larger enough to hold the unwound register value then fails. Fix this by explicitly creating an array view of the right size. Change-Id: Ie170da438ec9085863e7be8b455a067b531635dc Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
This resolves the following memory leak reported by ASAN:
Direct leak of 17 byte(s) in 1 object(s) allocated from:
#0 0x3ffb32fbb1d in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x2aa149861cf in xmalloc ../../libiberty/xmalloc.c:149
#2 0x2aa149868ff in xstrdup ../../libiberty/xstrdup.c:34
#3 0x2aa1312391f in s390_machinemode ../../gas/config/tc-s390.c:2241
bminor#4 0x2aa130ddc7b in read_a_source_file ../../gas/read.c:1293
bminor#5 0x2aa1304f7bf in perform_an_assembly_pass ../../gas/as.c:1223
bminor#6 0x2aa1304f7bf in main ../../gas/as.c:1436
bminor#7 0x3ffb282be35 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
bminor#8 0x3ffb282bf33 in __libc_start_main_impl ../csu/libc-start.c:360
bminor#9 0x2aa1305758f (/home/jremus/git/binutils/build-asan/gas/as-new+0x2d5758f) (BuildId: ...)
gas/
* config/tc-s390.c (s390_machinemode): Free mode_string before
returning.
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
guyush1
pushed a commit
that referenced
this pull request
Feb 5, 2025
Simplify the .machine directive parsing logic, so that cpu_string is
always xstrdup'd and can therefore always be xfree'd before returning
to the caller.
This resolves the following memory leak reported by ASAN:
Direct leak of 13 byte(s) in 3 object(s) allocated from:
#0 0x3ff8aafbb1d in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x2aa338861cf in xmalloc ../../libiberty/xmalloc.c:149
#2 0x2aa338868ff in xstrdup ../../libiberty/xstrdup.c:34
#3 0x2aa320253cb in s390_machine ../../gas/config/tc-s390.c:2172
bminor#4 0x2aa31fddc7b in read_a_source_file ../../gas/read.c:1293
bminor#5 0x2aa31f4f7bf in perform_an_assembly_pass ../../gas/as.c:1223
bminor#6 0x2aa31f4f7bf in main ../../gas/as.c:1436
bminor#7 0x3ff8a02be35 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
bminor#8 0x3ff8a02bf33 in __libc_start_main_impl ../csu/libc-start.c:360
bminor#9 0x2aa31f5758f (/home/jremus/git/binutils/build-asan/gas/as-new+0x2d5758f) (BuildId: ...)
While at it add tests with double quoted .machine
"<cpu>[+<extension>...]" values.
gas/
* config/tc-s390.c (s390_machine): Simplify parsing and free
cpu_string before returning.
gas/testsuite/
* gas/s390/machine-parsing-1.l: Add tests with double quoted
values.
* gas/s390/machine-parsing-1.s: Likewise.
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
guyush1
pushed a commit
that referenced
this pull request
Apr 25, 2025
Recent work in the TUI has improved GDB's use of the curses wnoutrefresh and doupdate mechanism, which improves performance by batching together updates and then doing a single set of writes to the screen when doupdate is finally called. The tui_batch_rendering type is a RAII class which, in its destructor, calls doupdate to send the batched updates to the screen. However, if there is no tui_batch_rendering active on the call stack then any wnoutrefresh calls will remain batched but undisplayed until the next time doupdate happens to be called. This problem can be seen in PR gdb/32623. When an inferior is started the 'Starting program' message is not immediately displayed to the user. The 'Starting program' message originates from run_command_1 in infcmd.c, the message is sent to the current_uiout, which will be the TUI ui_out. After the message is sent, ui_out::flush() is called, here's the backtrace when that happens: #0 tui_file::flush (this=0x36e4ab0) at ../../src/gdb/tui/tui-file.c:42 #1 0x0000000001004f4b in pager_file::flush (this=0x36d35f0) at ../../src/gdb/utils.c:1531 #2 0x0000000001004f71 in gdb_flush (stream=0x36d35f0) at ../../src/gdb/utils.c:1539 #3 0x00000000006975ab in cli_ui_out::do_flush (this=0x35a50b0) at ../../src/gdb/cli-out.c:250 bminor#4 0x00000000009fd1f9 in ui_out::flush (this=0x35a50b0) at ../../src/gdb/ui-out.h:263 bminor#5 0x00000000009f56ad in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at ../../src/gdb/infcmd.c:449 bminor#6 0x00000000009f599a in run_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:511 And if we check out tui_file::flush (tui-file.c) we can see that this just calls tui_win_info::refresh_window(), which in turn, just uses wnoutrefresh to batch any pending output. The problem is that, in the above backtrace, there is no tui_batch_rendering active, and so there will be no doupdate call to flush the output to the screen. We could add a tui_batch_rendering into tui_file::flush. And tui_file::write. And tui_file::puts ..... ... but that all seems a bit unnecessary. Instead, I propose that tui_win_info::refresh_window() should be changed. If suppress_output is true (i.e. a tui_batch_rendering is active) then we should continue to call wnoutrefresh(). But if suppress_output is false, meaning that no tui_batch_rendering is in place, then we should call wrefresh(), which immediately writes the output to the screen. Testing but PR gdb/32623 was a little involved. We need to 'run' the inferior and check for the 'Starting program' message. But DejaGNUU can only check for the message once it knows the message should have appeared. But, as the bug is that output is not displayed, we don't have any output hints that the inferior is started yet... In the end, I have the inferior create a file in the test's output directory. Now DejaGNU can send the 'run' command, and wait for the file to appear. Once that happens, we know that the 'Starting program' message should have appeared. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32623 Approved-By: Tom Tromey <tom@tromey.com>
guyush1
pushed a commit
that referenced
this pull request
Apr 25, 2025
Modifying inline-frame-cycle-unwind.exp to use `bt -no-filters` produces the following incorrect backtrace: #0 inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49 #1 normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32 #2 0x000055555555517f in inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:50 #3 normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1 The expected output, which we get with `bt`, is: #0 inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49 #1 normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) PASS: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1 The cycle checking in `get_prev_frame_maybe_check_cycle` relies on newer frame ids having already been computed and stashed. Unlike other frames, frame #0's id does not get computed immediately. The test passes with `bt` because when applying python frame filters, the call to `bootstrap_python_frame_filters` happens to compute the id of frame #0. When `get_prev_frame_maybe_check_cycle` later tries to stash frame #2's id, the cycle is detected. The test fails with `bt -no-filters` because frame #0's id has not been stashed by the time `get_prev_frame_maybe_check_cycle` tries to stash frame #2's id which succeeds and the cycle is only detected later when trying to stash frame bminor#4's id. Doing `stepi` after the incorrect backtrace would then trigger an assertion failure when trying to stash frame #0's id because it is a duplicate of #2's already stashed id. In `get_prev_frame_always_1`, if this_frame is inline frame 0, then compute and stash its frame id before returning the previous frame. This ensures that the id of inline frame 0 has been stashed before `get_prev_frame_maybe_check_cycle` is called on older frames. The test case has been updated to run both `bt` and `bt -no-filters`. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32757
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
Recent work in the TUI has improved GDB's use of the curses wnoutrefresh and doupdate mechanism, which improves performance by batching together updates and then doing a single set of writes to the screen when doupdate is finally called. The tui_batch_rendering type is a RAII class which, in its destructor, calls doupdate to send the batched updates to the screen. However, if there is no tui_batch_rendering active on the call stack then any wnoutrefresh calls will remain batched but undisplayed until the next time doupdate happens to be called. This problem can be seen in PR gdb/32623. When an inferior is started the 'Starting program' message is not immediately displayed to the user. The 'Starting program' message originates from run_command_1 in infcmd.c, the message is sent to the current_uiout, which will be the TUI ui_out. After the message is sent, ui_out::flush() is called, here's the backtrace when that happens: #0 tui_file::flush (this=0x36e4ab0) at ../../src/gdb/tui/tui-file.c:42 guyush1#1 0x0000000001004f4b in pager_file::flush (this=0x36d35f0) at ../../src/gdb/utils.c:1531 guyush1#2 0x0000000001004f71 in gdb_flush (stream=0x36d35f0) at ../../src/gdb/utils.c:1539 guyush1#3 0x00000000006975ab in cli_ui_out::do_flush (this=0x35a50b0) at ../../src/gdb/cli-out.c:250 bminor#4 0x00000000009fd1f9 in ui_out::flush (this=0x35a50b0) at ../../src/gdb/ui-out.h:263 bminor#5 0x00000000009f56ad in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at ../../src/gdb/infcmd.c:449 bminor#6 0x00000000009f599a in run_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:511 And if we check out tui_file::flush (tui-file.c) we can see that this just calls tui_win_info::refresh_window(), which in turn, just uses wnoutrefresh to batch any pending output. The problem is that, in the above backtrace, there is no tui_batch_rendering active, and so there will be no doupdate call to flush the output to the screen. We could add a tui_batch_rendering into tui_file::flush. And tui_file::write. And tui_file::puts ..... ... but that all seems a bit unnecessary. Instead, I propose that tui_win_info::refresh_window() should be changed. If suppress_output is true (i.e. a tui_batch_rendering is active) then we should continue to call wnoutrefresh(). But if suppress_output is false, meaning that no tui_batch_rendering is in place, then we should call wrefresh(), which immediately writes the output to the screen. Testing but PR gdb/32623 was a little involved. We need to 'run' the inferior and check for the 'Starting program' message. But DejaGNUU can only check for the message once it knows the message should have appeared. But, as the bug is that output is not displayed, we don't have any output hints that the inferior is started yet... In the end, I have the inferior create a file in the test's output directory. Now DejaGNU can send the 'run' command, and wait for the file to appear. Once that happens, we know that the 'Starting program' message should have appeared. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32623 Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
I noticed that attempting to select a tail-call frame using 'frame function NAME' wouldn't work: (gdb) bt #0 func_that_never_returns () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:49 guyush1#1 0x0000000000401183 in func_that_tail_calls () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:59 guyush1#2 0x00000000004011a5 in main () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:70 (gdb) frame function func_that_tail_calls No frame for function "func_that_tail_calls". (gdb) up guyush1#1 0x0000000000401183 in func_that_tail_calls () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:59 59 func_that_never_returns (); (gdb) disassemble Dump of assembler code for function func_that_tail_calls: 0x000000000040117a <+0>: push %rbp 0x000000000040117b <+1>: mov %rsp,%rbp 0x000000000040117e <+4>: call 0x40116c <func_that_never_returns> End of assembler dump. (gdb) The problem is that the 'function' mechanism uses get_frame_pc() and then compares the address returned with the bounds of the function we're looking for. So in this case, the bounds of func_that_tail_calls are 0x40117a to 0x401183, with 0x401183 being the first address _after_ the function. However, because func_that_tail_calls ends in a tail call, then the get_frame_pc() is 0x401183, the first address after the function. As a result, GDB fails to realise that frame guyush1#1 is inside the function we're looking for, and the lookup fails. The fix is to use get_frame_address_in_block, which will return an adjusted address, in this case, 0x401182, which is within the function bounds. Now the lookup works: (gdb) frame function func_that_tail_calls guyush1#1 0x0000000000401183 in func_that_tail_calls () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:59 59 func_that_never_returns (); (gdb) I've extended the gdb.base/frame-selection.exp test to cover this case.
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
Consider the test-case sources main.c and foo.c:
$ cat main.c
extern int foo (void);
int
main (void)
{
return foo ();
}
$ cat foo.c
extern int foo (void);
int
foo (void)
{
return 0;
}
and main.c compiled with debug info, and foo.c without:
$ gcc -g main.c -c
$ gcc foo.c -c
$ gcc -g main.o foo.o
In TUI mode, if we run to foo:
$ gdb -q a.out -tui -ex "b foo" -ex run
it gets us "[ No Source Available ]":
┌─main.c─────────────────────────────────────────┐
│ │
│ │
│ │
│ [ No Source Available ] │
│ │
│ │
└────────────────────────────────────────────────┘
(src) In: foo L?? PC: 0x400566
...
Breakpoint 1, 0x0000000000400566 in foo ()
(gdb)
But after resizing (pressing ctrl-<minus> in the gnome-terminal), we
get instead the source for main.c:
┌─main.c─────────────────────────────────────────┐
│ 3 int │
│ 4 main (void) │
│ 5 { │
│ 6 return foo (); │
│ 7 } │
│ │
│ │
└────────────────────────────────────────────────┘
(src) In: foo L?? PC: 0x400566
...
Breakpoint 1, 0x0000000000400566 in foo ()
(gdb)
which is inappropriate because we're stopped in function foo, which is
not in main.c.
The problem is that, when the window is resized, GDB ends up calling
tui_source_window_base::rerender. The rerender function has three
cases, one for when the window already has some source code
content (which is not the case here), a case for when the inferior is
active, and we have a selected frame (which is the case that applies
here), and a final case for when the inferior is not running.
For the case which we end up in, the source code window has no
content, but the inferior is running, so we have a selected frame, GDB
calls the get_current_source_symtab_and_line() function to get the
symtab_and_line for the current location.
The get_current_source_symtab_and_line() will actually return the last
recorded symtab and line location, not the current symtab and line
location.
What this means, is that, if the current location has no debug
information, get_current_source_symtab_and_line() will return any
previously recorded location, or failing that, the default (main)
location.
This behaviour of get_current_source_symtab_and_line() also causes
problems for the 'list' command. Consider this pure CLI session:
(gdb) break foo
Breakpoint 1 at 0x40110a
(gdb) run
Starting program: /tmp/a.out
Breakpoint 1, 0x000000000040110a in foo ()
(gdb) list
1 extern int foo (void);
2
3 int
4 main (void)
5 {
6 return foo ();
7 }
(gdb) list .
Insufficient debug info for showing source lines at current PC (0x40110a).
(gdb)
However, if we look at how GDB's TUI updates the source window during
a normal stop, we see that GDB does a better job of displaying the
expected contents. Going back to our original example, when we start
GDB with:
$ gdb -q a.out -tui -ex "b foo" -ex run
we do get the "[ No Source Available ]" message as expected. Why is
that?
The answer is that, in this case GDB uses tui_show_frame_info to
update the source window, tui_show_frame_info is called each time a
prompt is displayed, like this:
#0 tui_show_frame_info (fi=...) at ../../src/gdb/tui/tui-status.c:269
guyush1#1 0x0000000000f55975 in tui_refresh_frame_and_register_information () at ../../src/gdb/tui/tui-hooks.c:118
guyush1#2 0x0000000000f55ae8 in tui_before_prompt (current_gdb_prompt=0x31ef930 <top_prompt+16> "(gdb) ") at ../../src/gdb/tui/tui-hooks.c:165
guyush1#3 0x000000000090ea45 in std::_Function_handler<void(char const*), void (*)(char const*)>::_M_invoke (__functor=..., __args#0=@0x7ffc955106b0: 0x31ef930 <top_prompt+16> "(gdb) ") at /usr/include/c++/9/bits/std_function.h:300
bminor#4 0x00000000009020df in std::function<void(char const*)>::operator() (this=0x5281260, __args#0=0x31ef930 <top_prompt+16> "(gdb) ") at /usr/include/c++/9/bits/std_function.h:688
bminor#5 0x0000000000901c35 in gdb::observers::observable<char const*>::notify (this=0x31dda00 <gdb::observers::before_prompt>, args#0=0x31ef930 <top_prompt+16> "(gdb) ") at ../../src/gdb/../gdbsupport/observable.h:166
bminor#6 0x00000000008ffed8 in notify_before_prompt (prompt=0x31ef930 <top_prompt+16> "(gdb) ") at ../../src/gdb/event-top.c:518
bminor#7 0x00000000008fff08 in top_level_prompt () at ../../src/gdb/event-top.c:534
bminor#8 0x00000000008ffdeb in display_gdb_prompt (new_prompt=0x0) at ../../src/gdb/event-top.c:487
If we look at how tui_show_frame_info figures out what source to
display, it doesn't use get_current_source_symtab_and_line(), instead,
it finds a symtab_and_line directly from a frame_info_pt. This means
we are not dependent on get_current_source_symtab_and_line() returning
the current location (which it does not).
I propose that we change tui_source_window_base::rerender() so that,
for the case we are discussing here (the inferior has a selected
frame, but the source window has no contents), we move away from using
get_current_source_symtab_and_line(), and instead use find_frame_sal
instead, like tui_show_frame_info does.
This means that we will always use the inferior's current location.
Tested on x86_64-linux.
Reviewed-By: Tom de Vries <tdevries@suse.de>
Reported-By: Andrew Burgess <aburgess@redhat.com>
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32614
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
…get_file_names
PR 32742 shows this failing:
$ make check TESTS="gdb.ada/access_to_unbounded_array.exp" RUNTESTFLAGS="--target_board=fission"
Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.ada/access_to_unbounded_array.exp ...
FAIL: gdb.ada/access_to_unbounded_array.exp: scenario=all: gdb_breakpoint: set breakpoint at foo.adb:23 (GDB internal error)
Or, interactively:
$ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.ada/access_to_unbounded_array/foo-all -ex 'b foo.adb:23' -batch
/home/simark/src/binutils-gdb/gdb/dwarf2/read.c:19567: internal-error: set_lang: Assertion `old_value == language_unknown || old_value == language_minimal || old_value == lang' failed.
The symptom is that for a given dwarf2_per_cu, the language gets set
twice. First, set to `language_ada`, and then, to `language_minimal`.
It's unexpected for the language of a CU to get changed like this.
The CU at offset 0x0 in the main file looks like:
0x00000000: Compile Unit: length = 0x00000030, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000034)
0x0000000b: DW_TAG_compile_unit
DW_AT_low_pc [DW_FORM_addr] (0x000000000000339a)
DW_AT_high_pc [DW_FORM_data8] (0x0000000000000432)
DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000)
DW_AT_GNU_dwo_name [DW_FORM_strp] ("b~foo.dwo")
DW_AT_comp_dir [DW_FORM_strp] ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array")
DW_AT_GNU_pubnames [DW_FORM_flag_present] (true)
DW_AT_GNU_addr_base [DW_FORM_sec_offset] (0x00000000)
DW_AT_GNU_dwo_id [DW_FORM_data8] (0x277aee54e7bd47f7)
This refers to the DWO file b~foo.dwo, whose top-level DIE is:
.debug_info.dwo contents:
0x00000000: Compile Unit: length = 0x00000b63, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000b67)
0x0000000b: DW_TAG_compile_unit
DW_AT_producer [DW_FORM_GNU_str_index] ("GNU Ada 14.2.1 20250207 -fgnat-encodings=minimal -gdwarf-4 -fdebug-types-section -fuse-ld=gold -gnatA -gnatWb -gnatiw -gdwarf-4 -gsplit-dwarf -ggnu-pubnames -gnatws -mtune=generic -march=x86-64")
DW_AT_language [DW_FORM_data1] (DW_LANG_Ada95)
DW_AT_name [DW_FORM_GNU_str_index] ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array/b~foo.adb")
DW_AT_comp_dir [DW_FORM_GNU_str_index] ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array")
DW_AT_GNU_dwo_id [DW_FORM_data8] (0xdbeffefab180a2cb)
The thing to note is that the language attribute is only present in the
DIE in the DWO file, not on the DIE in the main file.
The first time the language gets set is here:
#0 dwarf2_per_cu::set_lang (this=0x50f0000044b0, lang=language_ada, dw_lang=DW_LANG_Ada95) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:20788
guyush1#1 0x0000555561666af6 in cutu_reader::prepare_one_comp_unit (this=0x7ffff10bf2b0, cu=0x51700008e000, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:21029
guyush1#2 0x000055556159f740 in cutu_reader::cutu_reader (this=0x7ffff10bf2b0, this_cu=0x50f0000044b0, per_objfile=0x516000066080, abbrev_table=0x510000004640, existing_cu=0x0, skip_partial=false, pretend_language=language_minimal, cache=0x7ffff11b95e0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3371
guyush1#3 0x00005555615a547a in process_psymtab_comp_unit (this_cu=0x50f0000044b0, per_objfile=0x516000066080, storage=0x7ffff11b95e0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3799
bminor#4 0x00005555615a9292 in cooked_index_worker_debug_info::process_cus (this=0x51700008dc80, task_number=0, first=std::unique_ptr<dwarf2_per_cu> = {...}, end=std::unique_ptr<dwarf2_per_cu> = {...}) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:4122
In this code path (particularly this specific cutu_reader constructir),
the work is done to find and read the DWO file. So the language is
properly identifier as language_ada, all good so far.
The second time the language gets set is:
#0 dwarf2_per_cu::set_lang (this=0x50f0000044b0, lang=language_minimal, dw_lang=0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:20788
guyush1#1 0x0000555561666af6 in cutu_reader::prepare_one_comp_unit (this=0x7ffff0f42730, cu=0x517000091b80, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:21029
guyush1#2 0x00005555615a1822 in cutu_reader::cutu_reader (this=0x7ffff0f42730, this_cu=0x50f0000044b0, per_objfile=0x516000066080, pretend_language=language_minimal, parent_cu=0x0, dwo_file=0x0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3464
guyush1#3 0x000055556158c850 in dw2_get_file_names (this_cu=0x50f0000044b0, per_objfile=0x516000066080) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1956
bminor#4 0x000055556158f4f5 in dw_expand_symtabs_matching_file_matcher (per_objfile=0x516000066080, file_matcher=...) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2157
bminor#5 0x00005555616329e2 in cooked_index_functions::expand_symtabs_matching (this=0x50200002ab50, objfile=0x516000065780, file_matcher=..., lookup_name=0x0, symbol_matcher=..., expansion_notify=..., search_flags=..., domain=..., lang_matcher=...) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:15912
bminor#6 0x0000555562ca8a14 in objfile::map_symtabs_matching_filename (this=0x516000065780, name=0x50200002ad90 "break pck.adb", real_path=0x0, callback=...) at /home/smarchi/src/binutils-gdb/gdb/symfile-debug.c:207
bminor#7 0x0000555562d68775 in iterate_over_symtabs (pspace=0x513000005600, name=0x50200002ad90 "break pck.adb", callback=...) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:727
Here, we use the other cutu_reader constructor, the one that does not
look up the DWO file for the passed CU. If a DWO file exists for this
CU, the caller is expected to pass it as a parameter. That cutu_reader
constructor also ends up setting the language of the CU. But because it
didn't read the DWO file, it didn't figure out the language is
language_ada, so it tries to set the language to the default,
language_minimal.
A question is: why do we end up trying to set the CU's language is this
context. This is completely unrelated to what we're trying to do, that
is get the file names from the line table. Setting the language is a
side-effect of just constructing a cutu_reader, which we need to look up
attributes in dw2_get_file_names_reader. There are probably some
cleanups to be done here, to avoid doing useless work like looking up
and setting the CU's language when all we need is an object to help
reading the DIEs and attributes. But that is future work.
The same cutu_reader constructor is used in
`dwarf2_per_cu::ensure_lang`. Since this is the version of cutu_reader
that does not look up the DWO file, it will conclude that the language
is language_minimal and set that as the CU's language. In other words,
`dwarf2_per_cu::ensure_lang` will get the language wrong, pretty ironic.
Fix this by using the other cutu_reader constructor in those two spots.
Pass `per_objfile->get_cu (this_cu)`, as the `existing_cu` parameter. I
think this is necessary, because that constructor has an assert to check
that if `existing_cu` is nullptr, then there must not be an existing
`dwarf2_cu` in the per_objfile.
To avoid getting things wrong like this, I think that the second
cutu_reader constructor should be reserved for the spots that do pass a
non-nullptr dwo_file. The only spot at the moment in
create_cus_hash_table, where we read multiple units from the same DWO
file. In this context, I guess it makes sense for efficiency to get the
dwo_file once and pass it down to cutu_reader. For that constructor,
make the parameters non-optional, add "non-nullptr" asserts, and update
the code to assume the passed values are not nullptr.
What I don't know is if this change is problematic thread-wise, if the
functions I have modified to use the other cutu_reader constructor can
be called concurrently in worker threads. If so, I think it would be
problematic.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32742
Change-Id: I980d16875b9a43ab90e251504714d0d41165c7c8
Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
…_all_symtabs Commit 2920415 ("gdb/dwarf: use ranged for loop in some spots") broke some tests notably gdb.base/maint.exp with the fission board. $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/maint/maint -ex start -ex "maint expand-sym" -batch ... Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdc48, envp=0x7fffffffdc58) at /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.base/break.c:43 43 if (argc == 12345) { /* an unlikely value < 2^16, in case uninited */ /* set breakpoint 6 here */ /usr/include/c++/14.2.1/debug/safe_iterator.h:392: In function: gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>& gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>::operator++() [with _Iterator = gnu_cxx:: normal_iterator<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter>*, std::vector<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter>, std::allocator<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter> > > >; _Sequence = std::debug::vector<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter> >; _Category = std::forward_iterator_tag] Error: attempt to increment a singular iterator. Note that this is caught because I build with -D_GLIBCXX_DEBUG=1. Otherwise, it might crash more randomly, or just not crash at all (but still be buggy). While iterating on the all_units vector, some type units get added there: #0 add_type_unit (per_bfd=0x51b000044b80, section=0x50e0000c2280, sect_off=0, length=74, sig=4367013491293299229) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2576 guyush1#1 0x00005555618a3a40 in lookup_dwo_signatured_type (cu=0x51700009b580, sig=4367013491293299229) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2664 guyush1#2 0x00005555618ee176 in queue_and_load_dwo_tu (dwo_unit=0x521000120e00, cu=0x51700009b580) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:8329 guyush1#3 0x00005555618eeafe in queue_and_load_all_dwo_tus (cu=0x51700009b580) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:8366 bminor#4 0x00005555618966a6 in dw2_do_instantiate_symtab (per_cu=0x50f0000043c0, per_objfile=0x516000065a80, skip_partial=true) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1695 bminor#5 0x00005555618968d4 in dw2_instantiate_symtab (per_cu=0x50f0000043c0, per_objfile=0x516000065a80, skip_partial=true) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1719 bminor#6 0x000055556189ac3f in dwarf2_base_index_functions::expand_all_symtabs (this=0x502000024390, objfile=0x516000065780) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1977 This invalidates the iterator in dwarf2_base_index_functions::expand_all_symtabs, which is caught by the libstdc++ debug mode. I'm not entirely sure that it is correct to append type units from dwo files to the all_units vector like this. The dwarf2_find_containing_comp_unit function expects a precise ordering of the elements of the all_units vector, to be able to do a binary search. Appending a type unit at the end at this point certainly doesn't respect that ordering. For now I'd just like to undo the regression. Do that by using all_units_range in the ranged for loop. I will keep in mind to investigate whether this insertion of type units in all_units after the fact really makes sense or not. Change-Id: Iec131e59281cf2dbd12d3f3d163b59018fdc54da
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
On Debian 12, with gcc 12 and ld 2.40, I get some failures when running:
$ make check TESTS="gdb.base/style.exp" RUNTESTFLAGS="--target_board=fission"
I think I stumble on this bug [1], preventing the test from doing
anything that requires expanding the compilation unit:
$ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/style/style
Reading symbols from testsuite/outputs/gdb.base/style/style...
(gdb) p main
DW_FORM_strp pointing outside of .debug_str section [in module /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/style/style]
(gdb)
The error is thrown here:
#0 0x00007ffff693f0a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
guyush1#1 0x0000555569ce6852 in throw_it(return_reason, errors, const char *, typedef __va_list_tag __va_list_tag *) (reason=RETURN_ERROR, error=GENERIC_ERROR, fmt=0x555562a9fc40 "%s pointing outside of %s section [in module %s]", ap=0x7fffffff8df0) at /home/smarchi/src/binutils-gdb/gdbsupport/common-exceptions.cc:203
guyush1#2 0x0000555569ce690f in throw_verror (error=GENERIC_ERROR, fmt=0x555562a9fc40 "%s pointing outside of %s section [in module %s]", ap=0x7fffffff8df0) at /home/smarchi/src/binutils-gdb/gdbsupport/common-exceptions.cc:211
guyush1#3 0x000055556879c0cb in verror (string=0x555562a9fc40 "%s pointing outside of %s section [in module %s]", args=0x7fffffff8df0) at /home/smarchi/src/binutils-gdb/gdb/utils.c:193
bminor#4 0x0000555569cfa88d in error (fmt=0x555562a9fc40 "%s pointing outside of %s section [in module %s]") at /home/smarchi/src/binutils-gdb/gdbsupport/errors.cc:45
bminor#5 0x000055556667dbff in dwarf2_section_info::read_string (this=0x61b000042a08, objfile=0x616000055e80, str_offset=262811, form_name=0x555562886b40 "DW_FORM_strp") at /home/smarchi/src/binutils-gdb/gdb/dwarf2/section.c:211
bminor#6 0x00005555662486b7 in dwarf_decode_macro_bytes (per_objfile=0x616000056180, builder=0x614000006040, abfd=0x6120000f4b40, mac_ptr=0x60300004f5be "", mac_end=0x60300004f5bb "\002\004", current_file=0x62100007ad70, lh=0x60f000028bd0, section=0x61700008ba78, section_is_gnu=1, section_is_dwz=0, offset_size=4, str_section=0x61700008bac8, str_offsets_section=0x61700008baf0, str_offsets_base=std::optional<unsigned long> = {...}, include_hash=..., cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:511
bminor#7 0x000055556624af0e in dwarf_decode_macros (per_objfile=0x616000056180, builder=0x614000006040, section=0x61700008ba78, lh=0x60f000028bd0, offset_size=4, offset=0, str_section=0x61700008bac8, str_offsets_section=0x61700008baf0, str_offsets_base=std::optional<unsigned long> = {...}, section_is_gnu=1, cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:934
bminor#8 0x000055556642cb82 in dwarf_decode_macros (cu=0x61700008b600, offset=0, section_is_gnu=1) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:19435
bminor#9 0x000055556639bd12 in read_file_scope (die=0x6210000885c0, cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:6366
bminor#10 0x0000555566392d99 in process_die (die=0x6210000885c0, cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:5310
bminor#11 0x0000555566390d72 in process_full_comp_unit (cu=0x61700008b600, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:5075
The exception is then only caught at the event-loop level
(start_event_loop), causing the whole debug info reading process to be
aborted. I think it's a little harsh, considering that a lot of things
could work even if we failed to read macro information.
Catch the exception inside read_file_scope, print the exception, and
carry on. We could go even more fine-grained: if reading the string for
one macro definition fails, we could continue reading the macro
information. Perhaps it's just that one macro definition that is
broken. However, I don't need this level of granularity, so I haven't
attempted this. Also, my experience is that macro reading fails when
the compiler or linker has a bug, in which case pretty much everything
is messed up.
With this patch, it now looks like:
$ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/style/style
Reading symbols from testsuite/outputs/gdb.base/style/style...
(gdb) p main
While reading section .debug_macro.dwo: DW_FORM_strp pointing outside of .debug_str section [in module /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/style/style]
$1 = {int (int, char **)} 0x684 <main>
(gdb)
In the test I am investigating (gdb.base/style.exp with the fission
board), it allows more tests to run:
-# of expected passes 107
-# of unexpected failures 17
+# of expected passes 448
+# of unexpected failures 19
Of course, we still see the error about the macro information, and some
macro-related tests still fail (those would be kfailed ideally), but
many tests that are not macro-dependent now pass.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409
Change-Id: I0bdb01f153eff23c63c96ce3f41114bb027e5796
Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
Currently GDB's source cache doesn't track whether the entries within the cache are styled or not. This is pretty much fine, the assumption is that any time we are fetching source code, we do so in order to print it to the terminal, so where possible we always want styling applied, and if styling is not applied, then it is because that file cannot be styled for some reason. Changes to 'set style enabled' cause the source cache to be flushed, so future calls to fetch source code will regenerate the cache entries with styling enabled or not as appropriate. But this all assumes that styling is either on or off, and that switching between these two states isn't done very often. However, the Python API allows for individual commands to be executed with styling turned off via gdb.execute(). See commit: commit e5348a7 Date: Thu Feb 13 15:39:31 2025 +0000 gdb/python: new styling argument to gdb.execute Currently the source cache doesn't handle this case. Consider this: (gdb) list main ... snip, styled source code displayed here ... (gdb) python gdb.execute("list main", True, False, False) ... snip, styled source code is still shown here ... In the second case, the final `False` passed to gdb.execute() is asking for unstyled output. The problem is that, `get_source_lines` calls `ensure` to prime the cache for the file in question, then `extract_lines` just pulls the lines of interest from the cached contents. In `ensure`, if there is a cache entry for the desired filename, then that is considered good enough. There is no consideration about whether the cache entry is styled or not. This commit aims to fix this, after this commit, the `ensure` function will make sure that the cache entry used by `get_source_lines` is styled correctly. I think there are two approaches I could take: 1. Allow multiple cache entries for a single file, a styled, and non-styled entry. The `ensure` function would then place the correct cache entry into the last position so that `get_source_lines` would use the correct entry, or 2. Have `ensure` recalculate entries if the required styling mode is different to the styling mode of the current entry. Approach guyush1#1 is better if we are rapidly switching between styling modes, while guyush1#2 might be better if we want to keep more files in the cache and we only rarely switch styling modes. In the end I chose approach guyush1#2, but the good thing is that the changes are all contained within the `ensure` function. If in the future we wanted to change to strategy guyush1#1, this could be done transparently to the rest of GDB. So after this commit, the `ensure` function checks if styling is currently possible or not. If it is not, and the current entry is styled, then the current entry only is dropped from the cache, and a new, unstyled entry is created. Likewise, if the current entry is non-styled, but styling is required, we drop one entry and recalculate. With this change in place, I have updated set_style_enabled (in cli/cli-style.c) so the source cache is no longer flushed when the style settings are changed, the source cache will automatically handle changes to the style settings now. This problem was discovered in PR gdb/32676. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32676 Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
Modifying inline-frame-cycle-unwind.exp to use `bt -no-filters` produces the following incorrect backtrace: #0 inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49 guyush1#1 normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32 guyush1#2 0x000055555555517f in inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:50 guyush1#3 normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1 The expected output, which we get with `bt`, is: #0 inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49 guyush1#1 normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) PASS: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1 The cycle checking in `get_prev_frame_maybe_check_cycle` relies on newer frame ids having already been computed and stashed. Unlike other frames, frame #0's id does not get computed immediately. The test passes with `bt` because when applying python frame filters, the call to `bootstrap_python_frame_filters` happens to compute the id of frame #0. When `get_prev_frame_maybe_check_cycle` later tries to stash frame guyush1#2's id, the cycle is detected. The test fails with `bt -no-filters` because frame #0's id has not been stashed by the time `get_prev_frame_maybe_check_cycle` tries to stash frame guyush1#2's id which succeeds and the cycle is only detected later when trying to stash frame bminor#4's id. Doing `stepi` after the incorrect backtrace would then trigger an assertion failure when trying to stash frame #0's id because it is a duplicate of guyush1#2's already stashed id. In `get_prev_frame_always_1`, if this_frame is inline frame 0, then compute and stash its frame id before returning the previous frame. This ensures that the id of inline frame 0 has been stashed before `get_prev_frame_maybe_check_cycle` is called on older frames. The test case has been updated to run both `bt` and `bt -no-filters`. Co-authored-by: Andrew Burgess <aburgess@redhat.com>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
Consider this backtrace within GDB: #0 notify_breakpoint_modified (b=0x57d31d0) at ../../src/gdb/breakpoint.c:1083 guyush1#1 0x00000000005b6406 in breakpoint_set_commands (b=0x57d31d0, commands=...) at ../../src/gdb/breakpoint.c:1523 guyush1#2 0x00000000005c8c63 in update_dprintf_command_list (b=0x57d31d0) at ../../src/gdb/breakpoint.c:8641 guyush1#3 0x00000000005d3c4e in dprintf_breakpoint::re_set (this=0x57d31d0) at ../../src/gdb/breakpoint.c:12476 bminor#4 0x00000000005d6347 in breakpoint_re_set () at ../../src/gdb/breakpoint.c:13298 Whenever breakpoint_re_set is called we re-build the commands that the dprintf b/p will execute and store these into the breakpoint. The commands are re-built in update_dprintf_command_list and stored into the breakpoint object in breakpoint_set_commands. Now sometimes these commands can change, dprintf_breakpoint::re_set explains one case where this can occur, and I'm sure there must be others. But in most cases the commands we recalculate will not change. This means that the breakpoint modified event which is emitted from breakpoint_set_commands is redundant. This commit aims to eliminate the redundant breakpoint modified events for dprintf breakpoints. This is done by adding a commands_equal call to the start of breakpoint_set_commands. The commands_equal function is a new function which compares two command_line objects and returns true if they are identical. Using this function we can check if the new commands passed to breakpoint_set_commands are identical to the breakpoint's existing commands. If the new commands are equal then we don't need to change anything on the new breakpoint, and the breakpoint modified event can be skipped. The test for this commit stops at a dlopen() call in the inferior, sets up a dprintf breakpoint, then uses 'next' to step over the dlopen() call. When the library loads GDB call breakpoint_re_set, which calls dprintf_breakpoint::re_set. But in this case we don't expect the calculated command string to change, so we don't expect to see the breakpoint modified event.
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
When building with gcc, with flags -gdwarf-5, -gsplit-dwarf and
-fdebug-types-section, the resulting .dwo files contain multiple
.debug_info.dwo sections. One for each type unit and one for the
compile unit. This is correct, as per DWARF 5, section F.2.3 ("Contents
of the Split DWARF Object Files"):
The split DWARF object files each contain the following sections:
...
.debug_info.dwo (for the compilation unit)
.debug_info.dwo (one COMDAT section for each type unit)
...
GDB currently assumes that there is a single .debug_info.dwo section,
causing unpredictable behavior. For example, sometimes this crash:
==81781==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x508000007a71 at pc 0x58704d32a59c bp 0x7ffc0acc0bb0 sp 0x7ffc0acc0ba0
READ of size 1 at 0x508000007a71 thread T0
#0 0x58704d32a59b in bfd_getl32 /home/smarchi/src/binutils-gdb/bfd/libbfd.c:846
guyush1#1 0x58704ae62dce in read_initial_length(bfd*, unsigned char const*, unsigned int*, bool) /home/smarchi/src/binutils-gdb/gdb/dwarf2/leb.c:92
guyush1#2 0x58704aaf76bf in read_comp_unit_head(comp_unit_head*, unsigned char const*, dwarf2_section_info*, rcuh_kind) /home/smarchi/src/binutils-gdb/gdb/dwarf2/comp-unit-head.c:47
guyush1#3 0x58704aaf8f97 in read_and_check_comp_unit_head(dwarf2_per_objfile*, comp_unit_head*, dwarf2_section_info*, dwarf2_section_info*, unsigned char const*, rcuh_kind) /home/smarchi/src/binutils-gdb/gdb/dwarf2/comp-unit-head.c:193
bminor#4 0x58704b022908 in create_dwo_unit_hash_tables /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:6233
bminor#5 0x58704b0334a5 in open_and_init_dwo_file /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:7588
bminor#6 0x58704b03965a in lookup_dwo_cutu /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:7935
bminor#7 0x58704b03a5b1 in lookup_dwo_comp_unit /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:8009
bminor#8 0x58704aff5b70 in lookup_dwo_unit /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2802
The first time that locate_dwo_sections gets called for a
.debug_info.dwo section, dwo_sections::info gets initialized properly.
The second time it gets called for a .debug_info.dwo section, the size
field in dwo_sections::info gets overwritten with the size of the second
section. But the buffer remains pointing to the contents of the first
section, because the section is already "read in". So the size does not
match the buffer. And even if it did, we would only keep the
information about one .debug_info.dwo, out of the many.
First, add an assert in locate_dwo_sections to make sure we don't
try to fill in a dwo section info twice. Add the assert to other
functions with the same pattern, while at it.
Then, change dwo_sections::info to be a vector of sections (just like we
do for type sections). Update locate_dwo_sections to append to that
vector when seeing a new .debug_info.dwo section. Update
open_and_init_dwo_file to read the units from each section.
The problem can be observed by running some tests with the
dwarf5-fission-debug-types target board. For example,
gdb.base/condbreak.exp crashes (with the ASan failure shown above)
before the patch and passes after).
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119766
Change-Id: Iedf275768b6057dee4b1542396714f3d89903cf3
Reviewed-By: Tom de Vries <tdevries@suse.de>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
On arm-linux, with test-case gdb.python/py-missing-objfile.exp I get: ... (gdb) whatis global_exec_var^M type = volatile exec_type^M (gdb) FAIL: $exp: initial sanity check: whatis global_exec_var ... instead of the expected "type = volatile struct exec_type". The problem is that the current language is ASM instead of C, because the inner frame at the point of the core dump has language ASM: ... #0 __libc_do_syscall () at libc-do-syscall.S:47 guyush1#1 0xf7882920 in __pthread_kill_implementation () at pthread_kill.c:43 guyush1#2 0xf784df22 in __GI_raise (sig=sig@entry=6) at raise.c:26 guyush1#3 0xf783f03e in __GI_abort () at abort.c:73 bminor#4 0x009b0538 in dump_core () at py-missing-objfile.c:34 bminor#5 0x009b0598 in main () at py-missing-objfile.c:46 ... Fix this by manually setting the language to C. Tested on arm-linux and x86_64-linux. PR testsuite/32445 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32445
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
I decided to try to build and test gdb on Windows.
I found a page on the wiki [1] suggesting three ways of building gdb:
- MinGW,
- MinGW on Cygwin, and
- Cygwin.
I picked Cygwin, because I've used it before (though not recently).
I managed to install Cygwin and sufficient packages to build gdb and start the
testsuite.
However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp.
[ AFAICT, similar problems reported here [2]. ]
I managed to reproduce this hang by running just the test-case.
I attempted to kill the hanging processes by:
- first killing the inferior process, using the cygwin "kill -9" command, and
- then killing the gdb process, likewise.
But the gdb process remained, and I had to point-and-click my way through task
manager to actually kill the gdb process.
I investigated this by attaching to the hanging gdb process. Looking at the
main thread, I saw it was stopped in a call to WaitForSingleObject, with
the dwMilliseconds parameter set to INFINITE.
The backtrace in more detail:
...
(gdb) bt
#0 0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from
/cygdrive/c/windows/SYSTEM32/ntdll.dll
guyush1#1 0x00007fff16bbcdcf in WaitForSingleObjectEx () from
/cygdrive/c/windows/System32/KERNELBASE.dll
guyush1#2 0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=4294967295) at
gdb/windows-nat.c:435
guyush1#3 0x0000000100999aa7 in
windows_nat_target::do_synchronously(gdb::function_view<bool ()>)
(this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487
bminor#4 0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread
(event=<optimized out>, this=0xa001c6fe0)
at gdb/../gdbsupport/function-view.h:296
bminor#5 windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917
bminor#6 0x00000001008f2f86 in target_kill () at gdb/target.c:901
bminor#7 0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0)
at gdb/top.c:1658
bminor#8 quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0)
at gdb/top.c:1759
bminor#9 0x00000001004f9ea8 in quit_command (args=args@entry=0x0,
from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483
bminor#10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295
bminor#11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>)
at gdb/event-top.c:1496
bminor#12 0x0000000100499c45 in invoke_async_signal_handlers ()
at gdb/async-event.c:233
bminor#13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1)
at gdbsupport/event-loop.cc:198
bminor#14 0x00000001006df94a in interp::do_one_event (mstimeout=-1,
this=<optimized out>) at gdb/interps.h:87
bminor#15 start_event_loop () at gdb/main.c:402
bminor#16 captured_command_loop () at gdb/main.c:466
bminor#17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346
#18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365
#19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38
...
In the docs [3], I read that using an INFINITE argument to WaitForSingleObject
might cause a system deadlock.
This prompted me to try this simple change in wait_for_single:
...
while (true)
{
- DWORD r = WaitForSingleObject (handle, howlong);
+ DWORD r = WaitForSingleObject (handle,
+ howlong == INFINITE ? 100 : howlong);
+ if (howlong == INFINITE && r == WAIT_TIMEOUT)
+ continue;
...
with the timeout of 0.1 second estimated to be:
- small enough for gdb to feel reactive, and
- big enough not to consume too much cpu cycles with looping.
And indeed, the test-case, while still failing, now finishes in ~50 seconds.
While there may be an underlying bug that triggers this behaviour, the failure
mode is so severe that I consider it a bug in itself.
Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
Tested on x86_64-cygwin, by running the testsuite past the test-case.
Approved-By: Pedro Alves <pedro@palves.net>
PR tdep/32894
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32894
[1] https://sourceware.org/gdb/wiki/BuildingOnWindows
[2] https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
[3] https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
This commit fixes a couple of issues relating to the pagination prompt and styling. The pagination prompt is this one: --Type <RET> for more, q to quit, c to continue without paging-- I did try to split this into multiple patches, based on the three issues I describe below, but in the end, the fixes were all too interconnected, so it ended up as one patch that makes two related, but slightly different changes: 1. Within the pager_file class, relying on the m_applied_style attribute of the wrapped m_stream, as is done when calling m_stream->emit_style_escape, is not correct, so stop doing that, and 2. Failing to update m_applied_style within the pager_file class can leave that attribute out of date, which can then lead to styling errors later on, so ensure m_applied_style is always updated. The problems I have seen are: 1. After quitting from a pagination prompt, the next command can incorrectly style its output. This was reported as bug PR gdb/31033, and is fixed by this commit. 2. The pagination prompt itself could be styled. The pagination prompt should always be shown in the default style. 3. After continuing the output at a pagination prompt, GDB can fail to restore the default style the next time the output (within the same command) switches back to the default style. There are tests for all these issues as part of this patch. The pager_file class is a sub-class of wrapped_file, this means that a pager_file is itself a ui_file, while it also manages a pointer to a ui_file object (called m_stream). An instance of pager_file can be installed as the gdb_stdout ui_file object. Output sent to a pager_file is stored within an internal buffer (called m_wrap_buffer) until we have a complete line, when the content is flushed to the wrapped m_stream. If sufficient lines have been written out then the pager_file will present the pagination prompt and allow the user to continue viewing output, or quit the current command. As a pager_file is a ui_file, it has an m_applied_style member variable. The managed stream (m_stream) is also a ui_file, and so also has an m_applied_style member variable. In some places within the pager_file class we attempt to change the current style of the m_stream using calls like this: m_stream->emit_style_escape (style); See pager_file::emit_style_escape, pager_file::prompt_for_continue, and pager_file::puts. These calls will end up in ui_file::emit_style_escape, which tries to skip emitting unnecessary style escapes by checking if the requested style matches the current m_applied_style value. The m_applied_style value is updated by calls to the emit_style_escape function. The problem here is that most of the time pager_file doesn't change the style of m_stream by calling m_stream->emit_style_escape. Most of the time, style changes are performed by pager_file writing the escape sequence into m_wrap_buffer, and then later flushing this buffer to m_stream by calling m_stream->puts. It has to be done this way. Calling m_stream->emit_style_escape would, if it actually changed the style, immediately change the style by emitting an escape sequence. But pager_file doesn't want that, it wants the style change to happen later, when m_wrap_buffer is flushed. To avoid excessive style escape sequences being written into m_wrap_buffer, the pager_file::m_applied_style performs a function similar to the m_applied_style within m_stream, it tracks the current style for the end of m_wrap_buffer, and only allows style escape sequences to be emitted if the style is actually changing. However, a consequence of this is the m_applied_style within m_stream, is not updated, which means it will be out of sync with the actual current style of m_stream. If we then try to make a call to m_stream->emit_style_escape, if the style we are changing too happens to match the out of date style in m_stream->m_applied_style, then the style change will be ignored. And this is indeed what we see in pager_file::prompt_for_continue with the call: m_stream->emit_style_escape (ui_file_style ()); As m_stream->m_applied_style is not being updated, it will always be the default style, however m_stream itself might not actually be in the default style. This call then will not emit an escape sequence as the desired style matches the out of date m_applied_style. The fix in this case is to call m_stream->puts directly, passing in the escape sequence for the desired style. This will result in an immediate change of style for m_stream, which fixes some of the problems described above. In fact, given that m_stream's m_applied_style is always going to be out of sync, I think we should change all of the m_stream->emit_style_escape calls to instead call m_stream->puts. However, just changing to use puts doesn't fix all the problems. I found that, if I run 'apropos time', then quit at the first pagination prompt. If for the next command I run 'maintenance time' I see the expected output: "maintenance time" takes a numeric argument. However, everything after the first double quote is given the command name style rather than only styling the text between the double quotes. Here is GDB's stack while printing the above output: guyush1#2 0x0000000001050d56 in ui_out::vmessage (this=0x7fff1238a150, in_style=..., format=0x1c05af0 "", args=0x7fff1238a288) at ../../src/gdb/ui-out.c:754 guyush1#3 0x000000000104db88 in ui_file::vprintf (this=0x3f9edb0, format=0x1c05ad0 "\"%ps\" takes a numeric argument.\n", args=0x7fff1238a288) at ../../src/gdb/ui-file.c:73 bminor#4 0x00000000010bc754 in gdb_vprintf (stream=0x3f9edb0, format=0x1c05ad0 "\"%ps\" takes a numeric argument.\n", args=0x7fff1238a288) at ../../src/gdb/utils.c:1905 bminor#5 0x00000000010bca20 in gdb_printf (format=0x1c05ad0 "\"%ps\" takes a numeric argument.\n") at ../../src/gdb/utils.c:1945 bminor#6 0x0000000000b6b29e in maintenance_time_display (args=0x0, from_tty=1) at ../../src/gdb/maint.c:128 The interesting frames here are guyush1#3, in here `this` is the pager_file for GDB's stdout, and this passes its m_applied_style to frame guyush1#2 as the `in_style` argument. If the m_applied_style is wrong, then frame guyush1#2 will believe that the wrong style is currently in use as the default style, and so, after printing 'maintenance time' GDB will switch back to the wrong style. So the question is, why is pager_file::m_applied_style wrong? In pager_file::prompt_for_continue, there is an attempt to switch back to the default style using: m_stream->emit_style_escape (ui_file_style ()); If this is changed to a puts call (see above) then this still leaves pager_file::m_applied_style out of date. The right fix in this case is, I think, to instead do this: this->emit_style_escape (ui_file_style ()); this will update pager_file::m_applied_style, and also send the default style to m_stream using a puts call. While writing the tests I noticed that I was getting unnecessary style reset sequences emitted. The problem is that, around pagination, we don't really know what style is currently applied to m_stream. The pager_file::m_applied_style tracks the style at the end of m_wrap_buffer, but this can run ahead of the current m_stream style. For example, if the screen is currently full, such that the next character of output will trigger the pagination prompt, if the next call is actually to pager_file::emit_style_escape, then pager_file::m_applied_style will be updated, but the style of m_stream will remain unchanged. When the next character is written to pager_file::puts then the pagination prompt will be presented, and GDB will try to switch m_stream back to the default style. Whether an escape is emitted or not will depend on the m_applied_style value, which we know is different than the actual style of m_stream. It is, after all, only when m_wrap_buffer is flushed to m_stream that the style of m_stream actually change. And so, this commit also adds pager_file::m_stream_style. This new variable tracks the current style of m_stream. This really is a replacement for m_stream's ui_file::m_applied_style, which is not accessible from pager_file. When content is flushed from m_wrap_buffer to m_stream then the current value of pager_file::m_applied_style becomes the current style of m_stream. But, when m_wrap_buffer is filling up, but before it is flushed, then pager_file::m_applied_style can change, but m_stream_style will remain unchanged. Now in pager_file::emit_style_escape we are able to skip some of the direct calls to m_stream->puts() used to emit style escapes. After all this there are still a few calls to m_stream->emit_style_escape(). These are all in the wrap_here support code. I think that these calls are technically broken, but don't actually cause any issues due to the way styling works in GDB. I certainly haven't been able to trigger any bugs from these calls yet. I plan to "fix" these in the next commit just for completeness. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31033 Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan
pushed a commit
to FanFansfan/binutils-gdb
that referenced
this pull request
Jul 17, 2025
When running gdb.base/foll-fork-syscall.exp with a GDB built with UBSan,
I get:
/home/simark/src/binutils-gdb/gdb/linux-nat.c:1906:28: runtime error: load of value 3200171710, which is not a valid value for type 'target_waitkind'
ERROR: GDB process no longer exists
GDB process exited with wait status 3026417 exp9 0 1
UNRESOLVED: gdb.base/foll-fork-syscall.exp: follow-fork-mode=child: detach-on-fork=on: test_catch_syscall: continue to breakpoint after fork
The error happens here:
#0 __sanitizer::Die () at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_termination.cpp:50
guyush1#1 0x00007ffff600d8dd in __ubsan::__ubsan_handle_load_invalid_value_abort (Data=<optimized out>, Val=<optimized out>) at /usr/src/debug/gcc/gcc/libsanitizer/ubsan/ubsan_handlers.cpp:551
guyush1#2 0x00005555636d37b6 in linux_handle_syscall_trap (lp=0x7cdff1eb1b00, stopping=0) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:1906
guyush1#3 0x00005555636e0991 in linux_nat_filter_event (lwpid=3030627, status=1407) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3044
bminor#4 0x00005555636e407f in linux_nat_wait_1 (ptid=..., ourstatus=0x7bfff0d6cf18, target_options=...) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3381
bminor#5 0x00005555636e7795 in linux_nat_target::wait (this=0x5555704d35e0 <the_amd64_linux_nat_target>, ptid=..., ourstatus=0x7bfff0d6cf18, target_options=...) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3607
bminor#6 0x000055556378fad2 in thread_db_target::wait (this=0x55556af42980 <the_thread_db_target>, ptid=..., ourstatus=0x7bfff0d6cf18, options=...) at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1398
bminor#7 0x0000555564811327 in target_wait (ptid=..., status=0x7bfff0d6cf18, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2593
I believe the problem is that lwp_info::syscall_state is never
initialized. Fix that by initializing it with TARGET_WAITKIND_IGNORE.
This is the value we use elsewhere when resetting this field to mean
"not stopped at a syscall".
Change-Id: I5b76c63d1466d6e63448fced03305fd5ca8294eb
Approved-By: Tom Tromey <tom@tromey.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Python's freezing doesn't set up the file variable, so we cannot assume it exists and use it.