Skip to content

Conversation

@roddyrap
Copy link
Collaborator

@roddyrap roddyrap commented Jan 8, 2025

Python's freezing doesn't set up the file variable, so we cannot assume it exists and use it.

Python's freezing doesn't set up the __file__ variable, so we cannot
assume it exists and use it.
@roddyrap roddyrap force-pushed the python-gdb-file-patch branch from f3e181e to 34012c4 Compare January 9, 2025 19:16
@guyush1 guyush1 merged commit de520d5 into gdb-static Jan 10, 2025
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
After the commit:

  commit b9de07a
  Date:   Thu Oct 10 11:37:34 2024 +0100

      gdb: fix handling of DW_AT_entry_pc of inlined subroutines

GDB's buildbot CI testing highlighted this assertion failure:

  (gdb) c
  Continuing.
  ../../binutils-gdb/gdb/block.h:203: internal-error: set_entry_pc: Assertion `start >= this->start () && start < this->end ()' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  ----- Backtrace -----
  FAIL: gdb.base/break-probes.exp: run til our library loads (GDB internal error)

This assertion was in the new function set_entry_pc and is asserting
that the default_entry_pc() value is within the blocks start/end
range.

The default_entry_pc() is the value GDB will use as the entry-pc if
the DWARF doesn't specifically override the entry-pc.  This value is
calculated as:

  1. The start address of the first sub-range within the block, if the
  block has more than 1 range, or

  2. The low address (from DW_AT_low_pc) for the block.

If the block only has a single range then this means the block was
defined with low/high pc attributes (case #2 above).  These low/high
pc values are what block::start() and block::end() return.  This means
that by definition, if the block is continuous, the above assert
cannot trigger as 'start', the default_entry_pc() would be equivalent
to block::start().

This means that, for the assert to trigger, the block must have
multiple ranges, and the first address of the first range is not
within the blocks low/high address range.  This seems wrong.

I inspected the state at the time the assert triggered and discovered
the block's start() address.  Then I removed the assert and restarted
GDB.  I was now able to inspect the blocks at the offending address:

  (gdb) maintenance info blocks 0x7ffff7dddaa4
  Blocks at 0x7ffff7dddaa4:
    from objfile: [(objfile *) 0x44a37f0] /lib64/ld-linux-x86-64.so.2
 [(block *) 0x46b30c0] 0x7ffff7ddd5a0..0x7ffff7dde8a6
    entry pc: 0x7ffff7ddd5a0
    is global block
    symbol count: 4
    is contiguous
  [(block *) 0x46b3020] 0x7ffff7ddd5a0..0x7ffff7dde8a6
    entry pc: 0x7ffff7ddd5a0
    is static block
    symbol count: 9
    is contiguous
  [(block *) 0x46b2f70] 0x7ffff7ddda00..0x7ffff7dddac3
    entry pc: 0x7ffff7ddda00
    function: __GI__dl_find_dso_for_object
    symbol count: 4
    is contiguous
  [(block *) 0x46b2e10] 0x7ffff7dddaa4..0x7ffff7dddac3
    entry pc: 0x7ffff7dddaa4
    inline function: __GI__dl_find_dso_for_object
    symbol count: 5
    is contiguous
  [(block *) 0x46b2a40] 0x7ffff7dddaa4..0x7ffff7dddac3
    entry pc: 0x7ffff7dddaa4
    symbol count: 1
    is contiguous
  [(block *) 0x46b2970] 0x7ffff7dddaa4..0x7ffff7dddac3
    entry pc: 0x7ffff7dddaa4
    symbol count: 2
    address ranges:
      0x7ffff7ddda0e..0x7ffff7ddda77
      0x7ffff7ddda90..0x7ffff7ddda96

I've left everything in for context, but the only really interesting
bit is the very last block, it's low/high range is:

  0x7ffff7dddaa4..0x7ffff7dddac3

but it has separate ranges:

  0x7ffff7ddda0e..0x7ffff7ddda77
  0x7ffff7ddda90..0x7ffff7ddda96

which are all outside the low/high range.  This is what triggers the
assert.  But why does that block exist at all?

What I believe is happening is that we're running into a bug in older
versions of GCC.  The buildbot failure was with an 8.5 gcc, and Tom de
Vries also reported seeing failures when using version 7 and 8 gcc,
but not with gcc 9 and onward.

Looking at the DWARF I can see that the problematic block is created
from this DIE:

  <4><15efb>: Abbrev Number: 83 (DW_TAG_lexical_block)
     <15efc>   DW_AT_abstract_origin: <0x15e9f>
     <15efe>   DW_AT_low_pc      : 0x7ffff7dddaa4
     <15f06>   DW_AT_high_pc     : 31

which links via DW_AT_abstract_origin to:

  <2><15e9f>: Abbrev Number: 80 (DW_TAG_lexical_block)
     <15ea0>   DW_AT_ranges      : 0x38e0
     <15ea4>   DW_AT_sibling     : <0x15eca>

And so we can see that <15efb> has got both low/high pc attributes and
a ranges attribute.

If I widen my checking to parents of DIE <15efb> then I see that they
also have DW_AT_abstract_origin, however, there is something
interesting going on, the parent DIEs are linking to a different DIE
tree than <15efb>.

What I believe is happening is this, we have an abstract instance
tree, this is rooted at a DW_AT_subprogram, and contains all the
blocks, variables, parameters, etc, that you would expect.  As this is
an abstract instance, then there are no low/high pc attributes, and no
ranges attributes in this tree.  This makes sense.

Now elsewhere we have a DW_TAG_subprogram (not
DW_TAG_inlined_subroutine) which links via
DW_AT_abstract_origin to the abstract DW_AT_subprogram.  This case is
documented in the DWARF 5 spec in section 3.3.8.3, and describes an
Out-of-Line Instance of an Inlined Subroutine.  Within this out of
line instance many of the DIE correctly link back, using
DW_AT_abstract_origin to the abstract instance tree.  This tree also
includes the DIE <15e9f>, which is where our problem DIE references.

Now, to really confuse things, within this out-of-line instance we
have a DW_TAG_inlined_subroutine, which is another instance of the
same abstract instance tree!  This would seem to indicate a recursive
call to the inline function, and the compiler, for some reason, needed
to instantiate an out of line instance of this function.

And it is within this nested, inlined subroutine, that the problem DIE
exists.  The problem DIE is referencing the corresponding DIE within
the out of line instance tree, but I am convinced this must be a (long
fixed) GCC bug, and that the problem DIE should be referencing the DIE
within the abstract instance tree.

I'm aware that the above is pretty confusing.  The actual DWARF would
be a around 200 lines long, so I'd like to avoid dumping it in here.
But here's my attempt at representing what's going on in a minimal
example.  The numbers down the side represent the section offset, not
the nesting level, and I've removed any attributes that are not
relevant:

  <1> DW_TAG_subprogram
  <2>   DW_TAG_lexical_block
  <3> DW_TAG_subprogram
        DW_AT_abstract_origin <1>
  <4>   DW_TAG_lexical_block
          DW_AT_ranges ...
  <5>   DW_TAG_inlined_subroutine
          DW_AT_abstract_origin <1>
  <6>     DW_TAG_lexical_block
            DW_AT_abstract_origin <4>
            DW_AT_low_pc ...
            DW_AT_high_pc ...

The lexical block at <6> is linking to <4> when it should be linking
to <2>.

There is one additional thing that we might wonder about, which is,
when calculating the low/high pc range for a block, why does GDB not
make use of the range information and expand the range beyond the
defined low/high values?

The answer to this is in dwarf_get_pc_bounds_ranges_or_highlow_pc in
dwarf/read.c.  This is where the low/high bounds are calculated.  What
we see is that GDB first checks for a low/high attribute pair, and if
that is present, this defines the address range for the block.  Only
if there is no DW_AT_low_pc do we check for the DW_AT_ranges, and use
that to define the extent of the block.  And this makes sense, section
3.5 of the DWARF-5 spec says:

  The lexical block entry may have either a DW_AT_low_pc and DW_AT_high_pc
  pair of attributes or a DW_AT_ranges attribute whose values encode the
  contiguous or non-contiguous address ranges, respectively, of the machine
  instructions generated for the lexical block...

Section 3.5 is specifically about lexical blocks, but the same
wording, about it being either low/high OR ranges is repeated for
other DW_TAG_ types.

So this explains why GDB doesn't use the ranges to expand the problem
blocks ranges; as the first DIE has low/high addresses, these are
used, and the ranges is not consulted.

It is only later in dwarf2_record_block_ranges that we create a range
based off the low/high pc, and then also process the ranges data, this
allows the problem block to exist with ranges that are outside the
low/high range.

To solve this I considered a number of options:

1. Prevent loading certain attributes from an abstract instance.

Section 3.3.8.1 of the DWARF-5 spec talks about which attributes are
appropriate to place in an abstract instance.  Any attribute that
might vary between instances should not appear in an abstract
instance.  DW_AT_ranges is included as an example in the
non-exhaustive list of attributes that should not appear in an
abstract instance.

Currently in dwarf2_attr (dwarf2/read.c), when we see a
DW_AT_abstract_origin attribute, we always follow this to try and find
the attribute we are looking for.  But we could change this function
so that we prevent this following for attributes that we know should
not be looked up in an abstract instance.  This would solve the
problem in this case by preventing us finding the DW_AT_ranges in the
incorrect abstract instance.

2. Filter the ranges.

Having established a blocks low/high address range in
dwarf_get_pc_bounds_ranges_or_highlow_pc, we could allow
dwarf2_record_block_ranges to parse the ranges, but we could reject
any range that extends outside the blocks defined start and end
addresses.

For well behaved DWARF where we have either low/high or ranges, then
the blocks start/end are defined from the range data, and so, by
definition, every range would be acceptable.

But in our problem case we would reject all of the invalid ranges.

This is my least favourite solution as it feels like rejecting the
ranges is tackling the problem too late on.

3. Don't try to parse ranges when we have low/high attributes.

This option involves updating dwarf2_record_block_ranges to match the
behaviour of dwarf_get_pc_bounds_ranges_or_highlow_pc, and, I believe,
to match the DWARF spec: don't try to read range data from
DW_AT_ranges if we have low/high pc attributes.

In our case this solves the issue because the problematic DIE has the
low/high attributes, and it then links to the wrong DIE which happens
to have DW_AT_ranges.  With this change in place we don't even look
for the DW_AT_ranges.

If the problem were reversed, and the initial DIE had DW_AT_ranges,
but the incorrectly referenced DIE had the low/high pc attributes,
we would pick up the wrong addresses, but this wouldn't trigger any
asserts.  The reason is that dwarf_get_pc_bounds_ranges_or_highlow_pc
would also find the low/high addresses from the incorrectly referenced
DIE, and so we would just end up with a block which had the wrong
address ranges, but the block would be self consistent, which is
different to the problem we hit here.

In the end, in this commit I went with solution #3, having
dwarf_get_pc_bounds_ranges_or_highlow_pc and
dwarf2_record_block_ranges be consistent seems sensible.  However, I
do wonder if in the future we might want to explore solution #1 as an
additional safety feature.

With this patch in place I'm able to run the gdb.base/break-probes.exp
without seeing the assert that CI testing highlighted.  I see no
regressions when testing on x86-64 GNU/Linux with gcc  9.3.1.

Note: the diff in this commit looks big, but it's really just me
indenting the code.

Approved-By: Tom Tromey <tom@tromey.com>
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
When building gdb with -fsanitize=thread and running test-case
gdb.base/bg-exec-sigint-bp-cond.exp, I run into:
...
==================^M
WARNING: ThreadSanitizer: signal handler spoils errno (pid=25422)^M
    #0 handler_wrapper gdb/posix-hdep.c:66^M
    #1 decltype ({parm#2}({parm#3}...)) gdb::handle_eintr<>() \
         gdbsupport/eintr.h:67^M
    #2 gdb::waitpid(int, int*, int) gdbsupport/eintr.h:78^M
    #3 run_under_shell gdb/cli/cli-cmds.c:926^M
...

Likewise in:
- tui_sigwinch_handler with test-case gdb.python/tui-window.exp, and
- handle_sighup with test-case gdb.base/quit-live.exp.

Fix this by saving the original errno, and restoring it before returning [1].

Tested on x86_64-linux.

Approved-By: Tom Tromey <tom@tromey.com>

[1] https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
This commit adds support for a `gstack' command which Fedora has
been carrying for many years. gstack is a natural counterpart to
the gcore command. Whereas gcore dumps a core file, gstack prints
stack traces of a running process.

There are many improvements over Fedora's version of this script.
The dependency on procfs is gone; gstack will run anywhere gdb
runs. The only runtime dependencies are bash and awk.

The script includes suggestions from gdb/32325 to include
versioning and help. [If this approach to gdb/32325 is acceptable,
I could propagate the solution to gcore/gdb-add-index.]

I've rewritten the documentation, integrating it into the User Manual.
The manpage is now output using this one source.

Example run (on x86_64 Fedora 40)

$ gstack --help
Usage: gstack [-h|--help] [-v|--version] PID
Print a stack trace of a running program

  -h, --help         Print this message then exit.
  -v, --version      Print version information then exit.
$ gstack -v
GNU gstack (GDB) 16.0.50.20241119-git
$ gstack 12345678
Process 12345678 not found.
$ gstack $(pidof emacs)
Thread 6 (Thread 0x7fd5ec1c06c0 (LWP 2491423) "pool-spawner"):
#0  0x00007fd6015ca3dd in syscall () at /lib64/libc.so.6
#1  0x00007fd60b31eccd in g_cond_wait () at /lib64/libglib-2.0.so.0
#2  0x00007fd60b28a61b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0
#3  0x00007fd60b2f1a03 in g_thread_pool_spawn_thread () at /lib64/libglib-2.0.so.0
bminor#4  0x00007fd60b2f0813 in g_thread_proxy () at /lib64/libglib-2.0.so.0
bminor#5  0x00007fd6015486d7 in start_thread () at /lib64/libc.so.6
bminor#6  0x00007fd6015cc60c in clone3 () at /lib64/libc.so.6
bminor#7  0x0000000000000000 in ??? ()

Thread 5 (Thread 0x7fd5eb9bf6c0 (LWP 2491424) "gmain"):
#0  0x00007fd6015be87d in poll () at /lib64/libc.so.6
#1  0x0000000000000001 in ??? ()
#2  0xffffffff00000001 in ??? ()
#3  0x0000000000000001 in ??? ()
bminor#4  0x000000002104cfd0 in ??? ()
bminor#5  0x00007fd5eb9be320 in ??? ()
bminor#6  0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0

Thread 4 (Thread 0x7fd5eb1be6c0 (LWP 2491425) "gdbus"):
#0  0x00007fd6015be87d in poll () at /lib64/libc.so.6
#1  0x0000000020f9b558 in ??? ()
#2  0xffffffff00000003 in ??? ()
#3  0x0000000000000003 in ??? ()
bminor#4  0x00007fd5d8000b90 in ??? ()
bminor#5  0x00007fd5eb1bd320 in ??? ()
bminor#6  0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0

Thread 3 (Thread 0x7fd5ea9bd6c0 (LWP 2491426) "emacs"):
#0  0x00007fd6015ca3dd in syscall () at /lib64/libc.so.6
#1  0x00007fd60b31eccd in g_cond_wait () at /lib64/libglib-2.0.so.0
#2  0x00007fd60b28a61b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0
#3  0x00007fd60b28a67c in g_async_queue_pop () at /lib64/libglib-2.0.so.0
bminor#4  0x00007fd603f4d0d9 in fc_thread_func () at /lib64/libpangoft2-1.0.so.0
bminor#5  0x00007fd60b2f0813 in g_thread_proxy () at /lib64/libglib-2.0.so.0
bminor#6  0x00007fd6015486d7 in start_thread () at /lib64/libc.so.6
bminor#7  0x00007fd6015cc60c in clone3 () at /lib64/libc.so.6
bminor#8  0x0000000000000000 in ??? ()

Thread 2 (Thread 0x7fd5e9e6d6c0 (LWP 2491427) "dconf worker"):
#0  0x00007fd6015be87d in poll () at /lib64/libc.so.6
#1  0x0000000000000001 in ??? ()
#2  0xffffffff00000001 in ??? ()
#3  0x0000000000000001 in ??? ()
bminor#4  0x00007fd5cc000b90 in ??? ()
bminor#5  0x00007fd5e9e6c320 in ??? ()
bminor#6  0x00007fd60b321c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0

Thread 1 (Thread 0x7fd5fcc45280 (LWP 2491417) "emacs"):
#0  0x00007fd6015c9197 in pselect () at /lib64/libc.so.6
#1  0x0000000000000000 in ??? ()

Since this is essentially a complete rewrite of the original
script and documentation, I've chosen to only keep a 2024 copyright date.

Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Approved-By: Tom Tromey <tom@tromey.com>
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
On s390x-linux, I run into:
...
 (gdb) backtrace
 #0  0x000000000100061a in fun_three ()
 #1  0x000000000100067a in fun_two ()
 #2  0x000003fffdfa9470 in ?? ()
 Backtrace stopped: frame did not save the PC
 (gdb) FAIL: gdb.base/readnever.exp: backtrace
...

This is really due to a problem handling the fun_three frame.  When generating
a backtrace from fun_two, everying looks ok:
...
 $ gdb -readnever -q -batch outputs/gdb.base/readnever/readnever \
     -ex "b fun_two" \
     -ex run \
     -ex bt
   ...
 #0  0x0000000001000650 in fun_two ()
 #1  0x00000000010006b6 in fun_one ()
 #2  0x00000000010006ee in main ()
...

For reference the frame info with debug info (without -readnever) looks like this:
...
$ gdb -q -batch outputs/gdb.base/readnever/readnever \
    -ex "b fun_three" \
    -ex run \
    -ex "info frame"
  ...
Stack level 0, frame at 0x3fffffff140:
 pc = 0x1000632 in fun_three (readnever.c:20); saved pc = 0x100067a
 called by frame at 0x3fffffff1f0
 source language c.
 Arglist at 0x3fffffff140, args: a=10, b=49 '1', c=0x3fffffff29c
 Locals at 0x3fffffff140, Previous frame's sp in v0
...

But with -readnever, like this instead:
...
Stack level 0, frame at 0x0:
 pc = 0x100061a in fun_three; saved pc = 0x100067a
 called by frame at 0x3fffffff140
 Arglist at 0xffffffffffffffff, args:
 Locals at 0xffffffffffffffff, Previous frame's sp in r15
...

An obvious difference is the "Previous frame's sp in" v0 vs. r15.

Looking at the code:
...
0000000001000608 <fun_three>:
 1000608:	b3 c1 00 2b       	ldgr	%f2,%r11
 100060c:	b3 c1 00 0f       	ldgr	%f0,%r15
 1000610:	e3 f0 ff 50 ff 71 	lay	%r15,-176(%r15)
 1000616:	b9 04 00 bf       	lgr	%r11,%r15
...
it becomes clear what is going on.  This is an unusual prologue.

Rather than saving r11 (frame pointer) and r15 (stack pointer) to stack,
instead they're saved into call-clobbered floating point registers.

[ For reference, this is the prologue of fun_two:
...
0000000001000640 <fun_two>:
 1000640:	eb bf f0 58 00 24 	stmg	%r11,%r15,88(%r15)
 1000646:	e3 f0 ff 50 ff 71 	lay	%r15,-176(%r15)
 100064c:	b9 04 00 bf       	lgr	%r11,%r15
...
where the first instruction stores registers r11 to r15 to stack. ]

Gdb fails to properly analyze the prologue, which causes the problems getting
the frame info.

Fix this by:
- adding handling of the ldgr insn [1] in s390_analyze_prologue, and
- recognizing the insn as saving a register in
  s390_prologue_frame_unwind_cache.

This gets us instead:
...
Stack level 0, frame at 0x0:
 pc = 0x100061a in fun_three; saved pc = 0x100067a
 called by frame at 0x3fffffff1f0
 Arglist at 0xffffffffffffffff, args:
 Locals at 0xffffffffffffffff, Previous frame's sp in f0
...
and:
...
 (gdb) backtrace^M
 #0  0x000000000100061a in fun_three ()^M
 #1  0x000000000100067a in fun_two ()^M
 #2  0x00000000010006b6 in fun_one ()^M
 #3  0x00000000010006ee in main ()^M
 (gdb) PASS: gdb.base/readnever.exp: backtrace
...

Tested on s390x-linux.

PR tdep/32417
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32417

Approved-By: Andreas Arnez <arnez@linux.ibm.com>

[1] https://www.ibm.com/support/pages/sites/default/files/2021-05/SA22-7871-10.pdf
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
…read call

Commit 7fcdec0 ("GDB: Use gdb::array_view for buffers used in
register reading and unwinding") introduces a regression in
gdb.base/jit-reader.exp:

    $ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.base/jit-reader/jit-reader -ex 'jit-reader-load /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/jit-reader/jit-reader.so' -ex r -batch

    This GDB supports auto-downloading debuginfo from the following URLs:
      <https://debuginfod.archlinux.org>
    Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
    Debuginfod has been disabled.
    To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".

    Program received signal SIGTRAP, Trace/breakpoint trap.
    Recursive internal problem.

The "Recusive internal problem" part is not good, but it's not the point
of this patch.  It still means we hit an internal error.

The stack trace is:

    #0  internal_error_loc (file=0x55555ebefb20 "/home/simark/src/binutils-gdb/gdb/frame.c", line=1207, fmt=0x55555ebef500 "%s: Assertion `%s' failed.") at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:53
    #1  0x0000555561604d83 in frame_register_unwind (next_frame=..., regnum=16, optimizedp=0x7ffff12e5a20, unavailablep=0x7ffff12e5a30, lvalp=0x7ffff12e5a40, addrp=0x7ffff12e5a60, realnump=0x7ffff12e5a50, buffer=...)
        at /home/simark/src/binutils-gdb/gdb/frame.c:1207
    #2  0x0000555561608334 in deprecated_frame_register_read (frame=..., regnum=16, myaddr=...) at /home/simark/src/binutils-gdb/gdb/frame.c:1496
    #3  0x0000555561a74259 in jit_unwind_reg_get_impl (cb=0x7ffff1049ca0, regnum=16) at /home/simark/src/binutils-gdb/gdb/jit.c:988
    bminor#4  0x00007fffd26e634e in read_register (callbacks=0x7ffff1049ca0, dw_reg=16, value=0x7fffffffb4c8) at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/jit-reader.c:100
    bminor#5  0x00007fffd26e645f in unwind_frame (self=0x50400000ac10, cbs=0x7ffff1049ca0) at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/jit-reader.c:143
    bminor#6  0x0000555561a74a12 in jit_frame_sniffer (self=0x55556374d040 <jit_frame_unwind>, this_frame=..., cache=0x5210002905f8) at /home/simark/src/binutils-gdb/gdb/jit.c:1042
    bminor#7  0x00005555615f499e in frame_unwind_try_unwinder (this_frame=..., this_cache=0x5210002905f8, unwinder=0x55556374d040 <jit_frame_unwind>) at /home/simark/src/binutils-gdb/gdb/frame-unwind.c:138
    bminor#8  0x00005555615f512c in frame_unwind_find_by_frame (this_frame=..., this_cache=0x5210002905f8) at /home/simark/src/binutils-gdb/gdb/frame-unwind.c:209
    bminor#9  0x00005555616178d0 in get_frame_type (frame=...) at /home/simark/src/binutils-gdb/gdb/frame.c:2996
    bminor#10 0x000055556282db03 in do_print_frame_info (uiout=0x511000027500, fp_opts=..., frame=..., print_level=0, print_what=SRC_AND_LOC, print_args=1, set_current_sal=1) at /home/simark/src/binutils-gdb/gdb/stack.c:1033

The problem is that function `jit_unwind_reg_get_impl` passes field
`gdb_reg_value::value`, a gdb_byte array of 1 element (used as a
flexible array member), as the array view parameter of
`deprecated_frame_register_read`.  This results in an array view of size
1.  The assertion in `frame_register_unwind` that verifies the passed in
buffer is larger enough to hold the unwound register value then fails.

Fix this by explicitly creating an array view of the right size.

Change-Id: Ie170da438ec9085863e7be8b455a067b531635dc
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
This resolves the following memory leak reported by ASAN:

Direct leak of 17 byte(s) in 1 object(s) allocated from:
    #0 0x3ffb32fbb1d in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x2aa149861cf in xmalloc ../../libiberty/xmalloc.c:149
    #2 0x2aa149868ff in xstrdup ../../libiberty/xstrdup.c:34
    #3 0x2aa1312391f in s390_machinemode ../../gas/config/tc-s390.c:2241
    bminor#4 0x2aa130ddc7b in read_a_source_file ../../gas/read.c:1293
    bminor#5 0x2aa1304f7bf in perform_an_assembly_pass ../../gas/as.c:1223
    bminor#6 0x2aa1304f7bf in main ../../gas/as.c:1436
    bminor#7 0x3ffb282be35 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    bminor#8 0x3ffb282bf33 in __libc_start_main_impl ../csu/libc-start.c:360
    bminor#9 0x2aa1305758f  (/home/jremus/git/binutils/build-asan/gas/as-new+0x2d5758f) (BuildId: ...)

gas/
	* config/tc-s390.c (s390_machinemode): Free mode_string before
	returning.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
guyush1 pushed a commit that referenced this pull request Feb 5, 2025
Simplify the .machine directive parsing logic, so that cpu_string is
always xstrdup'd and can therefore always be xfree'd before returning
to the caller.

This resolves the following memory leak reported by ASAN:

Direct leak of 13 byte(s) in 3 object(s) allocated from:
    #0 0x3ff8aafbb1d in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x2aa338861cf in xmalloc ../../libiberty/xmalloc.c:149
    #2 0x2aa338868ff in xstrdup ../../libiberty/xstrdup.c:34
    #3 0x2aa320253cb in s390_machine ../../gas/config/tc-s390.c:2172
    bminor#4 0x2aa31fddc7b in read_a_source_file ../../gas/read.c:1293
    bminor#5 0x2aa31f4f7bf in perform_an_assembly_pass ../../gas/as.c:1223
    bminor#6 0x2aa31f4f7bf in main ../../gas/as.c:1436
    bminor#7 0x3ff8a02be35 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    bminor#8 0x3ff8a02bf33 in __libc_start_main_impl ../csu/libc-start.c:360
    bminor#9 0x2aa31f5758f  (/home/jremus/git/binutils/build-asan/gas/as-new+0x2d5758f) (BuildId: ...)

While at it add tests with double quoted .machine
"<cpu>[+<extension>...]" values.

gas/
	* config/tc-s390.c (s390_machine): Simplify parsing and free
	cpu_string before returning.

gas/testsuite/
	* gas/s390/machine-parsing-1.l: Add tests with double quoted
	values.
	* gas/s390/machine-parsing-1.s: Likewise.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
guyush1 pushed a commit that referenced this pull request Apr 25, 2025
Recent work in the TUI has improved GDB's use of the curses
wnoutrefresh and doupdate mechanism, which improves performance by
batching together updates and then doing a single set of writes to the
screen when doupdate is finally called.

The tui_batch_rendering type is a RAII class which, in its destructor,
calls doupdate to send the batched updates to the screen.

However, if there is no tui_batch_rendering active on the call stack
then any wnoutrefresh calls will remain batched but undisplayed until
the next time doupdate happens to be called.

This problem can be seen in PR gdb/32623.  When an inferior is started
the 'Starting program' message is not immediately displayed to the
user.

The 'Starting program' message originates from run_command_1 in
infcmd.c, the message is sent to the current_uiout, which will be the
TUI ui_out.  After the message is sent, ui_out::flush() is called,
here's the backtrace when that happens:

  #0  tui_file::flush (this=0x36e4ab0) at ../../src/gdb/tui/tui-file.c:42
  #1  0x0000000001004f4b in pager_file::flush (this=0x36d35f0) at ../../src/gdb/utils.c:1531
  #2  0x0000000001004f71 in gdb_flush (stream=0x36d35f0) at ../../src/gdb/utils.c:1539
  #3  0x00000000006975ab in cli_ui_out::do_flush (this=0x35a50b0) at ../../src/gdb/cli-out.c:250
  bminor#4  0x00000000009fd1f9 in ui_out::flush (this=0x35a50b0) at ../../src/gdb/ui-out.h:263
  bminor#5  0x00000000009f56ad in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at ../../src/gdb/infcmd.c:449
  bminor#6  0x00000000009f599a in run_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:511

And if we check out tui_file::flush (tui-file.c) we can see that this
just calls tui_win_info::refresh_window(), which in turn, just uses
wnoutrefresh to batch any pending output.

The problem is that, in the above backtrace, there is no
tui_batch_rendering active, and so there will be no doupdate call to
flush the output to the screen.

We could add a tui_batch_rendering into tui_file::flush.  And
tui_file::write.  And tui_file::puts .....

... but that all seems a bit unnecessary.  Instead, I propose that
tui_win_info::refresh_window() should be changed.  If suppress_output
is true (i.e. a tui_batch_rendering is active) then we should continue
to call wnoutrefresh().  But if suppress_output is false, meaning that
no tui_batch_rendering is in place, then we should call wrefresh(),
which immediately writes the output to the screen.

Testing but PR gdb/32623 was a little involved.  We need to 'run' the
inferior and check for the 'Starting program' message.  But DejaGNUU
can only check for the message once it knows the message should have
appeared.  But, as the bug is that output is not displayed, we don't
have any output hints that the inferior is started yet...

In the end, I have the inferior create a file in the test's output
directory.  Now DejaGNU can send the 'run' command, and wait for the
file to appear.  Once that happens, we know that the 'Starting
program' message should have appeared.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32623

Approved-By: Tom Tromey <tom@tromey.com>
guyush1 pushed a commit that referenced this pull request Apr 25, 2025
Modifying inline-frame-cycle-unwind.exp to use `bt -no-filters` produces
the following incorrect backtrace:

  #0  inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49
  #1  normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32
  #2  0x000055555555517f in inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:50
  #3  normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  (gdb) FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1

The expected output, which we get with `bt`, is:

  #0  inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49
  #1  normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  (gdb) PASS: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1

The cycle checking in `get_prev_frame_maybe_check_cycle` relies on newer
frame ids having already been computed and stashed.  Unlike other
frames, frame #0's id does not get computed immediately.

The test passes with `bt` because when applying python frame filters,
the call to `bootstrap_python_frame_filters` happens to compute the id
of frame #0.  When `get_prev_frame_maybe_check_cycle` later tries to
stash frame #2's id, the cycle is detected.

The test fails with `bt -no-filters` because frame #0's id has not been
stashed by the time `get_prev_frame_maybe_check_cycle` tries to stash
frame #2's id which succeeds and the cycle is only detected later when
trying to stash frame bminor#4's id.  Doing `stepi` after the incorrect
backtrace would then trigger an assertion failure when trying to stash
frame #0's id because it is a duplicate of #2's already stashed id.

In `get_prev_frame_always_1`, if this_frame is inline frame 0, then
compute and stash its frame id before returning the previous frame.
This ensures that the id of inline frame 0 has been stashed before
`get_prev_frame_maybe_check_cycle` is called on older frames.

The test case has been updated to run both `bt` and `bt -no-filters`.

Co-authored-by: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32757
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
Recent work in the TUI has improved GDB's use of the curses
wnoutrefresh and doupdate mechanism, which improves performance by
batching together updates and then doing a single set of writes to the
screen when doupdate is finally called.

The tui_batch_rendering type is a RAII class which, in its destructor,
calls doupdate to send the batched updates to the screen.

However, if there is no tui_batch_rendering active on the call stack
then any wnoutrefresh calls will remain batched but undisplayed until
the next time doupdate happens to be called.

This problem can be seen in PR gdb/32623.  When an inferior is started
the 'Starting program' message is not immediately displayed to the
user.

The 'Starting program' message originates from run_command_1 in
infcmd.c, the message is sent to the current_uiout, which will be the
TUI ui_out.  After the message is sent, ui_out::flush() is called,
here's the backtrace when that happens:

  #0  tui_file::flush (this=0x36e4ab0) at ../../src/gdb/tui/tui-file.c:42
  guyush1#1  0x0000000001004f4b in pager_file::flush (this=0x36d35f0) at ../../src/gdb/utils.c:1531
  guyush1#2  0x0000000001004f71 in gdb_flush (stream=0x36d35f0) at ../../src/gdb/utils.c:1539
  guyush1#3  0x00000000006975ab in cli_ui_out::do_flush (this=0x35a50b0) at ../../src/gdb/cli-out.c:250
  bminor#4  0x00000000009fd1f9 in ui_out::flush (this=0x35a50b0) at ../../src/gdb/ui-out.h:263
  bminor#5  0x00000000009f56ad in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at ../../src/gdb/infcmd.c:449
  bminor#6  0x00000000009f599a in run_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:511

And if we check out tui_file::flush (tui-file.c) we can see that this
just calls tui_win_info::refresh_window(), which in turn, just uses
wnoutrefresh to batch any pending output.

The problem is that, in the above backtrace, there is no
tui_batch_rendering active, and so there will be no doupdate call to
flush the output to the screen.

We could add a tui_batch_rendering into tui_file::flush.  And
tui_file::write.  And tui_file::puts .....

... but that all seems a bit unnecessary.  Instead, I propose that
tui_win_info::refresh_window() should be changed.  If suppress_output
is true (i.e. a tui_batch_rendering is active) then we should continue
to call wnoutrefresh().  But if suppress_output is false, meaning that
no tui_batch_rendering is in place, then we should call wrefresh(),
which immediately writes the output to the screen.

Testing but PR gdb/32623 was a little involved.  We need to 'run' the
inferior and check for the 'Starting program' message.  But DejaGNUU
can only check for the message once it knows the message should have
appeared.  But, as the bug is that output is not displayed, we don't
have any output hints that the inferior is started yet...

In the end, I have the inferior create a file in the test's output
directory.  Now DejaGNU can send the 'run' command, and wait for the
file to appear.  Once that happens, we know that the 'Starting
program' message should have appeared.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32623

Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
I noticed that attempting to select a tail-call frame using 'frame
function NAME' wouldn't work:

  (gdb) bt
  #0  func_that_never_returns () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:49
  guyush1#1  0x0000000000401183 in func_that_tail_calls () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:59
  guyush1#2  0x00000000004011a5 in main () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:70
  (gdb) frame function func_that_tail_calls
  No frame for function "func_that_tail_calls".
  (gdb) up
  guyush1#1  0x0000000000401183 in func_that_tail_calls () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:59
  59	  func_that_never_returns ();
  (gdb) disassemble
  Dump of assembler code for function func_that_tail_calls:
     0x000000000040117a <+0>:	push   %rbp
     0x000000000040117b <+1>:	mov    %rsp,%rbp
     0x000000000040117e <+4>:	call   0x40116c <func_that_never_returns>
  End of assembler dump.
  (gdb)

The problem is that the 'function' mechanism uses get_frame_pc() and
then compares the address returned with the bounds of the function
we're looking for.

So in this case, the bounds of func_that_tail_calls are 0x40117a to
0x401183, with 0x401183 being the first address _after_ the function.

However, because func_that_tail_calls ends in a tail call, then the
get_frame_pc() is 0x401183, the first address after the function.  As
a result, GDB fails to realise that frame guyush1#1 is inside the function
we're looking for, and the lookup fails.

The fix is to use get_frame_address_in_block, which will return an
adjusted address, in this case, 0x401182, which is within the function
bounds.  Now the lookup works:

  (gdb) frame function func_that_tail_calls
  guyush1#1  0x0000000000401183 in func_that_tail_calls () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.base/frame-selection.c:59
  59	  func_that_never_returns ();
  (gdb)

I've extended the gdb.base/frame-selection.exp test to cover this
case.
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
Consider the test-case sources main.c and foo.c:

  $ cat main.c
  extern int foo (void);

  int
  main (void)
  {
    return foo ();
  }
  $ cat foo.c
  extern int foo (void);

  int
  foo (void)
  {
    return 0;
  }

and main.c compiled with debug info, and foo.c without:

  $ gcc -g main.c -c
  $ gcc foo.c -c
  $ gcc -g main.o foo.o

In TUI mode, if we run to foo:

  $ gdb -q a.out -tui -ex "b foo" -ex run

it gets us "[ No Source Available ]":

  ┌─main.c─────────────────────────────────────────┐
  │                                                │
  │                                                │
  │                                                │
  │            [ No Source Available ]             │
  │                                                │
  │                                                │
  └────────────────────────────────────────────────┘
  (src) In: foo                  L??   PC: 0x400566
  ...
  Breakpoint 1, 0x0000000000400566 in foo ()
  (gdb)

But after resizing (pressing ctrl-<minus> in the gnome-terminal), we
get instead the source for main.c:

  ┌─main.c─────────────────────────────────────────┐
  │        3 int                                   │
  │        4 main (void)                           │
  │        5 {                                     │
  │        6   return foo ();                      │
  │        7 }                                     │
  │                                                │
  │                                                │
  └────────────────────────────────────────────────┘
  (src) In: foo                   L??   PC: 0x400566
  ...
  Breakpoint 1, 0x0000000000400566 in foo ()
  (gdb)

which is inappropriate because we're stopped in function foo, which is
not in main.c.

The problem is that, when the window is resized, GDB ends up calling
tui_source_window_base::rerender.  The rerender function has three
cases, one for when the window already has some source code
content (which is not the case here), a case for when the inferior is
active, and we have a selected frame (which is the case that applies
here), and a final case for when the inferior is not running.

For the case which we end up in, the source code window has no
content, but the inferior is running, so we have a selected frame, GDB
calls the get_current_source_symtab_and_line() function to get the
symtab_and_line for the current location.

The get_current_source_symtab_and_line() will actually return the last
recorded symtab and line location, not the current symtab and line
location.

What this means, is that, if the current location has no debug
information, get_current_source_symtab_and_line() will return any
previously recorded location, or failing that, the default (main)
location.

This behaviour of get_current_source_symtab_and_line() also causes
problems for the 'list' command.  Consider this pure CLI session:

  (gdb) break foo
  Breakpoint 1 at 0x40110a
  (gdb) run
  Starting program: /tmp/a.out

  Breakpoint 1, 0x000000000040110a in foo ()
  (gdb) list
  1	extern int foo (void);
  2
  3	int
  4	main (void)
  5	{
  6	  return foo ();
  7	}
  (gdb) list .
  Insufficient debug info for showing source lines at current PC (0x40110a).
  (gdb)

However, if we look at how GDB's TUI updates the source window during
a normal stop, we see that GDB does a better job of displaying the
expected contents.  Going back to our original example, when we start
GDB with:

  $ gdb -q a.out -tui -ex "b foo" -ex run

we do get the "[ No Source Available ]" message as expected.  Why is
that?

The answer is that, in this case GDB uses tui_show_frame_info to
update the source window, tui_show_frame_info is called each time a
prompt is displayed, like this:

  #0  tui_show_frame_info (fi=...) at ../../src/gdb/tui/tui-status.c:269
  guyush1#1  0x0000000000f55975 in tui_refresh_frame_and_register_information () at ../../src/gdb/tui/tui-hooks.c:118
  guyush1#2  0x0000000000f55ae8 in tui_before_prompt (current_gdb_prompt=0x31ef930 <top_prompt+16> "(gdb) ") at ../../src/gdb/tui/tui-hooks.c:165
  guyush1#3  0x000000000090ea45 in std::_Function_handler<void(char const*), void (*)(char const*)>::_M_invoke (__functor=..., __args#0=@0x7ffc955106b0: 0x31ef930 <top_prompt+16> "(gdb) ") at /usr/include/c++/9/bits/std_function.h:300
  bminor#4  0x00000000009020df in std::function<void(char const*)>::operator() (this=0x5281260, __args#0=0x31ef930 <top_prompt+16> "(gdb) ") at /usr/include/c++/9/bits/std_function.h:688
  bminor#5  0x0000000000901c35 in gdb::observers::observable<char const*>::notify (this=0x31dda00 <gdb::observers::before_prompt>, args#0=0x31ef930 <top_prompt+16> "(gdb) ") at ../../src/gdb/../gdbsupport/observable.h:166
  bminor#6  0x00000000008ffed8 in notify_before_prompt (prompt=0x31ef930 <top_prompt+16> "(gdb) ") at ../../src/gdb/event-top.c:518
  bminor#7  0x00000000008fff08 in top_level_prompt () at ../../src/gdb/event-top.c:534
  bminor#8  0x00000000008ffdeb in display_gdb_prompt (new_prompt=0x0) at ../../src/gdb/event-top.c:487

If we look at how tui_show_frame_info figures out what source to
display, it doesn't use get_current_source_symtab_and_line(), instead,
it finds a symtab_and_line directly from a frame_info_pt.  This means
we are not dependent on get_current_source_symtab_and_line() returning
the current location (which it does not).

I propose that we change tui_source_window_base::rerender() so that,
for the case we are discussing here (the inferior has a selected
frame, but the source window has no contents), we move away from using
get_current_source_symtab_and_line(), and instead use find_frame_sal
instead, like tui_show_frame_info does.

This means that we will always use the inferior's current location.

Tested on x86_64-linux.

Reviewed-By: Tom de Vries <tdevries@suse.de>
Reported-By: Andrew Burgess <aburgess@redhat.com>
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32614
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
…get_file_names

PR 32742 shows this failing:

    $ make check TESTS="gdb.ada/access_to_unbounded_array.exp" RUNTESTFLAGS="--target_board=fission"
    Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.ada/access_to_unbounded_array.exp ...
    FAIL: gdb.ada/access_to_unbounded_array.exp: scenario=all: gdb_breakpoint: set breakpoint at foo.adb:23 (GDB internal error)

Or, interactively:

    $ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.ada/access_to_unbounded_array/foo-all -ex 'b foo.adb:23' -batch
    /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:19567: internal-error: set_lang: Assertion `old_value == language_unknown || old_value == language_minimal || old_value == lang' failed.

The symptom is that for a given dwarf2_per_cu, the language gets set
twice.  First, set to `language_ada`, and then, to `language_minimal`.
It's unexpected for the language of a CU to get changed like this.

The CU at offset 0x0 in the main file looks like:

    0x00000000: Compile Unit: length = 0x00000030, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000034)

    0x0000000b: DW_TAG_compile_unit
                  DW_AT_low_pc [DW_FORM_addr]       (0x000000000000339a)
                  DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000432)
                  DW_AT_stmt_list [DW_FORM_sec_offset]      (0x00000000)
                  DW_AT_GNU_dwo_name [DW_FORM_strp] ("b~foo.dwo")
                  DW_AT_comp_dir [DW_FORM_strp]     ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array")
                  DW_AT_GNU_pubnames [DW_FORM_flag_present] (true)
                  DW_AT_GNU_addr_base [DW_FORM_sec_offset]  (0x00000000)
                  DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x277aee54e7bd47f7)

This refers to the DWO file b~foo.dwo, whose top-level DIE is:

    .debug_info.dwo contents:
    0x00000000: Compile Unit: length = 0x00000b63, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000b67)

    0x0000000b: DW_TAG_compile_unit
                  DW_AT_producer [DW_FORM_GNU_str_index]    ("GNU Ada 14.2.1 20250207 -fgnat-encodings=minimal -gdwarf-4 -fdebug-types-section -fuse-ld=gold -gnatA -gnatWb -gnatiw -gdwarf-4 -gsplit-dwarf -ggnu-pubnames -gnatws -mtune=generic -march=x86-64")
                  DW_AT_language [DW_FORM_data1]    (DW_LANG_Ada95)
                  DW_AT_name [DW_FORM_GNU_str_index]        ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array/b~foo.adb")
                  DW_AT_comp_dir [DW_FORM_GNU_str_index]    ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array")
                  DW_AT_GNU_dwo_id [DW_FORM_data8]  (0xdbeffefab180a2cb)

The thing to note is that the language attribute is only present in the
DIE in the DWO file, not on the DIE in the main file.

The first time the language gets set is here:

    #0  dwarf2_per_cu::set_lang (this=0x50f0000044b0, lang=language_ada, dw_lang=DW_LANG_Ada95) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:20788
    guyush1#1  0x0000555561666af6 in cutu_reader::prepare_one_comp_unit (this=0x7ffff10bf2b0, cu=0x51700008e000, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:21029
    guyush1#2  0x000055556159f740 in cutu_reader::cutu_reader (this=0x7ffff10bf2b0, this_cu=0x50f0000044b0, per_objfile=0x516000066080, abbrev_table=0x510000004640, existing_cu=0x0, skip_partial=false, pretend_language=language_minimal, cache=0x7ffff11b95e0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3371
    guyush1#3  0x00005555615a547a in process_psymtab_comp_unit (this_cu=0x50f0000044b0, per_objfile=0x516000066080, storage=0x7ffff11b95e0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3799
    bminor#4  0x00005555615a9292 in cooked_index_worker_debug_info::process_cus (this=0x51700008dc80, task_number=0, first=std::unique_ptr<dwarf2_per_cu> = {...}, end=std::unique_ptr<dwarf2_per_cu> = {...}) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:4122

In this code path (particularly this specific cutu_reader constructir),
the work is done to find and read the DWO file.  So the language is
properly identifier as language_ada, all good so far.

The second time the language gets set is:

    #0  dwarf2_per_cu::set_lang (this=0x50f0000044b0, lang=language_minimal, dw_lang=0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:20788
    guyush1#1  0x0000555561666af6 in cutu_reader::prepare_one_comp_unit (this=0x7ffff0f42730, cu=0x517000091b80, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:21029
    guyush1#2  0x00005555615a1822 in cutu_reader::cutu_reader (this=0x7ffff0f42730, this_cu=0x50f0000044b0, per_objfile=0x516000066080, pretend_language=language_minimal, parent_cu=0x0, dwo_file=0x0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3464
    guyush1#3  0x000055556158c850 in dw2_get_file_names (this_cu=0x50f0000044b0, per_objfile=0x516000066080) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1956
    bminor#4  0x000055556158f4f5 in dw_expand_symtabs_matching_file_matcher (per_objfile=0x516000066080, file_matcher=...) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2157
    bminor#5  0x00005555616329e2 in cooked_index_functions::expand_symtabs_matching (this=0x50200002ab50, objfile=0x516000065780, file_matcher=..., lookup_name=0x0, symbol_matcher=..., expansion_notify=..., search_flags=..., domain=..., lang_matcher=...) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:15912
    bminor#6  0x0000555562ca8a14 in objfile::map_symtabs_matching_filename (this=0x516000065780, name=0x50200002ad90 "break pck.adb", real_path=0x0, callback=...) at /home/smarchi/src/binutils-gdb/gdb/symfile-debug.c:207
    bminor#7  0x0000555562d68775 in iterate_over_symtabs (pspace=0x513000005600, name=0x50200002ad90 "break pck.adb", callback=...) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:727

Here, we use the other cutu_reader constructor, the one that does not
look up the DWO file for the passed CU.  If a DWO file exists for this
CU, the caller is expected to pass it as a parameter.  That cutu_reader
constructor also ends up setting the language of the CU.  But because it
didn't read the DWO file, it didn't figure out the language is
language_ada, so it tries to set the language to the default,
language_minimal.

A question is: why do we end up trying to set the CU's language is this
context.  This is completely unrelated to what we're trying to do, that
is get the file names from the line table.  Setting the language is a
side-effect of just constructing a cutu_reader, which we need to look up
attributes in dw2_get_file_names_reader.  There are probably some
cleanups to be done here, to avoid doing useless work like looking up
and setting the CU's language when all we need is an object to help
reading the DIEs and attributes.  But that is future work.

The same cutu_reader constructor is used in
`dwarf2_per_cu::ensure_lang`.  Since this is the version of cutu_reader
that does not look up the DWO file, it will conclude that the language
is language_minimal and set that as the CU's language.  In other words,
`dwarf2_per_cu::ensure_lang` will get the language wrong, pretty ironic.

Fix this by using the other cutu_reader constructor in those two spots.
Pass `per_objfile->get_cu (this_cu)`, as the `existing_cu` parameter.  I
think this is necessary, because that constructor has an assert to check
that if `existing_cu` is nullptr, then there must not be an existing
`dwarf2_cu` in the per_objfile.

To avoid getting things wrong like this, I think that the second
cutu_reader constructor should be reserved for the spots that do pass a
non-nullptr dwo_file.  The only spot at the moment in
create_cus_hash_table, where we read multiple units from the same DWO
file.  In this context, I guess it makes sense for efficiency to get the
dwo_file once and pass it down to cutu_reader.  For that constructor,
make the parameters non-optional, add "non-nullptr" asserts, and update
the code to assume the passed values are not nullptr.

What I don't know is if this change is problematic thread-wise, if the
functions I have modified to use the other cutu_reader constructor can
be called concurrently in worker threads.  If so, I think it would be
problematic.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32742
Change-Id: I980d16875b9a43ab90e251504714d0d41165c7c8
Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
…_all_symtabs

Commit 2920415 ("gdb/dwarf: use ranged for loop in some spots")
broke some tests notably gdb.base/maint.exp with the fission board.

    $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/maint/maint -ex start -ex "maint expand-sym" -batch
    ...
    Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdc48, envp=0x7fffffffdc58) at /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.base/break.c:43
    43          if (argc == 12345) {  /* an unlikely value < 2^16, in case uninited */ /* set breakpoint 6 here */
    /usr/include/c++/14.2.1/debug/safe_iterator.h:392:
    In function:
        gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>&
        gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>::operator++()
        [with _Iterator = gnu_cxx::
        normal_iterator<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter>*,
        std::vector<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter>,
        std::allocator<std::unique_ptr<dwarf2_per_cu, dwarf2_per_cu_deleter> > >
        >; _Sequence = std::debug::vector<std::unique_ptr<dwarf2_per_cu,
        dwarf2_per_cu_deleter> >; _Category = std::forward_iterator_tag]

    Error: attempt to increment a singular iterator.

Note that this is caught because I build with -D_GLIBCXX_DEBUG=1.
Otherwise, it might crash more randomly, or just not crash at all (but
still be buggy).

While iterating on the all_units vector, some type units get added
there:

    #0  add_type_unit (per_bfd=0x51b000044b80, section=0x50e0000c2280, sect_off=0, length=74, sig=4367013491293299229) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2576
    guyush1#1  0x00005555618a3a40 in lookup_dwo_signatured_type (cu=0x51700009b580, sig=4367013491293299229) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2664
    guyush1#2  0x00005555618ee176 in queue_and_load_dwo_tu (dwo_unit=0x521000120e00, cu=0x51700009b580) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:8329
    guyush1#3  0x00005555618eeafe in queue_and_load_all_dwo_tus (cu=0x51700009b580) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:8366
    bminor#4  0x00005555618966a6 in dw2_do_instantiate_symtab (per_cu=0x50f0000043c0, per_objfile=0x516000065a80, skip_partial=true) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1695
    bminor#5  0x00005555618968d4 in dw2_instantiate_symtab (per_cu=0x50f0000043c0, per_objfile=0x516000065a80, skip_partial=true) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1719
    bminor#6  0x000055556189ac3f in dwarf2_base_index_functions::expand_all_symtabs (this=0x502000024390, objfile=0x516000065780) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1977

This invalidates the iterator in
dwarf2_base_index_functions::expand_all_symtabs, which is caught by the
libstdc++ debug mode.

I'm not entirely sure that it is correct to append type units from dwo
files to the all_units vector like this.  The
dwarf2_find_containing_comp_unit function expects a precise ordering of
the elements of the all_units vector, to be able to do a binary search.
Appending a type unit at the end at this point certainly doesn't respect
that ordering.

For now I'd just like to undo the regression.  Do that by using
all_units_range in the ranged for loop.  I will keep in mind to
investigate whether this insertion of type units in all_units after the
fact really makes sense or not.

Change-Id: Iec131e59281cf2dbd12d3f3d163b59018fdc54da
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
On Debian 12, with gcc 12 and ld 2.40, I get some failures when running:

    $ make check TESTS="gdb.base/style.exp" RUNTESTFLAGS="--target_board=fission"

I think I stumble on this bug [1], preventing the test from doing
anything that requires expanding the compilation unit:

    $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/style/style
    Reading symbols from testsuite/outputs/gdb.base/style/style...
    (gdb) p main
    DW_FORM_strp pointing outside of .debug_str section [in module /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/style/style]
    (gdb)

The error is thrown here:

    #0  0x00007ffff693f0a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
    guyush1#1  0x0000555569ce6852 in throw_it(return_reason, errors, const char *, typedef __va_list_tag __va_list_tag *) (reason=RETURN_ERROR, error=GENERIC_ERROR, fmt=0x555562a9fc40 "%s pointing outside of %s section [in module %s]", ap=0x7fffffff8df0) at /home/smarchi/src/binutils-gdb/gdbsupport/common-exceptions.cc:203
    guyush1#2  0x0000555569ce690f in throw_verror (error=GENERIC_ERROR, fmt=0x555562a9fc40 "%s pointing outside of %s section [in module %s]", ap=0x7fffffff8df0) at /home/smarchi/src/binutils-gdb/gdbsupport/common-exceptions.cc:211
    guyush1#3  0x000055556879c0cb in verror (string=0x555562a9fc40 "%s pointing outside of %s section [in module %s]", args=0x7fffffff8df0) at /home/smarchi/src/binutils-gdb/gdb/utils.c:193
    bminor#4  0x0000555569cfa88d in error (fmt=0x555562a9fc40 "%s pointing outside of %s section [in module %s]") at /home/smarchi/src/binutils-gdb/gdbsupport/errors.cc:45
    bminor#5  0x000055556667dbff in dwarf2_section_info::read_string (this=0x61b000042a08, objfile=0x616000055e80, str_offset=262811, form_name=0x555562886b40 "DW_FORM_strp") at /home/smarchi/src/binutils-gdb/gdb/dwarf2/section.c:211
    bminor#6  0x00005555662486b7 in dwarf_decode_macro_bytes (per_objfile=0x616000056180, builder=0x614000006040, abfd=0x6120000f4b40, mac_ptr=0x60300004f5be "", mac_end=0x60300004f5bb "\002\004", current_file=0x62100007ad70, lh=0x60f000028bd0, section=0x61700008ba78, section_is_gnu=1, section_is_dwz=0, offset_size=4, str_section=0x61700008bac8, str_offsets_section=0x61700008baf0, str_offsets_base=std::optional<unsigned long> = {...}, include_hash=..., cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:511
    bminor#7  0x000055556624af0e in dwarf_decode_macros (per_objfile=0x616000056180, builder=0x614000006040, section=0x61700008ba78, lh=0x60f000028bd0, offset_size=4, offset=0, str_section=0x61700008bac8, str_offsets_section=0x61700008baf0, str_offsets_base=std::optional<unsigned long> = {...}, section_is_gnu=1, cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:934
    bminor#8  0x000055556642cb82 in dwarf_decode_macros (cu=0x61700008b600, offset=0, section_is_gnu=1) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:19435
    bminor#9  0x000055556639bd12 in read_file_scope (die=0x6210000885c0, cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:6366
    bminor#10 0x0000555566392d99 in process_die (die=0x6210000885c0, cu=0x61700008b600) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:5310
    bminor#11 0x0000555566390d72 in process_full_comp_unit (cu=0x61700008b600, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:5075

The exception is then only caught at the event-loop level
(start_event_loop), causing the whole debug info reading process to be
aborted.  I think it's a little harsh, considering that a lot of things
could work even if we failed to read macro information.

Catch the exception inside read_file_scope, print the exception, and
carry on.  We could go even more fine-grained: if reading the string for
one macro definition fails, we could continue reading the macro
information.  Perhaps it's just that one macro definition that is
broken.  However, I don't need this level of granularity, so I haven't
attempted this.  Also, my experience is that macro reading fails when
the compiler or linker has a bug, in which case pretty much everything
is messed up.

With this patch, it now looks like:

    $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/style/style
    Reading symbols from testsuite/outputs/gdb.base/style/style...
    (gdb) p main
    While reading section .debug_macro.dwo: DW_FORM_strp pointing outside of .debug_str section [in module /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/style/style]
    $1 = {int (int, char **)} 0x684 <main>
    (gdb)

In the test I am investigating (gdb.base/style.exp with the fission
board), it allows more tests to run:

    -# of expected passes           107
    -# of unexpected failures       17
    +# of expected passes           448
    +# of unexpected failures       19

Of course, we still see the error about the macro information, and some
macro-related tests still fail (those would be kfailed ideally), but
many tests that are not macro-dependent now pass.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409

Change-Id: I0bdb01f153eff23c63c96ce3f41114bb027e5796
Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
Currently GDB's source cache doesn't track whether the entries within
the cache are styled or not.  This is pretty much fine, the assumption
is that any time we are fetching source code, we do so in order to
print it to the terminal, so where possible we always want styling
applied, and if styling is not applied, then it is because that file
cannot be styled for some reason.

Changes to 'set style enabled' cause the source cache to be flushed,
so future calls to fetch source code will regenerate the cache entries
with styling enabled or not as appropriate.

But this all assumes that styling is either on or off, and that
switching between these two states isn't done very often.

However, the Python API allows for individual commands to be executed
with styling turned off via gdb.execute().  See commit:

  commit e5348a7
  Date:   Thu Feb 13 15:39:31 2025 +0000

      gdb/python: new styling argument to gdb.execute

Currently the source cache doesn't handle this case. Consider this:

  (gdb) list main
  ... snip, styled source code displayed here ...
  (gdb) python gdb.execute("list main", True, False, False)
  ... snip, styled source code is still shown here ...

In the second case, the final `False` passed to gdb.execute() is
asking for unstyled output.

The problem is that, `get_source_lines` calls `ensure` to prime the
cache for the file in question, then `extract_lines` just pulls the
lines of interest from the cached contents.

In `ensure`, if there is a cache entry for the desired filename, then
that is considered good enough.  There is no consideration about
whether the cache entry is styled or not.

This commit aims to fix this, after this commit, the `ensure` function
will make sure that the cache entry used by `get_source_lines` is
styled correctly.

I think there are two approaches I could take:

  1. Allow multiple cache entries for a single file, a styled, and
     non-styled entry.  The `ensure` function would then place the
     correct cache entry into the last position so that
     `get_source_lines` would use the correct entry, or

  2. Have `ensure` recalculate entries if the required styling mode is
     different to the styling mode of the current entry.

Approach guyush1#1 is better if we are rapidly switching between styling
modes, while guyush1#2 might be better if we want to keep more files in the
cache and we only rarely switch styling modes.

In the end I chose approach guyush1#2, but the good thing is that the changes
are all contained within the `ensure` function.  If in the future we
wanted to change to strategy guyush1#1, this could be done transparently to
the rest of GDB.

So after this commit, the `ensure` function checks if styling is
currently possible or not.  If it is not, and the current entry is
styled, then the current entry only is dropped from the cache, and a
new, unstyled entry is created.  Likewise, if the current entry is
non-styled, but styling is required, we drop one entry and
recalculate.

With this change in place, I have updated set_style_enabled (in
cli/cli-style.c) so the source cache is no longer flushed when the
style settings are changed, the source cache will automatically handle
changes to the style settings now.

This problem was discovered in PR gdb/32676.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32676

Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
Modifying inline-frame-cycle-unwind.exp to use `bt -no-filters` produces
the following incorrect backtrace:

  #0  inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49
  guyush1#1  normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32
  guyush1#2  0x000055555555517f in inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:50
  guyush1#3  normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  (gdb) FAIL: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1

The expected output, which we get with `bt`, is:

  #0  inline_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:49
  guyush1#1  normal_func () at .../gdb/gdb/testsuite/gdb.base/inline-frame-cycle-unwind.c:32
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  (gdb) PASS: gdb.base/inline-frame-cycle-unwind.exp: cycle at level 1: backtrace when the unwind is broken at frame 1

The cycle checking in `get_prev_frame_maybe_check_cycle` relies on newer
frame ids having already been computed and stashed.  Unlike other
frames, frame #0's id does not get computed immediately.

The test passes with `bt` because when applying python frame filters,
the call to `bootstrap_python_frame_filters` happens to compute the id
of frame #0.  When `get_prev_frame_maybe_check_cycle` later tries to
stash frame guyush1#2's id, the cycle is detected.

The test fails with `bt -no-filters` because frame #0's id has not been
stashed by the time `get_prev_frame_maybe_check_cycle` tries to stash
frame guyush1#2's id which succeeds and the cycle is only detected later when
trying to stash frame bminor#4's id.  Doing `stepi` after the incorrect
backtrace would then trigger an assertion failure when trying to stash
frame #0's id because it is a duplicate of guyush1#2's already stashed id.

In `get_prev_frame_always_1`, if this_frame is inline frame 0, then
compute and stash its frame id before returning the previous frame.
This ensures that the id of inline frame 0 has been stashed before
`get_prev_frame_maybe_check_cycle` is called on older frames.

The test case has been updated to run both `bt` and `bt -no-filters`.

Co-authored-by: Andrew Burgess <aburgess@redhat.com>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
Consider this backtrace within GDB:

  #0  notify_breakpoint_modified (b=0x57d31d0) at ../../src/gdb/breakpoint.c:1083
  guyush1#1  0x00000000005b6406 in breakpoint_set_commands (b=0x57d31d0, commands=...) at ../../src/gdb/breakpoint.c:1523
  guyush1#2  0x00000000005c8c63 in update_dprintf_command_list (b=0x57d31d0) at ../../src/gdb/breakpoint.c:8641
  guyush1#3  0x00000000005d3c4e in dprintf_breakpoint::re_set (this=0x57d31d0) at ../../src/gdb/breakpoint.c:12476
  bminor#4  0x00000000005d6347 in breakpoint_re_set () at ../../src/gdb/breakpoint.c:13298

Whenever breakpoint_re_set is called we re-build the commands that the
dprintf b/p will execute and store these into the breakpoint.  The
commands are re-built in update_dprintf_command_list and stored into
the breakpoint object in breakpoint_set_commands.

Now sometimes these commands can change, dprintf_breakpoint::re_set
explains one case where this can occur, and I'm sure there must be
others.  But in most cases the commands we recalculate will not
change.  This means that the breakpoint modified event which is
emitted from breakpoint_set_commands is redundant.

This commit aims to eliminate the redundant breakpoint modified events
for dprintf breakpoints.  This is done by adding a commands_equal call
to the start of breakpoint_set_commands.

The commands_equal function is a new function which compares two
command_line objects and returns true if they are identical.  Using
this function we can check if the new commands passed to
breakpoint_set_commands are identical to the breakpoint's existing
commands.  If the new commands are equal then we don't need to change
anything on the new breakpoint, and the breakpoint modified event can
be skipped.

The test for this commit stops at a dlopen() call in the inferior,
sets up a dprintf breakpoint, then uses 'next' to step over the
dlopen() call.  When the library loads GDB call breakpoint_re_set,
which calls dprintf_breakpoint::re_set.  But in this case we don't
expect the calculated command string to change, so we don't expect to
see the breakpoint modified event.
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
When building with gcc, with flags -gdwarf-5, -gsplit-dwarf and
-fdebug-types-section, the resulting .dwo files contain multiple
.debug_info.dwo sections.  One for each type unit and one for the
compile unit.  This is correct, as per DWARF 5, section F.2.3 ("Contents
of the Split DWARF Object Files"):

    The split DWARF object files each contain the following sections:

        ...
        .debug_info.dwo (for the compilation unit)
        .debug_info.dwo (one COMDAT section for each type unit)
	...

GDB currently assumes that there is a single .debug_info.dwo section,
causing unpredictable behavior.  For example, sometimes this crash:

    ==81781==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x508000007a71 at pc 0x58704d32a59c bp 0x7ffc0acc0bb0 sp 0x7ffc0acc0ba0
    READ of size 1 at 0x508000007a71 thread T0
        #0 0x58704d32a59b in bfd_getl32 /home/smarchi/src/binutils-gdb/bfd/libbfd.c:846
        guyush1#1 0x58704ae62dce in read_initial_length(bfd*, unsigned char const*, unsigned int*, bool) /home/smarchi/src/binutils-gdb/gdb/dwarf2/leb.c:92
        guyush1#2 0x58704aaf76bf in read_comp_unit_head(comp_unit_head*, unsigned char const*, dwarf2_section_info*, rcuh_kind) /home/smarchi/src/binutils-gdb/gdb/dwarf2/comp-unit-head.c:47
        guyush1#3 0x58704aaf8f97 in read_and_check_comp_unit_head(dwarf2_per_objfile*, comp_unit_head*, dwarf2_section_info*, dwarf2_section_info*, unsigned char const*, rcuh_kind) /home/smarchi/src/binutils-gdb/gdb/dwarf2/comp-unit-head.c:193
        bminor#4 0x58704b022908 in create_dwo_unit_hash_tables /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:6233
        bminor#5 0x58704b0334a5 in open_and_init_dwo_file /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:7588
        bminor#6 0x58704b03965a in lookup_dwo_cutu /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:7935
        bminor#7 0x58704b03a5b1 in lookup_dwo_comp_unit /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:8009
        bminor#8 0x58704aff5b70 in lookup_dwo_unit /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2802

The first time that locate_dwo_sections gets called for a
.debug_info.dwo section, dwo_sections::info gets initialized properly.
The second time it gets called for a .debug_info.dwo section, the size
field in dwo_sections::info gets overwritten with the size of the second
section.  But the buffer remains pointing to the contents of the first
section, because the section is already "read in".  So the size does not
match the buffer.  And even if it did, we would only keep the
information about one .debug_info.dwo, out of the many.

First, add an assert in locate_dwo_sections to make sure we don't
try to fill in a dwo section info twice.  Add the assert to other
functions with the same pattern, while at it.

Then, change dwo_sections::info to be a vector of sections (just like we
do for type sections).  Update locate_dwo_sections to append to that
vector when seeing a new .debug_info.dwo section.  Update
open_and_init_dwo_file to read the units from each section.

The problem can be observed by running some tests with the
dwarf5-fission-debug-types target board.  For example,
gdb.base/condbreak.exp crashes (with the ASan failure shown above)
before the patch and passes after).

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119766

Change-Id: Iedf275768b6057dee4b1542396714f3d89903cf3
Reviewed-By: Tom de Vries <tdevries@suse.de>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
On arm-linux, with test-case gdb.python/py-missing-objfile.exp I get:
...
(gdb) whatis global_exec_var^M
type = volatile exec_type^M
(gdb) FAIL: $exp: initial sanity check: whatis global_exec_var
...
instead of the expected "type = volatile struct exec_type".

The problem is that the current language is ASM instead of C, because the
inner frame at the point of the core dump has language ASM:
...
 #0  __libc_do_syscall () at libc-do-syscall.S:47
 guyush1#1  0xf7882920 in __pthread_kill_implementation () at pthread_kill.c:43
 guyush1#2  0xf784df22 in __GI_raise (sig=sig@entry=6) at raise.c:26
 guyush1#3  0xf783f03e in __GI_abort () at abort.c:73
 bminor#4  0x009b0538 in dump_core () at py-missing-objfile.c:34
 bminor#5  0x009b0598 in main () at py-missing-objfile.c:46
...

Fix this by manually setting the language to C.

Tested on arm-linux and x86_64-linux.

PR testsuite/32445
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32445
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
I decided to try to build and test gdb on Windows.

I found a page on the wiki [1] suggesting three ways of building gdb:
- MinGW,
- MinGW on Cygwin, and
- Cygwin.

I picked Cygwin, because I've used it before (though not recently).

I managed to install Cygwin and sufficient packages to build gdb and start the
testsuite.

However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp.
[ AFAICT, similar problems reported here [2]. ]

I managed to reproduce this hang by running just the test-case.

I attempted to kill the hanging processes by:
- first killing the inferior process, using the cygwin "kill -9" command, and
- then killing the gdb process, likewise.

But the gdb process remained, and I had to point-and-click my way through task
manager to actually kill the gdb process.

I investigated this by attaching to the hanging gdb process.  Looking at the
main thread, I saw it was stopped in a call to WaitForSingleObject, with
the dwMilliseconds parameter set to INFINITE.

The backtrace in more detail:
...
(gdb) bt
 #0  0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from
     /cygdrive/c/windows/SYSTEM32/ntdll.dll
 guyush1#1  0x00007fff16bbcdcf in WaitForSingleObjectEx () from
     /cygdrive/c/windows/System32/KERNELBASE.dll
 guyush1#2  0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=4294967295) at
     gdb/windows-nat.c:435
 guyush1#3  0x0000000100999aa7 in
     windows_nat_target::do_synchronously(gdb::function_view<bool ()>)
       (this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487
 bminor#4  0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread
     (event=<optimized out>, this=0xa001c6fe0)
     at gdb/../gdbsupport/function-view.h:296
 bminor#5  windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917
 bminor#6  0x00000001008f2f86 in target_kill () at gdb/target.c:901
 bminor#7  0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0)
     at gdb/top.c:1658
 bminor#8  quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0)
     at gdb/top.c:1759
 bminor#9  0x00000001004f9ea8 in quit_command (args=args@entry=0x0,
     from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483
 bminor#10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295
 bminor#11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>)
     at gdb/event-top.c:1496
 bminor#12 0x0000000100499c45 in invoke_async_signal_handlers ()
     at gdb/async-event.c:233
 bminor#13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1)
     at gdbsupport/event-loop.cc:198
 bminor#14 0x00000001006df94a in interp::do_one_event (mstimeout=-1,
     this=<optimized out>) at gdb/interps.h:87
 bminor#15 start_event_loop () at gdb/main.c:402
 bminor#16 captured_command_loop () at gdb/main.c:466
 bminor#17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346
 #18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365
 #19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38
...

In the docs [3], I read that using an INFINITE argument to WaitForSingleObject
might cause a system deadlock.

This prompted me to try this simple change in wait_for_single:
...
   while (true)
     {
-      DWORD r = WaitForSingleObject (handle, howlong);
+      DWORD r = WaitForSingleObject (handle,
+                                     howlong == INFINITE ? 100 : howlong);
+      if (howlong == INFINITE && r == WAIT_TIMEOUT)
+        continue;
...
with the timeout of 0.1 second estimated to be:
- small enough for gdb to feel reactive, and
- big enough not to consume too much cpu cycles with looping.

And indeed, the test-case, while still failing, now finishes in ~50 seconds.

While there may be an underlying bug that triggers this behaviour, the failure
mode is so severe that I consider it a bug in itself.

Fix this by avoiding calling WaitForSingleObject with INFINITE argument.

Tested on x86_64-cygwin, by running the testsuite past the test-case.

Approved-By: Pedro Alves <pedro@palves.net>

PR tdep/32894
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32894

[1] https://sourceware.org/gdb/wiki/BuildingOnWindows
[2] https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
[3] https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
This commit fixes a couple of issues relating to the pagination
prompt and styling.  The pagination prompt is this one:

  --Type <RET> for more, q to quit, c to continue without paging--

I did try to split this into multiple patches, based on the three
issues I describe below, but in the end, the fixes were all too
interconnected, so it ended up as one patch that makes two related,
but slightly different changes:

  1. Within the pager_file class, relying on the m_applied_style
  attribute of the wrapped m_stream, as is done when calling
  m_stream->emit_style_escape, is not correct, so stop doing that, and

  2. Failing to update m_applied_style within the pager_file class can
  leave that attribute out of date, which can then lead to styling
  errors later on, so ensure m_applied_style is always updated.

The problems I have seen are:

  1. After quitting from a pagination prompt, the next command can
  incorrectly style its output.  This was reported as bug PR
  gdb/31033, and is fixed by this commit.

  2. The pagination prompt itself could be styled.  The pagination
  prompt should always be shown in the default style.

  3. After continuing the output at a pagination prompt, GDB can fail
  to restore the default style the next time the output (within the
  same command) switches back to the default style.

There are tests for all these issues as part of this patch.

The pager_file class is a sub-class of wrapped_file, this means that a
pager_file is itself a ui_file, while it also manages a pointer to a
ui_file object (called m_stream).  An instance of pager_file can be
installed as the gdb_stdout ui_file object.

Output sent to a pager_file is stored within an internal
buffer (called m_wrap_buffer) until we have a complete line, when the
content is flushed to the wrapped m_stream.  If sufficient lines have
been written out then the pager_file will present the pagination
prompt and allow the user to continue viewing output, or quit the
current command.

As a pager_file is a ui_file, it has an m_applied_style member
variable.

The managed stream (m_stream) is also a ui_file, and so also has an
m_applied_style member variable.

In some places within the pager_file class we attempt to change the
current style of the m_stream using calls like this:

  m_stream->emit_style_escape (style);

See pager_file::emit_style_escape, pager_file::prompt_for_continue,
and pager_file::puts.  These calls will end up in
ui_file::emit_style_escape, which tries to skip emitting unnecessary
style escapes by checking if the requested style matches the current
m_applied_style value.

The m_applied_style value is updated by calls to the emit_style_escape
function.

The problem here is that most of the time pager_file doesn't change
the style of m_stream by calling m_stream->emit_style_escape.  Most of
the time, style changes are performed by pager_file writing the escape
sequence into m_wrap_buffer, and then later flushing this buffer to
m_stream by calling m_stream->puts.

It has to be done this way.  Calling m_stream->emit_style_escape
would, if it actually changed the style, immediately change the style
by emitting an escape sequence.  But pager_file doesn't want that, it
wants the style change to happen later, when m_wrap_buffer is
flushed.

To avoid excessive style escape sequences being written into
m_wrap_buffer, the pager_file::m_applied_style performs a function
similar to the m_applied_style within m_stream, it tracks the current
style for the end of m_wrap_buffer, and only allows style escape
sequences to be emitted if the style is actually changing.

However, a consequence of this is the m_applied_style within m_stream,
is not updated, which means it will be out of sync with the actual
current style of m_stream.  If we then try to make a call to
m_stream->emit_style_escape, if the style we are changing too happens
to match the out of date style in m_stream->m_applied_style, then the
style change will be ignored.

And this is indeed what we see in pager_file::prompt_for_continue with
the call:

  m_stream->emit_style_escape (ui_file_style ());

As m_stream->m_applied_style is not being updated, it will always be
the default style, however m_stream itself might not actually be in
the default style.  This call then will not emit an escape sequence as
the desired style matches the out of date m_applied_style.

The fix in this case is to call m_stream->puts directly, passing in
the escape sequence for the desired style.  This will result in an
immediate change of style for m_stream, which fixes some of the
problems described above.

In fact, given that m_stream's m_applied_style is always going to be
out of sync, I think we should change all of the
m_stream->emit_style_escape calls to instead call m_stream->puts.

However, just changing to use puts doesn't fix all the problems.

I found that, if I run 'apropos time', then quit at the first
pagination prompt.  If for the next command I run 'maintenance time' I
see the expected output:

  "maintenance time" takes a numeric argument.

However, everything after the first double quote is given the command
name style rather than only styling the text between the double
quotes.

Here is GDB's stack while printing the above output:

  guyush1#2  0x0000000001050d56 in ui_out::vmessage (this=0x7fff1238a150, in_style=..., format=0x1c05af0 "", args=0x7fff1238a288) at ../../src/gdb/ui-out.c:754
  guyush1#3  0x000000000104db88 in ui_file::vprintf (this=0x3f9edb0, format=0x1c05ad0 "\"%ps\" takes a numeric argument.\n", args=0x7fff1238a288) at ../../src/gdb/ui-file.c:73
  bminor#4  0x00000000010bc754 in gdb_vprintf (stream=0x3f9edb0, format=0x1c05ad0 "\"%ps\" takes a numeric argument.\n", args=0x7fff1238a288) at ../../src/gdb/utils.c:1905
  bminor#5  0x00000000010bca20 in gdb_printf (format=0x1c05ad0 "\"%ps\" takes a numeric argument.\n") at ../../src/gdb/utils.c:1945
  bminor#6  0x0000000000b6b29e in maintenance_time_display (args=0x0, from_tty=1) at ../../src/gdb/maint.c:128

The interesting frames here are guyush1#3, in here `this` is the pager_file
for GDB's stdout, and this passes its m_applied_style to frame guyush1#2 as
the `in_style` argument.

If the m_applied_style is wrong, then frame guyush1#2 will believe that the
wrong style is currently in use as the default style, and so, after
printing 'maintenance time' GDB will switch back to the wrong style.

So the question is, why is pager_file::m_applied_style wrong?

In pager_file::prompt_for_continue, there is an attempt to switch back
to the default style using:

  m_stream->emit_style_escape (ui_file_style ());

If this is changed to a puts call (see above) then this still leaves
pager_file::m_applied_style out of date.

The right fix in this case is, I think, to instead do this:

  this->emit_style_escape (ui_file_style ());

this will update pager_file::m_applied_style, and also send the
default style to m_stream using a puts call.

While writing the tests I noticed that I was getting unnecessary style
reset sequences emitted.

The problem is that, around pagination, we don't really know what
style is currently applied to m_stream.  The
pager_file::m_applied_style tracks the style at the end of
m_wrap_buffer, but this can run ahead of the current m_stream style.
For example, if the screen is currently full, such that the next
character of output will trigger the pagination prompt, if the next
call is actually to pager_file::emit_style_escape, then
pager_file::m_applied_style will be updated, but the style of m_stream
will remain unchanged.  When the next character is written to
pager_file::puts then the pagination prompt will be presented, and GDB
will try to switch m_stream back to the default style.  Whether an
escape is emitted or not will depend on the m_applied_style value,
which we know is different than the actual style of m_stream.

It is, after all, only when m_wrap_buffer is flushed to m_stream that
the style of m_stream actually change.

And so, this commit also adds pager_file::m_stream_style.  This new
variable tracks the current style of m_stream.  This really is a
replacement for m_stream's ui_file::m_applied_style, which is not
accessible from pager_file.

When content is flushed from m_wrap_buffer to m_stream then the
current value of pager_file::m_applied_style becomes the current style
of m_stream.  But, when m_wrap_buffer is filling up, but before it is
flushed, then pager_file::m_applied_style can change, but
m_stream_style will remain unchanged.

Now in pager_file::emit_style_escape we are able to skip some of the
direct calls to m_stream->puts() used to emit style escapes.

After all this there are still a few calls to
m_stream->emit_style_escape().  These are all in the wrap_here support
code.  I think that these calls are technically broken, but don't
actually cause any issues due to the way styling works in GDB.  I
certainly haven't been able to trigger any bugs from these calls yet.
I plan to "fix" these in the next commit just for completeness.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31033

Approved-By: Tom Tromey <tom@tromey.com>
FanFansfan pushed a commit to FanFansfan/binutils-gdb that referenced this pull request Jul 17, 2025
When running gdb.base/foll-fork-syscall.exp with a GDB built with UBSan,
I get:

    /home/simark/src/binutils-gdb/gdb/linux-nat.c:1906:28: runtime error: load of value 3200171710, which is not a valid value for type 'target_waitkind'
    ERROR: GDB process no longer exists
    GDB process exited with wait status 3026417 exp9 0 1
    UNRESOLVED: gdb.base/foll-fork-syscall.exp: follow-fork-mode=child: detach-on-fork=on: test_catch_syscall: continue to breakpoint after fork

The error happens here:

    #0  __sanitizer::Die () at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_termination.cpp:50
    guyush1#1  0x00007ffff600d8dd in __ubsan::__ubsan_handle_load_invalid_value_abort (Data=<optimized out>, Val=<optimized out>) at /usr/src/debug/gcc/gcc/libsanitizer/ubsan/ubsan_handlers.cpp:551
    guyush1#2  0x00005555636d37b6 in linux_handle_syscall_trap (lp=0x7cdff1eb1b00, stopping=0) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:1906
    guyush1#3  0x00005555636e0991 in linux_nat_filter_event (lwpid=3030627, status=1407) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3044
    bminor#4  0x00005555636e407f in linux_nat_wait_1 (ptid=..., ourstatus=0x7bfff0d6cf18, target_options=...) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3381
    bminor#5  0x00005555636e7795 in linux_nat_target::wait (this=0x5555704d35e0 <the_amd64_linux_nat_target>, ptid=..., ourstatus=0x7bfff0d6cf18, target_options=...) at /home/simark/src/binutils-gdb/gdb/linux-nat.c:3607
    bminor#6  0x000055556378fad2 in thread_db_target::wait (this=0x55556af42980 <the_thread_db_target>, ptid=..., ourstatus=0x7bfff0d6cf18, options=...) at /home/simark/src/binutils-gdb/gdb/linux-thread-db.c:1398
    bminor#7  0x0000555564811327 in target_wait (ptid=..., status=0x7bfff0d6cf18, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2593

I believe the problem is that lwp_info::syscall_state is never
initialized.  Fix that by initializing it with TARGET_WAITKIND_IGNORE.
This is the value we use elsewhere when resetting this field to mean
"not stopped at a syscall".

Change-Id: I5b76c63d1466d6e63448fced03305fd5ca8294eb
Approved-By: Tom Tromey <tom@tromey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants