Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coff-go32-exe: support variable length stub #1

Closed
wants to merge 14 commits into from
Closed

Conversation

jwt27
Copy link
Owner

@jwt27 jwt27 commented Feb 14, 2020

continuation of jwt27/djgpp-cvs#3

@jwt27
Copy link
Owner Author

jwt27 commented Feb 14, 2020

Most things seem to work now, eg size, objdump, objcopy.
strip segfaults regardless of stub size.

@jwt27
Copy link
Owner Author

jwt27 commented Feb 14, 2020

I think this is the problem:
https://github.com/bminor/binutils-gdb/blob/ba2ddec67a1f51c6578f28e611743a2325f7fd46/bfd/coffcode.h#L4076
I'm adjusting bfd_coff_filhsz based on the stub size read from the MZ header. But that is not the size of external_filehdr_go32_exe, so it ends up being only partially allocated.
It seems everything is geared towards the file header being a fixed size. Ideally bfd should either strip off the stub and read the file as a plain coff image, or regard the MZ header as the real file header and the stub and coff image as separate sections.

…ILHDR structure, instead of the actual file header size from coff_backend_info.
@jwt27
Copy link
Owner Author

jwt27 commented Feb 14, 2020

Previous commit fixes the segfault. strip appears to work now, but it writes garbage to the first 192 bytes of the exe.

@jwt27
Copy link
Owner Author

jwt27 commented Feb 15, 2020

d51ec80 fixes the clobbering issue, it was a use-after-free bug. strip seems to work now. However I'm seeing many failed tests on make check-binutils.

@jwt27
Copy link
Owner Author

jwt27 commented Feb 17, 2020

new attempt --> #2

@jwt27 jwt27 closed this Feb 17, 2020
jwt27 pushed a commit that referenced this pull request Mar 16, 2020
Running anything with the fission.exp board fails since commit
c0ab21c ("Replace init_cutu_and_read_dies with a class").
GDB crashes while reading the DWARF info.  cu is NULL in
read_signatured_type:

    Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
    0x000055555780663e in read_signatured_type
    sig_type=0x6210000c3600) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:22782
    22782         gdb_assert (cu->die_hash == NULL);
    (top-gdb) bt
    #0  0x000055555780663e in read_signatured_type (sig_type=0x6210000c3600) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:22782
    #1  0x00005555578062dd in load_full_type_unit (per_cu=0x6210000c3600) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:22758
    #2  0x00005555577c5fb7 in queue_and_load_dwo_tu (slot=0x60600007fc00, info=0x6210000c34e0) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:12674
    #3  0x0000555559934232 in htab_traverse_noresize (htab=0x60b000063670, callback=0x5555577c5e61 <queue_and_load_dwo_tu(void**, void*)>, info=0x6210000c34e0)
        at /home/simark/src/binutils-gdb/libiberty/hashtab.c:775
    bminor#4  0x00005555577c6252 in queue_and_load_all_dwo_tus (per_cu=0x6210000c34e0) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:12701
    bminor#5  0x000055555777ebd8 in dw2_do_instantiate_symtab (per_cu=0x6210000c34e0, skip_partial=false) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:2371
    bminor#6  0x000055555777eea2 in dw2_instantiate_symtab (per_cu=0x6210000c34e0, skip_partial=false) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:2395
    bminor#7  0x0000555557786ab6 in dw2_lookup_symbol (objfile=0x614000007240, block_index=GLOBAL_BLOCK, name=0x602000025310 "main", domain=VAR_DOMAIN)
        at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:3539

After creating the reader object, the reader.cu field should not be
NULL.  By checking the commit previous to the faulty one mentioned
above, I noticed that the cu field is normally set by
init_cu_die_reader, called from read_cutu_die_from_dwo, itself called
from cutu_reader::init_tu_and_read_dwo_dies, itself called from
cutu_reader's constructor.

However, cutu_reader::init_tu_and_read_dwo_dies calls
read_cutu_die_from_dwo, passing a pointer to a local `die_reader_specs`
variable.  So it's the `cu` field of that object that gets set.
cutu_reader itself is a `die_reader_specs` (it inherits from it), and
the intention was most likely to pass `this` to read_cutu_die_from_dwo.
This way, the fields of the cutu_reader object, which
read_signatured_type will use, are set.

With this, I am able to use:

  make check RUNTESTFLAGS='--target_board=fission'

and it looks much better.  There are still some failures to be
investigated, but that's the usual state of the testsuite.

gdb/ChangeLog:

	* dwarf2/read.c (cutu_reader::init_tu_and_read_dwo_dies): Remove
	reader variable, pass `this` to read_cutu_die_from_dwo.
jwt27 pushed a commit that referenced this pull request Mar 16, 2020
I'm running into the following failure (and 17 more like it) in
gdb.base/break-interp.exp:
...
 (gdb) bt^M
 #0  0x00007fde85a3b0c1 in __GI___nanosleep \
   (requested_time=requested_time@entry=0x7ffe5044ee70, \
   remaining=remaining@entry=0x7ffe5044ee70) at nanosleep.c:27^M
 #1  0x00007fde85a3affa in __sleep (seconds=0) at sleep.c:55^M
 #2  0x00007fde8606789c in libfunc (Reading in symbols for libc-start.c...^M
 action=0x7ffe5044fa12 "sleep") at gdb.base/break-interp-lib.c:41^M
 #3  0x0000000000400708 in main ()^M
 Reading in symbols for ../sysdeps/x86_64/start.S...^M
 (gdb) FAIL: gdb.base/break-interp.exp: LDprelinkNOdebugNO: \
   BINprelinkNOdebugNOpieNO: INNER: attach: attach main bt
...

The problem is that the test uses verbose mode to detect the "PIE (Position
Independent Executable) displacement" messages, but the verbose mode also
triggers "Reading in symbols for" messages, which may appear in the middle of
a backtrace (or not, depending on whether debug info is available).

[ In fact, the messages appear in the middle of a backtrace line, which is
PR25613. ]

Fix these FAILs by limiting the scope of verbose to the parts of the test that
need it.

Tested on x86_64-linux.

gdb/testsuite/ChangeLog:

2020-03-11  Tom de Vries  <tdevries@suse.de>

	* gdb.base/break-interp.exp: Limit verbose scope.
jwt27 pushed a commit that referenced this pull request Mar 16, 2020
A patch somewhat like this patch has been in Fedora GDB for well over
a decade.  The Fedora patch was written by Jan Kratochvil.  The Fedora
version prints a warning and attempts to continue.  This version will
error out, fatally.  An earlier version of this patch was more like
the Fedora version than this one.  Simon Marchi recommended use of an
assertion to test for the infinite recursion; I decided to use an
explicit test (with an "if" statement) along with a call to
internal_error() if the condition is met.  This way, I could include
a plea to file a bug report.

It was motivated by a customer reported bug (back in 2006!) which
showed infinite mutual recursion between find_pc_sect_line and
find_pc_line.  Here is a portion of the backtrace from the bug report:

    (gdb) bt
    #0  0x00000000004450a4 in lookup_minimal_symbol_by_pc_section (
	pc=251700325328, section=0x570f500) at gdb/minsyms.c:484
    #1  0x00000000004bbfb2 in find_pc_sect_line (pc=251700325328,
	section=0x570f500, notcurrent=0) at gdb/symtab.c:2057
    #2  0x00000000004bc480 in find_pc_line (pc=251700325328, notcurrent=0)
	at gdb/symtab.c:2232
    #3  0x00000000004bc1ff in find_pc_sect_line (pc=251700325328,
	section=0x570f500, notcurrent=0) at gdb/symtab.c:2081

    ...   (lots and lots of the same two functions with the same parameters)

    #1070 0x00000000004bc480 in find_pc_line (pc=251700325328, notcurrent=0)
	at gdb/symtab.c:2232
    #1071 0x00000000004bc1ff in find_pc_sect_line (pc=251700325328,
	section=0x570f500, notcurrent=0) at gdb/symtab.c:2081
    #1072 0x00000000004bc480 in find_pc_line (pc=251700325328, notcurrent=0)
	at gdb/symtab.c:2232
    #1073 0x00000000004bc1ff in find_pc_sect_line (pc=251700325328,
	section=0x570f500, notcurrent=0) at gdb/symtab.c:2081
    #1074 0x00000000004bc480 in find_pc_line (pc=251700325328, notcurrent=0)
	at gdb/symtab.c:2232
    #1075 0x00000000004bc1ff in find_pc_sect_line (pc=251696794399,
	section=0x59b0df8, notcurrent=0) at gdb/symtab.c:2081
    #1076 0x00000000004bc480 in find_pc_line (pc=251696794399, notcurrent=0)
	at gdb/symtab.c:2232
    #1077 0x000000000055550e in find_frame_sal (frame=0xb3f3e0, sal=0x7fff1d1a8200)
	at gdb/frame.c:1392
    #1078 0x00000000004d86fd in set_current_sal_from_frame (frame=0x1648, center=1)
	at gdb/stack.c:379
    #1079 0x00000000004cf137 in normal_stop () at gdb/infrun.c:3147
    ...

The test case was a large application.  Attempts were made to make a
small(er) test case, but those attempts were not successful.
Therefore, I cannot provide a new test for this patch.

That said, we ought to guard against recursively calling
find_pc_sect_line (via find_pc_line) with the identical PC value that
it had been called with.  Should this happen, infinite recursion (as
shown in the above backtrace) is the result.  This patch prevents
that from happening.

If this should happens, there is a bug somewhere, perhaps in GDB, perhaps
in some other part of the toolchain or a library.  We error out fatally
with a message briefly describing the condition along with a plea to file
a bug report.

I spent some time looking at the surrounding code and commentary which
handle the case of PC being in a stub/trampoline.  It first appeared
in the public GDB repository in April, 1999.  The ChangeLog entry for
this commit is from 1998-12-31.  The relevant portion is:

	(find_pc_sect_line): Return correct information if pc is in import
	or export stub (trampoline).

What's remarkable about the overall ChangeLog entry is that it's over
2500+ lines long!  I believe that this was part of the infamous "HP
merge" (in which insufficient due diligence was given in accepting
a large batch of changes from an outside source).  In the years that
followed, much of this code was either significantly revised or
outright removed.

For this particular case, I'm grateful that extensive comments were
provided by "RT".  (I haven't been able to figure out who RT is/was.)
I've decided against attempting to revise this stub/trampoline handling
code any further than adding Jan's test which prevents an obvious case
of infinite recursion.

I've tested on Fedora 31, x86-64.  I see no regressions.  I've also
searched the logfile for the new message, but as expected, no message
was found (which is good).

gdb/ChangeLog:

	* symtab.c (find_pc_sect_line): Add check which prevents infinite
	recursion.

Change-Id: I595470be6ab5f61ca7e4e9e70c61a252c0deaeaa
jwt27 pushed a commit that referenced this pull request Mar 25, 2020
The type struct compunit_symtab contains two fields (disregarding field next)
that express relations with other compunit_symtabs: user and includes.

These fields are currently not printed with "maint info symtabs" and
"maint print symbols".

Fix this such that for "maint info symtabs" we print:
...
   { ((struct compunit_symtab *) 0x23e8450)
     debugformat DWARF 2
     producer (null)
     dirname (null)
     blockvector ((struct blockvector *) 0x23e8590)
+    user ((struct compunit_symtab *) 0x2336280)
+    ( includes
+      ((struct compunit_symtab *) 0x23e85e0)
+      ((struct compunit_symtab *) 0x23e8960)
+    )
         { symtab <unknown> ((struct symtab *) 0x23e85b0)
           fullname (null)
           linetable ((struct linetable *) 0x0)
         }
   }
...

And for "maint print symbols" we print:
...
-Symtab for file <unknown>
+Symtab for file <unknown> at 0x23e85b0
 Read from object file /data/gdb_versions/devel/a.out (0x233ccf0)
 Language: c

 Blockvector:

 block #000, object at 0x23e8530, 0 syms/buckets in 0x0..0x0
   block #1, object at 0x23e84d0 under 0x23e8530, 0 syms/buckets in 0x0..0x0

+Compunit user: 0x2336300
+Compunit include: 0x23e8900
+Compunit include: 0x23dd970
...
Note: for user and includes we don't list the actual compunit_symtab address,
but instead the corresponding symtab address, which allows us to find that
symtab elsewhere in the output (given that we also now print the address of
symtabs).

gdb/ChangeLog:

2020-03-25  Tom de Vries  <tdevries@suse.de>

	* symtab.h (is_main_symtab_of_compunit_symtab): New function.
	* symmisc.c (dump_symtab_1): Print user and includes fields.
	(maintenance_info_symtabs): Same.
jwt27 pushed a commit that referenced this pull request Apr 2, 2020
When you have a Thumb only PLT then the address in the GOT for PLT0 needs to
have the Thumb bit set since the instruction used in PLTn to get there is
`ldr.w	pc` which is an inter-working instruction:

the PLT sequence in question is

00000120 <foo@plt>:
 120:	f240 0c98 	movw	ip, #152	; 0x98
 124:	f2c0 0c01 	movt	ip, #1
 128:	44fc      	add	ip, pc
 12a:	f8dc f000 	ldr.w	pc, [ip]
 12e:	e7fc      	b.n	12a <foo@plt+0xa>

Disassembly of section .text:

00000130 <bar>:
 130:	b580      	push	{r7, lr}
 132:	af00      	add	r7, sp, #0
 134:	f7ff fff4 	bl	120 <foo@plt>

and previously the linker would generate

Hex dump of section '.got':
 ...
  0x000101b8 40010100 00000000 00000000 10010000 @...............

Which would make it jump and transition out of thumb mode and crash since you
only have thumb mode on such cores.

Now it correctly generates

Hex dump of section '.got':
 ...
  0x000101b8 40010100 00000000 00000000 11010000 @...............

Thanks to Amol for testing patch and to rgujju for reporting it.

bfd/ChangeLog:

	PR ld/16017
	* elf32-arm.c (elf32_arm_populate_plt_entry): Set LSB of the PLT0
	address in the GOT if in thumb only mode.

ld/ChangeLog:

	PR ld/16017
	* testsuite/ld-arm/arm-elf.exp (thumb-plt-got): New.
	* testsuite/ld-arm/thumb-plt-got.d: New test.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
In PR28004 the following warning / Internal error is reported:
...
$ gdb -q -batch \
    -iex "set sysroot $(pwd -P)/repro" \
    ./repro/gdb \
    ./repro/core \
    -ex bt
  ...
 Program terminated with signal SIGABRT, Aborted.
 #0  0x00007ff8fe8e5d22 in raise () from repro/usr/lib/libc.so.6
 [Current thread is 1 (LWP 1762498)]
 #1  0x00007ff8fe8cf862 in abort () from repro/usr/lib/libc.so.6
 warning: (Internal error: pc 0x7ff8feb2c21d in read in psymtab, \
           but not in symtab.)
 warning: (Internal error: pc 0x7ff8feb2c218 in read in psymtab, \
           but not in symtab.)
  ...
 #2  0x00007ff8feb2c21e in __gnu_debug::_Error_formatter::_M_error() const \
   [clone .cold] (warning: (Internal error: pc 0x7ff8feb2c21d in read in \
   psymtab, but not in symtab.)

) from repro/usr/lib/libstdc++.so.6
...

The warning is about the following:
- in find_pc_sect_compunit_symtab we try to find the address
  (0x7ff8feb2c218 / 0x7ff8feb2c21d) in the symtabs.
- that fails, so we try again in the partial symtabs.
- we find a matching partial symtab
- however, the partial symtab has a full symtab, so
  we should have found a matching symtab in the first step.

The addresses are:
...
(gdb) info sym 0x7ff8feb2c218
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] in \
  section .text of repro/usr/lib/libstdc++.so.6
(gdb) info sym 0x7ff8feb2c21d
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] + 5 in \
  section .text of repro/usr/lib/libstdc++.so.6
...
which correspond to unrelocated addresses 0x9c218 and 0x9c21d:
...
$ nm -C  repro/usr/lib/libstdc++.so.6.0.29 | grep 000000000009c218
000000000009c218 t __gnu_debug::_Error_formatter::_M_error() const \
  [clone .cold]
...
which belong to function __gnu_debug::_Error_formatter::_M_error() in
/build/gcc/src/gcc/libstdc++-v3/src/c++11/debug.cc.

The partial symtab that is found for the addresses is instead the one for
/build/gcc/src/gcc/libstdc++-v3/src/c++98/bitmap_allocator.cc, which is
incorrect.

This happens as follows.

The bitmap_allocator.cc CU has DW_AT_ranges at .debug_rnglist offset 0x4b50:
...
    00004b50 0000000000000000 0000000000000056
    00004b5a 00000000000a4790 00000000000a479c
    00004b64 00000000000a47a0 00000000000a47ac
...

When reading the first range 0x0..0x56, it doesn't trigger the "start address
of zero" complaint here:
...
      /* A not-uncommon case of bad debug info.
         Don't pollute the addrmap with bad data.  */
      if (range_beginning + baseaddr == 0
          && !per_objfile->per_bfd->has_section_at_zero)
        {
          complaint (_(".debug_rnglists entry has start address of zero"
                       " [in module %s]"), objfile_name (objfile));
          continue;
        }
...
because baseaddr != 0, which seems incorrect given that when loading the
shared library individually in gdb (and consequently baseaddr == 0), we do see
the complaint.

Consequently, we run into this case in dwarf2_get_pc_bounds:
...
  if (low == 0 && !per_objfile->per_bfd->has_section_at_zero)
    return PC_BOUNDS_INVALID;
...
which then results in this code in process_psymtab_comp_unit_reader being
called with cu_bounds_kind == PC_BOUNDS_INVALID, which sets the set_addrmap
argument to 1:
...
      scan_partial_symbols (first_die, &lowpc, &highpc,
                            cu_bounds_kind <= PC_BOUNDS_INVALID, cu);
...
and consequently, the CU addrmap gets build using address info from the
functions.

During that process, addrmap_set_empty is called with a range that includes
0x9c218 and 0x9c21d:
...
(gdb) p /x start
$7 = 0x9989c
(gdb) p /x end_inclusive
$8 = 0xb200d
...
but it's called for a function at DIE 0x54153 with DW_AT_ranges at 0x40ae:
...
    000040ae 00000000000b1ee0 00000000000b200e
    000040b9 000000000009989c 00000000000998c4
    000040c3 <End of list>
...
and neither range includes 0x9c218 and 0x9c21d.

This is caused by this code in partial_die_info::read:
...
            if (dwarf2_ranges_read (ranges_offset, &lowpc, &highpc, cu,
                                    nullptr, tag))
             has_pc_info = 1;
...
which pretends that the function is located at addresses 0x9989c..0xb200d,
which is indeed not the case.

This patch fixes the first problem encountered: fix the "start address of
zero" complaint warning by removing the baseaddr part from the condition.
Same for dwarf2_ranges_process.

The effect is that:
- the complaint is triggered, and
- the warning / Internal error is no longer triggered.

This does not fix the observed problem in partial_die_info::read, which is
filed as PR28200.

Tested on x86_64-linux.

Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca>

gdb/ChangeLog:

2021-07-29  Simon Marchi  <simon.marchi@polymtl.ca>
	    Tom de Vries  <tdevries@suse.de>

	PR symtab/28004
	* gdb/dwarf2/read.c (dwarf2_rnglists_process, dwarf2_ranges_process):
	Fix zero address complaint.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range-shlib.c: New test.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range.c: New test.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range.exp: New file.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
While working on the testsuite, I ended up noticing that GDB fails to
produce a full backtrace from a thread waiting in pthread_join.  When
selecting the waiting thread and using the 'bt' command, the following
result can be observed:

	(gdb) bt
	#0  0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0
	#1  0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0
	Backtrace stopped: frame did not save the PC

On my platform, I do not have debug symbols for glibc, so I need to rely
on prologue analysis in order to unwind stack.

Here is what the function prologue looks like:

	(gdb) disassemble __pthread_clockjoin_ex
	Dump of assembler code for function __pthread_clockjoin_ex:
	   0x0000003ff7fc42de <+0>:     addi    sp,sp,-144
	   0x0000003ff7fc42e0 <+2>:     sd      s5,88(sp)
	   0x0000003ff7fc42e2 <+4>:     auipc   s5,0xd
	   0x0000003ff7fc42e6 <+8>:     ld      s5,-2(s5) # 0x3ff7fd12e0
	   0x0000003ff7fc42ea <+12>:    ld      a5,0(s5)
	   0x0000003ff7fc42ee <+16>:    sd      ra,136(sp)
	   0x0000003ff7fc42f0 <+18>:    sd      s0,128(sp)
	   0x0000003ff7fc42f2 <+20>:    sd      s1,120(sp)
	   0x0000003ff7fc42f4 <+22>:    sd      s2,112(sp)
	   0x0000003ff7fc42f6 <+24>:    sd      s3,104(sp)
	   0x0000003ff7fc42f8 <+26>:    sd      s4,96(sp)
	   0x0000003ff7fc42fa <+28>:    sd      s6,80(sp)
	   0x0000003ff7fc42fc <+30>:    sd      s7,72(sp)
	   0x0000003ff7fc42fe <+32>:    sd      s8,64(sp)
	   0x0000003ff7fc4300 <+34>:    sd      s9,56(sp)
	   0x0000003ff7fc4302 <+36>:    sd      a5,40(sp)

As far as prologue analysis is concerned, the most interesting part is
done at address 0x0000003ff7fc42ee (<+16>): 'sd ra,136(sp)'. This stores
the RA (return address) register on the stack, which is the information
we are looking for in order to identify the caller.

In the current implementation of the prologue scanner, GDB stops when
hitting 0x0000003ff7fc42e6 (<+8>) because it does not know what to do
with the 'ld' instruction.  GDB thinks it reached the end of the
prologue but have not yet reached the important part, which explain
GDB's inability to unwind past this point.

The section of the prologue starting at <+4> until <+12> is used to load
the stack canary[1], which will then be placed on the stack at <+36> at
the end of the prologue.

In order to have the prologue properly handled, this commit proposes to
add support for the ld instruction in the RISC-V prologue scanner.
I guess riscv32 would use lw in such situation so this patch also adds
support for this instruction.

With this patch applied, gdb is now able to unwind past pthread_join:

	(gdb) bt
	#0  0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0
	#1  0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0
	#2  0x0000002aaaaaa88e in bar() ()
	#3  0x0000002aaaaaa8c4 in foo() ()
	bminor#4  0x0000002aaaaaa8da in main ()

I have had a look to see if I could reproduce this easily, but in my
simple testcases using '-fstack-protector-all', the canary is loaded
after the RA register is saved.  I do not have a reliable way of
generating a prologue similar to the problematic one so I forged one
instead.

The testsuite have been run on riscv64 ubuntu 21.01 with no regression
observed.

[1] https://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
The original reproducer for PR28030 required use of a specific
compiler version - gcc-c++-11.1.1-3.fc34 is mentioned in the PR,
though it seems probable that other gcc versions might also be able to
reproduce the bug as well.  This commit introduces a test case which,
using the DWARF assembler, provides a reproducer which is independent
of the compiler version.  (Well, it'll work with whatever compilers
the DWARF assembler works with.)

To the best of my knowledge, it's also the first test case which uses
the DWARF assembler to provide debug info for a shared object.  That
being the case, I provided more than the usual commentary which should
allow this case to be used as a template when a combo shared
library / DWARF assembler test case is required in the future.

I provide some details regarding the bug in a comment near the
beginning of locexpr-dml.exp.

This problem was difficult to reproduce; I found myself constantly
referring to the backtrace while trying to figure out what (else) I
might be missing while trying to create a reproducer.  Below is a
partial backtrace which I include for posterity.

 #0  internal_error (
    file=0xc50110 "/ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c", line=5575,
    fmt=0xc520c0 "Unexpected type field location kind: %d")
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdbsupport/errors.cc:51
 #1  0x00000000006ef0c5 in copy_type_recursive (objfile=0x1635930,
    type=0x274c260, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5575
 #2  0x00000000006ef382 in copy_type_recursive (objfile=0x1635930,
    type=0x274ca10, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5602
 #3  0x0000000000a7409a in preserve_one_value (value=0x24269f0,
    objfile=0x1635930, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2529
 bminor#4  0x000000000072012a in gdbscm_preserve_values (
    extlang=0xc55720 <extension_language_guile>, objfile=0x1635930,
    copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/guile/scm-value.c:94
 bminor#5  0x00000000006a3f82 in preserve_ext_lang_values (objfile=0x1635930,
    copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/extension.c:568
 bminor#6  0x0000000000a7428d in preserve_values (objfile=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2579
 bminor#7  0x000000000082d514 in objfile::~objfile (this=0x1635930,
    __in_chrg=<optimized out>)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:549
 bminor#8  0x0000000000831cc8 in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x1654580)
    at /usr/include/c++/11/bits/shared_ptr_base.h:348
 bminor#9  0x00000000004e6617 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1654580) at /usr/include/c++/11/bits/shared_ptr_base.h:168
 bminor#10 0x00000000004e1d2f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x190bb88, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:705
 bminor#11 0x000000000082feee in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x190bb80, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1154
 bminor#12 0x000000000082ff0a in std::shared_ptr<objfile>::~shared_ptr (
    this=0x190bb80, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr.h:122
 #13 0x000000000085ed7e in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x114bc00,
    __p=0x190bb80) at /usr/include/c++/11/ext/new_allocator.h:168
 #14 0x000000000085e88d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=...,
    __p=0x190bb80) at /usr/include/c++/11/bits/alloc_traits.h:531
 #15 0x000000000085e50c in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x114bc00, __position=
  std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930})
    at /usr/include/c++/11/bits/stl_list.h:1925
 #16 0x000000000085df0e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x114bc00, __position=
  std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930})
    at /usr/include/c++/11/bits/list.tcc:158
 #17 0x000000000085c748 in program_space::remove_objfile (this=0x114bbc0,
    objfile=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/progspace.c:210
 #18 0x000000000082d3ae in objfile::unlink (this=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:487
 #19 0x000000000082e68c in objfile_purge_solibs ()
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:875
 #20 0x000000000092dd37 in no_shared_libraries (ignored=0x0, from_tty=1)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/solib.c:1236
 #21 0x00000000009a37fe in target_pre_inferior (from_tty=1)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/target.c:2496
 #22 0x00000000007454d6 in run_command_1 (args=0x0, from_tty=1,
    run_how=RUN_NORMAL)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/infcmd.c:437

I'll note a few points regarding this backtrace:

Frame #1 is where the internal error occurs.  It's caused by an
unhandled case for FIELD_LOC_KIND_DWARF_BLOCK.  The fix for this bug
adds support for this case.

Frame #22 - it's a partial backtrace - shows that GDB is attempting to
(re)run the program.  You can see the exact command sequence that was
used for reproducing this problem in the PR (at
https://sourceware.org/bugzilla/show_bug.cgi?id=28030), but in a
nutshell, after starting the program and advancing to the appropriate
source line, GDB was asked to step into libstdc++; a "finish" command
was issued, returning a value.  The fact that a value was returned is
very important.  GDB was then used to step back into libstdc++.  A
breakpoint was set on a source line in the library after which a "run"
command was issued.

Frame #19 shows a call to objfile_purge_solibs.  It's aptly named.

Frame bminor#7 is a call to the destructor for one of the objfile solibs; it
turned out to be the one for libstdc++.

Frames bminor#6 thru #3 show various value preservation frames.  If you look
at preserve_values() in gdb/value.c, the value history is preserved
first, followed by internal variables, followed by values for the
extension languages (python and guile).
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
…es.exp

When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have:
...
 (gdb) bt^M
 #0  0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #1  0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #2  0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #3  0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 bminor#4  0x0000000000000001 in ?? ()^M
 bminor#5  0x00007fffffffdaac in ?? ()^M
 bminor#6  0x0000000000000000 in ?? ()^M
 (gdb) FAIL: gdb.base/break-probes.exp: ensure using probes
...

The test-case intends to emit an UNTESTED in this case, but fails to do so
because it tries to do it in a regexp clause in a gdb_test_multiple, which
doesn't trigger.  Instead, a default clause triggers which produces the FAIL.

Also the use of UNTESTED is not appropriate, and we should use UNSUPPORTED
instead.

Fix this by silencing the FAIL, and emitting an UNSUPPORTED after the
gdb_test_multiple:
...
 if { ! $using_probes } {
+    unsupported "probes not present on this system"
     return -1
 }
...

Tested on x86_64-linux.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have:
...
 (gdb) run^M
 Starting program: break-probes^M
 Stopped due to shared library event (no libraries added or removed)^M
 (gdb) bt^M
 #0  0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #1  0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #2  0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 #3  0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 bminor#4  0x0000000000000001 in ?? ()^M
 bminor#5  0x00007fffffffdaac in ?? ()^M
 bminor#6  0x0000000000000000 in ?? ()^M
 (gdb) UNSUPPORTED: gdb.base/break-probes.exp: probes not present on this system
...

Using the backtrace, the test-case tries to establish that we're stopped in
dl_main, which is used as proof that we're using probes.

However, the backtrace only shows an address, because:
- the dynamic linker contains no minimal symbols and no debug info, and
- gdb is build without --with-separate-debug-dir so it can't find the
  corresponding .debug file, which does contain the mimimal symbols and
  debug info.

Fix this by instead printing the pc and grepping for the value in the
info probes output:
...
(gdb) p /x $pc^M
$1 = 0x7ffff7dd6e12^M
(gdb) info probes^M
Type Provider Name           Where              Object                      ^M
  ...
stap rtld     init_start     0x00007ffff7dd6e12 /lib64/ld-linux-x86-64.so.2 ^M
  ...
(gdb)
...

Tested on x86_64-linux.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
When running test-case gdb.base/break-interp.exp on ubuntu 18.04.5, we have:
...
 (gdb) bt^M
 #0  0x00007eff7ad5ae12 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #1  0x00007eff7ad71f50 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #2  0x00007eff7ad59128 in ?? () from break-interp-LDprelinkNOdebugNO^M
 #3  0x00007eff7ad58098 in ?? () from break-interp-LDprelinkNOdebugNO^M
 bminor#4  0x0000000000000002 in ?? ()^M
 bminor#5  0x00007fff505d7a32 in ?? ()^M
 bminor#6  0x00007fff505d7a94 in ?? ()^M
 bminor#7  0x0000000000000000 in ?? ()^M
 (gdb) FAIL: gdb.base/break-interp.exp: ldprelink=NO: ldsepdebug=NO: \
         first backtrace: dl bt
...

Using the backtrace, the test-case tries to establish that we're stopped in
dl_main.

However, the backtrace only shows an address, because:
- the dynamic linker contains no minimal symbols and no debug info, and
- gdb is build without --with-separate-debug-dir so it can't find the
  corresponding .debug file, which does contain the mimimal symbols and
  debug info.

As in "[gdb/testsuite] Improve probe detection in gdb.base/break-probes.exp",
fix this by doing info probes and grepping for the address.

Tested on x86_64-linux.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
I build gdb without xml support using --without-expat, and ran into:
...
(gdb) target remote | vgdb --wait=2 --max-invoke-ms=2500 --pid=22032^M
Remote debugging using | vgdb --wait=2 --max-invoke-ms=2500 --pid=22032^M
relaying data between gdb and process 22032^M
warning: Can not parse XML target description; XML support was disabled at \
  compile time^M
  ...
(gdb) PASS: gdb.base/valgrind-infcall.exp: continue #1
p gdb_test_infcall ()^M
Remote 'g' packet reply is too long (expected 560 bytes, got 800 bytes): ...^M
(gdb) FAIL: gdb.base/valgrind-infcall.exp: p gdb_test_infcall ()
...

After googling the error message with context valgrind gdbserver, I found
indications that the Remote 'g' packet reply error is due to missing xml
support.

And here ( https://www.valgrind.org/docs/manual/manual-core-adv.html ) I
found:
...
GDB version needed for ARM and PPC32/64.

You must use a GDB version which is able to read XML target description sent
by a gdbserver.  This is the standard setup if GDB was configured and built
with the "expat" library.  If your GDB was not configured with XML support, it
will report an error message when using the "target" command.  Debugging will
not work because GDB will then not be able to fetch the registers from the
Valgrind gdbserver.
...

So I guess I'm running into the same problem for x86_64.

Fix this by skipping all gdb.base/valgrind-*.exp tests if xml support is not
available.  Although only the gdb.base/valgrind-infcall*.exp produce fails,
the Remote 'g' packet reply error occurs in all tests, so it seems prudent to
disable them all.

Tested on x86_64-linux.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
The gdb.multi/multi-term-settings.exp testcase sometimes fails like so:

 Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.multi/multi-term-settings.exp ...
 FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT)

It's easier to reproduce if you stress the machine at the same time, like e.g.:

  $ stress -c 24

Looking at gdb.log, we see:

 (gdb) attach 60422
 Attaching to program: build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings, process 60422
 [New Thread 60422.60422]
 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
 Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
 Reading symbols from /lib64/ld-linux-x86-64.so.2...
 (No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
 0x00007f2fc2485334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id@entry <mailto:clock_id@entry>=0, flags=flags@entry <mailto:flags@entry>=0, req=req@entry <mailto:req@entry>=0x7ffe23126940, rem=rem@entry <mailto:rem@entry>=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
 78	../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: inf2: attach
 set schedule-multiple on
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: set schedule-multiple on
 info inferiors
   Num  Description       Connection                         Executable
   1    process 60404     1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings
 * 2    process 60422     1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: info inferiors
 pid=60422, count=46
 pid=60422, count=47
 pid=60422, count=48
 pid=60422, count=49
 pid=60422, count=50
 pid=60422, count=51
 pid=60422, count=52
 pid=60422, count=53
 pid=60422, count=54
 pid=60422, count=55
 pid=60422, count=56
 pid=60422, count=57
 pid=60422, count=58
 pid=60422, count=59
 pid=60422, count=60
 pid=60422, count=61
 pid=60422, count=62
 pid=60422, count=63
 pid=60422, count=64
 pid=60422, count=65
 pid=60422, count=66
 pid=60422, count=67
 pid=60422, count=68
 pid=60422, count=69
 pid=60404, count=54
 pid=60404, count=55
 pid=60404, count=56
 pid=60404, count=57
 pid=60404, count=58
 PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: continue
 Quit
 (gdb) FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT)

If you look at the testcase's sources, you'll see that the intention
is to resumes the program with "continue", wait to see a few of those
"pid=..., count=..." lines, and then interrupt the program with
Ctrl-C.  But somehow, that resulted in GDB printing "Quit", instead of
the Ctrl-C stopping the program with SIGINT.

Here's what is happening:

 #1 - those "pid=..., count=..." lines we see above weren't actually
      output by the inferior after it has been continued (see #1).
      Note that "inf1_how" and "inf2_how" are "attach".  What happened
      is that those "pid=..., count=..." lines were output by the
      inferiors _before_ they were attached to.  We see them at that
      point instead of earlier, because that's where the testcase
      reads from the inferiors' spawn_ids.

 #2 - The testcase mistakenly thinks those "pid=..., count=..." lines
      happened after the continue was processed by GDB, meaning it has
      waited enough, and so sends the Ctrl-C.  GDB hasn't yet passed
      the terminal to the inferior, so the Ctrl-C results in that
      Quit.

The fix here is twofold:

 #1 - flush inferior output right after attaching

 #2 - consume the "Continuing" printed by "continue", indicating the
      inferior has the terminal.  This is the same as done throughout
      the testsuite to handle this exact problem of sending Ctrl-C too
      soon.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <pedro@palves.net <mailto:pedro@palves.net>>

	* gdb.multi/multi-term-settings.exp (create_inferior): Flush
	inferior output.
	(coretest): Use $gdb_test_name.  After issuing "continue", wait
	for "Continuing".

Change-Id: Iba7671dfe1eee6b98d29cfdb05a1b9aa2f9defb9
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
On openSUSE Tumbleweed with glibc-debuginfo installed I get:
...
 (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
 where^M
 #0  print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
 #1  0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
 #2  0x00007ffff7d56b37 in start_thread (arg=<optimized out>) \
                          at pthread_create.c:435^M
 #3  0x00007ffff7ddb640 in clone3 () \
                          at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81^M
 (gdb) PASS: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...
while without debuginfo installed I get instead:
...
 (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
 where^M
 #0  print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
 #1  0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
 #2  0x00007ffff7d56b37 in start_thread () from /lib64/libc.so.6^M
 #3  0x00007ffff7ddb640 in clone3 () from /lib64/libc.so.6^M
 (gdb) FAIL: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...

The problem is that the regexp used:
...
  "\(from .*libpthread\|at pthread_create\|in pthread_create\)"
...
expects the 'from' part to match libpthread, but in glibc 2.34 libpthread has
been merged into libc.

Fix this by updating the regexp.

Tested on x86_64-linux.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
Currently for a binary compiled normally (without -fsanitize=address) but with
LD_PRELOAD of ASAN one gets:

$ ASAN_OPTIONS=detect_leaks=0:alloc_dealloc_mismatch=1:abort_on_error=1:fast_unwind_on_malloc=0 LD_PRELOAD=/usr/lib64/libasan.so.6 gdb
=================================================================
==1909567==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete []) on 0x602000001570
    #0 0x7f1c98e5efa7 in operator delete[](void*) (/usr/lib64/libasan.so.6+0xb0fa7)
...
0x602000001570 is located 0 bytes inside of 2-byte region [0x602000001570,0x602000001572)
allocated by thread T0 here:
    #0 0x7f1c98e5cd1f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xaed1f)
    #1 0x557ee4a42e81 in operator new(unsigned long) (/usr/libexec/gdb+0x74ce81)
SUMMARY: AddressSanitizer: alloc-dealloc-mismatch (/usr/lib64/libasan.so.6+0xb0fa7) in operator delete[](void*)
==1909567==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0
==1909567==ABORTING

Despite the code called properly operator new[] and operator delete[].
But GDB's new-op.cc provides its own operator new[] which gets translated into
malloc() (which gets recogized as operatore new(size_t)) but as it does not
translate also operators delete[] Address Sanitizer gets confused.

The question is how many variants of the delete operator need to be provided.
There could be 14 operators new but there are only 4, GDB uses 3 of them.
There could be 16 operators delete but there are only 6, GDB uses 2 of them.
It depends on libraries and compiler which of the operators will get used.
Currently being used:
                 U operator new[](unsigned long)
                 U operator new(unsigned long)
                 U operator new(unsigned long, std::nothrow_t const&)
                 U operator delete[](void*)
                 U operator delete(void*, unsigned long)

Tested on x86_64-linux.
jwt27 pushed a commit that referenced this pull request Nov 25, 2021
This commit fixes Bug 28308, titled "Strange interactions with
dprintf and break/commands":

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28308

Since creating that bug report, I've found a somewhat simpler way of
reproducing the problem.  I've encapsulated it into the GDB test case
which I've created along with this bug fix.  The name of the new test
is gdb.base/dprintf-execution-x-script.exp, I'll demonstrate the
problem using this test case, though for brevity, I've placed all
relevant files in the same directory and have renamed the files to all
start with 'dp-bug' instead of 'dprintf-execution-x-script'.

The script file, named dp-bug.gdb, consists of the following commands:

dprintf increment, "dprintf in increment(), vi=%d\n", vi
break inc_vi
commands
  continue
end
run

Note that the final command in this script is 'run'.  When 'run' is
instead issued interactively, the  bug does not occur.  So, let's look
at the interactive case first in order to see the correct/expected
output:

$ gdb -q -x dp-bug.gdb dp-bug
... eliding buggy output which I'll discuss later ...
(gdb) run
Starting program: /mesquite2/sourceware-git/f34-master/bld/gdb/tmp/dp-bug
vi=0
dprintf in increment(), vi=0

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=1
dprintf in increment(), vi=1

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=2
dprintf in increment(), vi=2

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=3
[Inferior 1 (process 1539210) exited normally]

In this run, in which 'run' was issued from the gdb prompt (instead
of at the end of the script), there are three dprintf messages along
with three 'Breakpoint 2' messages.  This is the correct output.

Now let's look at the output that I snipped above; this is the output
when 'run' is issued from the script loaded via GDB's -x switch:

$ gdb -q -x dp-bug.gdb dp-bug
Reading symbols from dp-bug...
Dprintf 1 at 0x40116e: file dprintf-execution-x-script.c, line 38.
Breakpoint 2 at 0x40113a: file dprintf-execution-x-script.c, line 26.
vi=0
dprintf in increment(), vi=0

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	dprintf-execution-x-script.c: No such file or directory.
vi=1

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=2

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=3
[Inferior 1 (process 1539175) exited normally]

In the output shown above, only the first dprintf message is printed.
The 2nd and 3rd dprintf messages are missing!  However, all three
'Breakpoint 2...' messages are still printed.

Why does this happen?

bpstat_do_actions_1() in gdb/breakpoint.c contains the following
comment and code near the start of the function:

  /* Avoid endless recursion if a `source' command is contained
     in bs->commands.  */
  if (executing_breakpoint_commands)
    return 0;

  scoped_restore save_executing
    = make_scoped_restore (&executing_breakpoint_commands, 1);

Also, as described by this comment prior to the 'async' field
in 'struct ui' in top.h, the main UI starts off in sync mode
when processing command line arguments:

  /* True if the UI is in async mode, false if in sync mode.  If in
     sync mode, a synchronous execution command (e.g, "next") does not
     return until the command is finished.  If in async mode, then
     running a synchronous command returns right after resuming the
     target.  Waiting for the command's completion is later done on
     the top event loop.  For the main UI, this starts out disabled,
     until all the explicit command line arguments (e.g., `gdb -ex
     "start" -ex "next"') are processed.  */

This combination of things, the state of the static global
'executing_breakpoint_commands' plus the state of the async
field in the main UI causes this behavior.

This is a backtrace after hitting the dprintf breakpoint for
the second time when doing 'run' from the script file, i.e.
non-interactively:

Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffc2b8)
    at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431
4431	  if (executing_breakpoint_commands)

 #0  bpstat_do_actions_1 (bsp=0x7fffffffc2b8)
     at gdb/breakpoint.c:4431
 #1  0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x1538090)
     at gdb/breakpoint.c:13048
 #2  0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x137f450, ws=0x7fffffffc718,
     stop_chain=0x1538090) at gdb/breakpoint.c:5498
 #3  0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffc6f0)
     at gdb/infrun.c:6172
 bminor#4  0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffc6f0)
     at gdb/infrun.c:5662
 bminor#5  0x0000000000763cd5 in fetch_inferior_event ()
     at gdb/infrun.c:4060
 bminor#6  0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT)
     at gdb/inf-loop.c:41
 bminor#7  0x00000000007a702f in handle_target_event (error=0, client_data=0x0)
     at gdb/linux-nat.c:4207
 bminor#8  0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0)
     at gdbsupport/event-loop.cc:701
 bminor#9  0x0000000000b8d032 in gdb_wait_for_event (block=0)
     at gdbsupport/event-loop.cc:597
 bminor#10 gdb_do_one_event () at gdbsupport/event-loop.cc:212
 bminor#11 0x00000000009d19b6 in wait_sync_command_done ()
     at gdb/top.c:528
 bminor#12 0x00000000009d1a3f in maybe_wait_sync_command_done (was_sync=0)
     at gdb/top.c:545
 #13 0x00000000009d2033 in execute_command (p=0x7fffffffcb18 "", from_tty=0)
     at gdb/top.c:676
 #14 0x0000000000560d5b in execute_control_command_1 (cmd=0x13b9bb0, from_tty=0)
     at gdb/cli/cli-script.c:547
 #15 0x000000000056134a in execute_control_command (cmd=0x13b9bb0, from_tty=0)
     at gdb/cli/cli-script.c:717
 #16 0x00000000004c3bbe in bpstat_do_actions_1 (bsp=0x137f530)
     at gdb/breakpoint.c:4469
 #17 0x00000000004c3d40 in bpstat_do_actions ()
     at gdb/breakpoint.c:4533
 #18 0x00000000006a473a in command_handler (command=0x1399ad0 "run")
     at gdb/event-top.c:624
 #19 0x00000000009d182e in read_command_file (stream=0x113e540)
     at gdb/top.c:443
 #20 0x0000000000563697 in script_from_file (stream=0x113e540, file=0x13bb0b0 "dp-bug.gdb")
     at gdb/cli/cli-script.c:1642
 #21 0x00000000006abd63 in source_gdb_script (extlang=0xc44e80 <extension_language_gdb>, stream=0x113e540,
     file=0x13bb0b0 "dp-bug.gdb") at gdb/extension.c:188
 #22 0x0000000000544400 in source_script_from_stream (stream=0x113e540, file=0x7fffffffd91a "dp-bug.gdb",
     file_to_open=0x13bb0b0 "dp-bug.gdb")
     at gdb/cli/cli-cmds.c:692
 #23 0x0000000000544557 in source_script_with_search (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1, search_path=0)
     at gdb/cli/cli-cmds.c:750
 #24 0x00000000005445cf in source_script (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1)
     at gdb/cli/cli-cmds.c:759
 #25 0x00000000007cf6d9 in catch_command_errors (command=0x5445aa <source_script(char const*, int)>,
     arg=0x7fffffffd91a "dp-bug.gdb", from_tty=1, do_bp_actions=false)
     at gdb/main.c:523
 #26 0x00000000007cf85d in execute_cmdargs (cmdarg_vec=0x7fffffffd1b0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
     ret=0x7fffffffd18c) at gdb/main.c:615
 #27 0x00000000007d0c8e in captured_main_1 (context=0x7fffffffd3f0)
     at gdb/main.c:1322
 #28 0x00000000007d0eba in captured_main (data=0x7fffffffd3f0)
     at gdb/main.c:1343
 #29 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0)
     at gdb/main.c:1368
 #30 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508)
     at gdb/gdb.c:32

There are two frames for bpstat_do_actions_1(), one at frame #16 and
the other at frame #0.  The one at frame #16 is processing the actions
for Breakpoint 2, which is a 'continue'.  The one at frame #0 is attempting
to process the dprintf breakpoint action.  However, at this point,
the value of 'executing_breakpoint_commands' is 1, forcing an early
return, i.e. prior to executing the command(s) associated with the dprintf
breakpoint.

For the sake of comparison, this is what the stack looks like when hitting
the dprintf breakpoint for the second time when issuing the 'run'
command from the GDB prompt.

Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffccd8)
    at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431
4431	  if (executing_breakpoint_commands)

 #0  bpstat_do_actions_1 (bsp=0x7fffffffccd8)
     at gdb/breakpoint.c:4431
 #1  0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x16b0290)
     at gdb/breakpoint.c:13048
 #2  0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x13f0e60, ws=0x7fffffffd138,
     stop_chain=0x16b0290) at gdb/breakpoint.c:5498
 #3  0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffd110)
     at gdb/infrun.c:6172
 bminor#4  0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffd110)
     at gdb/infrun.c:5662
 bminor#5  0x0000000000763cd5 in fetch_inferior_event ()
     at gdb/infrun.c:4060
 bminor#6  0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT)
     at gdb/inf-loop.c:41
 bminor#7  0x00000000007a702f in handle_target_event (error=0, client_data=0x0)
     at gdb/linux-nat.c:4207
 bminor#8  0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0)
     at gdbsupport/event-loop.cc:701
 bminor#9  0x0000000000b8d032 in gdb_wait_for_event (block=0)
     at gdbsupport/event-loop.cc:597
 bminor#10 gdb_do_one_event () at gdbsupport/event-loop.cc:212
 bminor#11 0x00000000007cf512 in start_event_loop ()
     at gdb/main.c:421
 bminor#12 0x00000000007cf631 in captured_command_loop ()
     at gdb/main.c:481
 #13 0x00000000007d0ebf in captured_main (data=0x7fffffffd3f0)
     at gdb/main.c:1353
 #14 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0)
     at gdb/main.c:1368
 #15 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508)
     at gdb/gdb.c:32

This relatively short backtrace is due to the current UI's async field
being set to 1.

Yet another thing to be aware of regarding this problem is the
difference in the way that commands associated to dprintf breakpoints
versus regular breakpoints are handled.  While they both use a command
list associated with the breakpoint, regular breakpoints will place
the commands to be run on the bpstat chain constructed in
bp_stop_status().  These commands are run later on.  For dprintf
breakpoints, commands are run via the 'after_condition_true' function
pointer directly from bpstat_stop_status().  (The 'commands' field in
the bpstat is cleared in dprintf_after_condition_true().  This
prevents the dprintf commands from being run again later on when other
commands on the bpstat chain are processed.)

Another thing that I noticed is that dprintf breakpoints are the only
type of breakpoint which use 'after_condition_true'.  This suggests
that one possible way of fixing this problem, that of making dprintf
breakpoints work more like regular breakpoints, probably won't work.
(I must admit, however, that my understanding of this code isn't
complete enough to say why.  I'll trust that whoever implemented it
had a good reason for doing it this way.)

The comment referenced earlier regarding 'executing_breakpoint_commands'
states that the reason for checking this variable is to avoid
potential endless recursion when a 'source' command appears in
bs->commands.  We know that a dprintf command is constrained to either
1) execution of a GDB printf command, 2) an inferior function call of
a printf-like function, or 3) execution of an agent-printf command.
Therefore, infinite recursion due to a 'source' command cannot happen
when executing commands upon hitting a dprintf breakpoint.

I chose to fix this problem by having dprintf_after_condition_true()
directly call execute_control_commands().  This means that it no
longer attempts to go through bpstat_do_actions_1() avoiding the
infinite recursion check for potential 'source' commands on the
command chain.  I think it simplifies this code a little bit too, a
definite bonus.

Summary:

	* breakpoint.c (dprintf_after_condition_true): Don't call
	bpstat_do_actions_1().  Call execute_control_commands()
	instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant