Make GDB compile in native mode #1

palmer-dabbelt · 2015-09-04T01:21:50Z

It looks like the merge of the native GDB patches didn't go correctly and ended up producing something that doesn't even build. This PR submits some patches that make the RISC-V native GDB port build. I haven't tested anything (and the last patch is a mess), but I thought I'd get this out there just in case anyone was working on it because I might not have time to finish this for a bit.

At least it can't be worse than code that doesn't build... :)

These are the canonical RISC-V tuples, despite the fact that our Linux port doesn't actually report those currently. This allows GDB to build on these sorts of platforms.

This is needed in riscv-linux-tdep.c.

I'm not actually sure if the problem here is in the loop or the array, but GDB threw a warning here.

Without these GDB won't link.

I can't figure out how to do this correctly, but I kind of need GDB so for now I'm just going to disable rv32 support here. The correct fix has to be super easy...

Make GDB compile in native mode

vapier · 2015-09-04T01:28:35Z

looks good, thanks!

The main motivation for this is making non-stop / all-stop behave similarly on initial connection, in order to move in the direction of reimplementing all-stop mode with the remote target always running in non-stop mode. When we connect to a remote target in non-stop mode, we may find threads either running or already stopped. The act of connecting itself does not force threads to stop. To handle that, the remote non-stop connection is currently roughly like this: riscvarchive#1 - Fetch list of remote threads (qXfer:threads:read, qfThreadInfo, etc). All threads are assumed to be running until the target reports an asynchronous stop reply for them. riscvarchive#2 - Fetch the initial set of threads that were already stopped, with the '?' packet. (In non-stop, this is coupled with the vStopped mechanism to be able to retrieve the status of more than one thread.) The stop replies fetched in riscvarchive#2 are placed in the pending stop reply queue, and left for the regular event loop to process. That is, "target remote" finishes and returns _before_ those stops are processed. That means that it's possible to have GDB process further commands before the initial set of stopped threads is reported to the user. E.g., before the patch, note how the prompt is printed before the frame: Remote debugging using :9999 (gdb) [Thread 15296] riscvarchive#1 stopped. 0x0000003615a011f0 in ?? () Even though thread riscvarchive#1 was not running, for a moment, the user can see it as such: $ gdb a.out -ex "set non-stop 1" -ex "tar rem :9999" -ex "info threads" -ex "info registers" Remote debugging using :9999 Id Target Id Frame * 1 Thread 4772 (running) Target is executing. <<<<<<< info registers (gdb) [Thread 4772] riscvarchive#1 stopped. 0x0000003615a011f0 in ?? () To fix that, this commit makes gdb process all threads found already stopped at connection time, before giving the prompt to the user. The fix takes a cue from fork-child.c:startup_inferior [1], and processes the events locally in remote.c, avoiding the whole wait_for_inferior/handle_inferior_event path. I decided to try this approach after noticing that: - several cases in handle_inferior_event miss checking stop_soon. - we don't want to fetch the thread list in normal_stop. and trying to fix them was resulting in sprinkling stop_soon checks in many places, and uglifying normal_stop even more. While with this patch, I'm avoiding changing GDB's output other than when the prompt is printed, I think this approach is more flexible if we do want to change it. And also, it's likely easier to get rid of the MI *running event that is still sent for threads that are initially found stopped, if we want to. This happens to fix the testsuite too. All non-stop tests are racy against "target remote" / gdbserver testing currently. That is, sometimes the tests run, but other times they're just skipped without any indication of PASS/FAIL. When that happens, the logs show: target remote localhost:2346 Remote debugging using localhost:2346 (gdb) [Thread 25418] riscvarchive#1 stopped. 0x0000003615a011f0 in ?? () ^CQuit (gdb) Remote debugging from host 127.0.0.1 Killing process(es): 25418 monitor exit (gdb) Remote connection closed (gdb) testcase /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/multi-create-ns-info-thr.exp completed in 61 seconds The trouble here is that there's output after the prompt, and the regex in question doesn't expect that: -re "Remote debugging using .*$serialport_re.*$gdb_prompt $" { verbose "Set target to $targetname" return 0 } [1] - before startup_inferior was added, we'd go through wait_for_inferior/handle_inferior_event while going through the shell, and that turned out problematic. Tested on x86_64 Fedora 20, gdbserver. gdb/ChangeLog: 2015-08-20 Pedro Alves <palves@redhat.com> * infrun.c (print_target_wait_results): Make extern. * infrun.h (print_target_wait_results): Declare. * remote.c (set_stop_requested_callback): Delete. (process_initial_stop_replies): New function. (remote_start_remote): Use it. (stop_reply_queue_length): New function. gdb/testsuite/ChangeLog: 2015-08-20 Pedro Alves <palves@redhat.com> * gdb.server/connect-stopped-target.c: New file. * gdb.server/connect-stopped-target.exp: New file.

…g-bp.exp Running that test in a loop, I found a gdbserver core dump with the following back trace: Core was generated by `../gdbserver/gdbserver --once --multi :2346'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236 236 return inferior->regcache_data; (gdb) up riscvarchive#1 0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31 31 regcache = (struct regcache *) inferior_regcache_data (thread); (gdb) bt #0 0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236 riscvarchive#1 0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31 riscvarchive#2 0x0000000000409271 in prepare_resume_reply (buf=0x20dd593 "", ptid=..., status=0x20edce0) at src/gdb/gdbserver/remote-utils.c:1147 riscvarchive#3 0x000000000040ab0a in vstop_notif_reply (event=0x20edcc0, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/server.c:183 riscvarchive#4 0x0000000000426b38 in notif_write_event (notif=0x66e6c0 <notif_stop>, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/notif.c:69 riscvarchive#5 0x0000000000426c55 in handle_notif_ack (own_buf=0x20dd590 "T05", packet_len=8) at src/gdb/gdbserver/notif.c:113 riscvarchive#6 0x000000000041118f in handle_v_requests (own_buf=0x20dd590 "T05", packet_len=8, new_packet_len=0x7fff742c77b8) at src/gdb/gdbserver/server.c:2862 riscvarchive#7 0x0000000000413850 in process_serial_event () at src/gdb/gdbserver/server.c:4148 riscvarchive#8 0x0000000000413945 in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:4196 riscvarchive#9 0x000000000041a1ef in handle_file_event (event_file_desc=5) at src/gdb/gdbserver/event-loop.c:429 riscvarchive#10 0x00000000004199b6 in process_event () at src/gdb/gdbserver/event-loop.c:184 riscvarchive#11 0x000000000041a735 in start_event_loop () at src/gdb/gdbserver/event-loop.c:547 riscvarchive#12 0x00000000004123d2 in captured_main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3562 riscvarchive#13 0x000000000041252e in main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3631 Clearly this means that a thread pushed a stop reply in the event queue, and then before GDB confused the event, the whole process died, along with its thread. But the pending thread event was left dangling. When GDB fetched that event, gdbserver looked up the corresponding thread, but found NULL; not expecting this, gdbserver crashes when it tries to read this thread's registers. gdb/gdbserver/ 2015-08-21 Pedro Alves <palves@redhat.com> PR gdb/18749 * inferiors.c (remove_thread): Discard any pending stop reply for this thread. * server.c (remove_all_on_match_pid): Rename to ... (remove_all_on_match_ptid): ... this. Work with a filter ptid instead of a pid. (discard_queued_stop_replies): Change parameter to a ptid. Now extern. (handle_v_kill, kill_inferior_callback) (process_serial_event): Adjust. (captured_main): Call initialize_notif before starting the program, thus before threads are created. * server.h (discard_queued_stop_replies): Declare.

@cindex

The vforkdone stop reply misses indicating the thread ID of the vfork parent which the event relates to: @cindex vfork events, remote reply @item vfork The packet indicates that @code{vfork} was called, and @var{r} is the thread ID of the new child process. Refer to @ref{thread-id syntax} for the format of the @var{thread-id} field. This packet is only applicable to targets that support vfork events. @cindex vforkdone events, remote reply @item vforkdone The packet indicates that a child process created by a vfork has either called @code{exec} or terminated, so that the address spaces of the parent and child process are no longer shared. The @var{r} part is ignored. This packet is only applicable to targets that support vforkdone events. Unfortunately, this is not just a documentation issue. GDBserver is really not specifying the thread ID. I noticed because in non-stop mode, gdb complains: [Thread 6089.6089] riscvarchive#1 stopped. #0 0x0000003615a011f0 in ?? () 0x0000003615a011f0 in ?? () (gdb) set debug remote 1 (gdb) c Continuing. Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;#5f...Packet received: OK Sending packet: $vCont;c:p17c9.17c9#88...Packet received: OK Notification received: Stop:T05vfork:p17ce.17ce;06:40d7ffffff7f0000;07:30d7ffffff7f0000;10:e4c9eb1536000000;thread:p17c9.17c9;core:2; Sending packet: $vStopped#55...Packet received: OK Sending packet: $D;17ce#af...Packet received: OK Sending packet: $vCont;c:p17c9.17c9#88...Packet received: OK Notification received: Stop:T05vforkdone:; No process or thread specified in stop reply: T05vforkdone:; (gdb) This is not non-stop-mode-specific, however. Consider e.g., that in all-stop, you may be debugging more than one process at the same time. You continue, and both processes vfork. So when you next get a T05vforkdone, there's no way to tell which of the parent processes is done with the vfork. Tests will be added later. Tested on x86_64 Fedora 20. gdb/ChangeLog: 2015-09-15 Pedro Alves <palves@redhat.com> PR remote/18965 * remote-utils.c (prepare_resume_reply): Merge TARGET_WAITKIND_VFORK_DONE switch case with the TARGET_WAITKIND_FORKED case. gdb/doc/ChangeLog: 2015-09-15 Pedro Alves <palves@redhat.com> PR remote/18965 * gdb.texinfo (Stop Reply Packets): Explain that vforkdone's 'r' part indicates the thread ID of the parent process.

Assuming displaced stepping is enabled, and a breakpoint is set in the memory region of the scratch pad, things break. One of two cases can happen: riscvarchive#1 - The breakpoint wasn't inserted yet (all threads were stopped), so after setting up the displaced stepping scratch pad with the adjusted copy of the instruction we're trying to single-step, we insert the breakpoint, which corrupts the scratch pad, and the inferior executes the wrong instruction. (Example below.) This is clearly unacceptable. riscvarchive#2 - The breakpoint was already inserted, so setting up the displaced stepping scratch pad overwrites the breakpoint. This is OK in the sense that we already assume that no thread is going to executes the code in the scratch pad range (after initial startup) anyway. This commit addresses both cases by simply punting on displaced stepping if we have a breakpoint in the scratch pad range. The riscvarchive#1 case above explains a few regressions exposed by the AS/NS series on x86: Running ./gdb.dwarf2/callframecfa.exp ... FAIL: gdb.dwarf2/callframecfa.exp: set display for call-frame-cfa FAIL: gdb.dwarf2/callframecfa.exp: step 1 for call-frame-cfa FAIL: gdb.dwarf2/callframecfa.exp: step 2 for call-frame-cfa FAIL: gdb.dwarf2/callframecfa.exp: step 3 for call-frame-cfa FAIL: gdb.dwarf2/callframecfa.exp: step 4 for call-frame-cfa Running ./gdb.dwarf2/typeddwarf.exp ... FAIL: gdb.dwarf2/typeddwarf.exp: continue to breakpoint: continue to typeddwarf.c:53 FAIL: gdb.dwarf2/typeddwarf.exp: check value of x at typeddwarf.c:53 FAIL: gdb.dwarf2/typeddwarf.exp: check value of y at typeddwarf.c:53 FAIL: gdb.dwarf2/typeddwarf.exp: check value of z at typeddwarf.c:53 FAIL: gdb.dwarf2/typeddwarf.exp: continue to breakpoint: continue to typeddwarf.c:73 FAIL: gdb.dwarf2/typeddwarf.exp: check value of w at typeddwarf.c:73 FAIL: gdb.dwarf2/typeddwarf.exp: check value of x at typeddwarf.c:73 FAIL: gdb.dwarf2/typeddwarf.exp: check value of y at typeddwarf.c:73 FAIL: gdb.dwarf2/typeddwarf.exp: check value of z at typeddwarf.c:73 Enabling "maint set target-non-stop on" implies displaced stepping enabled as well, and it's the latter that's to blame here. We can see the same failures with "maint set target-non-stop off + set displaced on". Diffing (good/bad) gdb.log for callframecfa.exp shows: @@ -99,29 +99,29 @@ Breakpoint 2 at 0x80481b0: file q.c, lin continue Continuing. -Breakpoint 2, func (arg=77) at q.c:2 +Breakpoint 2, func (arg=52301) at q.c:2 2 in q.c (gdb) PASS: gdb.dwarf2/callframecfa.exp: continue to breakpoint: continue to breakpoint for call-frame-cfa display arg -1: arg = 77 -(gdb) PASS: gdb.dwarf2/callframecfa.exp: set display for call-frame-cfa +1: arg = 52301 +(gdb) FAIL: gdb.dwarf2/callframecfa.exp: set display for call-frame-cfa The problem is here, when setting up the func call: Breakpoint 1, main (argc=-13345, argv=0x0) at q.c:7 7 in q.c (gdb) disassemble Dump of assembler code for function main: 0x080481bb <+0>: push %ebp 0x080481bc <+1>: mov %esp,%ebp 0x080481be <+3>: sub $0x4,%esp => 0x080481c1 <+6>: movl $0x4d,(%esp) 0x080481c8 <+13>: call 0x80481b0 <func> 0x080481cd <+18>: leave 0x080481ce <+19>: ret End of assembler dump. (gdb) disassemble /r Dump of assembler code for function main: 0x080481bb <+0>: 55 push %ebp 0x080481bc <+1>: 89 e5 mov %esp,%ebp 0x080481be <+3>: 83 ec 04 sub $0x4,%esp => 0x080481c1 <+6>: c7 04 24 4d 00 00 00 movl $0x4d,(%esp) 0x080481c8 <+13>: e8 e3 ff ff ff call 0x80481b0 <func> 0x080481cd <+18>: c9 leave 0x080481ce <+19>: c3 ret End of assembler dump. Note the breakpoint at main is set at 0x080481c1. Right at the instruction that sets up func's argument. Executing that instruction should write 0x4d to the address pointed at by $esp. However, if we stepi, the program manages to write 52301/0xcc4d there instead (0xcc is int3, the x86 breakpoint instruction), because the breakpoint address is 4 bytes inside the scratch pad location, which is 0x080481bd: (gdb) p 0x080481c1 - 0x080481bd $1 = 4 IOW, instead of executing: "c7 04 24 4d 00 00 00" [ movl $0x4d,(%esp) ] the inferior executes: "c7 04 24 4d cc 00 00" [ movl $0xcc4d,(%esp) ] gdb/ChangeLog: 2015-10-30 Pedro Alves <palves@redhat.com> * breakpoint.c (breakpoint_in_range_p) (breakpoint_location_address_range_overlap): New functions. * breakpoint.h (breakpoint_in_range_p): New declaration. * infrun.c (displaced_step_prepare_throw): If there's a breakpoint in the scratch pad range, don't displaced step.

When I build GDB with -fsanitize=address, and run testsuite, some gdb.base/*.exp test triggers the ERROR below, ================================================================= ==7646==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000242810 at pc 0x487844 bp 0x7fffe32e84e0 sp 0x7fffe32e84d8 READ of size 4 at 0x603000242810 thread T0 #0 0x487843 in push_stack_item /home/yao/SourceCode/gnu/gdb/git/gdb/arm-tdep.c:3405 #1 0x48998a in arm_push_dummy_call /home/yao/SourceCode/gnu/gdb/git/gdb/arm-tdep.c:3960 In that path, GDB passes value on stack, in an INT_REGISTER_SIZE slot, but the value contents' length can be less than INT_REGISTER_SIZE, so the contents will be accessed out of the bound. This patch adds an array buf[INT_REGISTER_SIZE], and copy val to buf before writing them to stack. gdb: 2015-11-16 Yao Qi <yao.qi@linaro.org> * arm-tdep.c (arm_push_dummy_call): New array buf. Store regval to buf. Pass buf instead of val to push_stack_item.

Hi, I build GDB with -fsanitize=address, and run testsuite. In gdb.base/callfuncs.exp, I see the following error, p/c fun1() =================================================================^M ==9601==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffee858530 at pc 0x6df079 bp 0x7fffee8583a0 sp 0x7fffee858398 WRITE of size 16 at 0x7fffee858530 thread T0 #0 0x6df078 in regcache_raw_read /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:673 #1 0x6dfe1e in regcache_cooked_read /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:751 #2 0x4696a3 in aarch64_extract_return_value /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1708 #3 0x46ae57 in aarch64_return_value /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1918 We are extracting return value from V registers (128 bit), but only allocate X_REGISTER_SIZE-byte array, which isn't sufficient. This patch changes the array to V_REGISTER_SIZE. gdb: 2015-11-16 Yao Qi <yao.qi@linaro.org> * aarch64-tdep.c (aarch64_extract_return_value): Change array buf's length to V_REGISTER_SIZE.

Hi, I build GDB with -fsanitize=address, and run testsuite. In gdb.base/callfuncs.exp, I see the following error, p t_float_values(0.0,0.0) ================================================================= ==8088==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000cb650 at pc 0x6e195c bp 0x7fff164f9770 sp 0x7fff164f9768 READ of size 16 at 0x6020000cb650 thread T0^ #0 0x6e195b in regcache_raw_write /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:912 #1 0x6e1e52 in regcache_cooked_write /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:945 #2 0x466d69 in pass_in_v /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1101 #3 0x467512 in pass_in_v_or_stack /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1196 #4 0x467d7d in aarch64_push_dummy_call /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1335 The code in pass_in_v read contents from V registers (128 bit), but the data passed through V registers can be less than 128 bit. In this case, float is passed. So writing V registers contents into contents buff will cause overflow. In this patch, we add an array reg[V_REGISTER_SIZE], which is to hold the contents from V registers, and then copy useful bits to buf. gdb: 2015-11-18 Yao Qi <yao.qi@linaro.org> * aarch64-tdep.c (pass_in_v): Add argument len. Add local array reg. Callers updated.

The following issue has been observed on arm-android, trying to step over the following line of code: Put_Line (">>> " & Integer'Image (Message (I))); Below is a copy of the GDB transcript: (gdb) cont Breakpoint 1, q.dump (message=...) at q.adb:11 11 Put_Line (">>> " & Integer'Image (Message (I))); (gdb) next 0x00016000 in system.concat_2.str_concat_2 () The expected behavior for the "next" command is to step over the call to Put_Line and stop at line 12: (gdb) next 12 I := I + 1; What happens during the next step is that the code for line 11 above make a call to system.concat_2.str_concat_2 (to implement the '&' string concatenation operator) before making the call to Put_Line. While stepping, GDB stops eventually stops at the first instruction of that function, and fails to detect that it's a function call from where we were before, and so decides to stop stepping. And the reason why it fails to detect that we landed inside a function call is because it fails to unwind from that function: (gdb) bt #0 0x00016000 in system.concat_2.str_concat_2 () #1 0x0001bc74 in ?? () Debugging GDB, I found that GDB decides to use the ARM unwind info for that function, which contains the following data: 0x16000 <system__concat_2__str_concat_2>: 0x80acb0b0 Compact model index: 0 0xac pop {r4, r5, r6, r7, r8, r14} 0xb0 finish 0xb0 finish But, in fact, using that data is wrong, in this case, because it mentions a pop of 6 registers, and therefore hints at a frame size of 24 bytes. The problem is that, because we're at the first instruction of the function, the 6 registers haven't been pushed to the stack yet. In other words, using the ARM unwind entry above, GDB is tricked into thinking that the frame size is 24 bytes, and that the return address (r14) is available on the stack. One visible manifestation of this issue can been seen by looking at the value of the stack pointer, and the frame's base address: (gdb) p /x $sp $2 = 0xbee427b0 (gdb) info frame Stack level 0, frame at 0xbee427c8: ^^^^^^^^^^ |||||||||| The frame's base address should be equal to the value of the stack pointer at entry. And you eventually get the correct frame address, as well as the correct backtrace if you just single-step one additional instruction, past the push: (gdb) x /i $pc => 0x16000 <system__concat_2__str_concat_2>: push {r4, r5, r6, r7, r8, lr} (gdb) stepi (gdb) bt #0 0x00016004 in system.concat_2.str_concat_2 () #1 0x00012b6c in q.dump (message=...) at q.adb:11 #2 0x00012c3c in q () at q.adb:19 Digging further, I found that GDB tries to use the ARM unwind info only when sure that it is relevant, as explained in the following comment: /* The ARM exception table does not describe unwind information for arbitrary PC values, but is guaranteed to be correct only at call sites. We have to decide here whether we want to use ARM exception table information for this frame, or fall back [...] There is one case where it decides that the info is relevant, described in the following comment: /* We also assume exception information is valid if we're currently blocked in a system call. The system library is supposed to ensure this, so that e.g. pthread cancellation works. For that, it just parses the instruction at the address it believes to be the point of call, and matches it against an "svc" instruction. For instance, for a non-thumb instruction, it is at... get_frame_pc (this_frame) - 4 ... and the code checking looks like the following. if (safe_read_memory_integer (get_frame_pc (this_frame) - 4, 4, byte_order_for_code, &insn) && (insn & 0x0f000000) == 0x0f000000 /* svc */) exc_valid = 1; However, the reason why this doesn't work in our case is that because we are at the first instruction of a function in the innermost frame. That frame can't possibly be making a call, and therefore be stuck on a system call. What the code above ends up doing is checking the instruction just before the start of our function, which in our case is not even an actual instruction, but unlucky for us, happens to match the pattern it is looking for, thus leading GDB to improperly trust the ARM unwinding data. gdb/ChangeLog: * arm-tdep.c (arm_exidx_unwind_sniffer): Do not check for a frame stuck on a system call if the given frame is the innermost frame.

One of our users reported an internal error using the "bt full" command. In their situation, reproducing involved the following scenario: (gdb) frame 1 (gdb) bt full #0 0xf7783430 in __kernel_vsyscall () No symbol table info available. #1 0xf5550aeb in waitpid () at ../sysdeps/unix/syscall-template.S:81 No locals. [...] #6 0x0fe83139 in xxxx (arg=...) [...some locals printed, and then...] <S17b> = [...]/dwarf2loc.c:364: internal-error: dwarf_expr_frame_base: Assertion `framefunc != NULL' failed. As shown above, the error happens while GDB is trying to print the value of <S17b>, which is a local string internally generated by the compiler. For that, it finds that the array lives in memory, and therefore tries to create a struct value for it via: case DWARF_VALUE_MEMORY: { CORE_ADDR address = dwarf_expr_fetch_address (ctx, 0); [...] retval = value_at_lazy (type, address + byte_offset); Unfortunately for us, TYPE happens to be an array whose bounds are dynamic. More precisely, the bounds of our arrays are described in the debugging info as being... <4><2c1985e>: Abbrev Number: 33 (DW_TAG_subrange_type) <2c1985f> DW_AT_type : <0x2c1989c> <2c19863> DW_AT_lower_bound : <0x2c19835> <2c19867> DW_AT_upper_bound : <0x2c19841> ... which are references to a pair of local variables. For instance, the lower bound is a reference to the following DIE <3><2c19835>: Abbrev Number: 32 (DW_TAG_variable) <2c19836> DW_AT_name : [...] <2c1983a> DW_AT_type : <0x2c198b4> <2c1983e> DW_AT_artificial : 1 <2c1983e> DW_AT_location : 2 byte block: 91 58 (DW_OP_fbreg: -40) As a result of the above, value_at_lazy indirectly triggers a resolution of TYPE (via value_from_contents_and_address), which means a resolution of TYPE's bounds, and as seen in the DW_AT_location attribute above for our bounds, computing the bound's location requires the frame (its location expression uses DW_OP_fbreg). Unfortunately for us, value_at_lazy does not get passed a frame, we've lost the relevant frame when we try to resolve the array's bounds. Instead, resolve_dynamic_range gets calls dwarf2_evaluate_property with NULL as the frame: static struct type * resolve_dynamic_range (struct type *dyn_range_type, struct property_addr_info *addr_stack) { [...] if (dwarf2_evaluate_property (prop, NULL, addr_stack, &value)) ^^^^ ... which then handles this by using the selected frame instead: if (frame == NULL && has_stack_frames ()) frame = get_selected_frame (NULL); In our case, the selected frame happens to be frame #1, which is a frame where we have a minimal amount of debugging info, and in particular, no debug info for the function itself. And because of that, when we try to determine the frame's base... static void dwarf_expr_frame_base (void *baton, const gdb_byte **start, size_t * length) { struct dwarf_expr_baton *debaton = (struct dwarf_expr_baton *) baton; const struct block *bl = get_frame_block (debaton->frame, NULL); [...] framefunc = block_linkage_function (bl); ... framefunc ends up being NULL, which triggers the assert in that same function: gdb_assert (framefunc != NULL); This patches avoids the issue by temporarily setting the selected_frame before printing the locals of each frames. This patch also adds a small testcase, which reproduces the same issue, but with a slightly different outcome: (gdb) bt full #0 0x000000000040049a in opaque_routine () No symbol table info available. #1 0x0000000000400532 in main () at wrong_frame_bt_full-main.c:20 my_table_size = 3 my_table = <error reading variable my_table (frame address is not available.)> With this patch, the output becomes: (gdb) bt full [...] my_table = {0, 1, 2} gdb/ChangeLog: * stack.c (print_frame_local_vars): Temporarily set the selected frame to FRAME while printing the frame's local variables. gdb/testsuite/ChangeLog: * gdb.base/wrong_frame_bt_full-main.c: New file. * gdb.base/wrong_frame_bt_full-opaque.c: New file. * gdb.base/wrong_frame_bt_full.exp: New file.

This patch fixes the GDB internal error on AArch64 when running watchpoint-fork.exp top?bt 15 internal_error (file=file@entry=0x79d558 "../../binutils-gdb/gdb/linux-nat.c", line=line@entry=4866, fmt=0x793b20 "%s: Assertion `%s' failed.") at ../../binutils-gdb/gdb/common/errors.c:51 #1 0x0000000000495bc4 in linux_nat_thread_address_space (t=<optimized out>, ptid=<error reading variable: Cannot access memory at address 0x1302>) at ../../binutils-gdb/gdb/linux-nat.c:4866 #2 0x00000000005db2c8 in delegate_thread_address_space (self=<optimized out>, arg1=<error reading variable: Cannot access memory at address 0x1302>) at ../../binutils-gdb/gdb/target-delegates.c:2447 #3 0x00000000005e8c7c in target_thread_address_space (ptid=<error reading variable: Cannot access memory at address 0x1302>) at ../../binutils-gdb/gdb/target.c:2727 #4 0x000000000054eef8 in get_thread_arch_regcache (ptid=..., gdbarch=0xad51e0) at ../../binutils-gdb/gdb/regcache.c:529 #5 0x000000000054efcc in get_thread_regcache (ptid=...) at ../../binutils-gdb/gdb/regcache.c:546 #6 0x000000000054f120 in get_thread_regcache_for_ptid (ptid=...) at ../../binutils-gdb/gdb/regcache.c:560 #7 0x00000000004a2278 in aarch64_point_is_aligned (is_watchpoint=0, addr=34168, len=2) at ../../binutils-gdb/gdb/nat/aarch64-linux-hw-point.c:122 #8 0x00000000004a2e68 in aarch64_handle_breakpoint (type=hw_execute, addr=34168, len=2, is_insert=0, state=0xae8880) at ../../binutils-gdb/gdb/nat/aarch64-linux-hw-point.c:465 #9 0x000000000048edf0 in aarch64_linux_remove_hw_breakpoint (self=<optimized out>, gdbarch=<optimized out>, bp_tgt=<optimized out>) at ../../binutils-gdb/gdb/aarch64-linux-nat.c:657 #10 0x00000000005da8dc in delegate_remove_hw_breakpoint (self=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at ../../binutils-gdb/gdb/target-delegates.c:492 #11 0x0000000000536a24 in bkpt_remove_location (bl=<optimized out>) at ../../binutils-gdb/gdb/breakpoint.c:13065 #12 0x000000000053351c in remove_breakpoint_1 (bl=0xb3fe70, is=is@entry=mark_inserted) at ../../binutils-gdb/gdb/breakpoint.c:4026 #13 0x000000000053ccc0 in detach_breakpoints (ptid=...) at ../../binutils-gdb/gdb/breakpoint.c:3930 #14 0x00000000005a3ac0 in handle_inferior_event_1 (ecs=0x7ffffff048) at ../../binutils-gdb/gdb/infrun.c:5042 After the fork, GDB will physically remove the breakpoints from the child process (in frame #14), but at that time, GDB doesn't create an inferior yet for child, but inferior_ptid is set to child's ptid (in frame #13). In aarch64_point_is_aligned, we'll get the regcache of current_lwp_ptid to determine if the current process is 32-bit or 64-bit, so the inferior can't be found, and the internal error is caused. I don't find a better fix other than not checking alignment on removing breakpoint. gdb: 2015-11-27 Yao Qi <yao.qi@linaro.org> * nat/aarch64-linux-hw-point.c (aarch64_dr_state_remove_one_point): Don't assert on alignment. (aarch64_handle_breakpoint): Only check alignment when IS_INSERT is true.

There's currently no non-stop equivalent of the all-stop ^C (\003) "packet" that GDB sends when a ctrl-c is pressed while a foreground command is active. There's vCont;t, but that's defined to cause a "signal 0" stop. This fixes many tests that type ^C, when testing with extended-remote with "maint set target-non-stop on". E.g.: Continuing. talk to me baby PASS: gdb.base/interrupt.exp: process is alive a a PASS: gdb.base/interrupt.exp: child process ate our char ^C [Thread 22730.22730] #1 stopped. 0x0000003615ee6650 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) FAIL: gdb.base/interrupt.exp: send_gdb control C p func1 () gdb/ 2015-11-30 Pedro Alves <palves@redhat.com> * NEWS (New remote packets): Mention vCtrlC. * remote.c (PACKET_vCtrlC): New enum value. (async_remote_interrupt): Call target_interrupt instead of target_stop. (remote_interrupt_as): Remove 'ptid' parameter. (remote_interrupt_ns): New function. (remote_stop): Adjust. (remote_interrupt): If the target is in non-stop mode, try interrupting with vCtrlC. (initialize_remote): Install set remote ctrl-c packet. gdb/doc/ 2015-11-30 Pedro Alves <palves@redhat.com> * gdb.texinfo (Bootstrapping): Add "interrupting remote targets" anchor. (Packets): Document vCtrlC. gdb/gdbserver/ 2015-11-30 Pedro Alves <palves@redhat.com> * server.c (handle_v_requests): Handle vCtrlC.

Testing with the extended-remote board with "maint set target-non-stop on" shows a dprintf-non-stop.exp regression. The issue is simply that the test is expecting output that is only valid for the native target: native: [process 8676] #1 stopped. remote: [Thread 8900.8900] #1 stopped. In order to expose this without "maint set target-non-stop on", this restarts gdb with non-stop mode already enabled. gdb/testsuite/ChangeLog: 2015-11-30 Pedro Alves <palves@redhat.com> * gdb.base/dprintf-non-stop.exp: Use build_executable instead of prepare_for_testing. Start gdb with "set non-stop on" appended to GDBFLAGS. Lax expected stop output.

Hi, AddressSanitizer reports an error like this, (gdb) PASS: gdb.base/call-ar-st.exp: continue to tbreak9 print print_long_arg_list(a, b, c, d, e, f, *struct1, *struct2, *struct3, *struct4, *flags, *flags_combo, *three_char, *five_char, *int_char_combo, *d1, *d2, *d3, *f1, *f2, *f3) ================================================================= ==6236==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200008eb50 at pc 0x89e432 bp 0x7fffa3df9080 sp 0x7fffa3df9078 READ of size 5 at 0x60200008eb50 thread T0 #0 0x89e431 in memory_xfer_partial gdb/target.c:1264 #1 0x89e6c7 in target_xfer_partial gdb/target.c:1320 #2 0x89f267 in target_write_partial gdb/target.c:1595^M #3 0x8a014b in target_write_with_progress gdb/target.c:1889^M #4 0x8a0262 in target_write gdb/target.c:1914^M #5 0x89ee59 in target_write_memory gdb/target.c:1492^M #6 0x9a1c74 in write_memory gdb/corefile.c:393^M #7 0x467ea5 in aarch64_push_dummy_call gdb/aarch64-tdep.c:1388 The problem is that an instance of stack_item_t is created to adjust stack for alignment, the item.len is correct, but item.data is buf, which is wrong, because item.len can be greater than the length of buf. This patch sets item.data to NULL, and only update sp (no inferior memory writes on stack for this item). gdb: 2015-12-17 Yao Qi <yao.qi@linaro.org> * aarch64-tdep.c (struct stack_item_t): Update comments. (pass_on_stack): Set item.data to NULL. (aarch64_push_dummy_call): Call write_memory if si->data isn't NULL.

…al is ours I see a timeout in gdb.base/random-signal.exp, Continuing.^M PASS: gdb.base/random-signal.exp: continue ^CPython Exception <type 'exceptions.KeyboardInterrupt'> <type exceptions.KeyboardInterrupt'>: ^M FAIL: gdb.base/random-signal.exp: stop with control-c (timeout) it can be reproduced by running random-signal.exp with native-gdbserver in a loop, like this, and the fail will be shown in about 20 runs, $ (set -e; while true; do make check RUNTESTFLAGS="--target_board=native-gdbserver random-signal.exp"; done) In the test, the program is being single-stepped for software watchpoint, and in each internal stop, python unwinder sniffer is used, #0 pyuw_sniffer (self=<optimised out>, this_frame=<optimised out>, cache_ptr=0xd554f8) at /home/yao/SourceCode/gnu/gdb/git/gdb/python/py-unwind.c:608 #1 0x00000000006a10ae in frame_unwind_try_unwinder (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8, unwinder=0xecd540) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:107 #2 0x00000000006a143f in frame_unwind_find_by_frame (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:163 #3 0x000000000069dc6b in compute_frame_id (fi=0xd554e0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:454 #4 get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1781 #5 0x000000000069fdb9 in get_prev_frame_always_1 (this_frame=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1955 #6 get_prev_frame_always (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1971 #7 0x00000000006a04b1 in get_prev_frame (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2213 when GDB goes to python extension, or other language extension, the SIGINT handler is changed, and is restored when GDB leaves extension language. GDB only stays in extension language for a very short period in this case, but if ctrl-c is pressed at that moment, python extension will handle the SIGINT, and exceptions.KeyboardInterrupt is shown. Language extension is used in GDB side rather than inferior side, so GDB should only change SIGINT handler for extension language when the terminal is ours (not inferior's). This is what this patch does. With this patch applied, I run random-signal.exp in a loop for 18 hours, and no fail is shown. gdb: 2016-01-08 Yao Qi <yao.qi@linaro.org> * extension.c: Include target.h. (set_active_ext_lang): Only call install_gdb_sigint_handler, check_quit_flag, and set_quit_flag if target_terminal_is_ours returns false. (restore_active_ext_lang): Likewise. * target.c (target_terminal_is_ours): New function. * target.h (target_terminal_is_ours): Declare.

The PR threads/19422 patchset added a new regression. Additionally below it there was already a regression if --with-guile (which is default if Guile is found) was used. racy case #1: (xgdb) PASS: gdb.gdb/selftest.exp: Set xgdb_prompt ^M Thread 1 "xgdb" received signal SIGINT, Interrupt.^M 0x00007ffff583bfdd in poll () from /lib64/libc.so.6^M (gdb) FAIL: gdb.gdb/selftest.exp: send ^C to child process signal SIGINT^M Continuing with signal SIGINT.^M ^C^M Thread 1 "xgdb" received signal SIGINT, Interrupt.^M 0x00007ffff5779da0 in sigprocmask () from /lib64/libc.so.6^M (gdb) PASS: gdb.gdb/selftest.exp: send SIGINT signal to child process backtrace^M errstring=errstring@entry=0x7e0e6c "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240^M errstring=errstring@entry=0x7e0e6c "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240^M (gdb) PASS: gdb.gdb/selftest.exp: backtrace through signal handler racy case #2: (xgdb) PASS: gdb.gdb/selftest.exp: Set xgdb_prompt ^M Thread 1 "xgdb" received signal SIGINT, Interrupt.^M 0x00007ffff583bfdd in poll () from /lib64/libc.so.6^M (gdb) FAIL: gdb.gdb/selftest.exp: send ^C to child process signal SIGINT^M Continuing with signal SIGINT.^M ^C^M Thread 2 "xgdb" received signal SIGINT, Interrupt.^M [Switching to Thread 0x7ffff3b7f700 (LWP 13227)]^M 0x00007ffff6b88b10 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0^M (gdb) PASS: gdb.gdb/selftest.exp: send SIGINT signal to child process backtrace^M (gdb) FAIL: gdb.gdb/selftest.exp: backtrace through signal handler Pedro Alves: Not all targets support thread names, and even those that do, not all use the program name as default thread name -- I think that's only true for GNU/Linux, actually. So I think it's best to not expect that, like: -re "(Thread .*|Program) received signal SIGINT.*$gdb_prompt $" { gdb/testsuite/ChangeLog 2016-01-22 Jan Kratochvil <jan.kratochvil@redhat.com> Fix testsuite compatibility with Guile. * gdb.gdb/selftest.exp (send ^C to child process): Accept also Thread. (thread 1): New test for backtrace through signal handler.

I see GDB crashes in dprintf.exp on aarch64-linux testing, (gdb) PASS: gdb.base/dprintf.exp: agent: break 29 set dprintf-style agent^M (gdb) PASS: gdb.base/dprintf.exp: agent: set dprintf style to agent continue^M Continuing. ASAN:SIGSEGV ================================================================= ==22475==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x000000494820 sp 0x7fff389b83a0 bp 0x62d000082417 T0) #0 0x49481f in remote_add_target_side_commands /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:9190^M #1 0x49e576 in remote_add_target_side_commands /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:9174^M #2 0x49e576 in remote_insert_breakpoint /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:9240^M #3 0x5278b7 in insert_bp_location /home/yao/SourceCode/gnu/gdb/git/gdb/breakpoint.c:2734^M #4 0x52ac09 in insert_breakpoint_locations /home/yao/SourceCode/gnu/gdb/git/gdb/breakpoint.c:3159^M #5 0x52ac09 in update_global_location_list /home/yao/SourceCode/gnu/gdb/git/gdb/breakpoint.c:12686 the root cause of this problem in this case is about linespec and symtab which produces additional incorrect location and a NULL is added to bp_tgt->tcommands. I posted a patch https://sourceware.org/ml/gdb-patches/2015-12/msg00321.html to fix it in linespec (the fix causes regression), but GDB still shouldn't add NULL into bp_tgt->tcommands. The logic of build_target_command_list looks odd to me. If we get something wrong in parse_cmd_to_aexpr (it returns NULL), we shouldn't continue, instead we should set flag null_command_or_parse_error. This is what this patch does. In the meantime, we find build_target_condition_list has the same problem, so fix it too. gdb: 2016-01-28 Yao Qi <yao.qi@linaro.org> * breakpoint.c (build_target_command_list): Don't call continue if aexpr is NULL. (build_target_condition_list): Likewise.

I see the following error in testing aarch64 GDB debugging arm program. (gdb) PASS: gdb.reverse/readv-reverse.exp: set breakpoint at marker2 continue Continuing. ================================================================= ==32273==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x000000ce4c00 in thread T0 #0 0x2ba5615645c7 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x545c7)^M riscvarchive#1 0x4be8b5 in VEC_CORE_ADDR_cleanup /home/yao/SourceCode/gnu/gdb/git/gdb/common/gdb_vecs.h:34^M riscvarchive#2 0x5e6d95 in do_my_cleanups /home/yao/SourceCode/gnu/gdb/git/gdb/common/cleanups.c:154^M riscvarchive#3 0x64c99a in fetch_inferior_event /home/yao/SourceCode/gnu/gdb/git/gdb/infrun.c:3975^M riscvarchive#4 0x678437 in inferior_event_handler /home/yao/SourceCode/gnu/gdb/git/gdb/inf-loop.c:44^M riscvarchive#5 0x5078f6 in remote_async_serial_handler /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:13223^M riscvarchive#6 0x4cecfd in run_async_handler_and_reschedule /home/yao/SourceCode/gnu/gdb/git/gdb/ser-base.c:137^M riscvarchive#7 0x676864 in gdb_wait_for_event /home/yao/SourceCode/gnu/gdb/git/gdb/event-loop.c:834^M riscvarchive#8 0x676a27 in gdb_do_one_event /home/yao/SourceCode/gnu/gdb/git/gdb/event-loop.c:323^M riscvarchive#9 0x676aed in start_event_loop /home/yao/SourceCode/gnu/gdb/git/gdb/event-loop.c:347^M riscvarchive#10 0x6706d2 in captured_command_loop /home/yao/SourceCode/gnu/gdb/git/gdb/main.c:318^M riscvarchive#11 0x66db8c in catch_errors /home/yao/SourceCode/gnu/gdb/git/gdb/exceptions.c:240^M riscvarchive#12 0x6716dd in captured_main /home/yao/SourceCode/gnu/gdb/git/gdb/main.c:1157^M riscvarchive#13 0x66db8c in catch_errors /home/yao/SourceCode/gnu/gdb/git/gdb/exceptions.c:240^M riscvarchive#14 0x671b7a in gdb_main /home/yao/SourceCode/gnu/gdb/git/gdb/main.c:1165^M riscvarchive#15 0x467684 in main /home/yao/SourceCode/gnu/gdb/git/gdb/gdb.c:32^M riscvarchive#16 0x2ba563ed7ec4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)^M riscvarchive#17 0x4676b2 (/scratch/yao/gdb/build-git/aarch64-linux-gnu/gdb/gdb+0x4676b2) looks we should discard cleanup if function arm_linux_software_single_step returns early, or create cleanup when it is needed. gdb: 2016-02-16 Yao Qi <yao.qi@linaro.org> * arm-linux-tdep.c (arm_linux_software_single_step): Assign 'old_chain' later.

I see the following GDBserver internal error in two cases, gdb/gdbserver/linux-low.c:1922: A problem internal to GDBserver has been detected. unsuspend LWP 17200, suspended=-1 1. step over a breakpoint on fork/vfork syscall instruction, 2. step over a breakpoint on clone syscall instruction and child threads hits a breakpoint, the stack backtrace is #0 internal_error (file=file@entry=0x44c4c0 "gdb/gdbserver/linux-low.c", line=line@entry=1922, fmt=fmt@entry=0x44c7d0 "unsuspend LWP %ld, suspended=%d\n") at gdb/gdbserver/../common/errors.c:51 riscvarchive#1 0x0000000000424014 in lwp_suspended_decr (lwp=<optimised out>, lwp=<optimised out>) at gdb/gdbserver/linux-low.c:1922 riscvarchive#2 0x000000000042403a in unsuspend_one_lwp (entry=<optimised out>, except=0x66e8c0) at gdb/gdbserver/linux-low.c:2885 riscvarchive#3 0x0000000000405f45 in find_inferior (list=<optimised out>, func=func@entry=0x424020 <unsuspend_one_lwp>, arg=arg@entry=0x66e8c0) at gdb/gdbserver/inferiors.c:243 riscvarchive#4 0x00000000004297de in unsuspend_all_lwps (except=0x66e8c0) at gdb/gdbserver/linux-low.c:2895 riscvarchive#5 linux_wait_1 (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, target_options=target_options@entry=0) at gdb/gdbserver/linux-low.c:3632 riscvarchive#6 0x000000000042a764 in linux_wait (ptid=..., ourstatus=0x665ec0 <last_status>, target_options=0) at gdb/gdbserver/linux-low.c:3770 riscvarchive#7 0x0000000000411163 in mywait (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, options=options@entry=0, connected_wait=connected_wait@entry=1) at gdb/gdbserver/target.c:214 riscvarchive#8 0x000000000040b1f2 in resume (actions=0x66f800, num_actions=1) at gdb/gdbserver/server.c:2757 riscvarchive#9 0x000000000040f660 in handle_v_cont (own_buf=0x66a630 "vCont;c:p45e9.-1") at gdb/gdbserver/server.c:2719 when GDBserver steps over a thread, other threads have been suspended, the "stepping" thread may create new thread, but GDBserver doesn't set it suspend count to 1. When GDBserver unsuspend threads, the child's suspend count goes to -1, and the assert is triggered. In fact, GDBserver has already taken care of suspend count of new thread when GDBserver is suspending all threads except the one GDBserver wants to step over by https://sourceware.org/ml/gdb-patches/2015-07/msg00946.html + /* If we're suspending all threads, leave this one suspended + too. */ + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) + { + if (debug_threads) + debug_printf ("HEW: leaving child suspended\n"); + child_lwp->suspended = 1; + } but that is not enough, because new thread is still can be spawned in the thread which is being stepped over. This patch extends the condition that GDBserver set child's suspend count to one if it is suspending threads or stepping over the thread. gdb/gdbserver: 2016-03-03 Yao Qi <yao.qi@linaro.org> PR server/19736 * linux-low.c (handle_extended_wait): Set child suspended if event_lwp->bp_reinsert isn't zero.

Fix this GDB crash: $ gdb -ex "set architecture mips:10000" Segmentation fault (core dumped) Backtrace: Program received signal SIGSEGV, Segmentation fault. 0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/mips-tdep.c:8436 8436 if (bfd_get_flavour (info.abfd) == bfd_target_elf_flavour (top-gdb) bt #0 0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at .../src/gdb/mips-tdep.c:8436 riscvarchive#1 0x00000000007348a6 in gdbarch_find_by_info (info=...) at .../src/gdb/gdbarch.c:5155 riscvarchive#2 0x000000000073563c in gdbarch_update_p (info=...) at .../src/gdb/arch-utils.c:522 riscvarchive#3 0x0000000000735585 in set_architecture (ignore_args=0x0, from_tty=1, c=0x26bc870) at .../src/gdb/arch-utils.c:496 riscvarchive#4 0x00000000005f29fd in do_sfunc (c=0x26bc870, args=0x0, from_tty=1) at .../src/gdb/cli/cli-decode.c:121 riscvarchive#5 0x00000000005fd3f3 in do_set_command (arg=0x7fffffffdcdd "mips:10000", from_tty=1, c=0x26bc870) at .../src/gdb/cli/cli-setshow.c:455 riscvarchive#6 0x0000000000836157 in execute_command (p=0x7fffffffdcdd "mips:10000", from_tty=1) at .../src/gdb/top.c:460 riscvarchive#7 0x000000000071abfb in catch_command_errors (command=0x835f6b <execute_command>, arg=0x7fffffffdccc "set architecture mips:10000", from_tty=1) at .../src/gdb/main.c:368 riscvarchive#8 0x000000000071bf4f in captured_main (data=0x7fffffffd750) at .../src/gdb/main.c:1132 riscvarchive#9 0x0000000000716737 in catch_errors (func=0x71af44 <captured_main>, func_args=0x7fffffffd750, errstring=0x106b9a1 "", mask=RETURN_MASK_ALL) at .../src/gdb/exceptions.c:240 riscvarchive#10 0x000000000071bfe6 in gdb_main (args=0x7fffffffd750) at .../src/gdb/main.c:1164 riscvarchive#11 0x000000000040a6ad in main (argc=4, argv=0x7fffffffd858) at .../src/gdb/gdb.c:32 (top-gdb) We already check whether info.abfd is NULL before all other bfd_get_flavour calls in the same function. Just this one case was missing. (This was exposed by a WIP test that tries all "set architecture ARCH" values.) gdb/ChangeLog: 2016-03-07 Pedro Alves <palves@redhat.com> * mips-tdep.c (mips_gdbarch_init): Check whether info.abfd is NULL before calling bfd_get_flavour.

Nowadays, GDB can't unwind successfully from epilogue on arm, (gdb) bt #0 0x76ff65a2 in shr1 () from /home/yao/Source/gnu/build/gdb/testsuite/gdb.reverse/shr1.sl riscvarchive#1 0x0000869e in main () at /home/yao/Source/gnu/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.reverse/solib-reverse.c:34 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) disassemble shr1 Dump of assembler code for function shr1: .... 0x76ff659a <+10>: adds r7, riscvarchive#12 0x76ff659c <+12>: mov sp, r7 0x76ff659e <+14>: ldr.w r7, [sp], riscvarchive#4 0x76ff65a2 <+18>: bx lr End of assembler dump. in this case, prologue unwinder is used. It analyzes the prologue and get the offsets of saved registers to SP. However, in epilogue, the SP has been restored, prologue unwinder gets the registers from the wrong address, and even the frame id is wrong. In reverse debugging, this case (program stops at the last instruction of function) happens quite frequently due to the reverse execution. There are many test fails due to missing epilogue unwinder. This adds epilogue unwinder, but the frame cache is still get by prologue unwinder except that SP is fixed up separately, because SP is restored in epilogue. This patch fixes many fails in solib-precsave.exp, and solib-reverse.exp. gdb: 2016-03-30 Yao Qi <yao.qi@linaro.org> * arm-tdep.c: (arm_make_epilogue_frame_cache): New function. (arm_epilogue_frame_this_id): New function. (arm_epilogue_frame_prev_register): New function. (arm_epilogue_frame_sniffer): New function. (arm_epilogue_frame_unwind): New. (arm_gdbarch_init): Append unwinder arm_epilogue_frame_unwind.

immediate_quit used to be necessary back when prompt_for_continue used blocking fread, but nowadays it uses gdb_readline_wrapper, which is implemented in terms of a nested event loop, which already knows how to react to SIGINT: #0 throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:324 riscvarchive#1 0x00000000007bab5d in throw_vquit (fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:366 riscvarchive#2 0x00000000007bac9f in throw_quit (fmt=0x9d6d7e "Quit") at .../src/gdb/common/common-exceptions.c:385 riscvarchive#3 0x0000000000773a2d in quit () at .../src/gdb/utils.c:1039 riscvarchive#4 0x000000000065d81b in async_request_quit (arg=0x0) at .../src/gdb/event-top.c:893 riscvarchive#5 0x000000000065c27b in invoke_async_signal_handlers () at .../src/gdb/event-loop.c:949 riscvarchive#6 0x000000000065aeef in gdb_do_one_event () at .../src/gdb/event-loop.c:280 riscvarchive#7 0x0000000000770838 in gdb_readline_wrapper (prompt=0x7fffffffcd40 "---Type <return> to continue, or q <return> to quit---") at .../src/gdb/top.c:873 The need for the QUIT in stdin_event_handler is then exposed by the gdb.base/double-prompt-target-event-error.exp test, which has: # We're now stopped in a pagination query while handling a # target event (printing where the program stopped). Quitting # the pagination should result in only one prompt being # output. send_gdb "\003p 1\n" Without that change we'd get: Continuing. ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination ^CpQuit (gdb) 1 Undefined command: "1". Try "help". (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt ERROR: Undefined command "". UNRESOLVED: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt Vs: Continuing. ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination ^CQuit (gdb) p 1 $1 = 1 (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt gdb/ChangeLog: 2016-04-12 Pedro Alves <palves@redhat.com> * event-top.c (stdin_event_handler): Call QUIT; (prompt_for_continue): Don't run with immediate_quit set.

Simon Marchi tried gdb on OpenBSD, and it immediately segfaults when running a program. Simon tracked down the problem to x86_dr_low.get_status being nullptr at this point: (lldb) print x86_dr_low.get_status (unsigned long (*)()) $0 = 0x0000000000000000 (lldb) bt * thread riscvarchive#1, stop reason = step over * frame #0: 0x0000033b64b764aa gdb`x86_dr_stopped_data_address(state=0x0000033d7162a310, addr_p=0x00007f7ffffc5688) at x86-dregs.c:645:12 frame riscvarchive#1: 0x0000033b64b766de gdb`x86_dr_stopped_by_watchpoint(state=0x0000033d7162a310) at x86-dregs.c:687:10 frame riscvarchive#2: 0x0000033b64ea5f72 gdb`x86_stopped_by_watchpoint() at x86-nat.c:206:10 frame riscvarchive#3: 0x0000033b64637fbb gdb`x86_nat_target<obsd_nat_target>::stopped_by_watchpoint(this=0x0000033b65252820) at x86-nat.h:100:12 frame riscvarchive#4: 0x0000033b64d3ff11 gdb`target_stopped_by_watchpoint() at target.c:468:46 frame riscvarchive#5: 0x0000033b6469b001 gdb`watchpoints_triggered(ws=0x00007f7ffffc61c8) at breakpoint.c:4790:32 frame riscvarchive#6: 0x0000033b64a8bb8b gdb`handle_signal_stop(ecs=0x00007f7ffffc61a0) at infrun.c:6072:29 frame riscvarchive#7: 0x0000033b64a7e3a7 gdb`handle_inferior_event(ecs=0x00007f7ffffc61a0) at infrun.c:5694:7 frame riscvarchive#8: 0x0000033b64a7c1a0 gdb`fetch_inferior_event() at infrun.c:4090:5 frame riscvarchive#9: 0x0000033b64a51921 gdb`inferior_event_handler(event_type=INF_REG_EVENT) at inf-loop.c:41:7 frame riscvarchive#10: 0x0000033b64a827c9 gdb`infrun_async_inferior_event_handler(data=0x0000000000000000) at infrun.c:9384:3 frame riscvarchive#11: 0x0000033b6465bd4f gdb`check_async_event_handlers() at async-event.c:335:4 frame riscvarchive#12: 0x0000033b65070917 gdb`gdb_do_one_event() at event-loop.cc:216:10 frame riscvarchive#13: 0x0000033b64af0db1 gdb`start_event_loop() at main.c:421:13 frame riscvarchive#14: 0x0000033b64aefe9a gdb`captured_command_loop() at main.c:481:3 frame riscvarchive#15: 0x0000033b64aed5c2 gdb`captured_main(data=0x00007f7ffffc6470) at main.c:1353:4 frame riscvarchive#16: 0x0000033b64aed4f2 gdb`gdb_main(args=0x00007f7ffffc6470) at main.c:1368:7 frame riscvarchive#17: 0x0000033b6459d787 gdb`main(argc=5, argv=0x00007f7ffffc6518) at gdb.c:32:10 frame riscvarchive#18: 0x0000033b6459d521 gdb`___start + 321 On BSDs, get_status is set in _initialize_x86_bsd_nat, but only if HAVE_PT_GETDBREGS is defined. PT_GETDBREGS doesn't exist on OpenBSD, so get_status (and the other fields of x86_dr_low) are left as nullptr. OpenBSD doesn't support getting or setting the x86 debug registers, so fix by omitting debug register support entirely on OpenBSD: - Change x86bsd_nat_target to only inherit from x86_nat_target if PT_GETDBREGS is supported. - Don't include x86-nat.o and nat/x86-dregs.o for OpenBSD/amd64. They were already omitted for OpenBSD/i386.

In PR28004 the following warning / Internal error is reported: ... $ gdb -q -batch \ -iex "set sysroot $(pwd -P)/repro" \ ./repro/gdb \ ./repro/core \ -ex bt ... Program terminated with signal SIGABRT, Aborted. #0 0x00007ff8fe8e5d22 in raise () from repro/usr/lib/libc.so.6 [Current thread is 1 (LWP 1762498)] riscvarchive#1 0x00007ff8fe8cf862 in abort () from repro/usr/lib/libc.so.6 warning: (Internal error: pc 0x7ff8feb2c21d in read in psymtab, \ but not in symtab.) warning: (Internal error: pc 0x7ff8feb2c218 in read in psymtab, \ but not in symtab.) ... riscvarchive#2 0x00007ff8feb2c21e in __gnu_debug::_Error_formatter::_M_error() const \ [clone .cold] (warning: (Internal error: pc 0x7ff8feb2c21d in read in \ psymtab, but not in symtab.) ) from repro/usr/lib/libstdc++.so.6 ... The warning is about the following: - in find_pc_sect_compunit_symtab we try to find the address (0x7ff8feb2c218 / 0x7ff8feb2c21d) in the symtabs. - that fails, so we try again in the partial symtabs. - we find a matching partial symtab - however, the partial symtab has a full symtab, so we should have found a matching symtab in the first step. The addresses are: ... (gdb) info sym 0x7ff8feb2c218 __gnu_debug::_Error_formatter::_M_error() const [clone .cold] in \ section .text of repro/usr/lib/libstdc++.so.6 (gdb) info sym 0x7ff8feb2c21d __gnu_debug::_Error_formatter::_M_error() const [clone .cold] + 5 in \ section .text of repro/usr/lib/libstdc++.so.6 ... which correspond to unrelocated addresses 0x9c218 and 0x9c21d: ... $ nm -C repro/usr/lib/libstdc++.so.6.0.29 | grep 000000000009c218 000000000009c218 t __gnu_debug::_Error_formatter::_M_error() const \ [clone .cold] ... which belong to function __gnu_debug::_Error_formatter::_M_error() in /build/gcc/src/gcc/libstdc++-v3/src/c++11/debug.cc. The partial symtab that is found for the addresses is instead the one for /build/gcc/src/gcc/libstdc++-v3/src/c++98/bitmap_allocator.cc, which is incorrect. This happens as follows. The bitmap_allocator.cc CU has DW_AT_ranges at .debug_rnglist offset 0x4b50: ... 00004b50 0000000000000000 0000000000000056 00004b5a 00000000000a4790 00000000000a479c 00004b64 00000000000a47a0 00000000000a47ac ... When reading the first range 0x0..0x56, it doesn't trigger the "start address of zero" complaint here: ... /* A not-uncommon case of bad debug info. Don't pollute the addrmap with bad data. */ if (range_beginning + baseaddr == 0 && !per_objfile->per_bfd->has_section_at_zero) { complaint (_(".debug_rnglists entry has start address of zero" " [in module %s]"), objfile_name (objfile)); continue; } ... because baseaddr != 0, which seems incorrect given that when loading the shared library individually in gdb (and consequently baseaddr == 0), we do see the complaint. Consequently, we run into this case in dwarf2_get_pc_bounds: ... if (low == 0 && !per_objfile->per_bfd->has_section_at_zero) return PC_BOUNDS_INVALID; ... which then results in this code in process_psymtab_comp_unit_reader being called with cu_bounds_kind == PC_BOUNDS_INVALID, which sets the set_addrmap argument to 1: ... scan_partial_symbols (first_die, &lowpc, &highpc, cu_bounds_kind <= PC_BOUNDS_INVALID, cu); ... and consequently, the CU addrmap gets build using address info from the functions. During that process, addrmap_set_empty is called with a range that includes 0x9c218 and 0x9c21d: ... (gdb) p /x start $7 = 0x9989c (gdb) p /x end_inclusive $8 = 0xb200d ... but it's called for a function at DIE 0x54153 with DW_AT_ranges at 0x40ae: ... 000040ae 00000000000b1ee0 00000000000b200e 000040b9 000000000009989c 00000000000998c4 000040c3 <End of list> ... and neither range includes 0x9c218 and 0x9c21d. This is caused by this code in partial_die_info::read: ... if (dwarf2_ranges_read (ranges_offset, &lowpc, &highpc, cu, nullptr, tag)) has_pc_info = 1; ... which pretends that the function is located at addresses 0x9989c..0xb200d, which is indeed not the case. This patch fixes the first problem encountered: fix the "start address of zero" complaint warning by removing the baseaddr part from the condition. Same for dwarf2_ranges_process. The effect is that: - the complaint is triggered, and - the warning / Internal error is no longer triggered. This does not fix the observed problem in partial_die_info::read, which is filed as PR28200. Tested on x86_64-linux. Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca> gdb/ChangeLog: 2021-07-29 Simon Marchi <simon.marchi@polymtl.ca> Tom de Vries <tdevries@suse.de> PR symtab/28004 * gdb/dwarf2/read.c (dwarf2_rnglists_process, dwarf2_ranges_process): Fix zero address complaint. * gdb/testsuite/gdb.dwarf2/dw2-zero-range-shlib.c: New test. * gdb/testsuite/gdb.dwarf2/dw2-zero-range.c: New test. * gdb/testsuite/gdb.dwarf2/dw2-zero-range.exp: New file.

While working on the testsuite, I ended up noticing that GDB fails to produce a full backtrace from a thread waiting in pthread_join. When selecting the waiting thread and using the 'bt' command, the following result can be observed: (gdb) bt #0 0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0 riscvarchive#1 0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0 Backtrace stopped: frame did not save the PC On my platform, I do not have debug symbols for glibc, so I need to rely on prologue analysis in order to unwind stack. Here is what the function prologue looks like: (gdb) disassemble __pthread_clockjoin_ex Dump of assembler code for function __pthread_clockjoin_ex: 0x0000003ff7fc42de <+0>: addi sp,sp,-144 0x0000003ff7fc42e0 <+2>: sd s5,88(sp) 0x0000003ff7fc42e2 <+4>: auipc s5,0xd 0x0000003ff7fc42e6 <+8>: ld s5,-2(s5) # 0x3ff7fd12e0 0x0000003ff7fc42ea <+12>: ld a5,0(s5) 0x0000003ff7fc42ee <+16>: sd ra,136(sp) 0x0000003ff7fc42f0 <+18>: sd s0,128(sp) 0x0000003ff7fc42f2 <+20>: sd s1,120(sp) 0x0000003ff7fc42f4 <+22>: sd s2,112(sp) 0x0000003ff7fc42f6 <+24>: sd s3,104(sp) 0x0000003ff7fc42f8 <+26>: sd s4,96(sp) 0x0000003ff7fc42fa <+28>: sd s6,80(sp) 0x0000003ff7fc42fc <+30>: sd s7,72(sp) 0x0000003ff7fc42fe <+32>: sd s8,64(sp) 0x0000003ff7fc4300 <+34>: sd s9,56(sp) 0x0000003ff7fc4302 <+36>: sd a5,40(sp) As far as prologue analysis is concerned, the most interesting part is done at address 0x0000003ff7fc42ee (<+16>): 'sd ra,136(sp)'. This stores the RA (return address) register on the stack, which is the information we are looking for in order to identify the caller. In the current implementation of the prologue scanner, GDB stops when hitting 0x0000003ff7fc42e6 (<+8>) because it does not know what to do with the 'ld' instruction. GDB thinks it reached the end of the prologue but have not yet reached the important part, which explain GDB's inability to unwind past this point. The section of the prologue starting at <+4> until <+12> is used to load the stack canary[1], which will then be placed on the stack at <+36> at the end of the prologue. In order to have the prologue properly handled, this commit proposes to add support for the ld instruction in the RISC-V prologue scanner. I guess riscv32 would use lw in such situation so this patch also adds support for this instruction. With this patch applied, gdb is now able to unwind past pthread_join: (gdb) bt #0 0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0 riscvarchive#1 0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0 riscvarchive#2 0x0000002aaaaaa88e in bar() () riscvarchive#3 0x0000002aaaaaa8c4 in foo() () riscvarchive#4 0x0000002aaaaaa8da in main () I have had a look to see if I could reproduce this easily, but in my simple testcases using '-fstack-protector-all', the canary is loaded after the RA register is saved. I do not have a reliable way of generating a prologue similar to the problematic one so I forged one instead. The testsuite have been run on riscv64 ubuntu 21.01 with no regression observed. [1] https://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries

The original reproducer for PR28030 required use of a specific compiler version - gcc-c++-11.1.1-3.fc34 is mentioned in the PR, though it seems probable that other gcc versions might also be able to reproduce the bug as well. This commit introduces a test case which, using the DWARF assembler, provides a reproducer which is independent of the compiler version. (Well, it'll work with whatever compilers the DWARF assembler works with.) To the best of my knowledge, it's also the first test case which uses the DWARF assembler to provide debug info for a shared object. That being the case, I provided more than the usual commentary which should allow this case to be used as a template when a combo shared library / DWARF assembler test case is required in the future. I provide some details regarding the bug in a comment near the beginning of locexpr-dml.exp. This problem was difficult to reproduce; I found myself constantly referring to the backtrace while trying to figure out what (else) I might be missing while trying to create a reproducer. Below is a partial backtrace which I include for posterity. #0 internal_error ( file=0xc50110 "/ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c", line=5575, fmt=0xc520c0 "Unexpected type field location kind: %d") at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdbsupport/errors.cc:51 riscvarchive#1 0x00000000006ef0c5 in copy_type_recursive (objfile=0x1635930, type=0x274c260, copied_types=0x30bb290) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5575 riscvarchive#2 0x00000000006ef382 in copy_type_recursive (objfile=0x1635930, type=0x274ca10, copied_types=0x30bb290) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5602 riscvarchive#3 0x0000000000a7409a in preserve_one_value (value=0x24269f0, objfile=0x1635930, copied_types=0x30bb290) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2529 riscvarchive#4 0x000000000072012a in gdbscm_preserve_values ( extlang=0xc55720 <extension_language_guile>, objfile=0x1635930, copied_types=0x30bb290) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/guile/scm-value.c:94 riscvarchive#5 0x00000000006a3f82 in preserve_ext_lang_values (objfile=0x1635930, copied_types=0x30bb290) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/extension.c:568 riscvarchive#6 0x0000000000a7428d in preserve_values (objfile=0x1635930) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2579 riscvarchive#7 0x000000000082d514 in objfile::~objfile (this=0x1635930, __in_chrg=<optimized out>) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:549 riscvarchive#8 0x0000000000831cc8 in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x1654580) at /usr/include/c++/11/bits/shared_ptr_base.h:348 riscvarchive#9 0x00000000004e6617 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1654580) at /usr/include/c++/11/bits/shared_ptr_base.h:168 riscvarchive#10 0x00000000004e1d2f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x190bb88, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705 riscvarchive#11 0x000000000082feee in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x190bb80, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154 riscvarchive#12 0x000000000082ff0a in std::shared_ptr<objfile>::~shared_ptr ( this=0x190bb80, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122 riscvarchive#13 0x000000000085ed7e in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x114bc00, __p=0x190bb80) at /usr/include/c++/11/ext/new_allocator.h:168 riscvarchive#14 0x000000000085e88d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x190bb80) at /usr/include/c++/11/bits/alloc_traits.h:531 riscvarchive#15 0x000000000085e50c in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x114bc00, __position= std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930}) at /usr/include/c++/11/bits/stl_list.h:1925 riscvarchive#16 0x000000000085df0e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x114bc00, __position= std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930}) at /usr/include/c++/11/bits/list.tcc:158 riscvarchive#17 0x000000000085c748 in program_space::remove_objfile (this=0x114bbc0, objfile=0x1635930) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/progspace.c:210 riscvarchive#18 0x000000000082d3ae in objfile::unlink (this=0x1635930) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:487 riscvarchive#19 0x000000000082e68c in objfile_purge_solibs () at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:875 riscvarchive#20 0x000000000092dd37 in no_shared_libraries (ignored=0x0, from_tty=1) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/solib.c:1236 riscvarchive#21 0x00000000009a37fe in target_pre_inferior (from_tty=1) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/target.c:2496 riscvarchive#22 0x00000000007454d6 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/infcmd.c:437 I'll note a few points regarding this backtrace: Frame riscvarchive#1 is where the internal error occurs. It's caused by an unhandled case for FIELD_LOC_KIND_DWARF_BLOCK. The fix for this bug adds support for this case. Frame riscvarchive#22 - it's a partial backtrace - shows that GDB is attempting to (re)run the program. You can see the exact command sequence that was used for reproducing this problem in the PR (at https://sourceware.org/bugzilla/show_bug.cgi?id=28030), but in a nutshell, after starting the program and advancing to the appropriate source line, GDB was asked to step into libstdc++; a "finish" command was issued, returning a value. The fact that a value was returned is very important. GDB was then used to step back into libstdc++. A breakpoint was set on a source line in the library after which a "run" command was issued. Frame riscvarchive#19 shows a call to objfile_purge_solibs. It's aptly named. Frame riscvarchive#7 is a call to the destructor for one of the objfile solibs; it turned out to be the one for libstdc++. Frames riscvarchive#6 thru riscvarchive#3 show various value preservation frames. If you look at preserve_values() in gdb/value.c, the value history is preserved first, followed by internal variables, followed by values for the extension languages (python and guile).

…es.exp When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have: ... (gdb) bt^M #0 0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#1 0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#2 0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#3 0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#4 0x0000000000000001 in ?? ()^M riscvarchive#5 0x00007fffffffdaac in ?? ()^M riscvarchive#6 0x0000000000000000 in ?? ()^M (gdb) FAIL: gdb.base/break-probes.exp: ensure using probes ... The test-case intends to emit an UNTESTED in this case, but fails to do so because it tries to do it in a regexp clause in a gdb_test_multiple, which doesn't trigger. Instead, a default clause triggers which produces the FAIL. Also the use of UNTESTED is not appropriate, and we should use UNSUPPORTED instead. Fix this by silencing the FAIL, and emitting an UNSUPPORTED after the gdb_test_multiple: ... if { ! $using_probes } { + unsupported "probes not present on this system" return -1 } ... Tested on x86_64-linux.

When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have: ... (gdb) run^M Starting program: break-probes^M Stopped due to shared library event (no libraries added or removed)^M (gdb) bt^M #0 0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#1 0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#2 0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#3 0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M riscvarchive#4 0x0000000000000001 in ?? ()^M riscvarchive#5 0x00007fffffffdaac in ?? ()^M riscvarchive#6 0x0000000000000000 in ?? ()^M (gdb) UNSUPPORTED: gdb.base/break-probes.exp: probes not present on this system ... Using the backtrace, the test-case tries to establish that we're stopped in dl_main, which is used as proof that we're using probes. However, the backtrace only shows an address, because: - the dynamic linker contains no minimal symbols and no debug info, and - gdb is build without --with-separate-debug-dir so it can't find the corresponding .debug file, which does contain the mimimal symbols and debug info. Fix this by instead printing the pc and grepping for the value in the info probes output: ... (gdb) p /x $pc^M $1 = 0x7ffff7dd6e12^M (gdb) info probes^M Type Provider Name Where Object ^M ... stap rtld init_start 0x00007ffff7dd6e12 /lib64/ld-linux-x86-64.so.2 ^M ... (gdb) ... Tested on x86_64-linux.

When running test-case gdb.base/break-interp.exp on ubuntu 18.04.5, we have: ... (gdb) bt^M #0 0x00007eff7ad5ae12 in ?? () from break-interp-LDprelinkNOdebugNO^M riscvarchive#1 0x00007eff7ad71f50 in ?? () from break-interp-LDprelinkNOdebugNO^M riscvarchive#2 0x00007eff7ad59128 in ?? () from break-interp-LDprelinkNOdebugNO^M riscvarchive#3 0x00007eff7ad58098 in ?? () from break-interp-LDprelinkNOdebugNO^M riscvarchive#4 0x0000000000000002 in ?? ()^M riscvarchive#5 0x00007fff505d7a32 in ?? ()^M riscvarchive#6 0x00007fff505d7a94 in ?? ()^M riscvarchive#7 0x0000000000000000 in ?? ()^M (gdb) FAIL: gdb.base/break-interp.exp: ldprelink=NO: ldsepdebug=NO: \ first backtrace: dl bt ... Using the backtrace, the test-case tries to establish that we're stopped in dl_main. However, the backtrace only shows an address, because: - the dynamic linker contains no minimal symbols and no debug info, and - gdb is build without --with-separate-debug-dir so it can't find the corresponding .debug file, which does contain the mimimal symbols and debug info. As in "[gdb/testsuite] Improve probe detection in gdb.base/break-probes.exp", fix this by doing info probes and grepping for the address. Tested on x86_64-linux.

I build gdb without xml support using --without-expat, and ran into: ... (gdb) target remote | vgdb --wait=2 --max-invoke-ms=2500 --pid=22032^M Remote debugging using | vgdb --wait=2 --max-invoke-ms=2500 --pid=22032^M relaying data between gdb and process 22032^M warning: Can not parse XML target description; XML support was disabled at \ compile time^M ... (gdb) PASS: gdb.base/valgrind-infcall.exp: continue riscvarchive#1 p gdb_test_infcall ()^M Remote 'g' packet reply is too long (expected 560 bytes, got 800 bytes): ...^M (gdb) FAIL: gdb.base/valgrind-infcall.exp: p gdb_test_infcall () ... After googling the error message with context valgrind gdbserver, I found indications that the Remote 'g' packet reply error is due to missing xml support. And here ( https://www.valgrind.org/docs/manual/manual-core-adv.html ) I found: ... GDB version needed for ARM and PPC32/64. You must use a GDB version which is able to read XML target description sent by a gdbserver. This is the standard setup if GDB was configured and built with the "expat" library. If your GDB was not configured with XML support, it will report an error message when using the "target" command. Debugging will not work because GDB will then not be able to fetch the registers from the Valgrind gdbserver. ... So I guess I'm running into the same problem for x86_64. Fix this by skipping all gdb.base/valgrind-*.exp tests if xml support is not available. Although only the gdb.base/valgrind-infcall*.exp produce fails, the Remote 'g' packet reply error occurs in all tests, so it seems prudent to disable them all. Tested on x86_64-linux.

The gdb.multi/multi-term-settings.exp testcase sometimes fails like so: Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.multi/multi-term-settings.exp ... FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT) It's easier to reproduce if you stress the machine at the same time, like e.g.: $ stress -c 24 Looking at gdb.log, we see: (gdb) attach 60422 Attaching to program: build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings, process 60422 [New Thread 60422.60422] Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007f2fc2485334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id@entry <mailto:clock_id@entry>=0, flags=flags@entry <mailto:flags@entry>=0, req=req@entry <mailto:req@entry>=0x7ffe23126940, rem=rem@entry <mailto:rem@entry>=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: inf2: attach set schedule-multiple on (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: set schedule-multiple on info inferiors Num Description Connection Executable 1 process 60404 1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings * 2 process 60422 1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: info inferiors pid=60422, count=46 pid=60422, count=47 pid=60422, count=48 pid=60422, count=49 pid=60422, count=50 pid=60422, count=51 pid=60422, count=52 pid=60422, count=53 pid=60422, count=54 pid=60422, count=55 pid=60422, count=56 pid=60422, count=57 pid=60422, count=58 pid=60422, count=59 pid=60422, count=60 pid=60422, count=61 pid=60422, count=62 pid=60422, count=63 pid=60422, count=64 pid=60422, count=65 pid=60422, count=66 pid=60422, count=67 pid=60422, count=68 pid=60422, count=69 pid=60404, count=54 pid=60404, count=55 pid=60404, count=56 pid=60404, count=57 pid=60404, count=58 PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: continue Quit (gdb) FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT) If you look at the testcase's sources, you'll see that the intention is to resumes the program with "continue", wait to see a few of those "pid=..., count=..." lines, and then interrupt the program with Ctrl-C. But somehow, that resulted in GDB printing "Quit", instead of the Ctrl-C stopping the program with SIGINT. Here's what is happening: riscvarchive#1 - those "pid=..., count=..." lines we see above weren't actually output by the inferior after it has been continued (see riscvarchive#1). Note that "inf1_how" and "inf2_how" are "attach". What happened is that those "pid=..., count=..." lines were output by the inferiors _before_ they were attached to. We see them at that point instead of earlier, because that's where the testcase reads from the inferiors' spawn_ids. riscvarchive#2 - The testcase mistakenly thinks those "pid=..., count=..." lines happened after the continue was processed by GDB, meaning it has waited enough, and so sends the Ctrl-C. GDB hasn't yet passed the terminal to the inferior, so the Ctrl-C results in that Quit. The fix here is twofold: riscvarchive#1 - flush inferior output right after attaching riscvarchive#2 - consume the "Continuing" printed by "continue", indicating the inferior has the terminal. This is the same as done throughout the testsuite to handle this exact problem of sending Ctrl-C too soon. gdb/testsuite/ChangeLog: yyyy-mm-dd Pedro Alves <pedro@palves.net <mailto:pedro@palves.net>> * gdb.multi/multi-term-settings.exp (create_inferior): Flush inferior output. (coretest): Use $gdb_test_name. After issuing "continue", wait for "Continuing". Change-Id: Iba7671dfe1eee6b98d29cfdb05a1b9aa2f9defb9

On openSUSE Tumbleweed with glibc-debuginfo installed I get: ... (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print where^M #0 print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M #1 0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M #2 0x00007ffff7d56b37 in start_thread (arg=<optimized out>) \ at pthread_create.c:435^M #3 0x00007ffff7ddb640 in clone3 () \ at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81^M (gdb) PASS: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit ... while without debuginfo installed I get instead: ... (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print where^M #0 print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M #1 0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M #2 0x00007ffff7d56b37 in start_thread () from /lib64/libc.so.6^M #3 0x00007ffff7ddb640 in clone3 () from /lib64/libc.so.6^M (gdb) FAIL: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit ... The problem is that the regexp used: ... "$from .*libpthread\|at pthread_create\|in pthread_create$" ... expects the 'from' part to match libpthread, but in glibc 2.34 libpthread has been merged into libc. Fix this by updating the regexp. Tested on x86_64-linux.

Currently for a binary compiled normally (without -fsanitize=address) but with LD_PRELOAD of ASAN one gets: $ ASAN_OPTIONS=detect_leaks=0:alloc_dealloc_mismatch=1:abort_on_error=1:fast_unwind_on_malloc=0 LD_PRELOAD=/usr/lib64/libasan.so.6 gdb ================================================================= ==1909567==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete []) on 0x602000001570 #0 0x7f1c98e5efa7 in operator delete[](void*) (/usr/lib64/libasan.so.6+0xb0fa7) ... 0x602000001570 is located 0 bytes inside of 2-byte region [0x602000001570,0x602000001572) allocated by thread T0 here: #0 0x7f1c98e5cd1f in __interceptor_malloc (/usr/lib64/libasan.so.6+0xaed1f) #1 0x557ee4a42e81 in operator new(unsigned long) (/usr/libexec/gdb+0x74ce81) SUMMARY: AddressSanitizer: alloc-dealloc-mismatch (/usr/lib64/libasan.so.6+0xb0fa7) in operator delete[](void*) ==1909567==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0 ==1909567==ABORTING Despite the code called properly operator new[] and operator delete[]. But GDB's new-op.cc provides its own operator new[] which gets translated into malloc() (which gets recogized as operatore new(size_t)) but as it does not translate also operators delete[] Address Sanitizer gets confused. The question is how many variants of the delete operator need to be provided. There could be 14 operators new but there are only 4, GDB uses 3 of them. There could be 16 operators delete but there are only 6, GDB uses 2 of them. It depends on libraries and compiler which of the operators will get used. Currently being used: U operator new[](unsigned long) U operator new(unsigned long) U operator new(unsigned long, std::nothrow_t const&) U operator delete[](void*) U operator delete(void*, unsigned long) Tested on x86_64-linux.

This commit fixes Bug 28308, titled "Strange interactions with dprintf and break/commands": Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28308 Since creating that bug report, I've found a somewhat simpler way of reproducing the problem. I've encapsulated it into the GDB test case which I've created along with this bug fix. The name of the new test is gdb.base/dprintf-execution-x-script.exp, I'll demonstrate the problem using this test case, though for brevity, I've placed all relevant files in the same directory and have renamed the files to all start with 'dp-bug' instead of 'dprintf-execution-x-script'. The script file, named dp-bug.gdb, consists of the following commands: dprintf increment, "dprintf in increment(), vi=%d\n", vi break inc_vi commands continue end run Note that the final command in this script is 'run'. When 'run' is instead issued interactively, the bug does not occur. So, let's look at the interactive case first in order to see the correct/expected output: $ gdb -q -x dp-bug.gdb dp-bug ... eliding buggy output which I'll discuss later ... (gdb) run Starting program: /mesquite2/sourceware-git/f34-master/bld/gdb/tmp/dp-bug vi=0 dprintf in increment(), vi=0 Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26 26 in dprintf-execution-x-script.c vi=1 dprintf in increment(), vi=1 Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26 26 in dprintf-execution-x-script.c vi=2 dprintf in increment(), vi=2 Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26 26 in dprintf-execution-x-script.c vi=3 [Inferior 1 (process 1539210) exited normally] In this run, in which 'run' was issued from the gdb prompt (instead of at the end of the script), there are three dprintf messages along with three 'Breakpoint 2' messages. This is the correct output. Now let's look at the output that I snipped above; this is the output when 'run' is issued from the script loaded via GDB's -x switch: $ gdb -q -x dp-bug.gdb dp-bug Reading symbols from dp-bug... Dprintf 1 at 0x40116e: file dprintf-execution-x-script.c, line 38. Breakpoint 2 at 0x40113a: file dprintf-execution-x-script.c, line 26. vi=0 dprintf in increment(), vi=0 Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26 26 dprintf-execution-x-script.c: No such file or directory. vi=1 Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26 26 in dprintf-execution-x-script.c vi=2 Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26 26 in dprintf-execution-x-script.c vi=3 [Inferior 1 (process 1539175) exited normally] In the output shown above, only the first dprintf message is printed. The 2nd and 3rd dprintf messages are missing! However, all three 'Breakpoint 2...' messages are still printed. Why does this happen? bpstat_do_actions_1() in gdb/breakpoint.c contains the following comment and code near the start of the function: /* Avoid endless recursion if a `source' command is contained in bs->commands. */ if (executing_breakpoint_commands) return 0; scoped_restore save_executing = make_scoped_restore (&executing_breakpoint_commands, 1); Also, as described by this comment prior to the 'async' field in 'struct ui' in top.h, the main UI starts off in sync mode when processing command line arguments: /* True if the UI is in async mode, false if in sync mode. If in sync mode, a synchronous execution command (e.g, "next") does not return until the command is finished. If in async mode, then running a synchronous command returns right after resuming the target. Waiting for the command's completion is later done on the top event loop. For the main UI, this starts out disabled, until all the explicit command line arguments (e.g., `gdb -ex "start" -ex "next"') are processed. */ This combination of things, the state of the static global 'executing_breakpoint_commands' plus the state of the async field in the main UI causes this behavior. This is a backtrace after hitting the dprintf breakpoint for the second time when doing 'run' from the script file, i.e. non-interactively: Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffc2b8) at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431 4431 if (executing_breakpoint_commands) #0 bpstat_do_actions_1 (bsp=0x7fffffffc2b8) at gdb/breakpoint.c:4431 #1 0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x1538090) at gdb/breakpoint.c:13048 #2 0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x137f450, ws=0x7fffffffc718, stop_chain=0x1538090) at gdb/breakpoint.c:5498 #3 0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffc6f0) at gdb/infrun.c:6172 #4 0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffc6f0) at gdb/infrun.c:5662 #5 0x0000000000763cd5 in fetch_inferior_event () at gdb/infrun.c:4060 #6 0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT) at gdb/inf-loop.c:41 #7 0x00000000007a702f in handle_target_event (error=0, client_data=0x0) at gdb/linux-nat.c:4207 #8 0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0) at gdbsupport/event-loop.cc:701 #9 0x0000000000b8d032 in gdb_wait_for_event (block=0) at gdbsupport/event-loop.cc:597 #10 gdb_do_one_event () at gdbsupport/event-loop.cc:212 #11 0x00000000009d19b6 in wait_sync_command_done () at gdb/top.c:528 #12 0x00000000009d1a3f in maybe_wait_sync_command_done (was_sync=0) at gdb/top.c:545 #13 0x00000000009d2033 in execute_command (p=0x7fffffffcb18 "", from_tty=0) at gdb/top.c:676 #14 0x0000000000560d5b in execute_control_command_1 (cmd=0x13b9bb0, from_tty=0) at gdb/cli/cli-script.c:547 #15 0x000000000056134a in execute_control_command (cmd=0x13b9bb0, from_tty=0) at gdb/cli/cli-script.c:717 #16 0x00000000004c3bbe in bpstat_do_actions_1 (bsp=0x137f530) at gdb/breakpoint.c:4469 #17 0x00000000004c3d40 in bpstat_do_actions () at gdb/breakpoint.c:4533 #18 0x00000000006a473a in command_handler (command=0x1399ad0 "run") at gdb/event-top.c:624 #19 0x00000000009d182e in read_command_file (stream=0x113e540) at gdb/top.c:443 #20 0x0000000000563697 in script_from_file (stream=0x113e540, file=0x13bb0b0 "dp-bug.gdb") at gdb/cli/cli-script.c:1642 #21 0x00000000006abd63 in source_gdb_script (extlang=0xc44e80 <extension_language_gdb>, stream=0x113e540, file=0x13bb0b0 "dp-bug.gdb") at gdb/extension.c:188 #22 0x0000000000544400 in source_script_from_stream (stream=0x113e540, file=0x7fffffffd91a "dp-bug.gdb", file_to_open=0x13bb0b0 "dp-bug.gdb") at gdb/cli/cli-cmds.c:692 #23 0x0000000000544557 in source_script_with_search (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1, search_path=0) at gdb/cli/cli-cmds.c:750 #24 0x00000000005445cf in source_script (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1) at gdb/cli/cli-cmds.c:759 #25 0x00000000007cf6d9 in catch_command_errors (command=0x5445aa <source_script(char const*, int)>, arg=0x7fffffffd91a "dp-bug.gdb", from_tty=1, do_bp_actions=false) at gdb/main.c:523 #26 0x00000000007cf85d in execute_cmdargs (cmdarg_vec=0x7fffffffd1b0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffd18c) at gdb/main.c:615 #27 0x00000000007d0c8e in captured_main_1 (context=0x7fffffffd3f0) at gdb/main.c:1322 #28 0x00000000007d0eba in captured_main (data=0x7fffffffd3f0) at gdb/main.c:1343 #29 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0) at gdb/main.c:1368 #30 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508) at gdb/gdb.c:32 There are two frames for bpstat_do_actions_1(), one at frame #16 and the other at frame #0. The one at frame #16 is processing the actions for Breakpoint 2, which is a 'continue'. The one at frame #0 is attempting to process the dprintf breakpoint action. However, at this point, the value of 'executing_breakpoint_commands' is 1, forcing an early return, i.e. prior to executing the command(s) associated with the dprintf breakpoint. For the sake of comparison, this is what the stack looks like when hitting the dprintf breakpoint for the second time when issuing the 'run' command from the GDB prompt. Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffccd8) at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431 4431 if (executing_breakpoint_commands) #0 bpstat_do_actions_1 (bsp=0x7fffffffccd8) at gdb/breakpoint.c:4431 #1 0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x16b0290) at gdb/breakpoint.c:13048 #2 0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x13f0e60, ws=0x7fffffffd138, stop_chain=0x16b0290) at gdb/breakpoint.c:5498 #3 0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffd110) at gdb/infrun.c:6172 #4 0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffd110) at gdb/infrun.c:5662 #5 0x0000000000763cd5 in fetch_inferior_event () at gdb/infrun.c:4060 #6 0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT) at gdb/inf-loop.c:41 #7 0x00000000007a702f in handle_target_event (error=0, client_data=0x0) at gdb/linux-nat.c:4207 #8 0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0) at gdbsupport/event-loop.cc:701 #9 0x0000000000b8d032 in gdb_wait_for_event (block=0) at gdbsupport/event-loop.cc:597 #10 gdb_do_one_event () at gdbsupport/event-loop.cc:212 #11 0x00000000007cf512 in start_event_loop () at gdb/main.c:421 #12 0x00000000007cf631 in captured_command_loop () at gdb/main.c:481 #13 0x00000000007d0ebf in captured_main (data=0x7fffffffd3f0) at gdb/main.c:1353 #14 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0) at gdb/main.c:1368 #15 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508) at gdb/gdb.c:32 This relatively short backtrace is due to the current UI's async field being set to 1. Yet another thing to be aware of regarding this problem is the difference in the way that commands associated to dprintf breakpoints versus regular breakpoints are handled. While they both use a command list associated with the breakpoint, regular breakpoints will place the commands to be run on the bpstat chain constructed in bp_stop_status(). These commands are run later on. For dprintf breakpoints, commands are run via the 'after_condition_true' function pointer directly from bpstat_stop_status(). (The 'commands' field in the bpstat is cleared in dprintf_after_condition_true(). This prevents the dprintf commands from being run again later on when other commands on the bpstat chain are processed.) Another thing that I noticed is that dprintf breakpoints are the only type of breakpoint which use 'after_condition_true'. This suggests that one possible way of fixing this problem, that of making dprintf breakpoints work more like regular breakpoints, probably won't work. (I must admit, however, that my understanding of this code isn't complete enough to say why. I'll trust that whoever implemented it had a good reason for doing it this way.) The comment referenced earlier regarding 'executing_breakpoint_commands' states that the reason for checking this variable is to avoid potential endless recursion when a 'source' command appears in bs->commands. We know that a dprintf command is constrained to either 1) execution of a GDB printf command, 2) an inferior function call of a printf-like function, or 3) execution of an agent-printf command. Therefore, infinite recursion due to a 'source' command cannot happen when executing commands upon hitting a dprintf breakpoint. I chose to fix this problem by having dprintf_after_condition_true() directly call execute_control_commands(). This means that it no longer attempts to go through bpstat_do_actions_1() avoiding the infinite recursion check for potential 'source' commands on the command chain. I think it simplifies this code a little bit too, a definite bonus. Summary: * breakpoint.c (dprintf_after_condition_true): Don't call bpstat_do_actions_1(). Call execute_control_commands() instead.

On Fedora 35, $ readelf -d /usr/bin/npc caused readelf to run out of stack since load_separate_debug_info returned the input main file as the separate debug info: (gdb) bt #0 load_separate_debug_info ( main_filename=main_filename@entry=0x510f50 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo", xlink=xlink@entry=0x4e5180 <debug_displays+4480>, parse_func=parse_func@entry=0x431550 <parse_gnu_debuglink>, check_func=check_func@entry=0x432ae0 <check_gnu_debuglink>, func_data=func_data@entry=0x7fffffffdb60, file=file@entry=0x51d430) at /export/gnu/import/git/sources/binutils-gdb/binutils/dwarf.c:11057 #1 0x000000000043328d in check_for_and_load_links (file=0x51d430, filename=0x510f50 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo") at /export/gnu/import/git/sources/binutils-gdb/binutils/dwarf.c:11381 #2 0x00000000004332ae in check_for_and_load_links (file=0x51b070, filename=0x518dd0 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo") Return NULL if the separate debug info is the same as the input main file to avoid infinite recursion. PR binutils/28679 * dwarf.c (load_separate_debug_info): Don't return the input main file.

Fedora Rawhide is now using gcc-12.0. As part of updating to the gcc-12.0 package set, Rawhide is also now using a version of libgcc_s which lacks a .data section. This causes gdb to fail in the following fashion while debugging a program (such as gdb) which uses libgcc_s: (top-gdb) run Starting program: rawhide-master/bld/gdb/gdb ... objfiles.h:467: internal-error: sect_index_data not initialized A problem internal to GDB has been detected, further debugging may prove unreliable. ... I snipped the backtrace from the above output. Instead, here's a portion of a backtrace obtained using GDB's backtrace command. (Obviously, in order to obtain it, I used a GDB which has been patched with this commit.) #0 internal_error ( file=0xc6a508 "gdb/objfiles.h", line=467, fmt=0xc6a4e8 "sect_index_data not initialized") at gdbsupport/errors.cc:51 #1 0x00000000005f9651 in objfile::data_section_offset (this=0x4fa48f0) at gdb/objfiles.h:467 #2 0x000000000097c5f8 in relocate_address (address=0x17244, objfile=0x4fa48f0) at gdb/stap-probe.c:1333 #3 0x000000000097c630 in stap_probe::get_relocated_address (this=0xa1a17a0, objfile=0x4fa48f0) at gdb/stap-probe.c:1341 #4 0x00000000004d7025 in create_exception_master_breakpoint_probe ( objfile=0x4fa48f0) at gdb/breakpoint.c:3505 #5 0x00000000004d7426 in create_exception_master_breakpoint () at gdb/breakpoint.c:3575 #6 0x00000000004efcc1 in breakpoint_re_set () at gdb/breakpoint.c:13407 #7 0x0000000000956998 in solib_add (pattern=0x0, from_tty=0, readsyms=1) at gdb/solib.c:1001 #8 0x00000000009576a8 in handle_solib_event () at gdb/solib.c:1269 ... The function 'relocate_address' in gdb/stap-probe.c attempts to do its "relocation" by using objfile->data_section_offset(). That method, data_section_offset() is defined as follows in objfiles.h: CORE_ADDR data_section_offset () const { return section_offsets[SECT_OFF_DATA (this)]; } The internal error occurs when the SECT_OFF_DATA macro finds that the 'sect_index_data' field is -1: #define SECT_OFF_DATA(objfile) \ ((objfile->sect_index_data == -1) \ ? (internal_error (__FILE__, __LINE__, \ _("sect_index_data not initialized")), -1) \ : objfile->sect_index_data) relocate_address() is obtaining the section offset in order to compute a relocated address. For some ABIs, such as the System V ABI, the section offsets will all be the same. So for those ABIs, it doesn't matter which offset is used. However, other ABIs, such as the FDPIC ABI, will have different offsets for the various sections. Thus, for those ABIs, it is vital that this and other relocation code use the correct offset. In stap_probe::get_relocated_address, the address to which to add the offset (thus forming the relocated address) is obtained via this->get_address (); get_address is a getter for m_address in probe.h. It's documented/defined as follows (also in probe.h): /* The address where the probe is inserted, relative to SECT_OFF_TEXT. */ CORE_ADDR m_address; (Thanks to Tom Tromey for this observation.) So, based on this, the current use of data_section_offset / SECT_OFF_DATA is wrong. This relocation code should have been using text_section_offset / SECT_OFF_TEXT all along. That being the case, I've adjusted the stap-probe.c relocation code accordingly. Searching the sources turned up one other use of data_section_offset, in gdb/dtrace-probe.c, so I've updated that code as well. The same reasoning presented above applies to this case too. Summary: * gdb/dtrace-probe.c (dtrace_probe::get_relocated_address): Use method text_section_offset instead of data_section_offset. * gdb/stap-probe.c (relocate_address): Likewise.

g++ 11.1.0 has a bug where it will emit a negative DW_AT_data_member_location in some cases: $ cat test.cpp #include <memory> int main() { std::unique_ptr<int> ptr; } $ g++ -g test.cpp $ llvm-dwarfdump -F a.out ... 0x00000964: DW_TAG_member DW_AT_name [DW_FORM_strp] ("_M_head_impl") DW_AT_decl_file [DW_FORM_data1] ("/usr/include/c++/11.1.0/tuple") DW_AT_decl_line [DW_FORM_data1] (125) DW_AT_decl_column [DW_FORM_data1] (0x27) DW_AT_type [DW_FORM_ref4] (0x0000067a "default_delete<int>") DW_AT_data_member_location [DW_FORM_sdata] (-1) ... This leads to a GDB crash (when built with ASan, otherwise probably garbage results), since it tries to read just before (to the left, in ASan speak) of the value's buffer: ==888645==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000c52af at pc 0x7f711b239f4b bp 0x7fff356bd470 sp 0x7fff356bcc18 READ of size 1 at 0x6020000c52af thread T0 #0 0x7f711b239f4a in __interceptor_memcpy /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 #1 0x555c4977efa1 in value_contents_copy_raw /home/simark/src/binutils-gdb/gdb/value.c:1347 #2 0x555c497909cd in value_primitive_field(value*, long, int, type*) /home/simark/src/binutils-gdb/gdb/value.c:3126 #3 0x555c478f2eaa in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:333 #4 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #5 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #6 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #7 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #8 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #9 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #10 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 #11 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 #12 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 #13 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 #14 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 #15 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 #16 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #17 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #18 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 #19 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 #20 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 #21 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 #22 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 #23 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 #24 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 #25 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 #26 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 #27 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 #28 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 #29 0x555c4760f04c in c_value_print(value*, ui_file*, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:587 #30 0x555c483ff954 in language_defn::value_print(value*, ui_file*, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:614 #31 0x555c49759f61 in value_print(value*, ui_file*, value_print_options const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1189 #32 0x555c48950f70 in print_formatted /home/simark/src/binutils-gdb/gdb/printcmd.c:337 #33 0x555c48958eda in print_value(value*, value_print_options const&) /home/simark/src/binutils-gdb/gdb/printcmd.c:1258 #34 0x555c48959891 in print_command_1 /home/simark/src/binutils-gdb/gdb/printcmd.c:1367 #35 0x555c4895a3df in print_command /home/simark/src/binutils-gdb/gdb/printcmd.c:1458 #36 0x555c4767f974 in do_simple_func /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:97 #37 0x555c47692e25 in cmd_func(cmd_list_element*, char const*, int) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:2475 #38 0x555c4936107e in execute_command(char const*, int) /home/simark/src/binutils-gdb/gdb/top.c:670 #39 0x555c485f1bff in catch_command_errors /home/simark/src/binutils-gdb/gdb/main.c:523 #40 0x555c485f249c in execute_cmdargs /home/simark/src/binutils-gdb/gdb/main.c:618 #41 0x555c485f6677 in captured_main_1 /home/simark/src/binutils-gdb/gdb/main.c:1317 #42 0x555c485f6c83 in captured_main /home/simark/src/binutils-gdb/gdb/main.c:1338 #43 0x555c485f6d65 in gdb_main(captured_main_args*) /home/simark/src/binutils-gdb/gdb/main.c:1363 #44 0x555c46e41ba8 in main /home/simark/src/binutils-gdb/gdb/gdb.c:32 #45 0x7f71198bcb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24) #46 0x555c46e4197d in _start (/home/simark/build/binutils-gdb-one-target/gdb/gdb+0x77f197d) 0x6020000c52af is located 1 bytes to the left of 8-byte region [0x6020000c52b0,0x6020000c52b8) allocated by thread T0 here: #0 0x7f711b2b7459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x555c470acdc9 in xcalloc /home/simark/src/binutils-gdb/gdb/alloc.c:100 #2 0x555c49b775cd in xzalloc(unsigned long) /home/simark/src/binutils-gdb/gdbsupport/common-utils.cc:29 #3 0x555c4977bdeb in allocate_value_contents /home/simark/src/binutils-gdb/gdb/value.c:1029 #4 0x555c4977be25 in allocate_value(type*) /home/simark/src/binutils-gdb/gdb/value.c:1040 #5 0x555c4979030d in value_primitive_field(value*, long, int, type*) /home/simark/src/binutils-gdb/gdb/value.c:3092 #6 0x555c478f6280 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:501 #7 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #8 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #9 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #10 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #11 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #12 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 #13 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 #14 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 #15 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 #16 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 #17 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 #18 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 #19 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 #20 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 #21 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 #22 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 #23 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 #24 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 #25 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 #26 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 #27 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 #28 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 #29 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 Since there are some binaries with this in the wild, I think it would be useful for GDB to work around this. I did the obvious simple thing, if the DW_AT_data_member_location's value is -1, replace it with 0. I added a producer check to only apply this fixup for GCC 11. The idea is that if some other compiler ever uses a DW_AT_data_member_location value of -1 by mistake, we don't know (before analyzing the bug at least) if they did mean 0 or some other value. So I wouldn't want to apply the fixup in that case. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28063 Change-Id: Ieef3459b0b9bbce8bdad838ba83b4b64e7269d42

Starting with commit commit 1da5d0e Date: Tue Jan 4 08:02:24 2022 -0700 Change how Python architecture and language are handled we see a failure in gdb.threads/killed-outside.exp: ... Executing on target: kill -9 16622 (timeout = 300) builtin_spawn -ignore SIGHUP kill -9 16622 continue Continuing. Couldn't get registers: No such process. (gdb) [Thread 0x7ffff77c2700 (LWP 16626) exited] Program terminated with signal SIGKILL, Killed. The program no longer exists. FAIL: gdb.threads/killed-outside.exp: prompt after first continue (timeout) This is not a regression but a failure due to a change in GDB's output. Prior to the aforementioned commit, GDB has been printing the "Couldn't get registers: No such process." message twice. The second one came from (top-gdb) bt #0 amd64_linux_nat_target::fetch_registers (this=0x555557f31440 <the_amd64_linux_nat_target>, regcache=0x555558805ce0, regnum=16) at /gdb-up/gdb/amd64-linux-nat.c:225 #1 0x000055555640ac5f in target_ops::fetch_registers (this=0x555557d636d0 <the_thread_db_target>, arg0=0x555558805ce0, arg1=16) at /gdb-up/gdb/target-delegates.c:502 #2 0x000055555641a647 in target_fetch_registers (regcache=0x555558805ce0, regno=16) at /gdb-up/gdb/target.c:3945 #3 0x0000555556278e68 in regcache::raw_update (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:587 #4 0x0000555556278f14 in readable_regcache::raw_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:601 #5 0x00005555562792aa in readable_regcache::cooked_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:690 #6 0x000055555627965e in readable_regcache::cooked_read_value (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:748 #7 0x0000555556352a37 in sentinel_frame_prev_register (this_frame=0x555558181090, this_prologue_cache=0x5555581810a8, regnum=16) at /gdb-up/gdb/sentinel-frame.c:53 #8 0x0000555555fa4773 in frame_unwind_register_value (next_frame=0x555558181090, regnum=16) at /gdb-up/gdb/frame.c:1235 #9 0x0000555555fa420d in frame_register_unwind (next_frame=0x555558181090, regnum=16, optimizedp=0x7fffffffd570, unavailablep=0x7fffffffd574, lvalp=0x7fffffffd57c, addrp=0x7fffffffd580, realnump=0x7fffffffd578, bufferp=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1143 #10 0x0000555555fa455f in frame_unwind_register (next_frame=0x555558181090, regnum=16, buf=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1199 #11 0x00005555560178e2 in i386_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/i386-tdep.c:1972 #12 0x0000555555cd2b9d in gdbarch_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/gdbarch.c:3007 #13 0x0000555555fa3a5b in frame_unwind_pc (this_frame=0x555558181090) at /gdb-up/gdb/frame.c:948 #14 0x0000555555fa7621 in get_frame_pc (frame=0x555558181160) at /gdb-up/gdb/frame.c:2572 #15 0x0000555555fa7706 in get_frame_address_in_block (this_frame=0x555558181160) at /gdb-up/gdb/frame.c:2602 #16 0x0000555555fa77d0 in get_frame_address_in_block_if_available (this_frame=0x555558181160, pc=0x7fffffffd708) at /gdb-up/gdb/frame.c:2665 #17 0x0000555555fa5f8d in select_frame (fi=0x555558181160) at /gdb-up/gdb/frame.c:1890 #18 0x0000555555fa5bab in lookup_selected_frame (a_frame_id=..., frame_level=-1) at /gdb-up/gdb/frame.c:1720 #19 0x0000555555fa5e47 in get_selected_frame (message=0x0) at /gdb-up/gdb/frame.c:1810 #20 0x0000555555cc9c6e in get_current_arch () at /gdb-up/gdb/arch-utils.c:848 #21 0x000055555625b239 in gdbpy_before_prompt_hook (extlang=0x555557451f20 <extension_language_python>, current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/python/python.c:1063 #22 0x0000555555f7cfbb in ext_lang_before_prompt (current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/extension.c:922 #23 0x0000555555f7d442 in std::_Function_handler<void (char const*), void (*)(char const*)>::_M_invoke(std::_Any_data const&, char const*&&) (__functor=..., __args#0=@0x7fffffffd900: 0x555557f4d890 <top_prompt+16> "(gdb) ") at /usr/include/c++/7/bits/std_function.h:316 #24 0x0000555555f752dd in std::function<void (char const*)>::operator()(char const*) const (this=0x55555817d838, __args#0=0x555557f4d890 <top_prompt+16> "(gdb) ") at /usr/include/c++/7/bits/std_function.h:706 #25 0x0000555555f75100 in gdb::observers::observable<char const*>::notify (this=0x555557f49060 <gdb::observers::before_prompt>, args#0=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/../gdbsupport/observable.h:150 #26 0x0000555555f736dc in top_level_prompt () at /gdb-up/gdb/event-top.c:444 #27 0x0000555555f735ba in display_gdb_prompt (new_prompt=0x0) at /gdb-up/gdb/event-top.c:411 #28 0x00005555564611a7 in tui_on_command_error () at /gdb-up/gdb/tui/tui-interp.c:205 #29 0x0000555555c2173f in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316 #30 0x0000555555e10c20 in std::function<void ()>::operator()() const (this=0x5555580f9028) at /usr/include/c++/7/bits/std_function.h:706 #31 0x0000555555e10973 in gdb::observers::observable<>::notify() const (this=0x555557f48d20 <gdb::observers::command_error>) at /gdb-up/gdb/../gdbsupport/observable.h:150 #32 0x00005555560e9b3f in start_event_loop () at /gdb-up/gdb/main.c:438 #33 0x00005555560e9bcc in captured_command_loop () at /gdb-up/gdb/main.c:481 #34 0x00005555560eb616 in captured_main (data=0x7fffffffddd0) at /gdb-up/gdb/main.c:1348 #35 0x00005555560eb67c in gdb_main (args=0x7fffffffddd0) at /gdb-up/gdb/main.c:1363 #36 0x0000555555c1b6b3 in main (argc=12, argv=0x7fffffffded8) at /gdb-up/gdb/gdb.c:32 Commit 1da5d0e eliminated the call to 'get_current_arch' in 'gdbpy_before_prompt_hook'. Hence, the second instance of "Couldn't get registers: No such process." does not appear anymore. Fix the failure by updating the regular expression in the test.

…ync." Commit 14b3360 ("do_target_wait_1: Clear TARGET_WNOHANG if the target isn't async.") broke some multi-target tests, such as gdb.multi/multi-target-info-inferiors.exp. The symptom is that execution just hangs at some point. What happens is: 1. One remote inferior is started, and now sits stopped at a breakpoint. It is not "async" at this point (but it "can async"). 2. We run a native inferior, the event loop gets woken up by the native target's fd. 3. In do_target_wait, we randomly choose an inferior to call target_wait on first, it happens to be the remote inferior. 4. Because the target is currently not "async", we clear TARGET_WNOHANG, resulting in synchronous wait. We therefore block here: #0 0x00007fe9540dbb4d in select () from /usr/lib/libc.so.6 #1 0x000055fc7e821da7 in gdb_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/posix-hdep.c:31 #2 0x000055fc7ddef905 in interruptible_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/event-top.c:1134 riscvarchive#3 0x000055fc7eda58e4 in ser_base_wait_for (scb=0x6250002e4100, timeout=1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:240 riscvarchive#4 0x000055fc7eda66ba in do_ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:365 riscvarchive#5 0x000055fc7eda6ff6 in generic_readchar (scb=0x6250002e4100, timeout=-1, do_readchar=0x55fc7eda663c <do_ser_base_readchar(serial*, int)>) at /home/simark/src/binutils-gdb/gdb/ser-base.c:444 riscvarchive#6 0x000055fc7eda718a in ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:471 riscvarchive#7 0x000055fc7edb1ecd in serial_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/serial.c:393 riscvarchive#8 0x000055fc7ec48b8f in remote_target::readchar (this=0x617000038780, timeout=-1) at /home/simark/src/binutils-gdb/gdb/remote.c:9446 riscvarchive#9 0x000055fc7ec4da82 in remote_target::getpkt_or_notif_sane_1 (this=0x617000038780, buf=0x6170000387a8, forever=1, expecting_notif=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:9928 riscvarchive#10 0x000055fc7ec4f045 in remote_target::getpkt_or_notif_sane (this=0x617000038780, buf=0x6170000387a8, forever=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:10037 riscvarchive#11 0x000055fc7ec354d4 in remote_target::wait_ns (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8147 riscvarchive#12 0x000055fc7ec38aa1 in remote_target::wait (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8337 riscvarchive#13 0x000055fc7f1409ce in target_wait (ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2612 riscvarchive#14 0x000055fc7e19da98 in do_target_wait_1 (inf=0x617000038080, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3636 riscvarchive#15 0x000055fc7e19e26b in operator() (__closure=0x7ffdb77c2f90, inf=0x617000038080) at /home/simark/src/binutils-gdb/gdb/infrun.c:3697 riscvarchive#16 0x000055fc7e19f0c4 in do_target_wait (ecs=0x7ffdb77c33a0, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3716 riscvarchive#17 0x000055fc7e1a31f7 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4061 Before the aforementioned commit, we would not have cleared TARGET_WNOHANG, the remote target's wait would have returned nothing, and we would have consumed the native target's event. After applying this revert, the testsuite state looks as good as before for me on Ubuntu 20.04 amd64. Change-Id: Ic17a1642935cabcc16c25cb6899d52e12c2f5c3f

The current zombie leader detection code in linux-nat.c has a race -- if a multi-threaded inferior exits just before check_zombie_leaders finds that the leader is now zombie via checking /proc/PID/status, check_zombie_leaders deletes the leader, assuming we won't get an event for that exit (which we won't in some scenarios, but not in this one). That might seem mostly harmless, but it has some downsides: - later when we continue pulling events out of the kernel, we will collect the exit event of the non-leader threads, and once we see the last lwp in our list exit, we return _that_ lwp's exit code as whole-process exit code to infrun, instead of the leader's exit code. - this can cause a hang in stop_all_threads in infrun.c. Say there are 2 threads in the process. stop_all_threads stops each of those threads, and then waits for two stop or exit events, one for each thread. If the whole process exits, and check_zombie_leaders hits the false-positive case, linux-nat.c will only return one event to GDB (the whole-process exit returned when we see the last thread, the non-leader thread, exit), making stop_all_threads hang forever waiting for a second event that will never come. However, in this false-positive scenario, where the whole process is exiting, as opposed to just the leader (with pthread_exit(), for example), we _will_ get an exit event shortly for the leader, after we collect the exit event of all the other non-leader threads. Or put another way, we _always_ get an event for the leader after we see it become zombie. I tried a number of approaches to fix this: riscvarchive#1 - My first thought to address the race was to make GDB always report the whole-process exit status for the leader thread, not for whatever is the last lwp in the list. We _always_ get a final exit (or exec) event for the leader, and when the race triggers, we're not collecting it. riscvarchive#2 - My second thought was to try to plug the race in the first place. I thought of making GDB call waitpid/WNOHANG for all non-leader threads immediately when the zombie leader is detected, assuming there would be an exit event pending for each of them waiting to be collected. Turns out that that doesn't work -- you can see the leader become zombie _before_ the kernel kills all other threads. Waitpid in that small time window returns 0, indicating no-event. Thankfully we hit that race window all the time, which avoided trading one race for another. Looking at the non-leader thread's status in /proc doesn't help either, the threads are still in running state for a bit, for the same reason. riscvarchive#3 - My next attempt, which seemed promising, was to synchronously stop and wait for the stop for each of the non-leader threads. For the scenario in question, this will collect all the exit statuses of the non-leader threads. Then, if we are left with only the zombie leader in the lwp list, it means we either have a normal while-process exit or an exec, in which case we should not delete the leader. If _only_ the leader exited, like in gdb.threads/leader-exit.exp, then after pausing threads, we will still have at least one live non-leader thread in the list, and so we delete the leader lwp. I got this working and polished, and it was only after staring at the kernel code to convince myself that this would really work (and it would, for the scenario I considered), that I realized I had failed to account for one scenario -- if any non-leader thread is _already_ stopped when some thread triggers a group exit, like e.g., if you have some threads stopped and then resume just one thread with scheduler-locking or non-stop, and that thread exits the process. I also played with PTRACE_EVENT_EXIT, see if it would help in any way to plug the race, and I couldn't find a way that it would result in any practical difference compared to looking at /proc/PID/status, with respect to having a race. So I concluded that there's no way to plug the race, we just have to deal with it. Which means, going back to approach riscvarchive#1. That is the approach taken by this patch. Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979

…pported When parsing the ptid out of a reply package, if the multi-process extensions are not supported, use current_inferior's pid as the pid of the reported thread, instead of inferior_ptid. This is needed because the inferior_ptid may be null_ptid although a legit context exists, due to a prior context switch via switch_to_inferior_no_thread. Below is a scenario that illustrates what could go wrong. First, setup a multi-target scenario. This is needed, because in a multi-target setting, the inferior_ptid is cleared out before waiting on targets. The second inferior below sits on top of a remote target. Multi-process packets are disabled. $ # First, spawn a process with PID 26253 to attach to later. $ gdb-up a.out Reading symbols from a.out... (gdb) maint set target-non-stop on (gdb) set remote multiprocess-feature-packet off (gdb) start ... (gdb) add-inferior -no-connection [New inferior 2] Added inferior 2 (gdb) inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] (gdb) target extended-remote | gdbserver --multi - Remote debugging using | gdbserver --multi - Remote debugging using stdio (gdb) attach 26253 Attaching to Remote target Attached; pid = 26253 [New Thread 26253] [New inferior 3] Reading /tmp/a.out from remote target... ... [New Thread 26253] ... Reading /usr/local/lib/debug/....debug from remote target... >>> GDB seems to hang here. After attaching to a process and reading some library files, GDB seems to hang. One interesting thing to note is that [New Thread 26253] appears twice. We also see [New inferior 3] Running the same scenario with "debug infrun on" reveals more details. ... (gdb) attach 26253 [infrun] scoped_disable_commit_resumed: reason=attaching Attaching to Remote target Attached; pid = 26253 [New Thread 26253] [infrun] infrun_async: enable=1 [infrun] attach_command: immediately after attach: [infrun] attach_command: thread 26253.26253.0, executing = 1, resumed = 0, state = RUNNING [infrun] clear_proceed_status_thread: 26253.26253.0 [infrun] reset: reason=attaching [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at riscvarchive#1 [infrun] random_pending_event_thread: None found. [infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) = [infrun] print_target_wait_results: 26253.26253.0 [Thread 26253], [infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_0 [infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_0 [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] context_switch: Switching context from 0.0.0 to 26253.26253.0 [infrun] handle_signal_stop: stop_pc=0x7f849d8cf151 [infrun] stop_waiting: stop_waiting [infrun] stop_all_threads: starting [infrun] stop_all_threads: pass=0, iterations=0 [New inferior 3] Reading /tmp/a.out from remote target... warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. Reading /tmp/a.out from remote target... Reading symbols from target:/tmp/a.out... [New Thread 26253] [infrun] stop_all_threads: 4723.4723.0 not executing [infrun] stop_all_threads: 26253.26253.0 not executing [infrun] stop_all_threads: 42000.26253.0 executing, need stop [infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) = [infrun] print_target_wait_results: -1.0.0 [Thread 0], [infrun] print_target_wait_results: status->kind = IGNORE [infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) = [infrun] print_target_wait_results: -1.0.0 [Thread 0], [infrun] print_target_wait_results: status->kind = IGNORE GDB tried to stop Thread 42000.26253.0, which does not exist, and we are waiting for a stop event that will never happen. The PID in '42000.26253.0', namely 42000, is the PID of magic_null_ptid. It comes from gdb/remote.c:read_ptid: /* Since the stub is not sending a process id, then default to what's in inferior_ptid, unless it's null at this point. If so, then since there's no way to know the pid of the reported threads, use the magic number. */ if (inferior_ptid == null_ptid) pid = magic_null_ptid.pid (); else pid = inferior_ptid.pid (); if (obuf) *obuf = pp; return ptid_t (pid, tid); Because multi-process was turned off, GDB did not parse an explicitly specified PID. Furthermore, inferior_ptid == null_ptid, and eventually GDB picked the PID from magic_null_ptid. If target-non-stop is not turned on at the beginning, the same bug reveals itself as a duplicated thread as shown below. # Same setup as above, without 'maint set target-non-stop on'. ... (gdb) attach 26253 Attaching to Remote target Attached; pid = 26253 [New inferior 3] ... [New Thread 26253] ... (gdb) info threads Id Target Id Frame 1.1 process 13517 "a.out" main () at test.c:3 * 2.1 Thread 26253 "a.out" 0x00007f12750c5151 in read () from target:/lib/x86_64-linux-gnu/libc.so.6 3.1 Thread 26253 "a.out" Remote 'g' packet reply is too long (expected 560 bytes, got 2496 bytes): 00feffffffffffff000a3a75127f000051510c75127f0000000400000000000060d24ef6af5500000000000000000000680d000000000000b85b31e3fc7f0000c0283a75127f000000e55b75127f000010d04ef6af550000460200000000000060c73975127f0000a0d23975127f0000000a3a75127f0000000000000000000051510c75127f000046020000330000002b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000ffff0000000000000000000000000000000000000000000080143a75127f000080143a75127f000040143a75127f000040143a75127f00007d0000007e0000007f00000080000000300c3a75127f0000300c3a75127f00000e000000000000000e0000000000000000000000000000000000000000000000ffffffffffffffffffffffffffffffff0400000004000000040000000400000020143a75127f000020143a75127f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000000000000000000000e55b75127f0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (gdb) Fix the problem by preferring current_inferior()'s pid instead of magic_null_ptid. Regression-tested on X86-64 Linux. Co-authored-by: Aleksandar Paunovic <aleksandar.paunovic@intel.com>

I see some failures, at least in gdb.multi/multi-re-run.exp and gdb.threads/interrupted-hand-call.exp. Running `stress -C $(nproc)` at the same time as the test makes those tests relatively frequent. Let's take gdb.multi/multi-re-run.exp as an example. The failure looks like this, an unexpected "no resumed": continue Continuing. No unwaited-for children left. (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=2: iter=1: continue until exit The situation is: - Inferior 1 is stopped somewhere, it won't really play a role here. - Inferior 2 has 2 threads, both stopped. - We resume inferior 2, the leader thread is expected to exit, making the process exit. From GDB's perspective, a failing run looks like this: [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at riscvarchive#1 [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: T0506:20dcffffff7f0000;07:20dcffffff7f0000;10:9551555555550000;thread:pae4cd.ae4cd;core:e; [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 713933.713933.0 [Thread 713933.713933], [infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP [infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP [infrun] clear_step_over_info: clearing step over info [infrun] context_switch: Switching context from 0.0.0 to 713933.713933.0 [infrun] handle_signal_stop: stop_pc=0x555555555195 [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] process_event_stop_test: no stepping, continue [remote] Sending packet: $Z0,555555555194,1#8e [remote] Packet received: OK [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [713933.713933.0] at 0x555555555195 [remote] Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a [remote] Packet received: OK [remote] Sending packet: $vCont;c:pae4cd.-1#9f [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] fetch_inferior_event: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at #0 [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: N [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: -1.0.0 [process -1], [infrun] print_target_wait_results: status->kind = NO_RESUMED [infrun] handle_inferior_event: status->kind = NO_RESUMED [remote] Sending packet: $Hgp0.0#ad [remote] Packet received: OK [remote] Sending packet: $qXfer:threads:read::0,1000#92 [remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n [infrun] stop_waiting: stop_waiting [remote] Sending packet: $qXfer:threads:read::0,1000#92 [remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n [infrun] infrun_async: enable=0 [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] fetch_inferior_event: exit We can see that we resume the inferior with vCont;c, but got NO_RESUMED. When the test passes, we get an EXITED status to indicate the process has exited. From GDBserver's point of view, it looks like this. The logs contain some logging I added and that are part of this patch. [remote] getpkt: getpkt ("vCont;c:pae4cf.-1"); [no ack sent] [threads] resume: enter [threads] thread_needs_step_over: Need step over [LWP 713931]? Ignoring, should remain stopped [threads] thread_needs_step_over: Need step over [LWP 713932]? Ignoring, should remain stopped [threads] get_pc: pc is 0x555555555195 [threads] thread_needs_step_over: Need step over [LWP 713935]? No, no breakpoint found at 0x555555555195 [threads] get_pc: pc is 0x7ffff7d35a95 [threads] thread_needs_step_over: Need step over [LWP 713936]? No, no breakpoint found at 0x7ffff7d35a95 [threads] resume: Resuming, no pending status or step over needed [threads] resume_one_thread: resuming LWP 713935 [threads] proceed_one_lwp: lwp 713935 [threads] resume_one_lwp_throw: continue from pc 0x555555555195 [threads] resume_one_lwp_throw: Resuming lwp 713935 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: NOW ptid=713935.713935.0 stopped=0 resumed=0 [threads] resume_one_thread: resuming LWP 713936 [threads] proceed_one_lwp: lwp 713936 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] resume: exit [threads] wait_1: enter [threads] wait_1: [<all threads>] [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK [threads] resume_stopped_resumed_lwps: resuming stopped-resumed LWP LWP 713935.713936 at 7ffff7d35a95: step=0 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] operator(): check_zombie_leaders: leader_pid=713931, leader_lp!=NULL=1, num_lwps=2, zombie=0 [threads] operator(): check_zombie_leaders: leader_pid=713935, leader_lp!=NULL=1, num_lwps=2, zombie=1 [threads] operator(): Thread group leader 713935 zombie (it exited, or another thread execd). [threads] delete_lwp: deleting 713935 [threads] wait_for_event_filtered: exit (no unwaited-for LWP) sigchld_handler [threads] wait_1: ret = null_ptid, TARGET_WAITKIND_NO_RESUMED [threads] wait_1: exit What happens is: - We resume the leader (713935) successfully. - The leader exits. - We resume the secondary thread (713936), we get ESRCH. This is expected this the leader has exited. - resume_one_lwp_throw throws, it's caught by resume_one_lwp. - resume_one_lwp checks with check_ptrace_stopped_lwp_gone that the failure can be explained by the LWP becoming zombie, and swallows the error. - Note that this means that the secondary lwp still has stopped==1. - wait_1 is called, probably because linux_process_target::resume marks the async pipe at the end. - The exit event isn't ready yet, probably because the machine is under load, so waitpid returns nothing. - check_zombie_leaders detects that the leader is zombie and deletes - We try to find a resumed (non-stopped) LWP to get an event from, there's none since the leader (that was resumed) is now deleted, and the secondary thread is still marked stopped. wait_for_event_filtered returns -1, causing wait_1 to return NO_RESUMED. What I notice here is that there is some kind of race between the availability of the process' exit notification and the call to wait_1 that results from marking the async pipe at the end of resume. I think what we want from this wait_1 invocation is to keep waiting, as we will eventually get thread exit notifications for both of our threads. The fix I came up with is to mark the secondary thread as !stopped (or resumed) when we fail to resume it. This makes wait_1 see that there is at least one resume lwp, so it won't return NO_RESUMED. I think this makes sense to consider it resumed, because we are going to receive an exit event for it. Here's the GDBserver logs with the fix applied: [threads] resume: enter [threads] thread_needs_step_over: Need step over [LWP 724595]? Ignoring, should remain stopped [threads] thread_needs_step_over: Need step over [LWP 724596]? Ignoring, should remain stopped [threads] get_pc: pc is 0x555555555195 [threads] thread_needs_step_over: Need step over [LWP 724597]? No, no breakpoint found at 0x555555555195 [threads] get_pc: pc is 0x7ffff7d35a95 [threads] thread_needs_step_over: Need step over [LWP 724598]? No, no breakpoint found at 0x7ffff7d35a95 [threads] resume: Resuming, no pending status or step over needed [threads] resume_one_thread: resuming LWP 724597 [threads] proceed_one_lwp: lwp 724597 [threads] resume_one_lwp_throw: continue from pc 0x555555555195 [threads] resume_one_lwp_throw: Resuming lwp 724597 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: NOW ptid=724597.724597.0 stopped=0 resumed=0 [threads] resume_one_thread: resuming LWP 724598 [threads] proceed_one_lwp: lwp 724598 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 724598 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] resume: exit [threads] wait_1: enter [threads] wait_1: [<all threads>] sigchld_handler [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK [threads] operator(): check_zombie_leaders: leader_pid=724595, leader_lp!=NULL=1, num_lwps=2, zombie=0 [threads] operator(): check_zombie_leaders: leader_pid=724597, leader_lp!=NULL=1, num_lwps=2, zombie=1 [threads] operator(): Thread group leader 724597 zombie (it exited, or another thread execd). [threads] delete_lwp: deleting 724597 [threads] wait_for_event_filtered: sigsuspend'ing sigchld_handler [threads] wait_for_event_filtered: waitpid(-1, ...) returned 724598, ERRNO-OK [threads] wait_for_event_filtered: waitpid 724598 received 0 (exited) [threads] filter_event: 724598 exited [threads] wait_for_event_filtered: waitpid(-1, ...) returned 724597, ERRNO-OK [threads] wait_for_event_filtered: waitpid 724597 received 0 (exited) [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK sigchld_handler [threads] wait_1: ret = LWP 724597.724598, exited with retcode 0 [threads] wait_1: exit Change-Id: Idf0bdb4cb0313f1b49e4864071650cc83fb3c100

Bug 28980 shows that trying to value_copy an entirely optimized out value causes an internal error. The original bug report involves MI and some Python pretty printer, and is quite difficult to reproduce, but another easy way to reproduce (that is believed to be equivalent) was proposed: $ ./gdb -q -nx --data-directory=data-directory -ex "py print(gdb.Value(gdb.Value(5).type.optimized_out()))" /home/smarchi/src/binutils-gdb/gdb/value.c:1731: internal-error: value_copy: Assertion `arg->contents != nullptr' failed. This is caused by 5f8ab46 ("gdb: constify parameter of value_copy"). It added an assertion that the contents buffer is allocated if the value is not lazy: if (!value_lazy (val)) { gdb_assert (arg->contents != nullptr); This was based on the comment on value::contents, which suggest that this is the case: /* Actual contents of the value. Target byte-order. NULL or not valid if lazy is nonzero. */ gdb::unique_xmalloc_ptr<gdb_byte> contents; However, it turns out that it can also be nullptr also if the value is entirely optimized out, for example on exit of allocate_optimized_out_value. That function creates a lazy value, marks the entire value as optimized out, and then clears the lazy flag. But contents remains nullptr. This wasn't a problem for value_copy before, because it was calling value_contents_all_raw on the input value, which caused contents to be allocated before doing the copy. This means that the input value to value_copy did not have its contents allocated on entry, but had it allocated on exit. The result value had it allocated on exit. And that we copied bytes for an entirely optimized out value (i.e. meaningless bytes). From here I see two choices: 1. respect the documented invariant that contents is nullptr only and only if the value is lazy, which means making allocate_optimized_out_value allocate contents 2. extend the cases where contents can be nullptr to also include values that are entirely optimized out (note that you could still have some entirely optimized out values that do have contents allocated, it depends on how they were created) and adjust value_copy accordingly Choice riscvarchive#1 is safe, but less efficient: it's not very useful to allocate a buffer for an entirely optimized out value. It's even a bit less efficient than what we had initially, because values coming out of allocate_optimized_out_value would now always get their contents allocated. Choice riscvarchive#2 would be more efficient than what we had before: giving an optimized out value without allocated contents to value_copy would result in an optimized out value without allocated contents (and the input value would still be without allocated contents on exit). But it's more risky, since it's difficult to ensure that all users of the contents (through the various_contents* accessors) are all fine with that new invariant. In this patch, I opt for choice riscvarchive#2, since I think it is a better direction than choice riscvarchive#1. riscvarchive#1 would be a pessimization, and if we go this way, I doubt that it will ever be revisited, it will just stay that way forever. Add a selftest to test this. I initially started to write it as a Python test (since the reproducer is in Python), but a selftest is more straightforward. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28980 Change-Id: I6e2f5c0ea804fafa041fcc4345d47064b5900ed7

While working on a different patch, I triggered an assertion from the initialize_current_architecture code, specifically from one of the *_gdbarch_init functions in a *-tdep.c file. This exposes a couple of issues with GDB. This is easy enough to reproduce by adding 'gdb_assert (false)' into a suitable function. For example, I added a line into i386_gdbarch_init and can see the following issue. I start GDB and immediately hit the assert, the output is as you'd expect, except for the very last line: $ ./gdb/gdb --data-directory ./gdb/data-directory/ ../../src.dev-1/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- ... snip ... --------------------- ../../src.dev-1/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) ../../src.dev-1/gdb/ser-event.c:212:16: runtime error: member access within null pointer of type 'struct serial' Something goes wrong when we try to query the user. Note, I configured GDB with --enable-ubsan, I suspect that without this the above "error" would actually just be a crash. The backtrace from ser-event.c:212 looks like this: (gdb) bt 10 #0 serial_event_clear (event=0x675c020) at ../../src/gdb/ser-event.c:212 riscvarchive#1 0x0000000000769456 in invoke_async_signal_handlers () at ../../src/gdb/async-event.c:211 riscvarchive#2 0x000000000295049b in gdb_do_one_event () at ../../src/gdbsupport/event-loop.cc:194 riscvarchive#3 0x0000000001f015f8 in gdb_readline_wrapper ( prompt=0x67135c0 "../../src/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable.\nQuit this debugg"...) at ../../src/gdb/top.c:1141 riscvarchive#4 0x0000000002118b64 in defaulted_query(const char *, char, typedef __va_list_tag __va_list_tag *) ( ctlstr=0x2e4eb68 "%s\nQuit this debugging session? ", defchar=0 '\000', args=0x7fffffffa6e0) at ../../src/gdb/utils.c:934 riscvarchive#5 0x0000000002118f72 in query (ctlstr=0x2e4eb68 "%s\nQuit this debugging session? ") at ../../src/gdb/utils.c:1026 riscvarchive#6 0x00000000021170f6 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x6107bc0 <internal_error_problem>, file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.", ap=0x7fffffffa8e8) at ../../src/gdb/utils.c:417 riscvarchive#7 0x00000000021175a0 in internal_verror (file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.", ap=0x7fffffffa8e8) at ../../src/gdb/utils.c:485 riscvarchive#8 0x00000000029503b3 in internal_error (file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.") at ../../src/gdbsupport/errors.cc:55 riscvarchive#9 0x000000000122d5b6 in i386_gdbarch_init (info=..., arches=0x0) at ../../src/gdb/i386-tdep.c:8455 (More stack frames follow...) It turns out that the problem is that the async event handler mechanism has been invoked, but this has not yet been initialized. If we look at gdb_init (in gdb/top.c) we can indeed see the call to gdb_init_signals is after the call to initialize_current_architecture. If I reorder the calls, moving gdb_init_signals earlier, then the initial error is resolved, however, things are still broken. I now see the same "Quit this debugging session? (y or n)" prompt, but when I provide an answer and press return GDB immediately crashes. So what's going on now? The next problem is that the call_readline field within the current_ui structure is not initialized, and this callback is invoked to process the reply I entered. The problem is that call_readline is setup as a result of calling set_top_level_interpreter, which is called from captured_main_1. Unfortunately, set_top_level_interpreter is called after gdb_init is called. I wondered how to solve this problem for a while, however, I don't know if there's an easy "just reorder some lines" solution here. Looking through captured_main_1 there seems to be a bunch of dependencies between printing various things, parsing config files, and setting up the interpreter. I'm sure there is a solution hiding in there somewhere.... I'm just not sure I want to spend any longer looking for it. So. I propose a simpler solution, more of a hack/work-around. In utils.c we already have a function filtered_printing_initialized, this is checked in a few places within internal_vproblem. In some of these cases the call gates whether or not GDB will query the user. My proposal is to add a new readline_initialized function, which checks if the current_ui has had readline initialized yet. If this is not the case then we should not attempt to query the user. After this change GDB prints the error message, the backtrace, and then aborts (including dumping core). This actually seems pretty sane as, if GDB has not yet made it through the initialization then it doesn't make much sense to allow the user to say "no, I don't want to quit the debug session" (I think).

The variable right_lib_flags is not being set correctly to define RIGHT. The value RIGHT is needed to force the address of the library functions lib1_func3 and lib2_func4 to occur at different address in the wrong and right libraries. With RIGHT defined correctly, functions lib1_func3 and lib2_func4 occur at different addresses the test runs correctly on Powerpc. The test needs the lib2 addresses to be different in the right and wrong cases. That is the point of introducing function lib2_spacer with the ifdef RIGHT compiler directive. On Intel, the ARRAY_SIZE of 1 versus 8192 is sufficient to get the dynamic linker to move the addresses of the library. You can also get the same effect on PowerPC but you must use a value much larger than 8192. The key thing is that the test was not properly setting RIGHT to defined to get the lib2_spacer function on Intel and Powerpc. Without the patch, we have the Intel backtrace for the bad libraries: backtrace #0 break_here () at /home/ ... /gdb/testsuite/gdb.base/solib-search.c:30 riscvarchive#1 0x00007ffff7fae156 in ?? () riscvarchive#2 0x00007fffffffc150 in ?? () riscvarchive#3 0x00007ffff7fbb156 in ?? () riscvarchive#4 0x00007fffffffc160 in ?? () riscvarchive#5 0x00007ffff7fae146 in ?? () riscvarchive#6 0x00007fffffffc170 in ?? () riscvarchive#7 0x00007ffff7fbb146 in ?? () riscvarchive#8 0x00007fffffffc180 in ?? () riscvarchive#9 0x0000555555555156 in main () at /home/ ... /binutils-gdb/gdb/testsuite/gdb.base/solib-search.c:23 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) PASS: gdb.base/solib-search.exp: backtrace (with wrong libs) (data collection) The backtrace on Intel with the good libraries is: backtrace #0 break_here () at /.../binutils-gdb/gdb/testsuite/gdb.base/solib-search.c:30 riscvarchive#1 0x00007ffff7fae156 in lib2_func4 () at /.../binutils-gdb/gdb/testsuite/gdb.base/solib-search-lib2.c:49 riscvarchive#2 0x00007ffff7fbb156 in lib1_func3 () at /.../gdb.base/solib-search-lib1.c:49 riscvarchive#3 0x00007ffff7fae146 in lib2_func2 () at /.../testsuite/gdb.base/solib-search-lib2.c:30 riscvarchive#4 0x00007ffff7fbb146 in lib1_func1 () at /.../gdb.base/solib-search-lib1.c:30 riscvarchive#5 0x0000555555555156 in main () at /...solib-search.c:23 (gdb) PASS: gdb.base/solib-search.exp: backtrace (with right libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with right libs) In one case the backtrace is correct and the other it is wrong on Intel. This is due to the fact that the ARRAY_SIZE caused the dynamic linker to move the library function addresses around. I believe it has to do with the default size of the data and code sections used by the dynamic linker. So without the patch the backtrace on PowerPC looks like: backtrace #0 break_here () at /.../solib-search.c:30 riscvarchive#1 0x00007ffff7f007f4 in lib2_func4 () at /.../solib-search-lib2.c:49 riscvarchive#2 0x00007ffff7f307f4 in lib1_func3 () at /.../solib-search-lib1.c:49 riscvarchive#3 0x00007ffff7f007ac in lib2_func2 () at /.../solib-search-lib2.c:30 riscvarchive#4 0x00007ffff7f307ac in lib1_func1 () at /.../solib-search-lib1.c:30 riscvarchive#5 0x000000001000074c in main () at /.../solib-search.c:23 for both the good and bad libraries. The patch fixes defining RIGHT in solib-search-lib1.c and solib-search- lib2.c. Note, without the patch the lib1_spacer and lib2_spacer functions do not show up in the object dump of the Intel or Powerpc libraries as it should. The patch fixes that by making sure RIGHT gets defined. Now with the patch the backtrace for the bad library on PowerPC looks like: backtrace #0 break_here () at /.../solib-search.c:30 riscvarchive#1 0x00007ffff7f0083c in __glink_PLTresolve () from /.../solib-search-lib2.so Backtrace stopped: frame did not save the PC And the backtrace for the good libraries on PowerPC looks like: backtrace #0 break_here () at /.../solib-search.c:30 riscvarchive#1 0x00007ffff7f0083c in lib2_func4 () at /.../solib-search-lib2.c:49 riscvarchive#2 0x00007ffff7f3083c in lib1_func3 () at /.../solib-search-lib1.c:49 riscvarchive#3 0x00007ffff7f007cc in lib2_func2 () at /.../solib-search-lib2.c:30 riscvarchive#4 0x00007ffff7f307cc in lib1_func1 () at /.../solib-search-lib1.c:30 riscvarchive#5 0x000000001000074c in main () at /.../solib-search.c:23 (gdb) PASS: gdb.base/solib-search.exp: backtrace (with right libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with right libs) The issue then is on Power where the ARRAY_SIZE of 1 versus 8192 is not sufficient to cause the dymanic linker to allocate the libraries at different addresses. I don't claim to understand the specifics of how the dynamic linker works and what the default size is for the data and code sections are. My guess is by default PowerPC allocates a larger data size by default, which is large enough to hold array[8192]. The default size of the data section allocated by the dynamic linker on Intel is not large enough to hold array[8192] thus causing the code section on Intel to have to move when the large array is defined. Note on PowerPC, if you make ARRAY_SIZE big enough, then you will cause the library addresses to occur at different addresses as the larger data section forces the code section to a different address. That was actually my original fix for the program until I spoke with Doug Evans who originally wrote the test. Doug noticed that RIGHT was not getting defined as he originally intended in the test. With the patch to fix the definition of RIGHT, PowerPC has a bad and a good backtrace because the address of lib1_func3 and lib2_func4 both move because lib1_spacer and lib2_spacer are now defined before lib1_func3 and lib2_func4. Without the patch, the lib1_spacer and lib2_spacer function doesn't show up in the binary for the correct or incorrect library on Intel or PowerPC. With the patch, RIGHT gets defined as originally intended for the test on both architectures and lib1_spacer and lib2_spacer function show up in the binaries on both architectures changing the other function addresses as intended thus causing the test work as intended on PowerPC.

… failing to attach Running $ ../gdbserver/gdbserver --once --attach :1234 539436 with ASan while /proc/sys/kernel/yama/ptrace_scope is set to 1 (prevents attaching) shows that we fail to free some platform-specific objects tied to the process_info (process_info_private and arch_process_info): Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7f6b558b3fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 riscvarchive#1 0x562eaf15d04a in xcalloc /home/simark/src/binutils-gdb/gdbserver/../gdb/alloc.c:100 riscvarchive#2 0x562eaf251548 in xcnew<process_info_private> /home/simark/src/binutils-gdb/gdbserver/../gdbsupport/poison.h:122 riscvarchive#3 0x562eaf22810c in linux_process_target::add_linux_process_no_mem_file(int, int) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:426 riscvarchive#4 0x562eaf22d33f in linux_process_target::attach(unsigned long) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:1132 riscvarchive#5 0x562eaf1a7222 in attach_inferior /home/simark/src/binutils-gdb/gdbserver/server.cc:308 riscvarchive#6 0x562eaf1c1016 in captured_main /home/simark/src/binutils-gdb/gdbserver/server.cc:3949 riscvarchive#7 0x562eaf1c1d60 in main /home/simark/src/binutils-gdb/gdbserver/server.cc:4084 riscvarchive#8 0x7f6b552f630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f) Indirect leak of 56 byte(s) in 1 object(s) allocated from: #0 0x7f6b558b3fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 riscvarchive#1 0x562eaf15d04a in xcalloc /home/simark/src/binutils-gdb/gdbserver/../gdb/alloc.c:100 riscvarchive#2 0x562eaf2a0d79 in xcnew<arch_process_info> /home/simark/src/binutils-gdb/gdbserver/../gdbsupport/poison.h:122 riscvarchive#3 0x562eaf295e2c in x86_target::low_new_process() /home/simark/src/binutils-gdb/gdbserver/linux-x86-low.cc:723 riscvarchive#4 0x562eaf22819b in linux_process_target::add_linux_process_no_mem_file(int, int) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:428 riscvarchive#5 0x562eaf22d33f in linux_process_target::attach(unsigned long) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:1132 riscvarchive#6 0x562eaf1a7222 in attach_inferior /home/simark/src/binutils-gdb/gdbserver/server.cc:308 riscvarchive#7 0x562eaf1c1016 in captured_main /home/simark/src/binutils-gdb/gdbserver/server.cc:3949 riscvarchive#8 0x562eaf1c1d60 in main /home/simark/src/binutils-gdb/gdbserver/server.cc:4084 riscvarchive#9 0x7f6b552f630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f) Those objects are deleted by linux_process_target::mourn, but that is not called if we fail to attach, we only call remove_process. I initially fixed this by making linux_process_target::attach call linux_process_target::mourn on failure (before calling error). But this isn't done anywhere else (including in GDB) so it would just be confusing to do things differently here. Instead, add a linux_process_target::remove_linux_process helper method (which calls remove_process), and call that instead of remove_process in the Linux target. Move the free-ing of the extra data from the mourn method to that new method. Change-Id: I277059a69d5f08087a7f3ef0b8f1792a1fcf7a85

…nalled Commit 152a174 ("gdb: prune inferiors at end of fetch_inferior_event, fix intermittent failure of gdb.threads/fork-plus-threads.exp") introduced some follow-fork-related test failures, such as: info inferiors^M Num Description Connection Executable ^M * 1 process 634972 1 (native) /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M 2 process 634975 1 (native) /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: info inferiors inferior 2^M [Switching to inferior 2 [process 634975] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]^M [Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 634975))]^M #0 0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6^M (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: inferior 2 continue^M Continuing.^M [Inferior 2 (process 634975) exited normally]^M [Switching to Thread 0x7ffff7c9a740 (LWP 634972)]^M (gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: continue until exit at continue unfollowed inferior to end break callee^M Breakpoint 2 at 0x555555555160: file /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c, line 9.^M (gdb) FAIL: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: break callee What happens here is: - inferior 2 is selected - we continue, leading to inferior 2's exit - we set breakpoint, expect 2 locations, but only one location is resolved Reading between the lines, we understand that inferior 2 got pruned, when it shouldn't have been. The issue can be reproduced by hand with: $ ./gdb -q --data-directory=data-directory testsuite/outputs/gdb.base/foll-fork/foll-fork -ex "set detach-on-fork off" -ex start -ex "next 2" -ex "inferior 2" -ex "set debug infrun" ... Temporary breakpoint 1, main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:14 14 int v = 5; [New inferior 2 (process 637627)] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". 17 if (pid == 0) /* set breakpoint here */ [Switching to inferior 2 [process 637627] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)] [Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 637627))] #0 0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6 (gdb) continue Continuing. [infrun] clear_proceed_status_thread: 637627.637627.0 [infrun] proceed: enter [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT [infrun] scoped_disable_commit_resumed: reason=proceeding [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop [infrun] proceed: resuming 637627.637627.0 [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [637627.637627.0] at 0x7ffff7d7abf7 [infrun] do_target_resume: resume_ptid=637627.637627.0, step=0, sig=GDB_SIGNAL_0 [infrun] infrun_async: enable=1 [infrun] prepare_to_wait: prepare_to_wait [infrun] proceed: end: resuming threads, all-stop-on-top-of-non-stop [infrun] reset: reason=proceeding [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target native [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native [infrun] proceed: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at riscvarchive#1 [infrun] random_pending_event_thread: None found. [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 637627.637627.0 [process 637627], [infrun] print_target_wait_results: status->kind = EXITED, exit_status = 0 [infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0 [Inferior 2 (process 637627) exited normally] [infrun] stop_waiting: stop_waiting [infrun] stop_all_threads: start: reason=presenting stop to user in all-stop, inf=-1 [infrun] stop_all_threads: pass=0, iterations=0 [infrun] stop_all_threads: 637624.637624.0 not executing [infrun] stop_all_threads: pass=1, iterations=1 [infrun] stop_all_threads: 637624.637624.0 not executing [infrun] stop_all_threads: done [infrun] stop_all_threads: end: reason=presenting stop to user in all-stop, inf=-1 [Switching to Thread 0x7ffff7c9a740 (LWP 637624)] [infrun] infrun_async: enable=0 [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads (gdb) [infrun] fetch_inferior_event: exit (gdb) info inferiors Num Description Connection Executable * 1 process 637624 1 (native) /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork (gdb) i th Id Target Id Frame * 1 Thread 0x7ffff7c9a740 (LWP 637624) "foll-fork" main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:17 After handling the EXITED event for inferior 2, inferior 2 should have stayed the current inferior, which should have prevented it from getting pruned. When debugging, we find that when getting at the prune_inferiors call, the current inferior is inferior 1. Further debugging shows that prior to the call to clean_up_just_stopped_threads_fsms, the current inferior is inferior 2, and after, it's inferior 1. Then, back in fetch_inferior_event, the restore_thread object is disabled, due to: /* If we got a TARGET_WAITKIND_NO_RESUMED event, then the previously selected thread is gone. We have two choices - switch to no thread selected, or restore the previously selected thread (now exited). We chose the later, just because that's what GDB used to do. After this, "info threads" says "The current thread <Thread ID 2> has terminated." instead of "No thread selected.". */ if (!non_stop && cmd_done && ecs->ws.kind () != TARGET_WAITKIND_NO_RESUMED) restore_thread.dont_restore (); So in the end, inferior 1 stays current, and inferior 2 gets wrongfully pruned. I'd say clean_up_just_stopped_threads_fsms is the culprit here. It actually attempts to restore the event_thread to be current at the end, after the loop (I presume the current thread on entry is always supposed to be the event thread). But in this case, the event is of kind EXITED, and ecs->event_thread is not set, so the current inferior isn't restored. Fix that by using scoped_restore_current_thread. If there is no current thread, scoped_restore_current_thread will still restore the current inferior, and that's what we want. Random note: the thread_info object for inferior 2's thread is never freed. It is held (by refcount) by the restore_thread object in fetch_inferior_event, while the inferior's thread list gets cleared, in the exit event processing. When the refcount reaches 0 (when the restore_thread object is destroyed), there's nothing that actually deletes the thread_info object. And I think that nothing in GDB points to it anymore, so it leaks. I don't want to fix that in this patch, but thought it would be good to mention it, in case somebody has an idea for how to fix that. Change-Id: Ibc7df543e2c46aad5f3b9250b28c3fb5912be4e8

Commit 152a174 ("gdb: prune inferiors at end of fetch_inferior_event, fix intermittent failure of gdb.threads/fork-plus-threads.exp") broke some tests with the native-gdbserver board, such as: (gdb) PASS: gdb.base/step-over-syscall.exp: detach-on-fork=off: follow-fork=child: break cond on target : vfork: break marker continue^M Continuing.^M terminate called after throwing an instance of 'gdb_exception_error'^M I can manually reproduce the issue by running (just the commands that the test does as a one liner): $ ./gdb -q --data-directory=data-directory \ testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork \ -ex "tar rem | ../gdbserver/gdbserver - testsuite/outputs/gdb.base/step-over-syscall/step-over-vfork" \ -ex "b main" \ -ex c \ -ex "d 1" \ -ex "set displaced-stepping off" \ -ex "b *0x7ffff7d7ac5a if main == 0" \ -ex "set detach-on-fork off" \ -ex "set follow-fork-mode child" \ -ex c \ -ex "inferior 1" \ -ex "b marker" \ -ex c ... where 0x7ffff7d7ac5a is the exact address of the vfork syscall (which can be found by looking at gdb.log). The important part of the above is that a vfork syscall creates inferior 2, then inferior 2 executes until exit, then we switch back to inferior 1 and try to resume it. The uncaught exception happens here: riscvarchive#4 0x00005596969d81a9 in error (fmt=0x559692da9e40 "Cannot execute this command while the target is running.\nUse the \"interrupt\" command to stop the target\nand then try again.") at /home/simark/src/binutils-gdb/gdbsupport/errors.cc:43 riscvarchive#5 0x0000559695af6f66 in remote_target::putpkt_binary (this=0x617000038080, buf=0x559692da4380 "qSymbol::", cnt=9) at /home/simark/src/binutils-gdb/gdb/remote.c:9560 riscvarchive#6 0x0000559695af6aaf in remote_target::putpkt (this=0x617000038080, buf=0x559692da4380 "qSymbol::") at /home/simark/src/binutils-gdb/gdb/remote.c:9518 riscvarchive#7 0x0000559695ab50dc in remote_target::remote_check_symbols (this=0x617000038080) at /home/simark/src/binutils-gdb/gdb/remote.c:5141 riscvarchive#8 0x0000559695b3cccf in remote_new_objfile (objfile=0x0) at /home/simark/src/binutils-gdb/gdb/remote.c:14600 riscvarchive#9 0x0000559693bc52a9 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x61b0000167f8: 0x559695b3cb1d <remote_new_objfile(objfile*)>) at /usr/include/c++/11.2.0/bits/invoke.h:61 riscvarchive#10 0x0000559693bb2848 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x61b0000167f8: 0x559695b3cb1d <remote_new_objfile(objfile*)>) at /usr/include/c++/11.2.0/bits/invoke.h:111 riscvarchive#11 0x0000559693b8dddf in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7ffe0bae0590: 0x0) at /usr/include/c++/11.2.0/bits/std_function.h:291 riscvarchive#12 0x00005596956374b2 in std::function<void (objfile*)>::operator()(objfile*) const (this=0x61b0000167f8, __args#0=0x0) at /usr/include/c++/11.2.0/bits/std_function.h:560 riscvarchive#13 0x0000559695633c64 in gdb::observers::observable<objfile*>::notify (this=0x55969ef5c480 <gdb::observers::new_objfile>, args#0=0x0) at /home/simark/src/binutils-gdb/gdb/../gdbsupport/observable.h:150 riscvarchive#14 0x0000559695df6cc2 in clear_symtab_users (add_flags=...) at /home/simark/src/binutils-gdb/gdb/symfile.c:2873 riscvarchive#15 0x000055969574c263 in program_space::~program_space (this=0x6120000c8a40, __in_chrg=<optimized out>) at /home/simark/src/binutils-gdb/gdb/progspace.c:154 riscvarchive#16 0x0000559694fc086b in delete_inferior (inf=0x61700003bf80) at /home/simark/src/binutils-gdb/gdb/inferior.c:205 riscvarchive#17 0x0000559694fc341f in prune_inferiors () at /home/simark/src/binutils-gdb/gdb/inferior.c:390 riscvarchive#18 0x0000559695017ada in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4293 riscvarchive#19 0x0000559694f629e6 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/simark/src/binutils-gdb/gdb/inf-loop.c:41 riscvarchive#20 0x0000559695b3b0e3 in remote_async_serial_handler (scb=0x6250001ef100, context=0x6170000380a8) at /home/simark/src/binutils-gdb/gdb/remote.c:14466 riscvarchive#21 0x0000559695c59eb7 in run_async_handler_and_reschedule (scb=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:138 riscvarchive#22 0x0000559695c5a42a in fd_event (error=0, context=0x6250001ef100) at /home/simark/src/binutils-gdb/gdb/ser-base.c:189 riscvarchive#23 0x00005596969d9ebf in handle_file_event (file_ptr=0x60700005af40, ready_mask=1) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:574 riscvarchive#24 0x00005596969da7fa in gdb_wait_for_event (block=0) at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:700 riscvarchive#25 0x00005596969d8539 in gdb_do_one_event () at /home/simark/src/binutils-gdb/gdbsupport/event-loop.cc:212 If I enable "set debug infrun" just before the last continue, we see: (gdb) continue Continuing. [infrun] clear_proceed_status_thread: 965604.965604.0 [infrun] proceed: enter [infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT [infrun] scoped_disable_commit_resumed: reason=proceeding [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0 [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=proceeding [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote [infrun] proceed: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at riscvarchive#1 [infrun] random_pending_event_thread: None found. [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 965604.965604.0 [Thread 965604.965604], [infrun] print_target_wait_results: status->kind = VFORK_DONE [infrun] handle_inferior_event: status->kind = VFORK_DONE [infrun] context_switch: Switching context from 0.0.0 to 965604.965604.0 [infrun] handle_vfork_done: not waiting for a vfork-done event [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [965604.965604.0] at 0x7ffff7d7ac5c [infrun] do_target_resume: resume_ptid=965604.0.0, step=0, sig=GDB_SIGNAL_0 [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target remote terminate called after throwing an instance of 'gdb_exception_error' What happens is: - After doing the "continue" on inferior 1, the remote target gives us a VFORK_DONE event. The core ignores it and resumes inferior 1. - Since prune_inferiors is now called after each handled event, in fetch_inferior_event, it is called after we handled that VFORK_DONE event and resumed inferior 1. - Inferior 2 is pruned, which (see backtrace above) causes its program space to be deleted, which clears the symtabs for that program space, which calls the new_objfile observable and remote_new_objfile observer (with a nullptr objfile, to indicate that the previously loaded symbols have been discarded), which calls remote_check_symbols. remote_check_symbols is the function that sends the qSymbol packet, to let the remote side ask for symbol addresses. The problem is that the remote target is working in all-stop / sync mode and is currently resumed. It has sent a vCont packet to resume the target and is waiting for a stop reply. It can't send any packets in the mean time. That causes the exception to be thrown. This wasn't a problem before, when prune_inferiors was called in normal_stop, because it was always called at a time the target was not resumed. An important observation here is that the new_objfile observable is invoked for a change in inferior 2's program space (inferior 2's program space is the current program space). Inferior 2 isn't bound to any process on the remote side (it has exited, that's why it's being pruned). It doesn't make sense to try to send a qSymbol packet for a process that doesn't exist on the remote side. remote_check_symbols actually attempts to avoid that: /* The remote side has no concept of inferiors that aren't running yet, it only knows about running processes. If we're connected but our current inferior is not running, we should not invite the remote target to request symbol lookups related to its (unrelated) current process. */ if (!target_has_execution ()) return; The problem here is that while inferior 2's program space is the current program space, inferior 1 is the current inferior. So the check above passes, since inferior has execution. We therefore try to send a qSymbol packet for inferior 1 in reaction to a change in inferior 2's program space, that's wrong. This exposes a conceptual flaw in remote_new_objfile. The "new_objfile" event concerns a specific program space, which can concern multiple inferiors, as inferiors can share a program space. We shouldn't consider the current inferior at all, but instead all inferiors bound to the affected program space. Especially since the current inferior can be unrelated to the current program space at that point. To be clear, we are in this state because ~program_space sets itself as the current program space, but there is no more inferior having that program space to switch to, inferior 2 has already been unlinked. To fix this, make remote_new_objfile iterate on all inferiors bound to the affected program space. Remove the target_has_execution check from remote_check_symbols, replace it with an assert. All callers must ensure that the current inferior has execution before calling it. Change-Id: Ica643145bcc03115248290fd310cadab8ec8371c

Luis noticed that the recent changes to gdbserver to make it track process and threads independently regressed a few gdb.multi/*.exp tests for aarch64-linux. We started seeing the following internal error for gdb.multi/multi-target-continue.exp for example: Starting program: binutils-gdb/gdb/testsuite/outputs/gdb.multi/multi-target-continue/multi-target-continue ^M Error in re-setting breakpoint 2: Remote connection closed^M ../../../repos/binutils-gdb/gdb/thread.c:85: internal-error: inferior_thread: Assertion `current_thread_ != nullptr' failed.^M A problem internal to GDB has been detected,^M further debugging may prove unreliable. A backtrace looks like: #0 thread_regcache_data (thread=thread@entry=0x0) at ../../../repos/binutils-gdb/gdbserver/inferiors.cc:120 riscvarchive#1 0x0000aaaaaaabf0e8 in get_thread_regcache (thread=0x0, fetch=fetch@entry=0) at ../../../repos/binutils-gdb/gdbserver/regcache.cc:31 riscvarchive#2 0x0000aaaaaaad785c in is_64bit_tdesc () at ../../../repos/binutils-gdb/gdbserver/linux-aarch64-low.cc:194 riscvarchive#3 0x0000aaaaaaad8a48 in aarch64_target::sw_breakpoint_from_kind (this=<optimized out>, kind=4, size=0xffffffffef04) at ../../../repos/binutils-gdb/gdbserver/linux-aarch64-low.cc:3226 riscvarchive#4 0x0000aaaaaaabe220 in bp_size (bp=0xaaaaaab6f3d0) at ../../../repos/binutils-gdb/gdbserver/mem-break.cc:226 riscvarchive#5 check_mem_read (mem_addr=187649984471104, buf=buf@entry=0xaaaaaab625d0 "\006", mem_len=mem_len@entry=56) at ../../../repos/binutils-gdb/gdbserver/mem-break.cc:1862 riscvarchive#6 0x0000aaaaaaacc660 in read_inferior_memory (memaddr=<optimized out>, myaddr=0xaaaaaab625d0 "\006", len=56) at ../../../repos/binutils-gdb/gdbserver/target.cc:93 riscvarchive#7 0x0000aaaaaaac3d9c in gdb_read_memory (len=56, myaddr=0xaaaaaab625d0 "\006", memaddr=187649984471104) at ../../../repos/binutils-gdb/gdbserver/server.cc:1071 riscvarchive#8 gdb_read_memory (memaddr=187649984471104, myaddr=0xaaaaaab625d0 "\006", len=56) at ../../../repos/binutils-gdb/gdbserver/server.cc:1048 riscvarchive#9 0x0000aaaaaaac82a4 in process_serial_event () at ../../../repos/binutils-gdb/gdbserver/server.cc:4307 riscvarchive#10 handle_serial_event (err=<optimized out>, client_data=<optimized out>) at ../../../repos/binutils-gdb/gdbserver/server.cc:4520 riscvarchive#11 0x0000aaaaaaafbcd0 in gdb_wait_for_event (block=block@entry=1) at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:700 riscvarchive#12 0x0000aaaaaaafc0b0 in gdb_wait_for_event (block=1) at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:596 riscvarchive#13 gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:237 riscvarchive#14 0x0000aaaaaaacacb0 in start_event_loop () at ../../../repos/binutils-gdb/gdbserver/server.cc:3518 riscvarchive#15 captured_main (argc=4, argv=<optimized out>) at ../../../repos/binutils-gdb/gdbserver/server.cc:3998 riscvarchive#16 0x0000aaaaaaab66dc in main (argc=<optimized out>, argv=<optimized out>) at ../../../repos/binutils-gdb/gdbserver/server.cc:4084 This sequence of functions is invoked due to a series of conditions: 1 - The probe-based breakpoint mechanism failed (for some reason) so ... 2 - ... gdbserver has to know what type of architecture it is dealing with so it can pick the right breakpoint kind, so it wants to check if we have a 64-bit target. 3 - To determine the size of a register, we currently fetch the current thread's register cache, and the current thread pointer is now nullptr. In riscvarchive#3, the current thread is nullptr because gdb_read_memory clears it on purpose, via set_desired_process, exactly to expose code relying on the current thread when it shouldn't. It was always possible to end up in this situation (when the current thread exits), but it was harder to reproduce before. This commit fixes it by tweaking is_64bit_tdesc to look at the current process's tdesc instead of the current thread's tdesc. Note that the thread's tdesc is itself filled from the process's tdesc, so this should be equivalent: struct regcache * get_thread_regcache (struct thread_info *thread, int fetch) { struct regcache *regcache; regcache = thread_regcache_data (thread); ... if (regcache == NULL) { struct process_info *proc = get_thread_process (thread); gdb_assert (proc->tdesc != NULL); regcache = new_register_cache (proc->tdesc); set_thread_regcache_data (thread, regcache); } ... Change-Id: Ibc809d7345e70a2f058b522bdc5cdbdca97e2cdc

$ objdump -d outputs/gdb.base/large-frame/large-frame-O2 0000000120000b20 <func>: 120000b20: 67bdbff0 daddiu sp,sp,-16400 120000b24: ffbc4000 sd gp,16384(sp) 120000b28: 3c1c0002 lui gp,0x2 120000b2c: 679c8210 daddiu gp,gp,-32240 120000b30: 0399e02d daddu gp,gp,t9 120000b34: df998058 ld t9,-32680(gp) 120000b38: ffbf4008 sd ra,16392(sp) 120000b3c: 0411ffd8 bal 120000aa0 <blah> ... The disassembly of the above func function shows that we may use instructions such as daddiu/daddu, so add "daddiu $gp,$gp,n", "daddu $gp,$gp,$t9" and "daddu $gp,$t9,$gp" to the mips32_scan_prologue function to fix the large-frame.exp test case. Before applying the patch: backtrace #0 blah (a=0xfffffee220) at .../gdb/testsuite/gdb.base/large-frame-1.c:24 riscvarchive#1 0x0000000120000b44 in func () Backtrace stopped: frame did not save the PC (gdb) FAIL: gdb.base/large-frame.exp: optimize=-O2: backtrace # of expected passes 5 # of unexpected failures 1 After applying the patch: # of expected passes 6 Signed-off-by: Youling Tang <tangyouling@loongson.cn>

Simon reported that the recent change to make GDB and GDBserver avoid reading shell registers caused a GDBserver regression, caught with ASan while running gdb.server/non-existing-program.exp: $ /home/smarchi/build/binutils-gdb/gdb/testsuite/../../gdb/../gdbserver/gdbserver stdio non-existing-program ================================================================= ==127719==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f0000000e9 at pc 0x55bcbfa301f4 bp 0x7ffd238a7320 sp 0x7ffd238a7310 WRITE of size 1 at 0x60f0000000e9 thread T0 #0 0x55bcbfa301f3 in scoped_restore_tmpl<bool>::~scoped_restore_tmpl() /home/smarchi/src/binutils-gdb/gdbserver/../gdbsupport/scoped_restore.h:86 #1 0x55bcbfa2ffe9 in post_fork_inferior(int, char const*) /home/smarchi/src/binutils-gdb/gdbserver/fork-child.cc:120 #2 0x55bcbf9c9199 in linux_process_target::create_inferior(char const*, std::__debug::vector<char*, std::allocator<char*> > const&) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:991 #3 0x55bcbf954549 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3941 #4 0x55bcbf9552f0 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4084 #5 0x7ff9d663b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2) #6 0x55bcbf8ef2bd in _start (/home/smarchi/build/binutils-gdb/gdbserver/gdbserver+0x1352bd) 0x60f0000000e9 is located 169 bytes inside of 176-byte region [0x60f000000040,0x60f0000000f0) freed by thread T0 here: #0 0x7ff9d6c6f0c7 in operator delete(void*) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:160 #1 0x55bcbf910d00 in remove_process(process_info*) /home/smarchi/src/binutils-gdb/gdbserver/inferiors.cc:164 #2 0x55bcbf9c4ac7 in linux_process_target::remove_linux_process(process_info*) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:454 #3 0x55bcbf9cdaa6 in linux_process_target::mourn(process_info*) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:1599 #4 0x55bcbf988dc4 in target_mourn_inferior(ptid_t) /home/smarchi/src/binutils-gdb/gdbserver/target.cc:205 #5 0x55bcbfa32020 in startup_inferior(process_stratum_target*, int, int, target_waitstatus*, ptid_t*) /home/smarchi/src/binutils-gdb/gdbserver/../gdb/nat/fork-inferior.c:515 #6 0x55bcbfa2fdeb in post_fork_inferior(int, char const*) /home/smarchi/src/binutils-gdb/gdbserver/fork-child.cc:111 #7 0x55bcbf9c9199 in linux_process_target::create_inferior(char const*, std::__debug::vector<char*, std::allocator<char*> > const&) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:991 #8 0x55bcbf954549 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3941 #9 0x55bcbf9552f0 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4084 #10 0x7ff9d663b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2) previously allocated by thread T0 here: #0 0x7ff9d6c6e5a7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x55bcbf910ad0 in add_process(int, int) /home/smarchi/src/binutils-gdb/gdbserver/inferiors.cc:144 #2 0x55bcbf9c477d in linux_process_target::add_linux_process_no_mem_file(int, int) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:425 #3 0x55bcbf9c8f4c in linux_process_target::create_inferior(char const*, std::__debug::vector<char*, std::allocator<char*> > const&) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:985 #4 0x55bcbf954549 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3941 #5 0x55bcbf9552f0 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4084 #6 0x7ff9d663b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2) Above we see that in the non-existing-program case, the process gets deleted before the starting_up flag gets restored to false. This happens because startup_inferior calls target_mourn_inferior before throwing an error, and in GDBserver, unlike in GDB, mourning deletes the process. Fix this by not using a scoped_restore to manage the starting_up flag, since we should only clear it when startup_inferior doesn't throw. Change-Id: I67325d6f81c64de4e89e20e4ec4556f57eac7f6c

When building gdb with -fsanitize=thread and gcc 12, and running test-case gdb.dwarf2/dwz.exp, we run into a data race between thread T2 and the main thread in the same write: ... Write of size 1 at 0x7b200000300c:^M #0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile*, \ abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6252 \ (gdb+0x82f3b3)^M ... which is here: ... this_cu->dwarf_version = cu->header.version; ... Both writes are called from the parallel for in dwarf2_build_psymtabs_hard, this one directly: ... #1 process_psymtab_comp_unit gdb/dwarf2/read.c:6774 (gdb+0x8304d7)^M #2 operator() gdb/dwarf2/read.c:7098 (gdb+0x8317be)^M #3 operator() gdbsupport/parallel-for.h:163 (gdb+0x872380)^M ... and this via the PU import: ... #1 cooked_indexer::ensure_cu_exists(cutu_reader*, dwarf2_per_objfile*, \ sect_offset, bool, bool) gdb/dwarf2/read.c:17964 (gdb+0x85c43b)^M #2 cooked_indexer::index_imported_unit(cutu_reader*, unsigned char const*, \ abbrev_info const*) gdb/dwarf2/read.c:18248 (gdb+0x85d8ff)^M #3 cooked_indexer::index_dies(cutu_reader*, unsigned char const*, \ cooked_index_entry const*, bool) gdb/dwarf2/read.c:18302 (gdb+0x85dcdb)^M #4 cooked_indexer::make_index(cutu_reader*) gdb/dwarf2/read.c:18443 \ (gdb+0x85e68a)^M #5 process_psymtab_comp_unit gdb/dwarf2/read.c:6812 (gdb+0x830879)^M #6 operator() gdb/dwarf2/read.c:7098 (gdb+0x8317be)^M #7 operator() gdbsupport/parallel-for.h:171 (gdb+0x8723e2)^M ... Fix this by setting the field earlier, in read_comp_units_from_section. The write in cutu_reader::cutu_reader() is still needed, in case read_comp_units_from_section is not used (run the test-case with say, target board cc-with-gdb-index). Make the write conditional, such that it doesn't trigger if the field is already set by read_comp_units_from_section. Instead, verify that the field already has the value that we're trying to set it to. Move this logic into into a member function set_version (in analogy to the already present member function version) to make sure it's used consistenly, and make the field private in order to enforce access through the member functions, and rename it to m_dwarf_version. While we're at it, make sure that the version is set before read, to avoid say returning true for "per_cu.version () < 5" if "per_cu.version () == 0". Tested on x86_64-linux.

palmer-dabbelt added 5 commits September 2, 2015 16:07

Support --host=riscv{32,64}-unknown-linux-gnu

477f351

These are the canonical RISC-V tuples, despite the fact that our Linux port doesn't actually report those currently. This allows GDB to build on these sorts of platforms.

Move riscv_isa_regsize() to riscv-tdep.h

0889c00

This is needed in riscv-linux-tdep.c.

Fix an off-by-one in register walking

cdf2208

I'm not actually sure if the problem here is in the loop or the array, but GDB threw a warning here.

Add some missing objects to the RISC-V Linux target

b01d7ed

Without these GDB won't link.

Work around a merge change by disabling rv32

4f0023e

I can't figure out how to do this correctly, but I kind of need GDB so for now I'm just going to disable rv32 support here. The correct fix has to be super easy...

vapier added a commit that referenced this pull request Sep 4, 2015

Merge pull request #1 from palmer-dabbelt/native

c8f0d74

Make GDB compile in native mode

vapier merged commit c8f0d74 into riscvarchive:riscv-gdb Sep 4, 2015

palmer-dabbelt deleted the native branch November 9, 2015 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make GDB compile in native mode #1

Make GDB compile in native mode #1

palmer-dabbelt commented Sep 4, 2015

vapier commented Sep 4, 2015

Make GDB compile in native mode #1

Make GDB compile in native mode #1

Conversation

palmer-dabbelt commented Sep 4, 2015

vapier commented Sep 4, 2015