Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb, gdb-remote] Bug in SendContinuePacketAndWaitForResponse (with Patch/Fix proposal) #59641

Open
mf-RDP opened this issue Dec 21, 2022 · 6 comments
Labels

Comments

@mf-RDP
Copy link

mf-RDP commented Dec 21, 2022

Hi all,

I think I discovered an issue in the async thread for GDB remote.
When stepping with high speed (max keyboard rate -> 30 'step' / sec) through a multisthreaded program, it appears that the async thread that calls SendContinuePacketAndWaitForResponse sometimes, in very rare occurrences receives not the T05 (stopreply) packet but the normal response for another query packet that is sent (by the other thread). This will of course result in a connection drop.

Below is an example log output.

I tried to fix this, please see attached code. However, this fix is not practically applicable I think, as it will drop packet bytes if it receives accidentially a normal response but should receive a T05 packet. However, it solves the problem.

Unfortunately, I am unsure what the root cause of this can be. As far as I can see the threads shall lock the underlying network connection.

Many Thanks
S.

[LOG]

 Fetching extended information for thread 0001
 <  37> send packet: $jThreadExtendedInfo:{"thread":1}]#35
 < 749> read packet: $[{"tid":1,"metype":0,"mecount":0,"medata":[],"name":"Main Thread","qname":"default","qkind":"concurrent","reason":"breakpoint","qaddr":0,"registers":["0","79FF0100C0000000","1","F7FFFFFFBF000000","2","F6FFFFFFBF000000","3","BEFA0100C0000000"],"memory":[]}],{"tid":2,"metype":0,"mecount":0,"medata":[],"name":"Worker Thread","qname":"default","qkind":"concurrent","reason":"breakpoint","qaddr":0,"registers":["0","B2A30000C0000000","1","00000000C0000000","2","FFFFFFFFBF000000","3",null],"memory":[]}],{"tid":3,"metype":0,"mecount":0,"medata":[],"name":"Worker Thread","qname":"default","qkind":"concurrent","reason":"breakpoint","qaddr":0,"registers":["0","B2A30000C0000000","1","00000000C0000000","2","FFFFFFFFBF000000","3",null],"memory":[]}]]#35
 <  20> send packet: $qSWasmCallStack2#00
 < 935> read packet: ${"trace_messages":[{"message":""}],{"message":"call: _start[] returns[]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: __main_void[] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: __main_argc_argv[int32=[1] int32=[89104]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: main[int32=[0] int32=[0]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: _Z18debugger_terminatev[] returns[]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: _Z11printcolorsv[] returns[]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":"[truncated]"}]]}]#e7
 <  19> send packet: $p2;thread:0002;#8c
 <  25> read packet: $C00000A3B2,C000000000#3a
 Fetching extended information for thread 0002
 <  37> send packet: $jThreadExtendedInfo:{"thread":2}]#36
 <  20> read packet: $FFFFFFFFBF000000#d8
 <  20> send packet: $qSWasmCallStack3#01
 < 209> read packet: ${"trace_messages":[{"message":""}],{"message":"call: _Z14infiniteloop_APv[int32=[3366064]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: state_memorysafe_bit"}],{"message":""}]]}]#bb
 <  19> send packet: $p2;thread:0003;#8d
 <  25> read packet: $C00000A3B2,C000000000#3a
 Fetching extended information for thread 0003
 <  37> send packet: $jThreadExtendedInfo:{"thread":3}]#37
 <  20> read packet: $FFFFFFFFBF000000#d8
 <  17> send packet: $xc00001ff6e,b#c1
 < 206> read packet: ${"trace_messages":[{"message":""}],{"message":"call: _Z14infiniteloop_APv[int32=[3456176]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}]]}]#84
 Check if need to update ignored signals
 Resuming thread: 0001 with state: running.
 Resuming thread: 0002 with state: suspended.
 Resuming thread: 0003 with state: suspended.
 ProcessGDBRemote::Resume()
 ProcessGDBRemote::lldb_private::process_gdb_remote::ProcessGDBRemote::AsyncThread(pid = 2) Got an event of type: 1...
 ProcessGDBRemote::lldb_private::process_gdb_remote::ProcessGDBRemote::AsyncThread(pid = 2) got eBroadcastBitAsyncContinue: vCont;c:1
 GDBRemoteClientBase::ContinueLock::lldb_private::process_gdb_remote::GDBRemoteClientBase::ContinueLock::lock() resuming with vCont;c:1
 <  13> send packet: $vCont;c:1#13
 <  15> read packet: $0b0340200e210f411c2116#40
 GDBRemoteClientBase::lldb_private::process_gdb_remote::GDBRemoteClientBase::SendContinuePacketAndWaitForResponse () got packet: ��@ �!�A�!�        
 GDBRemoteClientBase::lldb_private::process_gdb_remote::GDBRemoteClientBase::SendContinuePacketAndWaitForResponse () unrecognized async packet
 ProcessGDBRemote::lldb_private::process_gdb_remote::ProcessGDBRemote::AsyncThread(pid = 2) thread exiting...

gdbremote_diff.txt

@mf-RDP
Copy link
Author

mf-RDP commented Dec 21, 2022

If anyone has any question, please ask. I will try to assist although my knowledge regarding the gdbremote internals are of course very limited.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 21, 2022

@llvm/issue-subscribers-lldb

@labath
Copy link
Collaborator

labath commented Dec 21, 2022

We already have code to prevent this see Lock and ContinueLock classes in GDBRemoteClientBase.cpp -- but of course it's possible they have bugs.

Looking at your log snippet, I get the feeling that problems start long before the async resume thread starts doing anything

< 37> send packet: $jThreadExtendedInfo:{"thread":1}]#35
< 749> read packet: $[{"tid":1,"metype":0,"mecount":0,"medata":[],"name":"Main Thread","qname":"default","qkind":"concurrent","reason":"breakpoint","qaddr":0,"registers":["0","79FF0100C0000000","1","F7FFFFFFBF000000","2","F6FFFFFFBF000000","3","BEFA0100C0000000"],"memory":[]}],{"tid":2,"metype":0,"mecount":0,"medata":[],"name":"Worker Thread","qname":"default","qkind":"concurrent","reason":"breakpoint","qaddr":0,"registers":["0","B2A30000C0000000","1","00000000C0000000","2","FFFFFFFFBF000000","3",null],"memory":[]}],{"tid":3,"metype":0,"mecount":0,"medata":[],"name":"Worker Thread","qname":"default","qkind":"concurrent","reason":"breakpoint","qaddr":0,"registers":["0","B2A30000C0000000","1","00000000C0000000","2","FFFFFFFFBF000000","3",null],"memory":[]}]]#35
< 20> send packet: $qSWasmCallStack2#00
< 935> read packet: ${"trace_messages":[{"message":""}],{"message":"call: _start[] returns[]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: __main_void[] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: __main_argc_argv[int32=[1] int32=[89104]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: main[int32=[0] int32=[0]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: _Z18debugger_terminatev[] returns[]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":""}],{"message":"call: _Z11printcolorsv[] returns[]"}],{"message":"stack_depth: 0"}],{"message":"call_state: none_or_executing"}],{"message":"[truncated]"}]]}]#e7
< 19> send packet: $p2;thread:0002;#8c
< 25> read packet: $C00000A3B2,C000000000#3a

This doesn't look like a correct response to the p packet.

Fetching extended information for thread 0002
< 37> send packet: $jThreadExtendedInfo:{"thread":2}]#36
< 20> read packet: $FFFFFFFFBF000000#d8

Ditto for jThreadExtendedInfo. Could this be response to p instead?

< 20> send packet: $qSWasmCallStack3#01
< 209> read packet: ${"trace_messages":[{"message":""}],{"message":"call: _Z14infiniteloop_APv[int32=[3366064]] returns[int32]"}],{"message":"stack_depth: 0"}],{"message":"call_state: state_memorysafe_bit"}],{"message":""}]]}]#bb
< 19> send packet: $p2;thread:0003;#8d
< 25> read packet: $C00000A3B2,C000000000#3a

same here

Fetching extended information for thread 0003
< 37> send packet: $jThreadExtendedInfo:{"thread":3}]#37
< 20> read packet: $FFFFFFFFBF000000#d8

and here

< 17> send packet: $xc00001ff6e,b#c1
< 206> read packet: ${"trace_messages":[{"message":""}],{"message":"call: _Z14infiniteloop_APv[int32=

Is this a response to jThreadExtendedInfo instead?

@mf-RDP
Copy link
Author

mf-RDP commented Dec 21, 2022

Hi Pavel,

thanks for looking at this so directly. You are absolutely right, the whole log, sent<>recvs, looks mixed up; I did not really notice, because I looked primarily at my own log (the other end, remote).

I will investigate this and report back to you.

In between, I found&fixed an issue on my end that potentially could have corrupted memory within lldb (maybe). Although I think it was not really related to what I'm reporting, I need to research and come back.
The whole thing is a bit hard to track down as it only appears randomly and very rarely on high packet throughput.

Many Thanks

@mf-RDP
Copy link
Author

mf-RDP commented Dec 21, 2022

... ok, tracked one better down. For the moment, please see attached text file, Line 50 is the point where things seem to get out of sync. As of now, I have no idea what on my end could trigger this behaviour.
better_sample.txt

@mf-RDP
Copy link
Author

mf-RDP commented Dec 22, 2022

Hi again,

so ok - I did try to sort out anything that could be related to my end. I also did disable replies for the JSON Thread packets jThreadsInfo and extInfo (still does run fine, I implement all other needed packets).

The issue remains. It appears I could narrow it down a bit: [Edit: unfortunately it does not seem to narrow down to this. I'm also experiencing other combinations, but it's the case in 90% of the errors I get]

The send<>recv seems really to get out of sync once it sends a qThreadStopInfoX but receives a %T05 stop reply, not a $T05 normal answer.

It still runs a bit afterwards but will error out soon.

Please see attached log#2 as well.

Note: One single point I notice, I get rather often the following message. Although it does not seem to harm:
GDBRemoteCommunicationClient::lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::SendThreadSpecificPacketAndWaitForResponse: Didn't get sequence mutex for p2 packet.
It is always the 'p2' packet.

Many Thanks

Also attached a large, complete log. Problem arises in line 10584 - 10586.

One more Edit: Maybe the problems result from lldb receiving two T05 stopreplies. Confirmed: Strictly preventing the stub from sending more than a single stopreply after one run/continue/vCont will prevent this issue from appearing. So, I think this can be considered as resolved, although an error message (log) or something similar would be great as the problem is hard to identify.

One other thing a came across is an assertion in ProcessGDBRemote::WillPublicStop() where const size n is != 0 and when the for loop is iterating, it is 0. So, some other thread must have done something to StructuredData::Array *thread_infos.
Turning off support for JSON threads info solved this.

log2.txt
lldb_log.txt
(large)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants