New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix debugger re-entry while detaching #599
Conversation
d4f1ec9
to
4c2ca52
Compare
Alright, I'll test this out in a bit and let you know if it fixes the issue. |
Ok, I'm still testing manually as we speak, but if you also test no harm :) |
Ok, testing in master by adding Can you confirm the branch fixes your issue too? |
4c2ca52
to
85a9601
Compare
Sure, let me pull the branch on my local Duktape repo and I'll test it. |
Detaching and reattaching at the debugger statement seems to work now. However now there's a different issue: I can't detach+reattach while the the target is running (it causes a detach like the original bug). |
Hmm, I wonder why. Could you post a debug log about that? The reattach should automatically make the client paused but maybe something goes wrong with that. It should be unrelated to |
Okay, something weird is going on here, minisphere is resuming execution instead of waiting for the debugger to reattach even though I ran it with the
|
The segfault points to a larger issue (I couldn't get it to happen again), but here's the log from the point I forcibly terminated SSJ and then reattached:
|
Hmm, it'd be nice if you could add a debug log to the |
New log, from first attach to minisphere termination:
|
It looks like it immediately detached after |
In case it's relevant, here's my reattach code: |
Pushed one more assert to test with - it might be that we're in "detaching" state because an error which happens here: we'd then exit the message loop without reattaching here: which would be incorrect. I added an assert there to see if this is indeed what's happening in your case. If so, it's much clearer what to fix. |
Okay, I'll try it out with assertions enabled and see what happens. |
|
That's actually a different assert that you hit this time. But it's a similar case: an earlier Added one more assert (and a potential fix). I'd first like to see where the root cause is. It'd then probably be best to allow |
Well, no assertions were triggered this time, but it still detaches upon the second reattach. |
|
@fatcerberus I'm having a hard time reproducing the same behavior - could you add debug prints to your read, write, and detached callbacks so it'd be easier to see when those calls happen in relation to Duktape logging (ideally it'd be the same stream i.e. I'll read a bit more code soonish and try to figure out what's going on. |
Hmm, if I read the Minisphere code correctly, you're calling also |
Even with a |
I don't know the exact reason for that detach-reattach cycle - I think I did it to ensure that I was always starting from a clean slate. I'll run minisphere under the MSVC debugger and see if I can trace the code path later. |
That detach+attach is not explicitly supported and I was assuming you'd only call attach in the initial attach or detached callback which is why the log looked weird at first:
There's a "detach1" even in the initial attach because it is preceded by a detach. And because Anyway, this log entry looks weird to me too (it's in the "reattach" position of the commented log):
The only relevant call site for that is
Commenting a bit on that snippet:
I might be able to understand better if you re-ran this test with the few added debug commits I pushed, and if you could debug print from the read, write, and detached callbacks :) |
The heap is freed because, when run with the However that shouldn't happen either--it should be waiting up to 30 seconds for a new connection before terminating. I'm not sure why but it's terminating immediately rather than waiting. |
@harold-b A longjmp could most likely cause that. It would've had to happen between setting In any case, with the fix to NULL all the I/O callbacks the first read error would have NULLed all callbacks other than the detached callback. This would be better behavior but still not correct because the latter half of detach1() would still have been skipped due to However, I'm not sure why a longjmp() would happen between that short span of code, other than e.g. out-of-memory. But that's hard to imagine because the Detaching notify consists of just three integers, i.e. no strings or other dynamic data. So unless there's a way to reproduce it's quite difficult to do anything about it - I can't explain that behavior by simply reading the code. At least right now, I can retry tomorrow :-) |
So unless there's a way to reproduce it's quite difficult to do anything about it - I can't explain that behavior by simply reading the code. At least right now, I can retry tomorrow :-) Oh no worries, I'm not asking for a fix, since I could hardly even reproduce it. I was just trying to point out that I couldn't understand the state of that bug give how the function was set up to protect itself! |
@harold-b @svaarala |
@fatcerberus I can't imagine why NULLing the other callbacks would cause any (at least harmful) side effects, but I won't merge this branch until there's been some more testing on the changes. I'll do my best with what I can test with once I've had a chance to finalize cleaning up the original fix. |
@fatcerberus Yeah, everything is working perfectly now though. Thanks again to both of you. |
I'm assuming the minimal fix (NULL'ing only write_cb) is what will be merged to master for 1.4.1? |
Not master, I meant the 1.4.x maintenance branch. |
Stepping back a bit, there have been several small bugs related to the debugger state handling. That indicates pretty clearly that the state handling is not very well structured. I'll try to keep an eye on that as the code is worked on. The next bigger thing would be to add setjmp() protection to guard against unlikely causes such as out of memory (which are not handled now), and maybe I can try to restructure some of the code a bit in the process. Anyway, if any new issues arise (especially something reproducible), I'll be eager to investigate and fix :-) @fatcerberus Pros and cons to each approach, but I'd favor the minimal fix for 1.4.1 and the cleaner fix for master. No fix is needed prior to 1.4 because there's no Detaching callback there (nor detach1/detach2 split for that matter; and of course no support for debugger reattach in detached callback :). |
I always found the concept of bugs in a debugger amusing. It's half the fun of developing SSJ, I always get such a kick out of telling people my debugger is buggy. :) |
a288a3e
to
e3ec6f0
Compare
* Set SIGPIPE handler to SIG_IGN when signal setup is enabled. This avoids hard exits due to SIGPIPE when debugger transport connections break. * Add --reattach option to allow the auto-reattach test to be done without modifying duk_cmdline.c.
e3ec6f0
to
08b2cf4
Compare
Fix two separate bugs: * Potential to re-enter the debugger recursively when a detach happened inside the debugger message loop. * Potential for detach initiated from outside the debugger message loop to be never finalized. The re-entry bug was caused by `dbg_processing` being set to 0 in detach1() while in the debugger message loop. This caused the potential for the debugger code to be re-entered recursively before the detach was finalized using detach2(), which could have various harmful side effects. The fix is to keep `dbg_processing` set to 1 at all times when still in the debugger message loop. The detach finalization bug was triggered by a debug transport callback called outside the debugger message loop indicating the transport connection had broken (this would concretely happen e.g. for a periodic Status notify). This caused detach1() to run which, among other things, sets dbg_read_cb to NULL and dbg_detaching to 1. The intent is then for detach2() to finalize the detach (this is separated from detach1() to allow reattach-during-detached-callback to be handled correctly). However, when dbg_read_cb was set to NULL outside the debugger message loop, the message loop would never be entered again -- and only the message loop has handling for finalizing a detach using detach2(). The fix for this is to allow the executor to enter the debugger message loop when a detach is pending (dbg_read_cb == NULL and dbg_detaching == 1) so that the detach can be correctly finalized. The message loop detach2() handling now also supports the corner case where user code reattaches in the detached callback but a detach is immediately triggered while still inside the detached callback. This may happen because `duk_debugger_attach()` writes the debugger protocol handshake line inline inside the API call using the transport write callback, and that callback may fail and indicate a transport error. The fix is to then issue another detached callback etc, until we are either cleanly detached or cleanly attached and can proceed. Finally, the control flow was changed so that the `dbg_processing` flag is set and cleared inside the debugger message loop now rather than in several separate call sites outside of it. This should be less error prone. Both bugs fixed in this commit were introduced in Duktape 1.4.0 as part of adding the Detaching notify and splitting detach handling into detach1() and detach2(). As such, the fixes don't need to be backported to Duktape 1.3.x etc.
Fix a bug in `duk_debug_write_byte()` where a transport write callback error (indicating a transport detached/broken condition) did not NULL the transport `dbg_write_cb` as intended. As a result, the detach1() handler (which runs immediately afterwards) would try to write a "Detaching" notify and invoke the transport write callback after it already returned an error. This is a violation of the debug transport contract. Fixed by making duk_debug_write_byte() simply call duk_debug_write_bytes() which has correct handling for the write_cb NULLing. This reduces call sites at the cost of being a little less optimal. However, the same approach is already used for read helpers so this change makes the handling symmetric.
NULL all debugger transport I/O callbacks (i.e. callbacks other than detaching_cb) whenever the transport indicates an error has occurred. By the API contract we're not allowed to call any of them after transport read/write error so it's safest to NULL them all on a transport error. This is not strictly necessary because only the write_cb is used by detach1() currently before NULLing the callbacks anyway. However, NULLing them early on transport error ensures there's no potential to call them incorrectly even when the code is reorganized later.
08b2cf4
to
ca6c628
Compare
I've cleaned up the fixes, improved commit messages, squashed and rebased now and I'd be ready to merge this from my point of view. I can't reproduce any issues with the updates - but it'd be great if you could double check that from the tip. @fatcerberus One issue in particular: while the debugger is running Duktape will log these two log lines around 5 times a second (total 10 log lines per second or so):
Do you still get way more than that? There's a change in this behavior because I tried to simplify the executor handling of Status notify sending. It's now folded into the debugger message loop which now sends a Status if the state is marked dirty, even when no messages end up being processed. This may technically increase the peek() callback rate slightly - but there should be very little practical difference: the earlier code would send a status when dirty without entering the debugger loop, but that dirty state could only really be set by (a) becoming paused (in which case we enter the message loop anyway) or (2) the timer based periodic debugger poll check (in which case we enter the message loop too). For me the debug logging rate is reasonable, and I've double checked that the Unix example debug transport peek() rate is also reasonable for me. |
Let me try it out at I'll let you know. The thing is, the Windows terminal emulator is very slow, and gets progressively worse as the number of visible lines increases. This makes Duktape literally unusable for me with |
Yes, But is |
Just tested it. Yes, that's reasonable. At one point when we were testing fixes it was literally printing those two lines nonstop, it's much better now. |
Yeah, that was the intermediate version where status sending was moved to the message loop, but the message loop wouldn't actually send a Status unless a message was processed. As a result, the state dirty -flag would remain set and we'd re-enter the message loop over and over again. Anyway, that's fixed and if the branch is still working correctly I'll re-read the final diff and merge. |
I tested various scenarios: Stopped at breakpoint, single-step, debugger statement, disconnect-while-running... and detached and reattached several times, everything looks to be working correctly. |
Fix debugger re-entry while detaching
Only backport the recursive re-entry into debugger after detach issue.
Fixes #597. Possibly related to debugger read callback re-entry issue described in #591.
Tasks: