Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Not restart debugging #1942

Closed
changliao1025 opened this issue Nov 21, 2019 · 4 comments
Closed

Not restart debugging #1942

changliao1025 opened this issue Nov 21, 2019 · 4 comments

Comments

@changliao1025
Copy link

Issue Type: Bug

Debug a Python script;
Stop at a breakpoint;
Then hit restart, but nothing happened.

Extension version: 2019.11.49689
VS Code version: Code 1.40.1 (8795a9889db74563ddd43eb0a897a2384129a619, 2019-11-13T16:47:44.719Z)
OS version: Darwin x64 18.7.0
Remote OS version: Linux x64 3.10.0-957.5.1.el7.x86_64

System Info
Item Value
CPUs Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (8 x 4200)
GPU Status 2d_canvas: enabled
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
metal: disabled_off
multiple_raster_threads: enabled_on
oop_rasterization: disabled_off
protected_video_decode: unavailable_off
rasterization: enabled
skia_renderer: disabled_off
surface_control: disabled_off
surface_synchronization: enabled_on
video_decode: enabled
viz_display_compositor: enabled_on
viz_hit_test_surface_layer: disabled_off
webgl: enabled
webgl2: enabled
Load (avg) 2, 2, 2
Memory (System) 32.00GB (15.43GB free)
Process Argv
Screen Reader no
VM 0%
Item Value
Remote SSH: compy
OS Linux x64 3.10.0-957.5.1.el7.x86_64
CPUs Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (80 x 1000)
Memory (System) 187.36GB (100.51GB free)
VM 0%
@karthiknadig karthiknadig transferred this issue from microsoft/vscode-python Nov 21, 2019
@int19h
Copy link
Contributor

int19h commented Nov 21, 2019

There are no errors or really anything unusual in ptvsd logs here, but this gets logged by VSC in Output -> Log (Window):

[2019-11-21 15:30:23.432] [renderer1] [error] timeout after 500 ms: Error: timeout after 500 ms
    at t.RawDebugSession.handleErrorResponse (file:///C:/Program Files/Microsoft VS Code/resources/app/out/vs/workbench/workbench.desktop.main.js:2798:1003)
    at file:///C:/Program Files/Microsoft VS Code/resources/app/out/vs/workbench/workbench.desktop.main.js:2798:434
    at async t.RawDebugSession.shutdown (file:///C:/Program Files/Microsoft VS Code/resources/app/out/vs/workbench/workbench.desktop.main.js:2796:536)
    at async R.terminate (file:///C:/Program Files/Microsoft VS Code/resources/app/out/vs/workbench/workbench.desktop.main.js:5173:637)
    at async q.restartSession (file:///C:/Program Files/Microsoft VS Code/resources/app/out/vs/workbench/workbench.desktop.main.js:5197:851)

which corresponds to:
https://github.com/microsoft/vscode/blob/a318b97373602cf81fc146ff93547f5fae7cd417/src/vs/workbench/contrib/debug/browser/rawDebugSession.ts#L478-L491

So, the problem is that IDE gives us 500 ms to terminate, and after that considers it an error. The problem is that it takes us longer to fully handle a graceful shutdown - in my repro, around 600 ms from the moment we receive "disconnect", to the moment we respond to it.

The error itself occurs when using "Stop" as well, but in that case the only side effect is it being logged by VSC. However, if using "Restart", VSC will simply not proceed if it failed to stop the process. It also doesn't display any error message, so if you don't have the log window opened, there's no indication as to what went wrong.

@int19h
Copy link
Contributor

int19h commented Nov 21, 2019

We can reduce the timeout for process exit to match what VSC expects, but this means that we will almost always be force-killing the debuggee process from the launcher if "Stop" or "Restart" is used, instead of letting pydevd terminate the process from inside in response to "disconnect" with "terminateDebuggee":true that we send to it. At the minimum, this means that child processes might not get cleaned up. @fabioz, is there anything else on the pydevd shutdown path that would result in some adverse effects if it doesn't get a chance to run to completion?

@fabioz
Copy link
Contributor

fabioz commented Nov 22, 2019

@int19h the timeout is probably due to: https://github.com/microsoft/ptvsd/blob/master/src/ptvsd/_vendored/pydevd/pydevd.py#L1983

The shutdown procedure in pydevd is the following: it shuts down the the writer thread socket (https://github.com/microsoft/ptvsd/blob/master/src/ptvsd/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py#L401) and waits for the client (in this case ptvsd) to close its own socket so that the reader thread finishes (https://github.com/microsoft/ptvsd/blob/master/src/ptvsd/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py#L293) -- and we wait for all threads to finish given that timeout.

This is done to ensure that all messages are received by the client before the shutdown.

So, I think that the first thing is making sure that ptvsd is closing the socket on its side when pydevd shuts the socket down in its side (maybe it's already done, so, just double checking here) and the second thing is making that timeout (https://github.com/microsoft/ptvsd/blob/master/src/ptvsd/_vendored/pydevd/pydevd.py#L1983) lower in the pydevd side -- probably 0.3 since it seems we have 0.1 of overhead and on another 0.1 we'd have to hard-kill pydevd.

As a note, this was all done to ensure messages are sent -- i.e. in #1699, which was closed and reopened.

I think that making that timeout lower may increase the likelihood that messages aren't received (especially on tests, if there are lots of tests running in parallel I think that it may be likely that under heavy load we may hit those limits, which may be the reason why we have this occurrences in the ci -- to be sure we could raise those limits to a big value... say, 20 seconds when running on the ci to see if we ever find the issue related to partial messages or not, so, maybe we could read the default timeout from an environment variable -- default would be 0.3 seconds and in the ci it'd be 20 seconds).

What do you think?

@int19h
Copy link
Contributor

int19h commented Nov 22, 2019

It's getting closed on the adapter side, but it's a bit convoluted. Here's the log showing the sequence, starting from when the adapter gets "disconnect" from the IDE:

D+00004.032: IDE[1] --> {
                 "seq": 15,
                 "type": "request",
                 "command": "disconnect",
                 "arguments": {
                     "restart": false
                 }
             }

I+00004.032: /handling #15 request "disconnect" from IDE[1]/
             IDE requested "disconnect"; finalizing Session-1.

D+00004.032: /handling #15 request "disconnect" from IDE[1]/
             Server[1] <-- {
                 "seq": 17,
                 "type": "request",
                 "command": "disconnect",
                 "arguments": {
                     "terminateDebuggee": true
                 }
             }

D+00004.032: Server[1] --> {
                 "pydevd_cmd_id": 502,
                 "seq": 44,
                 "type": "response",
                 "request_seq": 17,
                 "success": true,
                 "command": "disconnect"
             }

I+00004.032: /handling #15 request "disconnect" from IDE[1]/
             Session-1 waiting for "exited" event...

D+00004.032: Server[pid=23624] --> {
                 "pydevd_cmd_id": 129,
                 "seq": 46,
                 "type": "event",
                 "event": "terminated",
                 "body": {}
             }

D+00004.032: /handling #46 event "terminated" from Server[pid=23624]/
             Closing Server[pid=23624] message stream

D+00004.032: Exiting message loop for channel Server[pid=23624]: [WinError 10038] An operation was attempted on something that is not a socket

I+00004.032: All debug servers disconnected; waiting for remaining sessions...

D+00004.563: Launcher[1] --> {
                 "seq": 5,
                 "type": "event",
                 "event": "exited",
                 "body": {
                     "exitCode": 0
                 }
             }

D+00004.563: Launcher[1] --> {
                 "seq": 6,
                 "type": "event",
                 "event": "terminated"
             }

D+00004.563: /handling #15 request "disconnect" from IDE[1]/
             Launcher[1] <-- {
                 "seq": 2,
                 "type": "request",
                 "command": "terminate"
             }

D+00004.563: Launcher[1] --> {
                 "seq": 7,
                 "type": "response",
                 "request_seq": 2,
                 "success": true,
                 "command": "terminate"
             }

I+00004.563: /handling #15 request "disconnect" from IDE[1]/
             Session-1 waiting for Launcher[1] to disconnect...

D+00004.563: /handling #6 event "terminated" from Launcher[1]/
             IDE[1] <-- {
                 "seq": 24,
                 "type": "event",
                 "event": "exited",
                 "body": {
                     "exitCode": 0
                 }
             }

D+00004.563: /handling #6 event "terminated" from Launcher[1]/
             Closing Launcher[1] message stream

D+00004.563: Exiting message loop for channel Launcher[1]: [WinError 10038] An operation was attempted on something that is not a socket

D+00004.563: /handling #15 request "disconnect" from IDE[1]/
             IDE[1] <-- {
                 "seq": 25,
                 "type": "event",
                 "event": "terminated"
             }

I+00004.563: /handling #15 request "disconnect" from IDE[1]/
             Session-1 finalized.

D+00004.563: /handling #15 request "disconnect" from IDE[1]/
             IDE[1] <-- {
                 "seq": 26,
                 "type": "response",
                 "request_seq": 15,
                 "success": true,
                 "command": "disconnect"
             }

So first of all it passes "disconnect" over to the server, but adds "terminateDebuggee", since "launch" implies that. Then it waits for "exited" event from the launcher (which just sits and waits on the process handle, and sends that event when that wait is done) - but in the meantime continues processing messages from the server. The server sends "terminated" event shortly after, indicating that message queue is flushed; the adapter closes the socket to the server in response. Note that the time stamps here all indicate rapid processing.

The next thing is the launcher reporting that the debuggee has exited, and it's where the 500ms delay first appears. And it does appear to correspond to the 500ms timeout in pydevd code that you linked to, but it's not clear to me what else could be blocking it to trigger that timeout...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants