Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: debugger freezes when loading 1M goroutines #129

Open
ramya-rao-a opened this issue May 28, 2020 · 12 comments
Open

bug: debugger freezes when loading 1M goroutines #129

ramya-rao-a opened this issue May 28, 2020 · 12 comments
Labels
debug/callstack callstack handling related debug issues Debug Issues related to the debugging functionality of the extension.
Projects

Comments

@ramya-rao-a
Copy link
Contributor

From microsoft/vscode-go#2562 by @aarzilli:

Debug https://github.com/aarzilli/delve_client_testing/blob/master/20bigstuff/main.go.

I expected a slight slowdown but the debugger should keep working. What happend is that the debugger freezes slowing down the whole system. Possible causes include VSCode-go trying to read the 100MB string, 10 million stack frames or 1 million goroutines. Possibly all three.

@stamblerre stamblerre added the Debug Issues related to the debugging functionality of the extension. label May 29, 2020
@stamblerre stamblerre changed the title Debugger freezes bug: debugger freezes with 1000000 goroutines Jun 3, 2020
@stamblerre stamblerre changed the title bug: debugger freezes with 1000000 goroutines bug: debugger freezes on pathological code Jun 3, 2020
@stamblerre stamblerre changed the title bug: debugger freezes on pathological code bug: debugger freezes Jun 3, 2020
@stamblerre
Copy link
Contributor

@aarzilli: Can dlv on the command-line handle this, or is this issue specific to VS Code Go?

@aarzilli
Copy link
Contributor

aarzilli commented Jun 3, 2020

Yes, both the command line client and gdlv work with this. I haven't checked GoLand in a year and I don't remember if they fixed it. What this is doing is checking that reasonable limits are passed to ListGoroutines, Eval and Stacktrace. You will probably encounter problems with the fact that DAP doesn't have a way, in the protocol, to deal with too many goroutines.

@hyangah hyangah added this to Needs triage in Debug Sep 17, 2020
@polinasok
Copy link
Contributor

I tried this wiht dlv cli. The debug session makes it to the breakpoint with no issues.

$ dlv debug issue129.go 
Type 'help' for list of commands.
(dlv) c
about to stop
> main.done() ./issue129.go:20 (PC: 0x10c4d5f)
    15:         sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
    16:         sh.Data = slh.Data
    17:         sh.Len = slh.Len
    18:         fmt.Printf("about to stop\n")
    19:         runtime.Breakpoint()
=>  20:         fmt.Printf("this\n")
    21:         fmt.Printf("should\n")
    22:         fmt.Printf("be\n")
    23:         fmt.Printf("fast\n")
    24: }
    25:
(dlv) 

At that point if we issue the same commands that VS Code would automatically, I see the following behavior:

(dlv) goroutines

This slowly prints 10K goroutines at a time with long pauses in-between. I killed the process after getting to 150000 in about 5 minutes.

(dlv) print buf
[]uint8 len: 100000000, cap: 100000000, [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...+99999936 more]

This returns immediately. Only 64 bytes are loaded.
Stack also prints quickly as it truncates.

(dlv) stack
 0  0x00000000010c4d5f in main.done
    at ./issue129.go:20
 1  0x00000000010c4e5c in main.makeStack
    at ./issue129.go:28
 2  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 3  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 4  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 5  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 6  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 7  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 8  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
 9  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
10  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
11  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
12  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
13  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
14  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
15  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
16  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
17  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
18  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
19  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
20  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
21  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
22  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
23  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
24  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
25  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
26  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
27  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
28  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
29  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
30  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
31  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
32  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
33  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
34  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
35  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
36  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
37  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
38  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
39  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
40  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
41  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
42  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
43  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
44  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
45  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
46  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
47  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
48  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
49  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
50  0x00000000010c4e77 in main.makeStack
    at ./issue129.go:31
(truncated)
(dlv) 

@polinasok
Copy link
Contributor

polinasok commented Nov 2, 2020

When running the same program in vscode:

about to stop
020-11-02T09:21:46-08:00 debug layer=rpc (async 3) -> rpc2.CommandOut{"State":{"Running":false,"Recording":false,"currentThread":{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},"currentGoroutine":{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""},"Threads":[{"id":8099856,"pc":16861268,"file":"/usr/local/go/src/runtime/mbitmap.go","line":456,"function":{"name":"runtime.heapBits.next","value":16861216,"type":0,"goType":0,"optimized":true},"goroutineID":1000034,"ReturnValues":null},{"id":8099900,"pc":140735118227286,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099901,"pc":140735118227586,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099902,"pc":16908520,"file":"/usr/local/go/src/runtime/mgcmark.go","line":1260,"function":{"name":"runtime.scanobject","value":16908224,"type":0,"goType":0,"optimized":true},"goroutineID":1000018,"ReturnValues":null},{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},{"id":8099904,"pc":16862226,"file":"\u003cautogenerated\u003e","line":1,"function":{"name":"runtime.heapBits.bits","value":16862208,"type":0,"goType":0,"optimized":true},"goroutineID":1000049,"ReturnValues":null}],"NextInProgress":false,"exited":false,"exitStatus":0,"When":""}} error: ""
continue state {"Running":false,"Recording":false,"currentThread":{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},"currentGoroutine":{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""},"Threads":[{"id":8099856,"pc":16861268,"file":"/usr/local/go/src/runtime/mbitmap.go","line":456,"function":{"name":"runtime.heapBits.next","value":16861216,"type":0,"goType":0,"optimized":true},"goroutineID":1000034,"ReturnValues":null},{"id":8099900,"pc":140735118227286,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099901,"pc":140735118227586,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099902,"pc":16908520,"[...]
handleReenterDebug(breakpoint).
2020-11-02T09:21:46-08:00 debug layer=rpc <- RPCServer.ListGoroutines(rpc2.ListGoroutinesIn{"Start":0,"Count":1})
2020-11-02T09:21:46-08:00 debug layer=rpc -> *rpc2.ListGoroutinesOut{"Goroutines":[{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""}],"Nextg":1} error: ""
To client: {"seq":0,"type":"event","event":"stopped","body":{"reason":"breakpoint","threadId":1,"allThreadsStopped":true}}
StoppedEvent("breakpoint")
From client: threads(undefined)
2020-11-02T09:21:46-08:00 debug layer=rpc (async 6) <- RPCServer.State(rpc2.StateIn{"NonBlocking":true})
2020-11-02T09:21:46-08:00 debug layer=rpc (async 6) -> rpc2.StateOut{"State":{"Running":false,"Recording":false,"currentThread":{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},"currentGoroutine":{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""},"Threads":[{"id":8099904,"pc":16862226,"file":"\u003cautogenerated\u003e","line":1,"function":{"name":"runtime.heapBits.bits","value":16862208,"type":0,"goType":0,"optimized":true},"goroutineID":1000049,"ReturnValues":null},{"id":8099856,"pc":16861268,"file":"/usr/local/go/src/runtime/mbitmap.go","line":456,"function":{"name":"runtime.heapBits.next","value":16861216,"type":0,"goType":0,"optimized":true},"goroutineID":1000034,"ReturnValues":null},{"id":8099900,"pc":140735118227286,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099901,"pc":140735118227586,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099902,"pc":16908520,"file":"/usr/local/go/src/runtime/mgcmark.go","line":1260,"function":{"name":"runtime.scanobject","value":16908224,"type":0,"goType":0,"optimized":true},"goroutineID":1000018,"ReturnValues":null},{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null}],"NextInProgress":false,"exited":false,"exitStatus":0,"When":""}} error: ""
ThreadsRequest
To client: {"seq":0,"type":"event","event":"output","body":{"category":"stderr","output":"2020-11-02T09:21:46-08:00 debug layer=rpc <- RPCServer.ListGoroutines(rpc2.ListGoroutinesIn{\"Start\":0,\"Count\":0})\n"}}
2020-11-02T09:21:46-08:00 debug layer=rpc <- RPCServer.ListGoroutines(rpc2.ListGoroutinesIn{"Start":0,"Count":0})

Things get stuck where we request all the goroutines. This happens before loading the stacktrace or variables, so at least at this stage those request do not contribute to the issue.

Unlike StrackTrace Request,Threads Request does not offer any settings for partial retrieval. There is not a feature in the UI to retrieve more threads or a specific thread.

Realistically a user won't be able to examine 1M goroutines by hand, would they? So why don't we set the limit internally and only retrieve the selected goroutine and N more goroutines?

@polinasok polinasok changed the title bug: debugger freezes bug: debugger freezes when loading 1M goroutines Nov 9, 2020
@polinasok
Copy link
Contributor

When truncating the threads list, we should signal to the user that this happened by writing to the DEBUG console with an OutputEvent. If paging is added to DAP( microsoft/debug-adapter-protocol#159) and then to vscode UI, we can revisit this design. Ideally the CALL STACK would display something like "showing N threads out of M" with a button to load more and/or a search field to look up an individual thread.

@polinasok
Copy link
Contributor

polinasok commented Nov 12, 2020

When trimming the list of threads to return, we should favor the threads that are stopped on breakpoints in user code over those in runtime.gopark because those are the ones the users might want to examine more closely, checking their stack and variables. This is related to #130 and #118.

@polinasok
Copy link
Contributor

polinasok commented Nov 12, 2020

Btw, reducing the number of goroutines in the repro program to something small, we can observe if 100MB string or 10 million stack frames cause any additional issues. They don't.

Even though we don't have supportsDelayedStackTraceLoading explicitly set, we are essentially supporting stackTrace paging by setting totalFrames to a number smaller than the length of the stack frames array in the response. StackTrace requests by default comes with 20 levels. And by default launch.json sets stackTraceDepth to 50, which we pass to delve to limit the number of stack frames we load. The UI then offers "Load All Stack Frames" link to issue a subsequent request.
image
In the log:

From client: stackTrace({"threadId":1,"startFrame":0,"levels":20})
To client: {"seq":0,"type":"response","request_seq":12,"command":"stackTrace","success":true,"body":{"totalFrames":51, "stackFrames":[...includes 20 frames...]}}
From client: stackTrace({"threadId":1,"startFrame":20,"levels":31})

As for a large "string", the current adapter supports maxArrayValues, 64 by default, so that's what is loaded.

@polinasok
Copy link
Contributor

Suggestion from @hyangah "Is it possible that in the threads request, DAP sends only the goroutine each underlying thread saw when it stops. And don't send any long standing, unscheduled goroutines".

Looks like we can either get this from looping over the threads and picking those with GoroutineID not 0 or alternatively looping over goroutines and picking those with an associated thread that was executing it (ThreadID not 0). I think this should be a superset of the goroutines stopped at breakpoints.

@hyangah
Copy link
Contributor

hyangah commented Nov 13, 2020

Only if we have an alternative way to present grouped, filtered, or a paged goroutines. :-( Otherwise, bugs like deadlock - so interesting goroutines are not on threads, etc will be impossible to debug. Let's see how the issue in the DAP repo moves.

@polinasok
Copy link
Contributor

It's true - displaying a subset could hide interesting goroutines. But at the same time, displaying 1M goroutines, even paged, won't be of much help either without additional searching or filtering features in the UI. It is not realistic to study them manually in order even if we support labels. Even if paging is added to DAP, we would still need to request UI changes for those in the editor.

How would you debug bugs like deadlock and such with delve cli? You will run into the same issues, won't you?

I guess we could do this with expression evaluation like we do with call, but that would be quite hacky. The backend function behind goroutines does support paging, but doesn't support filtering by label or criteria. The backend function behind goroutine only searches by id. We would have to load everything and search ourselves or update the backend to add advanced search features. The same functionality can then be reused if DAP+UI expose searching/filtering. But then we open the Pandora's box of supporting a custom request language via the expression prompt, hacking return values to print long multi-line results strings, not really something it was meant for.

@aarzilli
Copy link
Contributor

update the backend to add advanced search features

this isn't a bad idea, in general.

@gopherbot gopherbot added this to the Untriaged milestone Apr 8, 2021
@hyangah hyangah modified the milestones: Untriaged, Backlog Apr 14, 2021
suzmue added a commit to suzmue/delve that referenced this issue Jul 16, 2021
…request

This adds a cap and a log message if there are many goroutines. This will help
prevent the debugger from freezing, but does not yet address making sure the
interesting goroutines are the ones that are returned.

Updates golang/vscode-go#129
derekparker pushed a commit to go-delve/delve that referenced this issue Jul 21, 2021
…request (#2595)

This adds a cap and a log message if there are many goroutines. This will help
prevent the debugger from freezing, but does not yet address making sure the
interesting goroutines are the ones that are returned.

Updates golang/vscode-go#129
@polinasok polinasok added the debug/callstack callstack handling related debug issues label Apr 8, 2022
@polinasok
Copy link
Contributor

polinasok commented Apr 9, 2022

The editor no longer freezers because we limit the number of goroutines and notify the usesr via DEBUG CONSOLE:
image
We do make sure the selected goroutine is included in the list.
To get to other interesting goroutines (e.g. running) the user might need to use dlv config goroutineFilters (dlv doc).
Currently there is no other way to alter the list returned by default to get to goroutines that did not make the cut.

We filed a paging issue in Nov 2020. That triggered a desire to investigate an alternative protocol & related vscode UI options. But there is no ETA for these improvements.

In the meantime, our options include:

  • adding maxGoroutines to launch.json with a warning to users to increase at their own risk (Delve's maxGoroutines should be configurable via go.delveConfig #2041)
  • adding dlv config maxGoroutines also with a warning
  • promoting dlv config goroutineFilters via detailed documentation in vscode-go (expanding upon dlv documentation)
  • printing Use 'dlv config goroutineFilters' to DEBUG CONSOLE when stopped
  • setting dlv config goroutineFilters to -w running by default
  • adding more powerful filters via dlv config ourselves

@hyangah hyangah removed the DA: DlvDAP label Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debug/callstack callstack handling related debug issues Debug Issues related to the debugging functionality of the extension.
Projects
No open projects
Debug
  
Low priority
Development

No branches or pull requests

6 participants