Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service/dap: support pause request #2466

Merged
merged 7 commits into from
May 17, 2021
Merged

Conversation

polinasok
Copy link
Collaborator

@polinasok polinasok commented May 5, 2021

Updates #1515
Updates golang/vscode-go#1474

@polinasok polinasok force-pushed the PauseRequest branch 4 times, most recently from fc99f10 to 2036797 Compare May 5, 2021 23:22
@polinasok
Copy link
Collaborator Author

I am having all sorts of issues with the test. Everything works locally, but as always TeamCity brings some surprises. The test uses loopprog.go, which doesn't do anything fancy. Yet have seen the following failures. Could they be related to halting? E.g. I am not checking state and issue a halt command request even if things are not running. Should I be?

### continue to breakpoint at line 16, then request next
2021-05-05T23:26:40Z debug layer=debugger switching to goroutine 1
2021-05-05T23:26:40Z debug layer=debugger nexting
2021-05-05T23:26:40Z debug layer=dap "next" command stopped - reason "next finished", location /delve/_fixtures/loopprog.go:17
### issue a halt request - which I expected to be a no-op because nothing was running
2021-05-05T23:26:40Z debug layer=dap [<- from client]{"seq":8,"type":"request","command":"pause","arguments":{"threadId":1}}
2021-05-05T23:26:40Z debug layer=debugger halting
2021-05-05T23:26:40Z debug layer=dap [-> to client]{"seq":0,"type":"response","request_seq":8,"success":true,"command":"pause"}
2021-05-05T23:26:40Z debug layer=dap [<- from client]{"seq":9,"type":"request","command":"stackTrace","arguments":{"threadId":1,"levels":20,"format":{}}}
2021-05-05T23:26:40Z debug layer=dap [-> to client]{"seq":0,"type":"response","request_seq":9,"success":true,"command":"stackTrace","body":{"stackFrames":[{"id":1003,"name":"main.main","source":{"name":"loopprog.go","path":"/delve/_fixtures/loopprog.go"},"line":17,"column":0},{"id":1004,"name":"runtime.main","source":{"name":"proc.go","path":"/usr/local/go/go1.14.15/src/runtime/proc.go"},"line":203,"column":0},{"id":1005,"name":"runtime.goexit","source":{"name":"asm_amd64.s","path":"/usr/local/go/go1.14.15/src/runtime/asm_amd64.s"},"line":1373,"column":0}],"totalFrames":3}}
### then a continue request, which failed miserably - is the previous halt somehow to blame?
2021-05-05T23:26:40Z debug layer=dap [<- from client]{"seq":10,"type":"request","command":"continue","arguments":{"threadId":0}}
2021-05-05T23:26:40Z debug layer=dap [-> to client]{"seq":0,"type":"response","request_seq":10,"success":true,"command":"continue","body":{"allThreadsContinued":true}}
2021-05-05T23:26:40Z debug layer=debugger continuing
SIGILL: illegal instruction
PC=0x4ae0dc m=0 sigcode=2
# breakpoint, next, halt (at a next stop), continue, halt, continue, halt, continue
# eventually another continue hit a fatal error
2021-05-05T23:46:52Z debug layer=debugger continuing
fatal error: unexpected signal during runtime execution
2021-05-05T23:46:52Z debug layer=dap [<- from client]{"seq":15,"type":"request","command":"pause","arguments":{"threadId":-2}}
2021-05-05T23:46:52Z debug layer=debugger halting
2021-05-05T23:46:52Z debug layer=dap "continue" command stopped - reason "manual", location /usr/local/go/go1.15.11/src/runtime/panic.go:1162
2021-05-05T23:46:52Z debug layer=dap [-> to client]{"seq":0,"type":"event","event":"stopped","body":{"reason":"fatal error","allThreadsStopped":true}}

@polinasok
Copy link
Collaborator Author

cc @hyangah @suzmue

@aarzilli
Copy link
Member

aarzilli commented May 6, 2021

The test has no reason to be more complicated than starting loopprog, waiting for a random timeout and sending a single pause request.

@polinasok
Copy link
Collaborator Author

polinasok commented May 6, 2021

The test has no reason to be more complicated than starting loopprog, waiting for a random timeout and sending a single pause request.

I figured that this would be the price of a non-flaky test. However, I wanted to keep it as-is for now so we can have a discussion as to why there have been all the issues that I have observed. I experimented with it quite a bit - adding (e.g. 2036797) and removing random time-outs and such and it did not make things better. I do think that the issue is the repeated continue/pause sequence and also calling pause when already at a stop. I had the loop because I wanted to both get the case when the main goroutine pauses and when there is no selected goroutine. I can take it out without reducing test coverage. But I would not want to take out the case of pausing while already in paused stated. Shouldn't that be working?

@aarzilli
Copy link
Member

aarzilli commented May 6, 2021

If you want to test something about the proc package you should write a test for the proc package, there's no reason to make things more complicated by going through a rpc layer.

@polinasok
Copy link
Collaborator Author

polinasok commented May 6, 2021

If you want to test something about the proc package you should write a test for the proc package, there's no reason to make things more complicated by going through a rpc layer.

My goal was not to test the proc package. I wanted to see how the dap layer handled different outcomes of halt. Does it handle halt errors? Does it handle cases where nothing is running and so nothing is interrupted and no stopped event is sent? What do things look like when there is no selected goroutine when we stop? In doing so I came across proc behavior that surprised me, so I wanted to learn more about it, to understand better the inner-workings of the codebase and the runtime I am working with.

In particular, should it be valid to call command halt when the debugger state is not running? Or should the client or the dap server detect that nothing is running and not call it? This would not only impact pause, but also disconnect.

@aarzilli
Copy link
Member

aarzilli commented May 6, 2021

My goal was not to test the proc package. I wanted to see how the dap layer handled different outcomes of halt. Does it handle halt errors? Does it handle cases where nothing is running and so nothing is interrupted and no stopped event is sent? What do things look like when there is no selected goroutine when we stop? In doing so I came across proc behavior that surprised me, so I wanted to learn more about it, to understand better the inner-workings of the codebase and the runtime I am working with.

Got it. But none of this is being tested here, this is a test about what happens if pause and resume are used in rapid succession.

In particular, should it be valid to call command halt when the debugger state is not running?

It should be but I don't think anyone has looked into it closely and it isn't something that's being tested in the wild either.

@polinasok
Copy link
Collaborator Author

My goal was not to test the proc package. I wanted to see how the dap layer handled different outcomes of halt. Does it handle halt errors? Does it handle cases where nothing is running and so nothing is interrupted and no stopped event is sent? What do things look like when there is no selected goroutine when we stop? In doing so I came across proc behavior that surprised me, so I wanted to learn more about it, to understand better the inner-workings of the codebase and the runtime I am working with.

Got it. But none of this is being tested here, this is a test about what happens if pause and resume are used in rapid succession.

Some of the test iterations that failed on TeamCity for me were sleeping between the successions, so they were not so rapid, but it did not help. Maybe the value I choose was still too small. I will try something more comparable to what a human would do. I just found the rapidly printing numbers polluting the output log very annoying. Any objections to changing this fixture to printing less often?

In particular, should it be valid to call command halt when the debugger state is not running?

It should be but I don't think anyone has looked into it closely and it isn't something that's being tested in the wild either.

It is used in the wild because both legacy vscode-go and dlv dap issue a halt as part of disconnect, without checking if the process is running. You can also run into double continue or double pause scenarios with multiple parallel clients or experimental and not fully fleshed out features like call, where the UI is not always correctly reflecting the running/stopped state, exposing the wrong controls. Obviously, that's not a big concern in reality.

@aarzilli
Copy link
Member

aarzilli commented May 6, 2021

The way manual stops work is pretty simple, we just send a SIGTRAP signal to the process, simulating what a software breakpoint would do. I don't know what the kernel does with it if the thread happens to be already in a trace-stopped state, I've never checked. I've also never checked what happens if the thread happens to have just exited trace-stop but hasn't been scheduled yet. It might be that something strange happens, it might be that there is a bug in our code.

The behavior is also very likely to be different between different OSes.

I'd be interested in looking into this more but I don't think this should be part of the test for this PR, because it's a problem in a different component, because it's unlikely to be fixed quickly given that it seems pretty low impact and because it's going to be easier to diagnose with fewer layers in between.

It is used in the wild because both legacy vscode-go and dlv dap issue a halt as part of disconnect, without checking if the process is running

If that's a concern I guess it's easier to fix now that you can call IsRunning, but if this was a common problem I think we would have heard something from users.

@polinasok
Copy link
Collaborator Author

If that's a concern I guess it's easier to fix now that you can call IsRunning, but if this was a common problem I think we would have heard something from users.

Likely so. Although if things are disconnecting and fail in the process, one might not even notice because either way all is gone in a fraction of a second. But yes, it's likely not a big deal because clearly in the common case where it matters, it hasn't bothered users yet.

@aarzilli
Copy link
Member

aarzilli commented May 7, 2021

I don't think this and the detach thing are related. I think the way we detach while killing the process has never been that great and sometimes it returns an error for stupid reasons.

@polinasok
Copy link
Collaborator Author

I don't think this and the detach thing are related. I think the way we detach while killing the process has never been that great and sometimes it returns an error for stupid reasons.

Fair enough. I will rework the test, but I think you can look at the implementation already. Looking forward to your review and thank you for the discussion.

suzmue and others added 5 commits May 6, 2021 23:56
…go-delve#2435)

The client can specify certain configurations in the initialize request.
For example, pathFormat determines the pathFormat. We do not currently
support configuring pathFormat, linesStartAt1, or columnsStartAt1, so
we report an error if the client attempts to set these to an
unsupported value.
Bintray is shutting down and the URL we used to install mingw is no
longer available. Use chocolatey instead.
…ve#2301)

Adds the low-level support for watchpoints (aka data breakpoints) to
the native linux/amd64 backend.

Does not add user interface or functioning support for watchpoints
on stack variables.

Updates go-delve#279
Copy link
Collaborator Author

@polinasok polinasok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pause test is now passing.
TestAttachStopOnEntry is failing. It has not been impacted by the pause change, but I have experienced it flake every once in a while in an unexpected way. I can adjust the test in a separate PR since it's not related.

2021-05-07T07:09:57Z debug layer=debugger continuing
2021-05-07T07:09:57Z debug layer=dap [<- from client]{"seq":13,"type":"request","command":"disconnect","arguments":{"terminateDebuggee":true}}
2021-05-07T07:09:57Z debug layer=debugger halting
2021-05-07T07:09:57Z debug layer=dap "continue" command stopped - reason "manual", location /usr/local/go/go1.16.4/src/runtime/sys_linux_amd64.s:159
2021-05-07T07:09:57Z debug layer=dap [-> to client]{"seq":0,"type":"event","event":"stopped","body":{"reason":"pause","threadId":1,"allThreadsStopped":true}}
2021-05-07T07:09:57Z debug layer=dap [-> to client]{"seq":0,"type":"event","event":"output","body":{"category":"console","output":"Detaching and terminating target process\n","source":{}}}
2021-05-07T07:09:57Z debug layer=debugger detaching
2021-05-07T07:09:57Z error layer=dap no such process
2021-05-07T07:09:57Z debug layer=dap Error while disconnecting: no such process


// Pause will be a no-op at a breakpoint: there will be no additional stopped events
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with this some more using personal builds. This pause at breakpoint, not the stress testing, appears to be causing consistent fatal errors during the continue following right after. I removed it so the test doesn't fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which platform?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

@aarzilli aarzilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aarzilli
Copy link
Member

aarzilli commented May 7, 2021

Pause test is now passing.
TestAttachStopOnEntry is failing. It has not been impacted by the pause change, but I have experienced it flake every once in a while in an unexpected way. I can adjust the test in a separate PR since it's not related.

I think this is the problem with how we do the Detach(kill=true).

@polinasok
Copy link
Collaborator Author

Pause test is now passing.
TestAttachStopOnEntry is failing. It has not been impacted by the pause change, but I have experienced it flake every once in a while in an unexpected way. I can adjust the test in a separate PR since it's not related.

I think this is the problem with how we do the Detach(kill=true).

#2476

@polinasok
Copy link
Collaborator Author

@derekparker Is there anything else that needs to be addressed, so this can be merged? Thank you.

@derekparker
Copy link
Member

@polinasok this LGTM, please just rebase and then I will merge today.

@polinasok
Copy link
Collaborator Author

@derekparker Rebase done

@derekparker derekparker merged commit c10b222 into go-delve:master May 17, 2021
@polinasok polinasok deleted the PauseRequest branch May 17, 2021 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants