Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: improve the UX when the gopls daemon panics #42072

Open
findleyr opened this issue Oct 19, 2020 · 6 comments
Open

x/tools/gopls: improve the UX when the gopls daemon panics #42072

findleyr opened this issue Oct 19, 2020 · 6 comments

Comments

@findleyr
Copy link
Contributor

@findleyr findleyr commented Oct 19, 2020

Sometimes gopls panics. Hopefully this is becoming increasingly rare, but by definition panics are unexpected and disruptive to the user experience. vscode-go improves the UX with special handling to identify and capture panics, but for other editor plugins the disruption can be completly opaque: gopls just stops working. Particularly when running gopls as a daemon, stderr is often /dev/null, and finding the panicking stack involves re-running the daemon and trying to reproduce the panic.

We can do better, by ensuring that stderr is always captured and doing our best to make it discoverable. Here is one way to achieve this:

  • If os.Stderr is /dev/null:
    • create a tempfile with 0600 permissions (conservative permissions since the caller didn't ask us to do this)
    • Redirect stderr to this temp file (using syscall.Dup2 for POSIX systems, etc.)
    • defer a cleanup func that redirects back to /dev/null and then deletes the tempfile
  • In the daemon, if os.Stderr is a normal file, tell the gopls forwarder about it during handshake.
  • In the forwarder, track whether "shutdown" or "exit" have been called by the LSP client. Detect abnormal daemon disconnection when "shutdown" or "exit" haven't been called.
  • On abnormal exit from the daemon, before closing the client connection, inject a message to the effect of "The gopls daemon exited abnormally. Please check the output at path/to/stderr and consider filing an issue at golang.org/issues"

The net effect of this should be that when the gopls daemon panics the deferred cleanup will not run and the temporary stderr will be persisted. Then, the LSP client will get a ShowMessage notification explaining what went wrong, and suggesting next steps.

CC @myitcv @leitzler @bhcleek, who might have opinions on this UX as plugin maintainers.

@gopherbot gopherbot added this to the Unreleased milestone Oct 19, 2020
@bhcleek
Copy link
Contributor

@bhcleek bhcleek commented Oct 19, 2020

FWIW, I don't think vim-go needs any special handling in gopls; I probably just enhance vim-go to handle a gopls panic better. It's not too hard to do; I just need to prioritize it.

@myitcv
Copy link
Member

@myitcv myitcv commented Oct 20, 2020

Thanks very much for raising this.

vscode-go improves the UX with special handling to identify and capture panics

Isn't the issue here that, when working in remote mode, the panic doesn't make its way to the LSP client?

What does VSCode do in that situation? Presumably it suffers the same problem?

If os.Stderr is /dev/null:

In the case of remote mode, isn't os.Stderr always going to be /dev/null for the daemon? Or have I misunderstood something?

In the daemon, if os.Stderr is a normal file, tell the gopls forwarder about it during handshake

Related to the previous point, isn't it generally going to be the case that the daemon will tell the forwarded about the location of its (the daemon's) stderr log file? Because it would seem to be the exception that the forwarder would tell the daemon where to log (because the forwarder will, generally, have a shorter life than the daemon).

On abnormal exit from the daemon, before closing the client connection...

I assume you are talking about the forwarded detecting this condition of abnormal exit, right? And hence talking about the LSP client connection? (as opposed to the forwarder as a client of the daemon)

Just to confirm one point here. We (govim) are treating the forwarded closing the connection to the client (i.e. us, govim) as abnormal. Because in normal operation it would only ever be the LSP client initiating the shutdown. That remains "correct" I assume?

The net effect of this should be that when the gopls daemon panics the deferred cleanup will not run and the temporary stderr will be persisted

This sounds like a real improvement.

Is there some way that the LSP client can also learn about the location of the remote (temporary) stderr log? That would be a nice thing for us to be able to add to GOVIMLogfilePaths in any case.

FWIW we are discussing/working on better restart logic in govim/govim#963. Indeed, per that issue, it would be good to have some test-only way of triggered a panic in the daemon so that we can test the handling via govim.

cc @cespare

@findleyr
Copy link
Contributor Author

@findleyr findleyr commented Oct 20, 2020

Isn't the issue here that, when working in remote mode, the panic doesn't make its way to the LSP client?

That's one issue. In my opinion it is also an issue that each client has to implement their own special handling for panics. For generic LSP clients this is not really possible, which is why I want to implement more graceful handling on our side.

In the case of remote mode, isn't os.Stderr always going to be /dev/null for the daemon

Not in general, because the daemon can be started manually. But if using -remote=auto, yes.

Related to the previous point, isn't it generally going to be the case that the daemon will tell the forwarded about the location of its (the daemon's) stderr log file

Yes, that is the expectation. It's not enough to have whatever gopls instance starts the daemon set the logfile, because the daemon can be manually managed. If I do gopls serve -listen=:8095 2> /tmp/gopls.daemon.stderr, I would expect that the daemon tell each of its clients about /tmp/gopls.daemon.stderr.

stderr log file

Just to be clear, there's a difference between logfile and stderr. We already tell each forwarder gopls about the daemon's log file, but this file is configured via log.SetOutput, which means it doesn't capture panics.

I assume you are talking about the forwarded detecting this condition of abnormal exit, right? And hence talking about the LSP client connection?

Yes, I should have been more clear in this section (especially for any readers who don't know what the forwarder is). We have the following diagram:

LSP Client --stdin/stdout--> gopls forwarder --tcp/uds--> daemon

Where the gopls forwarder is just responsible for discovering the daemon and forwarding the LSP. But because it operates at LSP level rather than net.Conn level, it is able to do some intercepting of messages along the way. The only place where we currently do this is to set the process environment, but I am proposing that we also use this layer to track whether the LSP client has properly shutdown. This allows us to detect when the daemon exits abnormally. In this case, we have an opportunity to send a final notification to the LSP client explaining what went wrong. I've tested all this and it works.

Because in normal operation it would only ever be the LSP client initiating the shutdown.

Yes, that's how it is supposed to work.

Is there some way that the LSP client can also learn about the location of the remote (temporary) stderr log

We could do this. The forwarder will have this information because it is exchanged in the handshake. Do you have a preference for how this information is surfaced?

test-only way of triggered a panic in the daemon

Can you just SIGTERM the daemon? From the perspective of govim that should be equivalent.

@myitcv
Copy link
Member

@myitcv myitcv commented Oct 20, 2020

... which is why I want to implement more graceful handling on our side

Ah, that clarifies things, thank you.

Not in general, because the daemon can be started manually. But if using -remote=auto, yes.

Good point.

I would expect that the daemon tell each of its clients about /tmp/gopls.daemon.stderr.

That makes it clear, thank you.

there's a difference between logfile and stderr

Thanks for clarifying.

Do you have a preference for how this information is surfaced?

Other than "something structured", no 😄

Can you just SIGTERM the daemon?

Won't that entirely side-step all the infrastructure we're putting place for panic handling? It seems to me we're looking to test that as part of any test suite here.

@findleyr
Copy link
Contributor Author

@findleyr findleyr commented Oct 20, 2020

Won't that entirely side-step all the infrastructure we're putting place for panic handling?

No, there's nothing special about panics, and this should work equally well for any other 'crashes'. But if we need to we can add a non-standard request to force a panic.

Other than "something structured", no 😄

For my own debugging purposes, I have a gopls -remote=<...> inspect sessions command that queries the information exchanged in the handshake as JSON. Would shelling out to a command like this work for you?

@bhcleek, regarding this:

FWIW, I don't think vim-go needs any special handling in gopls; I probably just enhance vim-go to handle a gopls panic better.

Per the above discussion, I think fundamentally when using -remote=auto there's no easy way for LSP clients to implement graceful handling on their own.

Unless there's specific objection, I am planning to go ahead with this plan. We can iterate as needed, but I think the core of it -- making sure we're capturing STDERR -- will be necessary no matter what.

@myitcv
Copy link
Member

@myitcv myitcv commented Oct 20, 2020

No, there's nothing special about panics

Right, I see what you're saying - this is effectively something that we as LSP clients can consider "tested". As you say, we can always add a non-standard request in case it becomes necessary to get coverage on this (the SIGTERM approach certainly works well enough for testing our handling of restarting logic - thanks for pointing that out)

@findleyr findleyr self-assigned this Oct 20, 2020
@stamblerre stamblerre moved this from Critical to In progress in vscode-go: gopls by default Nov 10, 2020
@findleyr findleyr moved this from In progress to Non-critical in vscode-go: gopls by default Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.