x/tools/gopls: improve the UX when the gopls daemon panics #42072
Sometimes gopls panics. Hopefully this is becoming increasingly rare, but by definition panics are unexpected and disruptive to the user experience. vscode-go improves the UX with special handling to identify and capture panics, but for other editor plugins the disruption can be completly opaque: gopls just stops working. Particularly when running gopls as a daemon, stderr is often /dev/null, and finding the panicking stack involves re-running the daemon and trying to reproduce the panic.
We can do better, by ensuring that stderr is always captured and doing our best to make it discoverable. Here is one way to achieve this:
The net effect of this should be that when the gopls daemon panics the deferred cleanup will not run and the temporary stderr will be persisted. Then, the LSP client will get a
The text was updated successfully, but these errors were encountered:
Thanks very much for raising this.
Isn't the issue here that, when working in remote mode, the panic doesn't make its way to the LSP client?
What does VSCode do in that situation? Presumably it suffers the same problem?
In the case of remote mode, isn't
Related to the previous point, isn't it generally going to be the case that the daemon will tell the forwarded about the location of its (the daemon's) stderr log file? Because it would seem to be the exception that the forwarder would tell the daemon where to log (because the forwarder will, generally, have a shorter life than the daemon).
I assume you are talking about the forwarded detecting this condition of abnormal exit, right? And hence talking about the LSP client connection? (as opposed to the forwarder as a client of the daemon)
Just to confirm one point here. We (
This sounds like a real improvement.
Is there some way that the LSP client can also learn about the location of the remote (temporary) stderr log? That would be a nice thing for us to be able to add to
FWIW we are discussing/working on better restart logic in govim/govim#963. Indeed, per that issue, it would be good to have some test-only way of triggered a panic in the daemon so that we can test the handling via
That's one issue. In my opinion it is also an issue that each client has to implement their own special handling for panics. For generic LSP clients this is not really possible, which is why I want to implement more graceful handling on our side.
Not in general, because the daemon can be started manually. But if using
Yes, that is the expectation. It's not enough to have whatever gopls instance starts the daemon set the logfile, because the daemon can be manually managed. If I do
Just to be clear, there's a difference between logfile and stderr. We already tell each forwarder gopls about the daemon's log file, but this file is configured via
Yes, I should have been more clear in this section (especially for any readers who don't know what the forwarder is). We have the following diagram:
Where the gopls forwarder is just responsible for discovering the daemon and forwarding the LSP. But because it operates at LSP level rather than net.Conn level, it is able to do some intercepting of messages along the way. The only place where we currently do this is to set the process environment, but I am proposing that we also use this layer to track whether the LSP client has properly shutdown. This allows us to detect when the daemon exits abnormally. In this case, we have an opportunity to send a final notification to the LSP client explaining what went wrong. I've tested all this and it works.
Yes, that's how it is supposed to work.
We could do this. The forwarder will have this information because it is exchanged in the handshake. Do you have a preference for how this information is surfaced?
Can you just SIGTERM the daemon? From the perspective of govim that should be equivalent.
Ah, that clarifies things, thank you.
That makes it clear, thank you.
Thanks for clarifying.
Other than "something structured", no 😄
Won't that entirely side-step all the infrastructure we're putting place for panic handling? It seems to me we're looking to test that as part of any test suite here.
No, there's nothing special about panics, and this should work equally well for any other 'crashes'. But if we need to we can add a non-standard request to force a panic.
For my own debugging purposes, I have a
@bhcleek, regarding this:
Per the above discussion, I think fundamentally when using
Unless there's specific objection, I am planning to go ahead with this plan. We can iterate as needed, but I think the core of it -- making sure we're capturing STDERR -- will be necessary no matter what.
Right, I see what you're saying - this is effectively something that we as LSP clients can consider "tested". As you say, we can always add a non-standard request in case it becomes necessary to get coverage on this (the SIGTERM approach certainly works well enough for testing our handling of restarting logic - thanks for pointing that out)