-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardening more of Servo against panic? #24167
Comments
And before @nox jumps in to say it, the Erlang philosophy of let-the-processes-crash-and-have-a-small-trusted-service-to-restart-them runs into problems in the case of services like our script threads, which have libraries like SpiderMonkey that have lots of in-memory unserializable state. :( |
I don't understand what that means @asajeffrey. Killing and restarting gets rid of in-memory state, that's the point. We should make sure to not panic in places where SM wouldn't be able to recover, but making script threads panic shouldn't bring down Servo. The point of Erlang is that leaves crashing don't bring down your core, it's not so much about restarting in a way that perfectly restores the state. For example, a Web server written in Erlang shouldn't go down just because a request provoked a crash, the process handling the request will crash and not be restarted and the rest of the system will happily continue to do whatever it is doing. |
One other wrinkle that we've had around panic is the delayed-panic due to channel-based communication. That is, if the receive end of a channel is in a thread that panics but we do not tear down, the thread with a send end will panic at the moment that they attempt to send a message. IPC channels may have been hardened against this, but my memory is fuzzy. I feel like there are several issues here:
I'm not sure how far we will get with a custom panic handler with log output and resume, but I'd definitely be interested to find out, too! |
An In the constellation we take care never to It might be easier to harden Servo in a coarse grain manner, at the level of processes, versus trying to catch panics and do something about it inside processes. I personally like hard expectations where they're possible, like script, because if the tests on CI passes, I don't go in to read the logs to see if there perhaps was a warning. I do look when there's a panic. A central component like the constellation is probably best placed to restart other component if they fail, and for example script should probably not include logic "restarting WebGL" or something similar. It could on the other end send a message to the constellation if a send to WegGL fails for example, and then panic itself(or run the "shutdown script-thread logic").
The constellation is now basically the main process of the browser, managing all tabs that in multiprocess mode run in their own content process. See
|
It would be good to write up how we handle the tab-crashed scenario w/o tearing down the whole browser. Based on every other system where I've had teams handle this scenario (e.g., addons in visual studio; user code execution in an interpreter; etc.), I'm a bit nervous if we don't have an architectural boundary for this and are relying on developer heroics to handle an edge case on every channel that may or may not correspond to a crashed tab boundary. Not just for continuing to run, but for ensuring we don't have zombie thread cycles off continuing to run some poor abandoned portion of the code. Apologies if this is already nailed down and I'm asking for things that have already been done; I'll admit that I (like the wiki!) have not been keeping fully up to date with all changes, though I do try to at least skim all PRs and issues that go through. |
@nox: the problem is that (at least at the moment) if a script thread panics, it tears down the thread, and there's nothing we can do to have SM recover from that. We either need to not panic in the first place, or have a panic handler that tries to keep the script thread alive. |
@larsbergstrom what I'd dearly love is to have some static analysis that ensures panic-freedom, but we're probably quite a way out from that, at least partly due to third-party code. The constellation and compositor are meant to be hardened, so in the worst case we should be able to show a crash report page. Not that this will make end-users super-happy. |
So currently we don't actually do it, and out of the top of my head, I can imagine the following workflow:
We have something like that at the embedder level, but it's not going to work in multiprocess mode, since it's only setup at the "main process". See Line 49 in 5003387
See servo/components/script/script_thread.rs Line 2838 in ec1da1d
Actually, I think it would be more correct, and much easier, to only reload the top-level pipeline, since that will then load any child pipelines "normally", and instead of trying to somehow re-create the exact state of the page when it crashed, we're just reloading the page(it might actually have changed in the meantime and not contain the same iframes, especially if those frames were ads). If it's an iframe for a different origin that crashes(in a different process from the page in which it is embedded), do we reload it?
|
At the moment, most of Servo will panic under error conditions, some which (e.g. GL errors) are in crates that are out of our control.
This is appropriate behaviour for use-cases where we'd like to see the bug reports, but not necessarily what end-users want. Should we provide some hardening for use-cases where it's more appropriate to log the error and try to recover? Perhaps behind a pref or feature gate?
The text was updated successfully, but these errors were encountered: