Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRemoved panicking when frame or pipeline lookup fails. #10082
Conversation
|
@bors-servo try |
Removed panicking when frame or pipeline lookup fails. Removed the methods `pipeline(id)`, `pipeline_mut(id)`, `frame(id)` and `frame_mut(id)` from constellation, which panicked when the table lookup failed. The panics were causing race conditions, e.g. visiting google.com and resizing the page would cause a panic, most likely due to an iframe being added and removed, with the `DOMLoad` event arriving after the iframe had been removed, causing a panic. This patch fixes #10017 and #8769 (although in non-webrender builds there's now a different panic, see #10017 (comment)). There are a few `TODO` items in the initial commit, for cases where it's not completely obvious what to do in the case of failure. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/10082) <!-- Reviewable:end -->
|
|
|
We might want to land #8641 before this PR. |
|
This seems like a scary change. In the original implementation I had those conditions as panics because I was fairly certain that if they do cause a panic, they signal a logic problem elsewhere. I think that making these not panic is likely to hide / mask other genuine bugs. Ideally I'd like to see an exact sequence of events / messages that results in these race conditions. Thoughts @larsbergstrom ? |
|
@glennw Thanks for bringing this to my attention! I totally agree, and the reason these are scary is that when I first started working on them pipeline-related shutdown issues resulted in crashes - either the compositor or script task can get into a really horrific state (w.r.t. their native resources) if they are not shut down correclty, sometime even leading to segfaults. I realize that the shutdown code is pretty terrifying (cleaning up our shutdown segfaults it was my first task, then glenn got it early on to handle the intermittents, and now it appears to be one of yours ). That said, now that we have session types available, I'd be totally open to instead sitting down and writing out all of the protocols and attempting to codify the shutdown sequence more explicitly. |
|
@glennw yup, pretty scary! The problem is that the current code is causing panics pretty regularly (e.g. load up google.com and resize the window). The panic is caused by an unexpected sequence of events (e.g. where an iframe is loaded then removed, but its DOMLoad event arrives after the iframe's resources have been reclaimed). I think the problem is that the code requires certain sequences of events to be impossible, but the events are concurrent, so I don't see any way to enforce arrival order. I agree that we need to rethink the protocols more thoroughly, and take a long hard look at the compositor, the question is what to do in the short term. @larsbergstrom I'm not 100% sure session types will address the issue, but I'm certainly open to giving it a shot! |
|
@larsbergstrom In this case it's not the shutdown sequence that's causing the problem, it's an iframe being removed, and the constellation panicking because it can't send messages to the iframe. This is why I don't know whether session types will fix the problem, since it's caused by lookup failures in the pipeline hash table. |
|
I agree with the general unease around silently ignoring frames that have unexpectedly gone missing. I do sympathize with wanting to improve browser robustness even in the presence of logic bugs though. Perhaps we could loudly print errors in these cases? |
|
@pcwalton we can include warning messages. These aren't exactly logic bugs though, just messages arriving in odd orders, which is pretty much par for the course on any concurrent system. |
|
Relevant discussion on irc: http://logs.glob.uno/?c=mozilla%23servo#c388656 |
|
I did some digging in constellation.rs, categorizing the potential causes of panic, the results are in https://public.etherpad-mozilla.org/p/servo-threads. TL;DR summary: there are four potential sources of panic:
This PR is just addressing (1), this still leaves a lot of (4). |
|
Some more irc chat, this time about what to do in the chan.send(...).unwrap() case: http://logs.glob.uno/?c=mozilla%23servo#c389192 Also @jdm points out (http://logs.glob.uno/?c=mozilla%23servo#c389143) that soldiering on in the presence of script thread failure can have nasty interactions with rooting, see #6462. |
|
Not entirely sure what happened there, github reckons I unassigned @glennw, no such luck :) |
DemiMarie
commented
Mar 22, 2016
|
One approach is to try to use processes instead of threads whenever possible. In that case, there should be no (or much less) need to cleanup – the OS will handle that if the process just dies. |
|
@drbo true, but there's still some clean-up required, e.g. when a pipeline crashes the constellation needs to update it's pipeline table, redirect the page to about:failure, etc. |
a4c9bbf
to
7bb3c08
|
Processes vs. threads is a red herring since we use unwinding when threads abnormally exit. The resource cleanup isn't the issue. |
0297acb
to
d188dd7
|
The latest commit removes the TODO(ajeffrey) comments, and adds debug messages in each case where pipeline/frame lookup fails. @pcwalton: Is |
Removed panicking when frame or pipeline lookup fails. Removed the methods `pipeline(id)`, `pipeline_mut(id)`, `frame(id)` and `frame_mut(id)` from constellation, which panicked when the table lookup failed. The panics were causing race conditions, e.g. visiting google.com and resizing the page would cause a panic, most likely due to an iframe being added and removed, with the `DOMLoad` event arriving after the iframe had been removed, causing a panic. This patch fixes #10017 and #8769 (although in non-webrender builds there's now a different panic, see #10017 (comment)). There are a few `TODO` items in the initial commit, for cases where it's not completely obvious what to do in the case of failure. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/10082) <!-- Reviewable:end -->
|
|
|
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed. components/compositing/constellation.rs, line 267 [r7] (raw file): Comments from the review on Reviewable.io |
f079391
to
df82a5b
|
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions. components/compositing/constellation.rs, line 637 [r4] (raw file): components/compositing/constellation.rs, line 892 [r4] (raw file): components/compositing/constellation.rs, line 267 [r7] (raw file): Comments from the review on Reviewable.io |
|
@bors-servo r+ |
|
|
Removed panicking when frame or pipeline lookup fails. Removed the methods `pipeline(id)`, `pipeline_mut(id)`, `frame(id)` and `frame_mut(id)` from constellation, which panicked when the table lookup failed. The panics were causing race conditions, e.g. visiting google.com and resizing the page would cause a panic, most likely due to an iframe being added and removed, with the `DOMLoad` event arriving after the iframe had been removed, causing a panic. This patch fixes #10017 and #8769 (although in non-webrender builds there's now a different panic, see #10017 (comment)). There are a few `TODO` items in the initial commit, for cases where it's not completely obvious what to do in the case of failure. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/10082) <!-- Reviewable:end -->
|
|
|
Can someone tl;dr this for me? What's the impact on browserhtml? What happens for the legitimate "unable to find frame/pipeline" bugs? Do they get ignore? Are they reported via debug logs? Do they still panic Servo? If not, do they need to be reported with a crash message within the browser? |
|
They are ignored, and should not bring down Servo any longer. If they don't appear in the terminal already, they can be surfaced via The main impact on browser.html is that it should be harder for a panic that occurs in a script/layout/paint thread to bring down the rest of the browser now. |
|
Ok. So we probably want to show that within browserhtml: #10334 |
|
We should think about what log level to issue these messages at. At the moment it's debug, but that probably should be increased. |
This was apparently fixed by servo#10082.
This was apparently fixed by servo#10082.
This was apparently fixed by servo#10082.

asajeffrey commentedMar 18, 2016
Removed the methods
pipeline(id),pipeline_mut(id),frame(id)andframe_mut(id)from constellation, which panicked when the table lookup failed.The panics were causing race conditions, e.g. visiting google.com and resizing the page would cause a panic, most likely due to an iframe being added and removed, with the
DOMLoadevent arriving after the iframe had been removed, causing a panic.This patch fixes #10017 and #8769 (although in non-webrender builds there's now a different panic, see #10017 (comment)).
There are a few
TODOitems in the initial commit, for cases where it's not completely obvious what to do in the case of failure.This change is