Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compositor: Handle synchronous messages while shutting down #31733

Merged
merged 1 commit into from Mar 19, 2024

Conversation

mrobinson
Copy link
Member

During the shutdown process, various threads (such as the
font cache thread) may be finishing up their work. If those threads make
synchronous requests to the compositor, answer them -- even if the
results will be unused. This is at least enough processing for them to
finish their work and exit cleanly.

This addresses crashes that are sometimes seen at exit, particuarly when
the font cache thread tries to register a font during shutdown.

In addition, this change also removes an unused compositor message.


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • There are tests for these changes but the tests are all the WPT. This should lead to less flaky crashes when integrated.

@mrobinson mrobinson requested a review from gterzian March 18, 2024 12:45
@mrobinson
Copy link
Member Author

mrobinson commented Mar 18, 2024

Here's an example of the failure that this addresses:

https://github.com/servo/servo/actions/runs/8331571618/job/22800303069

│ Shutting down the Constellation after generating an output file or exit flag specified
  │ called `Result::unwrap()` on an `Err` value: RecvError (thread FontCache, at components/servo/lib.rs:1031)
  │    0: servoshell::backtrace::print
  │    1: servoshell::main::{{closure}}
  │    2: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/alloc/src/boxed.rs:2021:9
  │       std::panicking::rust_panic_with_hook
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:735:13
  │    3: std::panicking::begin_panic_handler::{{closure}}
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:609:13
  │    4: std::sys_common::backtrace::__rust_end_short_backtrace
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:170:18
  │    5: rust_begin_unwind
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:597:5
  │    6: core::panicking::panic_fmt
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/panicking.rs:72:14
  │    7: core::result::unwrap_failed
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/result.rs:1652:5
  │    8: <servo::FontCacheWR as gfx_traits::WebrenderApi>::add_font
  │    9: gfx::font_cache_thread::FontCache::run
  │   10: std::sys_common::backtrace::__rust_begin_short_backtrace
  │   11: core::ops::function::FnOnce::call_once{{vtable.shim}}
  │   12: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/alloc/src/boxed.rs:2007:9
  │       <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/alloc/src/boxed.rs:2007:9
  │       std::sys::unix::thread::Thread::new::thread_start
  │              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys/unix/thread.rs:108:17
  │   13: <unknown>
  │   14: <unknown>
  │ mozilla::detail::MutexImpl::~MutexImpl: pthread_mutex_destroy failed: Device or resource busy
  │ Caught signal 11thread panicked while processing panic. aborting.
  │ Redirecting call to abort() to mozalloc_abort

Copy link
Member

@gterzian gterzian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits.

@@ -917,6 +899,74 @@ impl<Window: WindowMethods + ?Sized> IOCompositor<Window> {
}
}

/// Handle messages sent to the compositor during the shutdown process. In general,
/// the things the compositor can do in this state are limited. It's very important to
/// answer any synchronuos messages though as other threads might be waiting on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// during this time.
///
/// When that involves generating WebRender ids our approach here is to simply
/// generate them, but assume they will never be used, since once shuttind down the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

(CompositorMsg::ShutdownComplete, _) => {
match msg {
CompositorMsg::ShutdownComplete => {
warn!("Received `ShutdownComplete` while not shutting down.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shutdown flow appears to be:

  1. EmbedderEvent::Quit handled by maybe_start_shutting_down. This makes sure the state is set to ShutdownState::ShuttingDown, and sends the exit message to the constellation.
  2. Constellation sends CompositorMsg::ShutdownComplete when it is done shutting down, after a blocking exit call to the font cache thread.

So receiving this message in the ShutdownState::NotShuttingDown state appears impossible, so I think should panic here, or at least error. It appears to me we don't get many issues based on errors, but we do get a lot of issues when servo crashes. So panicking appears to result in more action being taken to address the underlying problems, hence I prefer it. But I know you prefer to log an error so let's at least do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this situation is highly unexpected, but in the end it's a non-fatal error as we are essentially about to quit anyway. This was handled with a warning before this change as well. Regarding crashing versus recovering from errors, I think that we cannot trust the messages that come from other threads / processes, as they may be experiencing runtime errors or be controlled by an attacker. Instead of immediately crashing, I think there is some benefit to attempting to properly handle when this happens. We should obviously be on top of any warnings printed during the execution of Servo, as they are only printed in very unexpected situations.

In my experience working on other browsers though, they are very complex pieces of software. If you turn on debug assertions, it's quite likely that you can hit one simply by browsing the web or running the tests. If the browser crashed every time that happened, it would be pretty unusable.


// When we are shutting_down, we need to avoid performing operations
// such as Paint that may crash because we have begun tearing down
// the rest of our resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this comment is still valid, I think it makes sense to copy it to the new location where messages are ignored while shutting down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment might be invalid now. It refers to Paint which is a pre-WebRender concept as far as my memory serves. The main issue here is that I think we don't want to keep compositing because the other parts of Servo are in a transient state.

},
CompositorMsg::NewWebRenderFrameReady(_) => {
// Subtract from the number of pending frames, but do not do any compositing.
self.pending_frames -= 1;
Copy link
Member

@gterzian gterzian Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the current code does this even after shutdown has completed, so there is a change of logic, but the change seems to make more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point that I had overlooked. I agree that it doesn't make sense to keep subtracting one from the pending_frames when shutdown is complete because the count isn't used any longer.

During the shutdown process, various threads (such as the
font cache thread) may be finishing up their work. If those threads make
synchronous requests to the compositor, answer them -- even if the
results will be unused. This is at least enough processing for them to
finish their work and exit cleanly.

This addresses crashes that are sometimes seen at exit, particuarly when
the font cache thread tries to register a font during shutdown.

In addition, this change also removes an unused compositor message.
@mrobinson mrobinson added this pull request to the merge queue Mar 19, 2024
@mrobinson
Copy link
Member Author

@gterzian Thank you for the review!

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2024
@mrobinson mrobinson added this pull request to the merge queue Mar 19, 2024
@mrobinson
Copy link
Member Author

Looks like this failed due to infrastructure issues.

Merged via the queue into servo:main with commit 05d9373 Mar 19, 2024
9 checks passed
@mrobinson mrobinson deleted the compositor-shutdown branch March 19, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants