Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of get_cx can fail after clearing JS runtime #21866

Open
jdm opened this issue Oct 4, 2018 · 8 comments
Open

Use of get_cx can fail after clearing JS runtime #21866

jdm opened this issue Oct 4, 2018 · 8 comments
Labels
A-content/dom Interacting with the DOM from web content I-panic Servo encounters a panic.

Comments

@jdm
Copy link
Member

jdm commented Oct 4, 2018

Using window.get_cx can trigger a script thread panic when it runs as part of a task that affects a global that already had its runtime cleared. We can avoid this by cancelling all outstanding tasks for the global as part of Window::clear_js_runtime.

@jdm jdm added A-content/dom Interacting with the DOM from web content I-panic Servo encounters a panic. labels Oct 4, 2018
@jdm
Copy link
Member Author

jdm commented Oct 4, 2018

An example of what this looks like:

  ▶ CRASH [expected OK] /_mozilla/mozilla/postmessage_closed.html
  │ 
  │ VMware, Inc.
  │ softpipe
  │ 3.3 (Core Profile) Mesa 18.1.0-devel
  │ called `Option::unwrap()` on a `None` value (thread ScriptThread PipelineId { namespace_id: PipelineNamespaceId(1), index: PipelineIndex(1) }, at libcore/option.rs:345)
  │ stack backtrace:
  │    0:     0x7f458ade275c - backtrace::backtrace::trace::h25a855f1b024854d
  │    1:     0x7f458ade2482 - <backtrace::capture::Backtrace as core::default::Default>::default::habdbfa508190f944
  │    2:     0x7f458ade24f8 - backtrace::capture::Backtrace::new::h4d43a9c81c9f7d2b
  │    3:     0x7f4587d14879 - servo::main::{{closure}}::h4f3daf96d0e7b2ea
  │    4:     0x7f458adf8083 - std::panicking::rust_panic_with_hook::h96bd948766a721a4
  │                         at libstd/panicking.rs:481
  │    5:     0x7f458adf7c29 - std::panicking::continue_panic_fmt::h79dd527b10c49814
  │                         at libstd/panicking.rs:391
  │    6:     0x7f458adf7b25 - rust_begin_unwind
  │                         at libstd/panicking.rs:326
  │    7:     0x7f458ae2fa4b - core::panicking::panic_fmt::ha9838fa819cc4c3a
  │                         at libcore/panicking.rs:77
  │    8:     0x7f458ae2f97a - core::panicking::panic::h285e7c5e0188ee3f
  │                         at libcore/panicking.rs:52
  │    9:     0x7f4588d1e015 - <script::dom::window::Window::post_message::post_serialised_message<F> as script::task::TaskOnce>::run_once::hc24a6f1083c58812
  │   10:     0x7f458854e6a3 - <T as script::task::TaskBox>::run_box::h7decbd20c577411a
  │   11:     0x7f458869022d - script::script_thread::ScriptThread::handle_msg_from_script::ha5cb4dc8dc384b38
  │   12:     0x7f4588682def - _ZN6script13script_thread12ScriptThread11handle_msgs17ha3193724cf5b3cbfE.llvm.16639006869903642791
  │   13:     0x7f458867f801 - script::script_thread::ScriptThread::start::hd0d0c5cc418f3718
  │   14:     0x7f458881f20f - std::sys_common::backtrace::__rust_begin_short_backtrace::h56d875724efd5dee
  │   15:     0x7f4588d23e35 - _ZN3std9panicking3try7do_call17h110851039340f857E.llvm.10071600851121277610
  │   16:     0x7f458ae20f79 - __rust_maybe_catch_panic
  │                         at libpanic_unwind/lib.rs:103
  │   17:     0x7f4588a2aa60 - <F as alloc::boxed::FnBox<A>>::call_box::h88161e523050cfb5
  │   18:     0x7f458ae108fa - <alloc::boxed::Box<(dyn alloc::boxed::FnBox<A, Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$RP$$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h91e1a0628cdb6b64
  │                         at /checkout/src/liballoc/boxed.rs:656
  │                          - std::sys_common::thread::start_thread::h8710783cd11c6158
  │                         at libstd/sys_common/thread.rs:24
  │   19:     0x7f458adf8855 - std::sys::unix::thread::Thread::new::thread_start::h8ecb5f3a76b46e8e
  │                         at libstd/sys/unix/thread.rs:90
  │   20:     0x7f458608b183 - start_thread
  │   21:     0x7f458431703c - clone
  │   22:                0x0 - <unknown>
  │ ERROR 2018-10-04T00:02:01Z: servo: called `Option::unwrap()` on a `None` value
  └ Pipeline failed in hard-fail mode.  Crashing!

@asajeffrey
Copy link
Member

Related to #21419, and in general lots of code which assumes browsing contexts are never discarded.

@AgustinCB
Copy link

How can I reproduce this on local?

@AgustinCB
Copy link

Also, when you mean the global, do you mean all the tasks submitted through GlobalScope.pipeline_id?

@jdm
Copy link
Member Author

jdm commented Oct 4, 2018

You may be lucky enough to be able to trigger this (potentially) intermittent failure by running ./mach test-wpt tests/wpt/mozilla/tests/mozilla/postmessage_closed.html. And when I talk about the global, I mean tasks that get queued that make use of a Window object and end up calling the get_cx method for some reason when they are later executed.

@AgustinCB
Copy link

Sorry, I have more questions:

How can I get a list of such a tasks? I have been taking a look at the window and globalscope files but didn't find references.

@jdm
Copy link
Member Author

jdm commented Oct 4, 2018

There's no way to get such a list without looking at each use of CommonScriptMsg::Task, as far as I can tell. It shouldn't be necessary to modify any of them, though; we built our task cancellation mechanism that you've been improving to enable this use case - if we use the API that marks all task sources as cancelled then any outstanding tasks for those sources will be ignored.

@gterzian
Copy link
Member

gterzian commented Oct 5, 2018

@AgustinCB Here are a few more pointers, hopefully they're correct and useful:

  1. It appears the test is first using postMessage on an iframe, and then removing the iframe from the DOM tree of the page where it is embedded.
  2. It appears to be a same origin iframe, hence postMessage will end up calling Window.post_message
  3. post_message will enqueue a task using the DOMManipulation task-source, here
    let _ = self.script_chan.send(CommonScriptMsg::Task(
  4. When this task is executed by the script_thread here get_cx is called, see
    let cx = this.get_cx();
  5. Removing the iframe from the tree, will, I think, end up calling unbind_from_tree, which will among others send a ScriptMsg::RemoveIFrame message to the constellation.
  6. It's also noteworthy that the iframe will block waiting on the answer from the constellation at
    let exited_pipeline_ids = receiver.recv().unwrap();
    . Since this "iframe" is same-origin, it is running in the same script-thread as the rest of the page, and this will thus block the entire script-thread, which will not be able to handle other tasks in the meantime.
  7. While the script-thread is blocked because of 6, the constellation is not, and will do the work related to "closing the iframe", which is found at handle_remove_iframe_msg.
  8. This will include "closing a browsing context", which is a workflow that ends-up sending a ConstellationControlMsg::ExitPipeline back to the script-thread were this iframe is running.
  9. The script-thread will handle this message in handle_exit_pipeline_msg, which also calls window.clear_js_runtime

So the way to think about how this is processed is that:

  1. PostMessage enqueues a task on a script-thread.
  2. remove ends up sending a ConstellationControlMsg::ExitPipeline to that same script-thread, while that script-thread is actually blocked, hence it won't for example handle the task related to 1.
  3. The script-thread handles running tasks, and handling message from the constellation, as part of the same "run loop" as part of handle_msgs
  4. The result of that both the "PostMessage" task, and the "Exit Pipeline" message, are sort of enqueued and waiting on being handled by this script-thread at the same time(since the script-thread will block at 2, meaning it won't be able to handled the "PostMessage" before the "Exit Pipeline" message has been received).
  5. At that point, it's a question of which one is handled first. If the "Exit Pipeline" is handled first, we'll get a crash when the "PostMessage" is handled next. There is some randomness there, because the select! statement that is used to "choose" a message either from the task-queue, or from the constellation queue of messages, will pick one fairly randomly. See
    let mut event = select! {
  6. The solution to this problem is to ensure that as part of handle_exit_pipeline_msg, all tasks for the window corresponding to the pipeline that is exited, are marked as cancelled.
  7. The weird thing, is that it appears to be done already inside clear_js_runtime at
    self.ignore_all_events();
  8. One thing I can think of, is to try to used a more strict Ordering at
    flag.store(true, Ordering::Relaxed);
  9. Because when a script-thread runs a task(see
    MainThreadScriptMsg::Common(CommonScriptMsg::Task(_, task, _, _)) => task.run_box(),
    )
  10. The task will check whether it has been cancelled using
    .map_or(false, |cancelled| cancelled.load(Ordering::SeqCst))

Hope this helps(and that if changing the ordering really is the fix, it didn't spoil the fun)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-content/dom Interacting with the DOM from web content I-panic Servo encounters a panic.
Projects
None yet
Development

No branches or pull requests

4 participants