Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main thread complexity #23899

Open
asajeffrey opened this issue Jul 31, 2019 · 8 comments
Open

Main thread complexity #23899

asajeffrey opened this issue Jul 31, 2019 · 8 comments
Labels
A-content/webgl 3d canvas API A-embedding A-webrender A-xr-zomg AR/VR/MR B-meta This issue tracks the status of multiple, related pieces of work I-cleanup No impact; the issue is one of maintainability or tidiness. I-refactor No impact; the issue is one of maintainability or tidiness. Proposed solution requires refactoring.

Comments

@asajeffrey
Copy link
Member

asajeffrey commented Jul 31, 2019

This is intended to be a meta-bug for centralizing the discussions about the complexity of our main thread.

Originally, the main thread ran the embedder (and in particular the window event loop) and webrender, and this wasn't too bad since the only source of blocking on events was the embedder. Webrender blocks on the WebGL thread, but that wasn't too bad when it was off in another thread.

Unfortunately, as Servo has been ported to different devices, we've ended up with more and more cases of things which need to be run on the main thread, including WebXR and WebGL (on devices where GL contexts which share textures need to live on the same thread). This is causing problems.


@asajeffrey asajeffrey added I-refactor No impact; the issue is one of maintainability or tidiness. Proposed solution requires refactoring. A-embedding I-cleanup No impact; the issue is one of maintainability or tidiness. A-content/webgl 3d canvas API A-webrender A-xr-zomg AR/VR/MR labels Jul 31, 2019
@asajeffrey asajeffrey added the B-meta This issue tracks the status of multiple, related pieces of work label Jul 31, 2019
@asajeffrey
Copy link
Member Author

@asajeffrey
Copy link
Member Author

In particular, there is discussion starting at #23865 (comment) about whether async/await and futures 0.3 is a good fit for Servo's main thread.

@gterzian
Copy link
Member

gterzian commented Jul 31, 2019

I think the main problem is that

pub fn perform_updates(&mut self) -> Result<(), &'static str> {

is running various sub-loops.

Since each sub-loops needs to run for different reasons, "waking-up" or "animation mode" is used to achieve a regular heartbeat on each of these sub-loops.

However, since every-time a sub-loop runs, it doesn't know exactly why, it cannot block on messages and otherwise run in a "robust" manner, instead needing to use try_recv or timeouts to try to "catch potentially incoming messages".

The irony of "animation mode" is that there is no lack of "source of events" for the main-thread, in the form of incoming messages from script. In fact, each frame requires messaging from and to script over IPC.

The problem is that those messages are handled by "sub-loops", hence the main-thread needs to keep running so as to keep them lively, even though the flow of messages in itself could be turned into a robust signalling mechanism(which would have happened already if each sub-loop ran in its own thread of execution).

A side-effect of these sub-loops, is that it's possible for the main-thread to become unresponsive to UI events as a result of running such sub-loop, with even the possibility existing that script could put the main-thread in such state on purpose and keep it there indefinitely(example from an issue noted above: servo/webxr#34).


The root of the problem is the interface of embedder_traits::EventLoopWaker, which interestingly these days is actually a re-exported webxr_api::MainThreadWaker(which points towards the need to unify "waking" across use-cases).

wake doesn't come with a specific event. Requiring "running Servo" each time a call occurs.

wake is also implemented, in ipc mode at least, by sending a message, received by a router running in the "main-process" alongside the main-thread, which then calls the embedder-specific "wake"(which is safe to call from another thread, but isn't available across process).

That's really interesting if you think about it, since the reason why the main-thread is awoken, is because another ipc message has been sent and it needs to be handled by a sub-loop.


My proposed solution, is to simply make every communication with the main-thread happen by sending an EmbedderMsg on an ipc channel. Yes that means no more webxr_api::SessionMsg::Raf, instead being replaced by something like EmbedderMsg::XrRaf.

Then, all those messages would be handled by a router thread, running in the same process as the "main-thread".

Each message would result in wake being called, but with a specific DeviceEvent.

it would then be the responsibility of the embedder code(running on the main-thread) to run an event-loop and handle these events, interleaved with UI input.

When handling such events, the embedder could do two things:

  1. Call in to system specific APIs, like swapping buffers or "making itself current".
  2. Call into Servo specific "main-thread Servo" APIs, for example acquiring a webgl lock, or sending a frame to script.

1 and 2 could be interleaved in various sequences, since this would be basically single-threaded sequential code.

The main difference with now is that the Servo code wouldn't have to do anything device-specific.

Instead, Servo would only be called into to do stuff the device cannot do, like communicating a frame back to script, and in the context of the device work having been done(or being assumed to soon complete). For example, in the case of a raf, Servo would only be called into to send a frame to script. Swapping buffers would not be the work of Servo.

it would also mean that the main-thread is basically running a device-specific render loop, in the context of which Servo is called into from time to time.

WebXr/WebgGL would then become part of the API into "main-thread Servo", which the render loop can call into when necessary. Compare to the current situation where each device requires implementing a "webXr backend", in addition to a render loop. And the "backend" simply calls into device apis. I think the render-loop should essentially be the backend, calling into main-thread Servo when necessary.

Essentially the opposite of needing to wake-up the device so as to "make Servo run". Instead, the device render loop is awoken with specific events, with the ability to call into Servo as part of handling these events.

There is a 1-to-1 relation between a "wake-up" and a "main-thread operation".

Finally, since the render-loop is the main-thread, I think it could allow full flexibility for the embedding code to do what's right platform-wise.

@asajeffrey
Copy link
Member Author

@gterzian I'm confused as to how sending an EmbedderMsg on an ipc channel would wake up the main event loop, in the case of an embedding like glutin where the embedder is blocked waiting on a UI event. Is the idea that we'd have a wrapper around an ipc channel that would send the EmbedderMsg then call the embedder waker?

@gterzian
Copy link
Member

gterzian commented Aug 1, 2019

I'm confused as to how sending an EmbedderMsg on an ipc channel would wake up the main event loop, in the case of an embedding like glutin where the embedder is blocked waiting on a UI event.

@asajeffrey It could be the same as is currently done with

https://github.com/servo/webxr/blob/fcbdddb66e2d307d86ff109d79001c7a45de79db/webxr-api/registry.rs#L63

With the difference that "waking" would not be done when another message is sent, the message would result in the wake-up.

So for example:

// Sends an IPC message.
let _ = self.sender.send(RegistryMsg::SupportsSession(mode, dest));
// Also sends an IPC message.
self.waker.wake();

would become

// Internally sends (PipelineId, EmbedderMsg::SuppportsXrMode(mode, dest))
self.embedder.supports_session(mode, dest); 

// Elsewhere
// Internally sends (PipelineId, EmbedderMsg::XrRaf(dest))
self.embedder.request_xr_animation_frame(dest)

then, in the "main-proces", there is a router thread that does:

for (pipeline_id, msg) in ipc_receiver.recv() {
    match msg {
        EmbedderMsg::SuppportsXrMode(mode, dest) => {
            // Here we don't need to go through the main-thread
            let _ = dest.send(self.xr_registry.supports_session(mode));
        },
        EmbedderMsg::XrRaf(dest) => {
           // This is a bit unfortunate, and I think we could do it better.
           self.xr_callbacks.lock().push_back((pipeline_id, dest));
           // Here, we want to run something on the main-thread
           self.waker.wake(DeviceEvent::XrRaf(pipeline_id));
       }
    }
}

Then, this calls into the device

void BrowserPage::WakeUp(DeviceEvent event) {
    switch (event.type) {
        case DeviceEvent::XrRaf: {
            auto pipelineId = event.pipelineId;
            RunOnGLThread([=] { mServo->XrRaf(pipelineId); });
        }
    }
}

in "main-thread Servo"

pub fn xr_raf(&mut self, pipeline_id: PipelineId) {
    let frame = self.device.wait_for_animation_frame();
    if let Some((dest_pipeline, dest)) = self.xr_callbacks.lock().pop_front() {
        assert_eq!(dest_pipeline, pipeline_id);
        let timestamp = self.timestamp;
        let _ = dest.send((timestamp, frame));
    }
}

There's a bit of a problem because the dest callback has to be passed to the main-thread. We could do it with some raw pointer with the embedding code, but I guess some shared-state, or maybe a channel, would be safer. I think we could also re-structure the flow a bit so that there is just one "callback" per session, which is setup at the beginning of the session and not sent along for each call.

The above also shows how the router thread can be a bit more than just a router, since some operations do not need to occur on the main-thread. It can do basically half the work, hold some state, and so on. It should never block.

Finally, I know earlier I said "Servo shouldn't do anything device specific", and now looking at https://github.com/servo/webxr/blob/master/webxr/googlevr/device.rs I see that something like wait_for_animation_frame can get pretty complicated and specific.

So maybe we want to keep the Device apis, the various backends, and so on. So the main change is really that the render loop drives the main-thread, and is itself driven either by UI events, or by the router thread handling EmbedderMsg.

Also, we can keep perform_updates to support "legacy main-thread", all the while not adding new stuff to it, and removing "sub-loops" one at the time, moving them into using EmbedderMsg and being driven by the main-loop one event at a time.

@gterzian
Copy link
Member

gterzian commented Aug 1, 2019

More specifically re glutin, it does seem to be a very complicated loop, and even there I think you could simply share an Arc<Mutex<VecDequeu<DeviceEvent>>(Not glutin DeviceEvent) between the EventLoopWaker and the App, and then internally call self.proxy.wakeup(), and then pop_front on the shared queue when glutin::Event::Awakened comes in.

Then the actual glutin::Event::WindowEvent would be handled as they come in.

It could look something like:

struct HeadedEventLoopWaker {
    proxy: Arc<glutin::EventsLoopProxy>,
    event_queue: Sender<DeviceEvent>,
}

impl EventLoopWaker for HeadedEventLoopWaker {
    fn wake(&self, event: DeviceEvent) {
        self.sender.send(event);
        // kick the OS event loop awake.
        if let Ok(err) = self.proxy.wakeup() {
            warn!("Failed to wake up event loop ({}).", err);
        }
    }
}

// In run_loop
self.events_loop.borrow_mut().run_forever(|e| {
    let event = match e {
        glutin::Event::Awakened => {
            // Can we count on not receiving spurious Event::Awakened?
            self.receiver.recv().unwrap()
        },
        glutin::Event::WindowEvent => //translate to a DeviceEvent
    };
    self.handle_event(event)
});

Or find some other way to plug a specific event into the wake-up...

It's not great, and I think the glutin port is more a case of supporting legacy and at least having a semblance of a window on desktop to run tests and so on. I'm more excited by the stuff going on in support these days...

@delan
Copy link
Member

delan commented Oct 24, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-content/webgl 3d canvas API A-embedding A-webrender A-xr-zomg AR/VR/MR B-meta This issue tracks the status of multiple, related pieces of work I-cleanup No impact; the issue is one of maintainability or tidiness. I-refactor No impact; the issue is one of maintainability or tidiness. Proposed solution requires refactoring.
Projects
None yet
Development

No branches or pull requests

3 participants