Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not wait for rAF to be requested in render loop #25343

Merged
merged 2 commits into from Jan 2, 2020

Conversation

@Manishearth
Copy link
Member

Manishearth commented Dec 19, 2019

Servo-side of servo/webxr#114

r? @jdm

@highfive
Copy link

highfive commented Dec 19, 2019

Heads up! This PR modifies the following files:

  • @asajeffrey: components/script/dom/xrsession.rs, components/script/dom/xr.rs
  • @KiChjang: components/script/dom/xrsession.rs, components/script/dom/xr.rs
@highfive
Copy link

highfive commented Dec 19, 2019

warning Warning warning

  • These commits modify script code, but no tests are modified. Please consider adding a test!
self.raf_sender.borrow().is_none(),
"RAF loop already set up"
);
fn setup_raf_loop(&self, frame_receiver: IpcReceiver<Frame>) {
let this = Trusted::new(self);
let global = self.global();
let window = global.as_window();
let (task_source, canceller) = window
.task_manager()
.dom_manipulation_task_source_with_canceller();

This comment has been minimized.

@jdm

jdm Dec 19, 2019

Member

To reduce the time between when we're signalled with a new frame and when we run the rAF callback, we should use a high-priority task source. The spec doesn't actually specify a particular ordering, so we should be able to get away with jumping the queue. For example, we could use a dedicated sender/receiver pair and use try_recv on it before calling select! in ScriptThread::handle_msgs.

This comment has been minimized.

@gterzian

gterzian Jan 1, 2020

Member

I think that's a good idea, and maybe it could be further improved by not going through the router to receive the frame.

I think it could be done through:

  1. Whenever a callback is added:
    • set a awaiting_frame flag on the script-thread.
    • send a MainThreadScriptMsg::WakeUp to ensure the main loop doesn't block on the select
  2. After the microtask checkpoint:
    • remove the awaiting_frame flag, and, if it was true:
      • block on the frame receiver, and
      • immediately run the callbacks when receiving it.
  3. This would require also running 2(and the microtask checkpoint) if sequential is empty(in the case that the only event is the MainThreadScriptMsg::WakeUp, since it will be filtered out by the task-queue). It also requires Xr to always send a message on the frame receiver to unblock the script-thread.

So that could be a way to block directly on a IpcReceiver inside handle_msgs, with the benefit being always running the frame callbacks immediately after you receive the frame, and not going through the router.

Then I guess the script-thread could make the xr_frame_ pair an ipc-channel where the sender would be cloned inside request_new_xr_frame and send to Xr?

This comment has been minimized.

@gterzian

gterzian Jan 1, 2020

Member

Ah ok I see you also need a way to call from the script-thread into the session, so instead of an awaiting_frame flag, you could have an Option<SessionAwaitingFrame> on the script-thread, which would be set by request_new_xr_frame and taken before optionally blocking on the frame receiver from within handle_msgs. I assume you can have only one active Xr session per script-thread?

This comment has been minimized.

@gterzian

gterzian Jan 1, 2020

Member

Like https://github.com/gterzian/servo/tree/restructure_xr_frame_handling
If it seems like a good idea I could just open a PR as a follow-up.

This comment has been minimized.

@Manishearth

Manishearth Jan 1, 2020

Author Member

You can have multiple online sessions, but only one immersive

We first need to figure out why this is breaking the tests. I bet it's because of a race between data getting sent and read from the mock device.

This comment has been minimized.

@gterzian

gterzian Jan 2, 2020

Member

I think the problem with the timeouts is that with the current changes, the call to render_animation_frame from script only occurs if the session is immersive(and the changes in Xr from servo/webxr#114 makes the render loop dependent on rendering a frame).

Previously, a new frame would be requested at the end of running callbacks, regardless of whether the session was immersive.

This comment has been minimized.

@Manishearth

Manishearth Jan 2, 2020

Author Member

That was indeed the issue, was poking at this earlier. Fixed now.

Opening a followup PR would be great, but we should test if it affects FPS.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2019

The FPS values seem to slightly go up from 12-15 to 15-20.

Timing values (in microseconds, I forgot what a nanosecond was and divided wrong) between the time the openxr backend constructs the frame ("Frame") and the time we trigger callbacks in script ("Now") :

Now: 1720318156	 Frame: 1720311936	Delta: 6219.375
Now: 1720390996	 Frame: 1720389289	Delta: 1707.292
Now: 1720446786	 Frame: 1720442530	Delta: 4256.355
Now: 1720528122	 Frame: 1720523149	Delta: 4973.020
Now: 1720584494	 Frame: 1720575796	Delta: 8697.812
Now: 1720657133	 Frame: 1720655931	Delta: 1202.447
Now: 1720714996	 Frame: 1720713575	Delta: 1421.093
Now: 1720773981	 Frame: 1720772679	Delta: 1302.292
Now: 1720830164	 Frame: 1720828616	Delta: 1547.604
Now: 1720890760	 Frame: 1720889245	Delta: 1515.781
Now: 1720945515	 Frame: 1720943989	Delta: 1525.729
Now: 1721007255	 Frame: 1721005985	Delta: 1269.844
Now: 1721062167	 Frame: 1721059961	Delta: 2205.833
Now: 1721123669	 Frame: 1721122519	Delta: 1150.000
Now: 1721181460	 Frame: 1721178199	Delta: 3261.302
Now: 1721240545	 Frame: 1721239333	Delta: 1211.979
Now: 1721309603	 Frame: 1721303655	Delta: 5947.812
The thread 0x1a7c has exited with code 0 (0x0).
Now: 1721374806	 Frame: 1721372548	Delta: 2258.073
Now: 1721415320	 Frame: 1721413298	Delta: 2022.031
Now: 1721474244	 Frame: 1721472577	Delta: 1667.136
Now: 1721516310	 Frame: 1721515056	Delta: 1253.802

Seems like we're losing around 2-6ms on just the communication delay. Digging further into this

@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2019

More timings. Delta is between us getting a frame from openxr and us seeing it in raf_callback in script. Delta_task is between the script process router getting the frame message and the script thread running it. Ignore the last one.

It seems like we're losing time in waiting for the script router to pick up the IPC message, which bolsters my initial hunch that IPC is slow for some reason.

Now: 30135.365	 Frame: 30133.465	Delta: 1.900	Delta_task: 0.707	Delta_rafc: 0.029
Now: 30187.074	 Frame: 30185.543	Delta: 1.530	Delta_task: 0.237	Delta_rafc: 0.015
Now: 30235.812	 Frame: 30234.467	Delta: 1.345	Delta_task: 0.171	Delta_rafc: 0.014
Now: 30287.797	 Frame: 30285.883	Delta: 1.913	Delta_task: 0.227	Delta_rafc: 0.015
Now: 30335.949	 Frame: 30334.205	Delta: 1.744	Delta_task: 0.399	Delta_rafc: 0.015
Now: 30387.547	 Frame: 30385.756	Delta: 1.789	Delta_task: 0.231	Delta_rafc: 0.016
Now: 30437.031	 Frame: 30435.457	Delta: 1.575	Delta_task: 0.188	Delta_rafc: 0.016
Now: 30487.953	 Frame: 30485.637	Delta: 2.317	Delta_task: 0.230	Delta_rafc: 0.015
Now: 30537.453	 Frame: 30535.949	Delta: 1.505	Delta_task: 0.214	Delta_rafc: 0.012
Now: 30604.316	 Frame: 30602.482	Delta: 1.835	Delta_task: 0.210	Delta_rafc: 0.015
Now: 30657.307	 Frame: 30652.455	Delta: 4.852	Delta_task: 0.705	Delta_rafc: 0.016
Now: 30737.090	 Frame: 30735.533	Delta: 1.555	Delta_task: 0.240	Delta_rafc: 0.016
Now: 30784.846	 Frame: 30783.484	Delta: 1.361	Delta_task: 0.157	Delta_rafc: 0.010
@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2019

Hmm, there's actually a consistent 1ms gap between the time we wait() and the time we construct our frame. Something's fishy there.

That's not the only contributing factor though.

@asajeffrey
Copy link
Member

asajeffrey commented Dec 19, 2019

2-6ms is a lot of extra time!

@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2019

So locate_views is taking around 0.2ms, .locate() is taking around 0.15ms, sync_actions is 0.044ms, and the input .frame() calls are all together taking 0.6ms (it's a total of 4 locate() calls, essentially).

@asajeffrey
Copy link
Member

asajeffrey commented Dec 19, 2019

Hmm, that all sounds like stuff that's out of our control? The only thing I can think of is decoupling the calls to input .frame(), since if that information is a frame late it probably won't matter so much?

@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2019

It might be that we're using the APIs incorrectly, too. I tried using unbounded spaces as recommended, that didn't fix it. Going for jdm's scheduling patch now to see if I can at least fix the communication overhead, which is still a nontrivial chunk.

@jdm
Copy link
Member

jdm commented Dec 19, 2019

Perhaps we should look into calling begin_frame before render_animation_frame, to be more like the SDK sample at https://github.com/microsoft/OpenXR-SDK-VisualStudio/blob/4378fbda334809852331fccf2ffa9b4979e0684f/samples/BasicXrApp/OpenXrProgram.cpp#L609-L612.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 20, 2019

Timings with @jdm's patch. Delta is the time between frame creation and frame reception in script, delta_task is the time between the script process receiving it and the script thread receiving it. Callback time is the time taken to execute raf callbacks (which we can expect to be somewhat large)

Now: 33839.398	 Frame: 33837.664	Delta: 1.736	Delta_task: 0.172	Callback time: 2.842
Now: 33895.418	 Frame: 33892.207	Delta: 3.213	Delta_task: 1.973	Callback time: 2.785
Now: 33939.176	 Frame: 33937.449	Delta: 1.726	Delta_task: 0.235	Callback time: 3.556
Now: 33993.762	 Frame: 33992.102	Delta: 1.660	Delta_task: 0.149	Callback time: 2.864
Now: 34055.547	 Frame: 34053.891	Delta: 1.654	Delta_task: 0.218	Callback time: 4.446
Now: 34127.477	 Frame: 34125.398	Delta: 2.076	Delta_task: 0.879	Callback time: 3.595
Now: 34187.035	 Frame: 34185.129	Delta: 1.906	Delta_task: 0.329	Callback time: 4.338
Now: 34244.059	 Frame: 34242.164	Delta: 1.893	Delta_task: 0.146	Callback time: 3.102

There are also a bunch of entries with much larger deltas sprinkled throughout. Some are in a sequence:

Now: 34850.398	 Frame: 34842.527	Delta: 7.875	Delta_task: 0.152	Callback time: 3.159
Now: 35625.762	 Frame: 35620.098	Delta: 5.662	Delta_task: 0.152	Callback time: 3.357

the larger deltas are paired with slightly higher time taken in all the openxr stuff in wait_for_animation_frame, but it doesn't fully account for it. It's possible we're doing a lot of computation elsewhere that's just slowing everything down.

(Perhaps it's because we're still rendering the webpage but not displaying it?)

@jdm
Copy link
Member

jdm commented Dec 20, 2019

Other interesting things to time include:

I wonder if we could send the swap buffer message, start layout, wait for layout to complete, then join with the swap buffer completion response to avoid blocking twice sequentially...

@jdm
Copy link
Member

jdm commented Dec 20, 2019

Mmm, some of those may not actually make sense here; I didn't realize we call swap_buffers directly in immersive mode.

@jdm
Copy link
Member

jdm commented Dec 20, 2019

That being said, we do run layout, and we do dirty the canvas, and both of those involve blocking the script thread on receiving a response from another thread, so they could contribute to the deltas.

@asajeffrey
Copy link
Member

asajeffrey commented Dec 20, 2019

Running layout should only block if the document is dirty or animations are running? Which hopefully pages aren't doing once they've entered immersive mode.

@jdm
Copy link
Member

jdm commented Dec 20, 2019

We mark the canvas dirty right after we swap the base layer's buffer:

// If the canvas element is attached to the DOM, it is now dirty,
// and we need to trigger a reflow.
base_layer
.Context()
.Canvas()
.upcast::<Node>()
.dirty(NodeDamage::OtherNodeDamage);

@Manishearth
Copy link
Member Author

Manishearth commented Dec 20, 2019

which makes sense for the test backend, and not for any other.

@asajeffrey
Copy link
Member

asajeffrey commented Dec 20, 2019

Ah, that's a holdover from the pre-surfman days.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 20, 2019

We should probably merge these since I was seeing an improvement from 12-15FPS to 15-20FPS.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 23, 2019

Removing the dirty-marking does not impact FPS visibly. Worth doing anyway.

@Manishearth Manishearth marked this pull request as ready for review Dec 23, 2019
@jdm
Copy link
Member

jdm commented Dec 24, 2019

This is still marked as a draft, fyi.

bors-servo added a commit to servo/webxr that referenced this pull request Dec 24, 2019
Do not wait for rAF to be requested

This does not make the render loop _completely_ independent of script: it still waits for script to submit a frame. If we don't, frames will be dropped (and I'm not sure what strategy to use there).

I'm going to test this soon and see if it actually improves our framerate.

Servo side: servo/servo#25343

r? @jdm
@Manishearth
Copy link
Member Author

Manishearth commented Dec 24, 2019

Hm? doesn't seem to be.

But i need to update this to deal with the other one landing

@jdm
jdm approved these changes Dec 24, 2019
@Manishearth Manishearth force-pushed the Manishearth:autoraf branch from 6cfd2cd to 5a23e6d Dec 24, 2019
@bors-servo
Copy link
Contributor

bors-servo commented Dec 24, 2019

💔 Test failed - status-taskcluster

@jdm
Copy link
Member

jdm commented Dec 24, 2019

Oops:

  ▶ TIMEOUT [expected OK] /webxr/xrSession_requestAnimationFrame_getViewerPose.https.html
  │ 
  │ 
  │ 
  └ 
  ▶ Unexpected subtest result in /webxr/xrSession_requestAnimationFrame_getViewerPose.https.html:
  │ TIMEOUT [expected PASS] XRFrame getViewerPose updates on the next frame for non-immersive sessions
  └   → Test timed out
  ▶ Unexpected subtest result in /webxr/xrSession_requestAnimationFrame_getViewerPose.https.html:
  └ NOTRUN [expected PASS] XRFrame getViewerPose updates on the next frame for immersive sessions
  ▶ TIMEOUT [expected OK] /webxr/render_state_vertical_fov_inline.https.html
  │ 
  │ 
  │ 
  └ 
  ▶ Unexpected subtest result in /webxr/render_state_vertical_fov_inline.https.html:
  │ TIMEOUT [expected PASS] inlineVerticalFieldOfView is set appropriately on inline sessions
  └   → Test timed out
  ▶ TIMEOUT [expected OK] /webxr/xrStationaryReferenceSpace_floorlevel_updates.https.html
  │ 
  │ 
  │ 
  └ 
  ▶ Unexpected subtest result in /webxr/xrStationaryReferenceSpace_floorlevel_updates.https.html:
  │ TIMEOUT [expected FAIL] 'floor-level' XRStationaryReferenceSpace updates properly when the transform changes for non-immersive sessions
  └   → Test timed out
  ▶ TIMEOUT [expected OK] /webxr/xrSession_viewer_referenceSpace.https.html
@Manishearth
Copy link
Member Author

Manishearth commented Dec 24, 2019

Oh, it's possible that the synchronization of mock data no longer works with this.

@Manishearth
Copy link
Member Author

Manishearth commented Jan 2, 2020

@bors-servo r=jdm

@bors-servo
Copy link
Contributor

bors-servo commented Jan 2, 2020

💡 This pull request was already approved, no need to approve it again.

  • This pull request previously failed. You should add more commits to fix the bug, or use retry to trigger a build again.
@bors-servo
Copy link
Contributor

bors-servo commented Jan 2, 2020

📌 Commit 1b5f0ed has been approved by jdm

@bors-servo
Copy link
Contributor

bors-servo commented Jan 2, 2020

Testing commit 1b5f0ed with merge fc92b13...

bors-servo added a commit that referenced this pull request Jan 2, 2020
Do not wait for rAF to be requested in render loop

Servo-side of servo/webxr#114

r? @jdm
@Manishearth Manishearth force-pushed the Manishearth:autoraf branch from 1b5f0ed to 666ee5c Jan 2, 2020
@Manishearth
Copy link
Member Author

Manishearth commented Jan 2, 2020

@bors-servo r-

oops, push didn't work

@Manishearth
Copy link
Member Author

Manishearth commented Jan 2, 2020

@bors-servo try- retry r=jdm

@bors-servo
Copy link
Contributor

bors-servo commented Jan 2, 2020

📌 Commit 666ee5c has been approved by jdm

@bors-servo
Copy link
Contributor

bors-servo commented Jan 2, 2020

Testing commit 666ee5c with merge 7281336...

bors-servo added a commit that referenced this pull request Jan 2, 2020
Do not wait for rAF to be requested in render loop

Servo-side of servo/webxr#114

r? @jdm
@bors-servo
Copy link
Contributor

bors-servo commented Jan 2, 2020

☀️ Test successful - status-taskcluster
Approved by: jdm
Pushing 7281336 to master...

@bors-servo bors-servo merged commit 666ee5c into servo:master Jan 2, 2020
2 checks passed
2 checks passed
Community-TC (pull_request) TaskGroup: success
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.