New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XRFrameOfReference needed for Projection Matrices on some backends? #412
Comments
One quick note to add... with option 2, calling |
I support the proposal option (2), as I think it simplifies several of the APIs in addition to this one. For example, with AR's hit-test we were thinking we would need to supply a frame-of-reference to some of the calls, and we could instead use the current frame-of-reference. One question I would have is that #409 calls for this API:
The idea behind that parameter being optional is to support inline sessions, I believe - so how would one express this in the new design? Does not setting a frame-of-reference imply an inline (frame-of-reference-free) session? |
Through the work I've been doing on #396, I've been thinking about how we would properly support diorama-style experiences within a 2D page being viewed in a headset. I'm generally leaning in the direction of needing to add a new Given those things, I'm inclined towards option B. We'll need to be very crisp about when the active FoR can be changed and when those changes are applied, but I think that's manageable. |
I'm generally not a fan of APIs that require setting global state at the right times with complicated rules about when things take effect. Web applications comprised of multiple sub-components typically struggle with state leakage. In the future, we will likely have video and HTML layers. I expect applications that use these will go for long periods of time without having "frame" callback opportunities to set global state so that subsequent events return expected numbers. Forcing them to do so seems wrong to me. To me, option A seems more straightforward to explain to people and reason about. |
Very interesting! Option 2 seems like it's worth exploring as an API simplification. We'd talked separately about adding some sort of
While there may be developer simplicity benefits here, I don't think this buys us out of pose syncs for inactive frame instances that the app is still keeping alive. A developer can still hold onto two Note that full correctness in the current API design requires a UA to support relating the position of two known anchors in a disjoint map fragment to one another ( If we explore that path, we should see how we'd then extend the WebXR API to support the scenarios discussed in #384. For example, if we consider planes to always be static, An option there could be to double-down on an
Perhaps then we'd rename Lots to think about! |
One related note that this change calls to mind - we should generally aim to avoid the term One thing I like about option 2 is that it naturally removes the entire notion of interface XRFrame {
readonly attribute XRSession session;
// No more views array here
XRViewPose? getViewPose(XRFrameOfReference frameOfRef);
XRInputPose? getInputPose(XRInputSource inputSource, XRFrameOfReference frameOfRef);
};
interface XRView {
readonly attribute XREye eye;
readonly attribute Float32Array projectionMatrix;
readonly attribute Float32Array viewMatrix; // View matrix just becomes another property of the XRView
};
interface XRViewPose {
readonly attribute boolean emulatedPosition;
readonly attribute Float32Array poseModelMatrix;
readonly attribute FrozenArray<XRView> views; // Views are now here
}; |
One interesting gotcha with option 2 is when we do get to multiple compositor layers. If a secondary WebGL layer is rendering a controller, it may wish to reproject based on predicted controller motion, rather than head motion. If so, it may want to render that second layer relative to a different root coordinate frame in some way. If we push too strongly on having a single active frame of reference without allowing apps to override that for a given API call, we may block off some paths for rendering multiple layers. This leans me back towards prefering a "default frame of reference" model where functions do take an XRCoordinateSystem/XRFrameOfReference but it can often be omitted to use your base frame. This gives more flexibility than a "single active frame of reference" model where apps are always handed poses in the active frame, with no chance to override. Otherwise, we may see apps juggling a mutable "current active" global frame of reference to render different layers of their scene, which seems messier. |
Fixed by #422 |
From the OpenXR SIGGRAPH presentation (https://www.khronos.org/assets/uploads/developers/library/2018-siggraph/04-OpenXR-SIGGRAPH_Aug2018.pdf) on Page 43 - "Viewport Projections"
In this slide we can see that OpenXR (as it stands now) has a function called
xrGetViewportProjections
that returns more-or-less everything that we report in ourXRFrame
. That includes the information necessary to build view and projection matrices. Because it's returning view information it's easy to surmise that the equivalent of anXRFrameOfReference
(what OpenXR calls anXRSpace
) needs to be provided to the function in order for it to return the right information. This also means, however, that projection matrices aren't available until the point theXRFrameOfReference
is passed in, either.This is a point of incompatibility with WebXR as it stands today. Currently the
XRFrame
reports an array ofXRView
s prior to any frame of reference being specified, which in turn contains projection matrices as a property. With a lot of native backends that seems fine, as projection parameters are frequently treated as static. Despite that, it's not too surprising to see some APIs may want to combine it into the same function call that gets the other space-dependent frame rendering data. In any case, we definitely want to address a known incompatibility with a native API.There's two straightforward ways that I can see to address this:
Move view info into the call that takes a FrameOfRef
Probably would justify a method rename, but in essence we'd simply be moving the views array under what is today the
getDevicePose
call:IDL
And the render loop we describe in the explainer doesn't change too much:
And that actually strikes me as a decent clarification of the API (modulo maybe the function name changing.) It's also got a nice side effect of reducing the number of JS function calls you're making by each frame, which is a minor but notable plus from an efficiency standpoint.
But, I'm also wondering if there's some sense in taking it a bit further?
Switch to a single-Frame-of-Reference-at-a-time system
This is more radical, but I'm starting to think it may be possible now, whereas previously I didn't. Given the changes proposed in #409, along with conversations on similar topics at the AR F2F, it's becoming apparent that Anchors (or similar mechanisms in the future) won't be something you query user poses/view matrices against, and instead will be something that you query for their position in relation to the larger Frame of Reference. That means that even in large-scale AR scenarios you're likely to only have one Frame of Reference at any given time.
So... what would it look like if we fully embraced that idea for the sake of API clarity?
IDL
And now the frame loop is even simpler, since we don't have to poll the pose with the FoR every frame.
This would also potentially have some efficiency/performance benefits for the browser, since we now only have to sync pose data for the known active Frame of Reference, in addition to in the future anchors knowing in advance which Frame of Reference they will be queried against. That makes the IPC some browsers need to do a lot less messy. Not to mention this happens to map a bit better to how systems like Oculus' PC SDK work and would probably reduce developer confusion as well, especially when learning the API for the first time.
In order to make this approach practical we'd want to ensure that any legitimate cases where you'd want to use multiple frames of reference at once were adequately addressed, but at this point the only one I'm really aware of is if you're using Nell's proposed 'unbounded' frame of reference and want to transition between your current one and a newly recentered one to avoid precision issues. If support for that case can be built into the
XRFrameOfReference
itself, though, it's may become a non-issue.Another potential issue that we'd have to work around is what happens to in-flight events during a Frame of Reference switch? (Thanks @NellWaliczek for pointing this issue out.) For example, if there's an input event that's been triggered by the native backend (which, again, may be a different process than the JS code) but the FoR is changed before the event is fired, what should we do? We could say that the change doesn't take effect until the next XRFrame fires, which may lead to developers misunderstanding a few events here and there, or we could force the browser to re-compute the event poses prior to firing? (Some systems may make that easy, others may not.) Or we could limit when the Frame of Reference could be changes somehow? I don't have a clear answer, but I do think it's a tractable problem.
Would love to hear opinions on this, especially from people who have worked with AR systems closely. Thanks!
The text was updated successfully, but these errors were encountered: