Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session options: immersive? #388

Closed
blairmacintyre opened this issue Aug 19, 2018 · 52 comments
Closed

session options: immersive? #388

blairmacintyre opened this issue Aug 19, 2018 · 52 comments
Milestone

Comments

@blairmacintyre
Copy link
Contributor

There are a few different issues that are touching on session options, but I have a basic questions I'd like to discuss directly.

What does immersive mean? In the current doc, I find
A session is considered to be an immersive session if it’s output is displayed to the user in a way that makes the user feel the content is present in the same space with them, shown at the proper scale. Sessions are considered non-immersive (sometimes referred to as inline) if their output is displayed as an element in an HTML document.

But this isn't satisfactory. What does "same space" mean? What is "proper scale" and why does it have anything to do with the kind of session you are getting? And why is non-immersive "sometimes referred to as inline"?

I'm initially interested in this from the viewpoint of handheld AR: is fullscreen AR (of the sort we see on handhelds using ARKit and ARCore) immersive? Why or why not? Is it "non-immersive" only when showed "inline" (e.g., not full screen)?

My initial expectation is that full-screen video-mixed handheld AR is immersive, but I'd like clarification. Is there a difference between full-screen video-mixed handheld AR and AR or VR on HMDs, aside from mono-vs-stereo? Clearly there are different capabilities that devs will want to know about (touchscreen vs not, for example).

@speigg
Copy link

speigg commented Aug 19, 2018

You’re right, the way the term “immersive” is defined and used here seems a bit fuzzy and/or contradictory. Traditionally, “immersive” might be synonymous with endocentric display (graphics that are displayed as if seen from the user’s physical eyes), which seems to be the definition we are using here... except that definition doesn’t work for the typical handheld AR usecase.

It seems like the closest concept would actually be something like “fullscreen”. If so, it might make sense to define an immersive session as a session which attempts to maximize the use of the available pixels on a given display (by controlling where the layer is rendered and how it is composited, and perhaps hiding any other non-immersive content if necessary). Or perhaps we just rename it to a “fullscreen” session?

Either way, I don’t think “same space” or “scale” should be a factor here, because (1) ensuring that the scene is truly in the “same space” on a video see-through handheld display would require a large FOV back-facing camera and the ability to track the user’s face, and (2) why shouldn’t the UA allow the user to zoom in/out (effectively scaling the space / projection matrix), if they desire to do so?

@toji
Copy link
Member

toji commented Aug 21, 2018

Agreed upon re-reading the text that this is an insufficient and confusing definition. Let's be real here: "immersive" effectively means "in a headset", but I'm not sure how to state that in such a way that it wouldn't inadvertently exclude things like CAVE systems or similar not-quite-headset-based tech.

One term that I've latched onto recently that may prove instructive is "inline", which is effectively the opposite of an immersive session: One that is displayed as an element on a browser page.

@Matt-Greenslade
Copy link

Maybe 'inline' means the content is embedded in a viewer or app that is 2D and not mapped on top of the real environment and 'immersive' means the content is expanded or launched into a 3D mapping overlaying the real environment. It could be inline with the potential to be immersive on a click of course.

@DRx3D
Copy link

DRx3D commented Aug 21, 2018 via email

@blairmacintyre
Copy link
Contributor Author

@toji if immersive means in a head mount, than it should say that and we shouldn't be using a vague term like "immersive". And this raises the very real question: why are we encouraging people to build things that ONLY will work in headsets? Isn't the point of the web to have the content work in various ways? Why would we want things to only work on a handheld vs only work on a headset?

This feels like exactly the kind of path that things like media queries were created to avoid, in the desktop-vs-handheld world. We don't want people saying "this page is for handhelds" and "this page is for desktops". They should react. Mice+keyboard is different than touch, but sites have learned to "deal" (with the help of frameworks).

Similarly, headset (head-coupled stereo + controllers) is different than full-screen handheld (flatscreen + touch) is different, but it's pretty trivial to see how frameworks could let developers adapt to these two common scenarios.

My contention is that a site should work on a headset or on a phone with something like ARKit/ARCore.

"Inline" is fine; except I don't see any of the things we are building supporting "inline". It implies that the content is in a page, and (for example) that page is a normal page, could be scrollable, etc.

Full-screen handheld AR is not "inline"
Full screen handheld AR

@blairmacintyre
Copy link
Contributor Author

@DRx3D I don't think we should get hung up on this sort of pondering of what "immersive" means w.r.t. the real world. (sorry, people have been talking about "immersive" and "presence" and so on for decades, I don't see this as useful when we're talking at this level.)

I was expecting that "immersive" in the API to refer to situations where the device's display was consumed by the visual experience. So, headsets are immersive, and fullscreen graphics on a 2D display is immersive. The implication, to me, is that this is more than just "div covering screen" as the platform may implement it such that it uses prediction, higher framerates, etc., to give a better experience.

I was expecting "inline" referred to a page rendering the content in a div/canvas, and that it would be displayed in the page as the author saw fit. It could get at the tracking info, for example, but would not benefit from the platform rendering capabilities. We've seen "inline" in many WebVR demos, especially on mobile, where the polyfill used the orientation APIs to match some 3D to phone motion.

@toji
Copy link
Member

toji commented Aug 21, 2018

There's nothing about the API shape that was intended to encourage only one type of content or another. We do, however, need a method for allowing developers to choose where and how their content is displayed. At the moment the immersive flag is the mechanism to support that choice. It's pretty trivial to write content that works well in both immersive and inline modes, since the configuration of the session is a pretty minor part of the overall code and we've done a good job at providing reasonable abstractions to the rendering and input mechanisms.

Also, I find there to be exactly zero difference between "fullscreen AR" and AR content shown in an element that's not big enough to cover the entire screen. Mechanically they are exactly the same, with the only difference being the size the developer chooses for the element (or maybe use of the fullscreen API). I consider them "inline" because in both cases they are presented as elements within the page. If the developer has chosen to make the AR content the only element in the page then so be it.

A final point I want to make is that simply saying "immersive means content is displayed in a headset" is not accurate, because you may be displaying inline content in a browser that's displayed in the headset, and that wouldn't be the same thing.

@NellWaliczek
Copy link
Member

Yeah, this is definitely not a straightforward naming issue partially because it's the concepts themselves that are slightly unclear. Heck, originally "immersive" was "exclusive" but that wasn't the right mental model either.
The key differentiator for me is around the interaction model. Are you putting 2D buttons on a screen (or something that is vaguely screenlike) or are your UI elements 3D objects in your scene? Developers need to be able to make that distinction so they build appropriate interaction models.

@DRx3D
Copy link

DRx3D commented Aug 21, 2018 via email

@blairmacintyre
Copy link
Contributor Author

@NellWaliczek I agree. I see this as something the web page REACTS to, not something they request. "Oh, the session I've gotten doesn't have a 2D display, so I need to put buttons in 3D" or "Oh, the session I've gotten has a 2D display, I can do buttons on the screen and/or buttons in 3D".

@toji

There's nothing about the API shape that was intended to encourage only one type of content or another. We do, however, need a method for allowing developers to choose where and how their content is displayed. At the moment the immersive flag is the mechanism to support that choice. It's pretty trivial to write content that works well in both immersive and inline modes, since the configuration of the session is a pretty minor part of the overall code and we've done a good job at providing reasonable abstractions to the rendering and input mechanisms.

I think we may be talking past each other. Developers need to know what the capabilities of their device is (as @NellWaliczek so clearly points out) so they know what to do.

But that's not choosing the display mode the ask for in session options: if "immersive" means "headset" and "immersive" is a flag I can choose to require of my session, then haven't I limited my app to immersive?

Also, I find there to be exactly zero difference between "fullscreen AR" and AR content shown in an element that's not big enough to cover the entire screen. Mechanically they are exactly the same, with the only difference being the size the developer chooses for the element (or maybe use of the fullscreen API). I consider them "inline" because in both cases they are presented as elements within the page. If the developer has chosen to make the AR content the only element in the page then so be it.

I think this may depend on how these things are implemented, which is probably biasing both of our interpretation of this. I expect there to be a difference

  • in the "inline" case, as a developer I am creating a 2D DOM that happens to contain a canvas that has AR content in it. I can do with it whatever I can do with a canvas, including (perhaps) applying CSS3 3D transforms to it (for example).
  • in the "immersive" case, as a developer I would expect that I'm rendering in the 3D canvas and the UA is rendering it fullscreen. The canvas is not being put by me into a 2D DOM, and I can't muck around with were it appears, etc.

Now, practically, in a browser on a phone, it may be that the difference is minor if the user was going to full-screen their inline DOM element.

But, if you consider the 2D-page-in-a-browser situation (e.g., Edge pages on the wall in Hololens or Windows MR; browser pages floating in space in Firefox Reality or the Oculus Browser), then the "inline" case still has the canvas displayed in the 2D panel, while the "immersive" case switches to 3D immersive mode.

To me, from a programmer perspective, there are still just 2 modes: "inline" which says "render the 3D into a canvas and display in my 2D page" and "immersive" which says "render in 3D on the display, in whatever way that means on this display".

There needs to be some way of understanding what the characteristics of the session are, but I think those should be features of the session that has gotten created, not creation options.

Here's another reason why that matters. I'm on a nice modern smart phone, that could support Daydream/GearVR and ARCore. I would love to have a web browser that, when a page I'm on says it wants to do an immersive "VR" session gives me the choice to do it in the HMD, or to do 2D "VR" (i.e., fullscreen 3D graphics, not stereo, perhaps using ARCore for full 6D motion tracking). From the viewpoint of the webpage, they will get a session back, and it will have 1 or 2 cameras, and it will support a 2D touch screen or not.

That should be up to the user (hey, maybe they have a daydream or not; maybe they have one but don't want to use it right now, etc).

@blairmacintyre
Copy link
Contributor Author

@DRx3D

So if I have an AR lens system that shows 3D models, then it is immersive. What happens if I add 2D icons and sensors to the display? Does that make it non-immersive?

I'm not sure what "AR lens system" means; HMD?

Regardless, I don't see the inclusion of 2D icons and sessions changing if between immersive or not, but I may not be understanding what you are asking.

I am asking these questions because readers and future users of the document will probably not have the same level of experience and understanding as this group. Using terms that are not clear (at least in today's language) will cause problems in the future, especially as the model is extended to environments that we cannot even yet conceive.

I agree; that's why I brought this topic up, because the language in the document is unclear.

@speigg
Copy link

speigg commented Aug 21, 2018

Agreed upon re-reading the text that this is an insufficient and confusing definition. Let's be real here: "immersive" effectively means "in a headset", but I'm not sure how to state that in such a way that it wouldn't inadvertently exclude things like CAVE systems or similar not-quite-headset-based tech.

@toji I believe the terms "endocentric" (views that are in alignment with the user's natural viewing frustum, such as HMDs, CAVE, etc) vs "exocentric" (opaque handheld and/or stationary displays presenting content with viewing frustums that are external to the user) can be used to describe what you are saying here.

Nevertheless, my understanding is that 'inline' sessions and 'immersive' sessions may have different capabilities and affordances... i.e, an 'inline' session may lack support for specific features (6DOF tracking, video-see-through, object tracking, anchors, etc.), and presentation of XR content is entirely under application control. On the other hand, an 'immersive' session would support these kinds of additional features, and the presentation of XR content would be more under user/UA control (rather than application control). To me, this suggests that there is a need for defining both of these session types and interaction modes on handheld (non-stereo) XR devices.

@speigg
Copy link

speigg commented Aug 21, 2018

There needs to be some way of understanding what the characteristics of the session are, but I think those should be features of the session that has gotten created, not creation options.

@blairmacintyre Yes, there is that too. One approach is that all XR apps implicitly support an 'immersive' mode (for XR-first browsers, this means XR apps would probably launch in an 'immersive' mode). More so, it would be nice if the user could seamlessly switch between 'inline' and 'immersive' interaction/presentation modes.

Perhaps an app can simply request an "AR" vs "VR" session, and then (after receiving a session) can check how the layer will be presented — whether or not it should be presented 'inline' by the application, or as part of an 'immersive' interface managed by the UA. But I think it should be “easier” for an app to support an ‘immersive’ mode than a ‘inline’ mode (less setup, no need to muck around with the DOM, etc)

@blairmacintyre
Copy link
Contributor Author

Something that came up when I was trying to explain my position on this to @TrevorFSmith, which might also have to do with why @toji doesn't necessarily buy into what I'm saying.

Let's ignore the word "immersive". When I look at the API, I see two "cases":

  • one where the page is getting access to some sensor info (perhaps) but is rendering inside it's page
  • one where the page is getting access to sensor info, and is rendering "in the world" around the user

In the later case, this may be perceptually immersive headworn display AR/VR; or it may be magicwindow style AR/VR. While both of these demand quite different UI's, etc., my hope/expectation is that a programmer would not explicitly have to request headworn or magicwindow; rather they would say I want to take over rendering (somehow).

The would clearly need to be able to understand the context of the rendering session that results from this (touchscreen or not, interaction device capabilities, stereo or mono, etc) in order to create the best UI.

They might even decide "screw this, I'm not going support handheld phone AR/VR, so I'm going to pop up a bit of graphics saying 'sorry, only hmd's need apply, hit "ok" to exit session'".

But, I would prefer them to react to a situation they don't want to deal with (potentially by testing for the capabilities they need) then have to request (in turn) the sorts of situations they might handle, until they get one that's ok.

So, in this context, I think of "immersive" as the second situation, and "inline" as the first. Perhaps we should have stuck with "exclusive." 😄

The discussion on the call this week makes my envisioned situation easier: if we request a session with few or no parameters, and then test for capabilities, the approach happens naturally:

  • request session
  • check for support for AR (i.e., do I need to render a background or not)?
  • support touchscreen?
  • perhaps "supports DOM overlay"? (not sure how we want to help people know if they can overlay 2D DOM elements on the view, or if the WebXR renderer has taken completely over and they can only render into the canvas)?
  • 6D or 3D wand?
  • etc

@TrevorFSmith
Copy link

TrevorFSmith commented Aug 23, 2018

I think we're combining two dimensions of choice and it's leading to terminology confusion:

Here's how I break it down:

Display mode

  • flat: elements in a page layout
  • portal: magic window or aquarium
  • immersive: headset, CAVE, and dome

Content type

  • page: DOM elements using existing web layouts
  • overlay: DOM and 3D elements on the display plane (usually in portal display mode)
  • spatial: 3D positioned visual and auditory information

Web coders need to understand what display mode they's using and what content types they should create and those are somewhat separate concerns.

@TrevorFSmith
Copy link

TrevorFSmith commented Aug 23, 2018

Display modes

Three display modes

Control types

Three control types

Overlay and spatial controls in portal mode

Portal display mode

@blairmacintyre
Copy link
Contributor Author

From the viewpoint of developers, especially when they consider what kind of UI to create, those differentiations are good. Clearly, developers need to understand the context.

But, the tension is see if differentiating between what the developer can/should/needs-to request, and what they discover about the device.

I see no reason for a developer to differentiate their webXR request between portal and immersive: it's unlikely that any individual display supports both, and the differences they encounter in creating UIs are something that they can deal with (or not). They need to know what situation they have. They may choose to say "I'm sorry dear user, I have not created a portal UI, only an HMD/controller one, so I'm not going to show you anything", but I think it's MUCH more likely that all the frameworks that evolve will make it easy to at least provide a trivial UI for all of these situations (even if it sucks).

In contrast, flat/page vs immersive/portal is definitely something they would want to request. On a phone, that's the difference between full-screen graphics and a possibly scrollable page (or a non-scrollable UI with the 3D scene inset in a part of the screen). On an HMD, that could differentiate between a 2D page placed in the 3D world (with 3D content embedded in it), and the full-world rendering we associate with immersive viewing.

The only tension I see in terms of session options is the idea of having the "immersive flag" mean your "immersive display" mode. Perhaps we should have left the flag as "exclusive". Right now, we have no way of differentiating between "flat/page" and "portal/magic window", if "immersive == immersive" 😄

@blairmacintyre
Copy link
Contributor Author

Here's the analogy I use. Right now, web pages do NOT simply fail on a device that hasn't been accounted for.

I might go to a website on my phone that has no custom phone UI, and it looks terrible and might not work well. But I can zoom, or painfully get the information I want. This is increasingly rare, though, as more and more frameworks and tools provide something. My Georgia Tech's lab website has a mobile version built into the Wordpress style I chose, which I didn't even know until I accidentally went to it.

We need the same for WebXR. By default, sites should work everywhere, where "work" means not having something say "this device doesn't support webxr" when it does.

The solution we are close to is:

  • ask for AR (overlay/merge) or VR (render everything); regardless of whats available, it is trivially easy to work in both situations since the basics are the same
  • ask for "take over the device display" or "ability to render in a DIV". Every UA should support some form of both

Even if the content is nonsensical: frameworks will likely evolve to communicate what's missing, and perhaps even provide reasonable fallbacks.

@TrevorFSmith
Copy link

TrevorFSmith commented Aug 23, 2018

Blair wrote:

I see no reason for a developer to differentiate their webXR request between portal and immersive

The most persuasive argument I heard from you yesterday (and please tell me if I'm misrepresenting) is that on the existing web when you use a handheld browser to visit a site that is not responsive and is designed for a desktop sized screen, the handheld browser does its best to render the site, using pinch zoom and other tricks. So, why shouldn't a handheld browser that offers only flat and portal display modes do the same thing and offer the user the best possible experience for sites that are only designed for immersive display mode.

After sleeping on it, I think that the problem is I don't believe that it's possible for browsers to automatically offer reasonable experiences across portal and immersive display modes. Because we're using the black box of a WebGL context, there's not enough information for a browser in portal mode to give the user a usable display and interaction into an experience that is designed only for immersive mode. Especially with new input methods that only work in immersive display mode (e.g. hand gestures and eye tracking come online) there is no equivalent to pinch zooming that can give the user reasonable access.

@speigg
Copy link

speigg commented Aug 23, 2018

Also, the only situation in which an outputContext should really be necessary is the “inline” display mode... so why make an app provide this if it wants to support the “portal” display mode? I’d imagine that a lot of apps won’t care at all about the scaled down “inline” use-case (what Trevor calls “flat” display mode), and would likely not bother with it at all... which seems fine to me, as long as the simplest path allows for both “portal” and “immersive” display (as Trevor defines them).

@blairmacintyre
Copy link
Contributor Author

After sleeping on it, I think that the problem is I don't believe that it's possible for browsers to automatically offer reasonable experiences across portal and immersive display modes. Because we're using the black box of a WebGL context, there's not enough information for a browser in portal mode to give the user a usable display and interaction into an experience that is designed only for immersive mode. Especially with new input methods that only work in immersive display mode (e.g. hand gestures and eye tracking come online) there is no equivalent to pinch zooming that can give the user reasonable access.

Perhaps, perhaps not.

Will all UIs work on all devices? No. Especially not for custom UI and interaction techniques. Do all touch-enabled websites work on the desktop (where "pinch zoom" isn't supported)? Of course not. Do modern web frameworks support developers building applications that work across different devices, by providing alternatives and making it straightforward to build functional (even if not awesome) fallback UIs? of course they do.

Is it easy to imagine different UAs evolving compatibility hooks and interactions to simulate common interaction modes they don't support? Sure. More importantly, I think there are a few basic interactions (selection, for example) that are easy to implement on both. On Hololens, Microsoft opted for a few standard gestures (bloom, airclick) that will be easy to implement everywhere, even if they don't take advantage of the device completely.

I firmly believe that a large proportion of apps (90%?) will be more than capable of running on a range of devices, with a relatively simple framework providing adaptive UIs.

Will this be possible if we encourage patterns where developers create content that refuses to run in modalities they don't support? No. Especially if the "easy" case for developers is "pick your mode and only support that", people will do that. And then there will be no pressure on frame developers to support reacting to different modes. On the other hand, if the easy case is "dedicated takeover" or "in page", and the dedicated runs on both handheld and headmount, we'll quickly have at least basic support for common UI metaphors on both.

My issue is with the API surface and the patterns it encourages, or even enforces, is that we're making an up-front decision to enter an undesirable long term situation where the majority of content works in on either hmd's or handhelds, but not both.

@blairmacintyre
Copy link
Contributor Author

At the end of the day, I'd much rather have a crappy phone UI, where all menus and interactors are "in 3D in the world" and I do pointing and selection by tapping on the screen (instead of waving a wand or pointing a finger), than not be able to use the "AR app designed for an HMD" at all on my phone.

It's pretty easy to imagine a phone-based UA pretending to have a wand floating a foot in front of the screen, pointing forward and slightly up (or even being "movable") so that it can work with wand-based UIs, if this becomes a problem.

@speigg
Copy link

speigg commented Aug 23, 2018

@TrevorFSmith wrote:

The most persuasive argument I heard from you yesterday (and please tell me if I'm misrepresenting) is that on the existing web when you use a handheld browser to visit a site that is not responsive and is designed for a desktop sized screen, the handheld browser does its best to render the site, using pinch zoom and other tricks. So, why shouldn't a handheld browser that offers only flat and portal display modes do the same thing and offer the user the best possible experience for sites that are only designed for immersive display mode. [...] there's not enough information for a browser in portal mode to give the user a usable display and interaction into an experience that is designed only for immersive mode.

I don’t imagine “immersive” mode on a handheld device being a way that the browser deals with apps designed for HMDs. I think Blair was just using the term “immersive” inclusively (to include what you call “portal” display mode). Apps would still have to adapt their UI for either handheld or HMD display modes accordingly (using best practices for each). The issue, IMO, is not whether the UA can force an app designed for HMDs to work on a non-HMD device, but whether an app developer is forced (encouraged) to design their app to work for both HMD and non-HMD XR devices.

@TrevorFSmith
Copy link

Gheric wrote:

I think Blair was just using the term “immersive” inclusively (to include what you call “portal” display mode).

Yes, I think it's not helpful to use "immersive" to include handheld displays when as far as I can tell nobody else is using it that way. As far as I'm concerned, there's no such thing as "immersive" on a handheld device because "immersive" refers to a display type and not a content type. The user is literally not "immersed" in the display, they are holding it at arm's length.

I suspect that the confusion is that you and Blair are using "immersive" to mean spatial content that is located around the user, so you two are talking past others in the group.

@speigg
Copy link

speigg commented Aug 23, 2018

@TrevorFSmith one reason for using the term ‘immersive’ inclusively is so that all apps can use the “simple” path to supporting XR on different displays (not needing to provide an outputContext). As Blair said, perhaps the session creation parameter here should have remained “exclusive”.

@TrevorFSmith
Copy link

Blair wrote:

At the end of the day, I'd much rather have a crappy phone UI, where all menus and interactors are "in 3D in the world" and I do pointing and selection by tapping on the screen (instead of waving a wand or pointing a finger), than not be able to use the "AR app designed for an HMD" at all on my phone.

If laser pointing were the only input types we could expect to use in immersive displays then I might agree with that path. Since we already have input devices and gestures with more complexity, and we already have immersive locomotion and display tricks that simply won't work in portal display mode, it doesn't seem possible for UAs to actually ship what you're suggesting.

It's better to admit that portal and immersive display modes are inherently different and creators must address them separately in order to be in any way usable. The fallback for sites with immersive designs isn't portal display mode, it's flat display mode.

@speigg
Copy link

speigg commented Aug 23, 2018

Perhaps a better option is for the session creation parameter to inverted to be “inline”, with portal/immersive display modes being the default. Though more ideally the session creation options will go away entirely (based on discussion in other threads), and this issue becomes moot.

@blairmacintyre
Copy link
Contributor Author

@TrevorFSmith I guess we'll agree to disagree. I created a pile of demos over the years (with Argon) that had basic immersive and touch screen UIs, and it "just wasn't that bad". So, I am more optimistic, I guess.

@ddorwin
Copy link
Contributor

ddorwin commented Aug 23, 2018

On the topic of whether "Portal Display" should be "immersive mode":

There is currently no "full-screen graphics" mode in WebXR or elsewhere. The Fullscreen API allows a subset of the page (as specified by an HTML element) to be rendered in fullscreen. The entire contents of the element might not even be displayed (i.e., try the Fullscreen button on https://permission.site on a phone). The author gets to decide whether the occupies the full screen.

Note that both the Fullscreen API and WebXR immersive sessions require a user gesture, so applications will likely want to support "Flat Display" to provide a preview on page load. Also, because fullscreen is just changing how an HTML element is displayed, it is simple to go in and out of "Portal Display." Immersive sessions, on the other hand, may require different graphics adapters, will likely involve the user changing displays (including placing a phone in a headset), and can be rendered at the same time as "Flat Display" or "Portal Display."

Both "Flat Display" and "Portal Display" also allow DOM elements to be displayed to the user in addition to the "XR content." If "Portal Display" was an immersive session, this would not be the case. One implication of this is that all smartphone AR applications would need to recreate all their UI in WebGL. (The upside of that is that they would be more prepared for (edit) "Immersive Display" in AR headsets, but most probably wouldn't bother and would just use fullscreen to simulate, which is exactly what we have today.)

@TrevorFSmith
Copy link

@blairmacintyre
My argument isn't that it's impossible for authors to create experiences that work in both portal and immersive display modes. My argument is that it's not possible for UAs to fudge portal access to immersive experiences using the equivalent of pinch-to-zoom like tricks. The author has to do the work and so they need to make different things happen for different display modes.

Immersive display and input hardware has moved beyond the Argon (or WebXR Viewer) style of handheld-only AR so authors have to explicitly and separately support portal and immersive display modes.

For example, TiltBrush written for an immersive display mode and dual tracked inputs can't be fudged by the UA to work in portal display mode. The author can write the app so that it supports both display modes, but they have to explicitly do so.

@speigg
Copy link

speigg commented Aug 23, 2018

Anyways, if the issue is that apps should be allowed to request sessions for only certain display modes, I disagree, but for arguments sake, it still shouldn’t be the “simplest” path. By default, the API should allow applications to receive a session that works for all display modes (or at least both “portal” and “immersive” display modes).

This should make everyone here happy:

// I only want to support immersive display mode 
let xrSession = xrDevice.requestSession({displayModes:[“immersive”]})

// I only want to support portal display mode
let xrSession = xrDevice.requestSession({displayModes:[“portal”]})

// I only want to support immersive and portal display modes
let xrSession = xrDevice.requestSession({displayModes:[“immersive”,”portal”]})

// I only want to support inline display mode
let xrSession = xrDevice.requestSession({displayModes:[“inline”]}) // or “flat”

// all your display are belong to me... (I support all display modes)
let xrSession = xrDevice.requestSession()

@ddorwin
Copy link
Contributor

ddorwin commented Aug 24, 2018

Additional thoughts on "immersive" and related aspects of session creation:

  • At the most basic level, the purpose of the "immersive" distinction is that you would render non-immersive immediately and have a button to start an immersive session. (This was much clearer with WebVR's requestPresent().)
  • Immersive WebXR sessions are really a parallel platform separate from the (2D) web platform.
    • Nothing rendered in the 2D web can be displayed in or over the immersive session.
    • Non-immersive/inline is just a <canvas> that is part of the 2D web platform.
      • Apps that rely on this will not translate well to immersive sessions.
  • I think "exclusive" is potentially a better term.
    • I think this originally meant exclusive access to the device.
    • It is really exclusive responsibility for rendering.
    • There is probably a better term along those lines.
  • Because immersive sessions really are this special case, a presentation paradigm does seem to make sense.
  • At the f2f, we discussed the idea that immersive/presentation could really be an upgrade of an existing session. In most cases it will be as there is no reason to mirror.
    • In that case, maybe we should bring back something like requestPresent() on XRSession.
  • The current requestSession() design allows mirroring to work like magic window. Maybe whether to continue rendering to the <canvas> could be an option for requestPresent()
  • I think there may be other modes of rendering in the future.
    • Thus, a bool and decision at session creation may not be the best option, especially if we want apps to generally work across all clients.
    • Other possible modes include rendering outside a floating browser window and non-headset-based augmentation.
    • I think methods for rendering in different ways might be better: i.e., requestExclusivePresentation() and requestExternalPresentation()

@blairmacintyre
Copy link
Contributor Author

thanks @ddorwin. Some comments on your thoughts:

At the most basic level, the purpose of the "immersive" distinction is that you would render non-immersive immediately and have a button to start an immersive session. (This was much clearer with WebVR's requestPresent().)

That's a reasonable way to think about it, yes, that's what I was thinking. It holds up on the various "setups" I can imagine (desktop + external; phone; phone + headset adaptor; standalone headset with 2D-views-floating-in-3D).

Immersive WebXR sessions are really a parallel platform separate from the (2D) web platform.

  • Nothing rendered in the 2D web can be displayed in or over the immersive session.
  • Non-immersive/inline is just a that is part of the 2D web platform.
    • Apps that rely on this will not translate well to immersive sessions.
  • I think "exclusive" is potentially a better term.
    • I think this originally meant exclusive access to the device.
    • It is really exclusive responsibility for rendering.
    • There is probably a better term along those lines.

Yes.

Because immersive sessions really are this special case, a presentation paradigm does seem to make sense.

Whether it's "presentation" or not, it is a significant action. For example, it could be UA/user driven, like we did in Argon. There hasn't been much interest in that here, primarily (I think) because we would then require all UAs to create new native interfaces to give the user the ability to switch to/from immersive mode.

Regardless of whether it's app or user driven, it does feel like an explicit action.

The only exception to this is if you follow a link while already in immersive mode; it would be reasonable for the destination page to enter immersive mode automatically (assuming some form of UA permission / control so the user knows what's happening).

At the f2f, we discussed the idea that immersive/presentation could really be an upgrade of an existing session. In most cases it will be as there is no reason to mirror.
In that case, maybe we should bring back something like requestPresent() on XRSession.

Brandon's proposed session rework addresses this, I think, if we want it to; "requestPresent" could be viewed as akin to the other capability requests.

The current requestSession() design allows mirroring to work like magic window. Maybe whether to continue rendering to the could be an option for requestPresent()

  • I think there may be other modes of rendering in the future.
  • Thus, a bool and decision at session creation may not be the best option, especially if we want apps to generally work across all clients.

Yes, that's actually my overall issue.

  • Other possible modes include rendering outside a floating browser window and non-headset-based augmentation.

Interesting; I hadn't considered the "rendering outside a floating browser" (by which I assume you mean the ongoing discussions elsewhere about ways to pop 3D content out of a page) as a WebXR mode. Unclear how that will work to me ... but it's worth considering.

  • I think methods for rendering in different ways might be better: i.e., requestExclusivePresentation() and requestExternalPresentation()

Perhaps. Might be worth walking through some cases.

@blairmacintyre
Copy link
Contributor Author

After going through all of this, I think the issue is still open, but clearer. So let me try again.

I think that any UA, HMD or not, should be able to support something they call "immersive" mode. What it means is that the page renders 3D only, and takes over the display. No mix of DOM and non-DOM content.

"Inline" means that the content is in a canvas and that canvas can be put in the DOM and used like any other canvas, mixed with content, etc.

A phone could support both without an HMD, at the discretion of the UA.

The only issue I see here is that there is no guarantee that there is only one version of each, and that both immersive and inline support the same mix of AR and non-AR (i.e., overlayed on the world somehow, or not)

For example:

  • On android, a UA could support inline AR, but not AR in an HMD. It might support "immersive" AR as a fullscreen presentation (with no DOM) or allow inline AR to go fullscreen and mix in DOM.
  • A browser in an AR HMD might support immersive AR but not AR inline
  • A desktop browser might have multiple HMDs plugged in. It could simultaneously support AR and VR HMDs (e.g., I might have a Meta and a Vive plugged in), but probably doesn't support inline AR (although perhaps a UA would implement AR if the attached HMD had a camera).

Thinking about my current project (WebXR Viewer):

  • it's unclear if I can effectively support inline AR (although I can use the "inject video frames into the javascript" approach that I use to do CV right now, and inject video into a DOM element, I suppose
  • our current "display video natively, overlay 3D" would be an "immersive" mode. The fact that it technically allows DOM content to be displayed is "unsupported", and we would likely want to not support that.

@toji
Copy link
Member

toji commented Aug 28, 2018

Woo boy. There's a lot going on in this thread since I last sat down to look at it, and while I've tried to read everything I'm sure I haven't grokked everyone's positions correctly, so forgive me if I say something that was already refuted by an earlier comment.

I want to try and nail down is the core display modes we want to enable, since it doesn't seem to have come to a resolution above. I've seen multiple times this listed as three distinct things, which @speigg referred to as immersive, inline, and portal. (And thanks @TrevorFSmith for the infographics on this subject. That was educational.)

I feel pretty strongly that we shouldn't re-invent pieces of the web platform we don't have to. Given that the Fullscreen API already exists, and that our "inline" page output is done via a canvas element, it's already possible to achieve the "portal" effect described above by simply making the canvas element fullscreen. Thus I'm heavily inclined to say we shouldn't have any notion of "portal" mode in the API. (That's not to say that it may not be useful to use that verbiage in tutorials, support libraries, or other supplementary material. It just wouldn't be a formal concept in the API.)

So with that said, let me speak to @blairmacintyre's most recent post.

I think that any UA, HMD or not, should be able to support something they call "immersive" mode. What it means is that the page renders 3D only, and takes over the display.

There's room for UA choice here, but I worry that this would be surprising behavior for most people. Let's assume that no matter how an "immersive" mode manifests, it'll require user activation (which typically == a button). So if we tell the page that they can use immersive mode, they add a button which the user clicks on, and the result is that the page simply goes fullscreen. That's probably not meeting user or developer expectations. You could make a case for that being a valid interpretation, but I have a hard time seeing most browsers following it.

"Inline" means that the content is in a canvas and that canvas can be put in the DOM and used like any other canvas, mixed with content, etc.

That's exactly how I've been viewing "inline" content.

it's unclear if I can effectively support inline AR

our current "display video natively, overlay 3D" would be an "immersive" mode.

Could you expand on these? I'm beginning to think that there's an implementation limitation in another browser or library I'm not aware of that's feeding into this discussion.

@speigg
Copy link

speigg commented Aug 28, 2018

@toji wrote:

Given that the Fullscreen API already exists, and that our "inline" page output is done via a canvas element, it's already possible to achieve the "portal" effect described above by simply making the canvas element fullscreen.

Possible, yes, but as currently specced this requires creating an canvas/outputContext, placing the canvas in the DOM appropriately (and perhaps calling the fullscreen API as you said)—for what should arguably be a much simpler situation than rendering to an immersive (e.g., HMD) display. The primary argument as I understand it is that the initialization of an XR session should not be fragmented between different types of XR devices, as it encourages the development of tools and applications for one kind of device, and not another (particularly if “immersive” sessions are somehow perceived as superior, more important, or more full-featured). Supporting “inline” display takes extra work, and that’s fine, but for that reason it should be an additional (optional) feature of an XRSession. Likewise, the simplest default path for creating a (“non-inline”) XRSession and rendering graphics to the display should work for any XR Device—whether the resulting display mode is “immersive” or “portal” (and if both modes are supported, it should be up to the UA/user to decide which one is appropriate).

@TrevorFSmith
Copy link

I'm still failing to understand the stance that portal and immersive display modes can be irrelevant to authors. Each mode requires radically different interaction and content design. In portal mode, authors will need to create overlay controls, indicators for how the user should move the handset to help SLAM, and a whole host of other features that are essentially different work than what needs to happen in immersive display mode.

@toji
Copy link
Member

toji commented Aug 28, 2018

Oh, I forgot to mention @speigg: Thanks for introducing me to the terms "endocentric" and "exocentric"! I wasn't aware of phrases for those concepts previously. It seems to me that based on the Wikipedia entries I linked, though, that both of the terms describe a variant of what I would consider an "immersive" display for the purposes of the API. The distinction is in how they achieve it. Specifically, a CAVE is used as an example of an exocentric environment.

as currently specced this requires creating an canvas/outputContext, placing the canvas in the DOM appropriately (and perhaps calling the fullscreen API as you said)—for what should arguably be a much simpler situation than rendering to an immersive (e.g., HMD) display.

I see where you're coming from on this a bit better now, but I'm still not sure I support the idea of this as a core API concept, because it IS still just an alternate way of getting the same effect as a fullscreen inline canvas. I would be 100% fully on board with tools like A-Frame providing it as a easy-to-use display mode, though.

I'm still failing to understand the stance that portal and immersive display modes can be irrelevant to authors. Each mode requires radically different interaction and content design.

I'm with Trevor here. It would certainly be possible to build some basic UIs that were immersive-centric and which would continue to work inline due to our current input model. I have a hard time seeing many larger projects being satisfied with that, though, when there's ease of use, accessibility, and developer familiarity benefits to going with an overlay UI when showing inline content.

Which leads to another point: I think we need to be prepared for the fact that, at least initially, we're going to see a fair number of potential users that only care about phone AR. It's got both buzzword appeal and a larger potential user base than VR, plus it doesn't require the user to "mode switch" by donning a headset. I'm absolutely a believer in both AR and VR, and I want to see people make content that scales to various environments as cleanly as possible. However, if we start off saying, in effect, "You must design your app to be usable by this modest slice of VR users in order to access this much larger market of phone AR users, even though there's no technical reason for that dependency" we'll drive content that otherwise would have happily lived on the web to native apps that have no such restriction. We must allow developers to say "I only care about use case X", or we've failed at a pretty fundamental aspect of our API design.

Don't get me wrong: I feel we should definitely make it as easy as possible to create responsive XR content. (With any luck the hardware ecosystem will start to encourage that anyway.) I just don't see how we could enforce it without driving developers away.

@speigg
Copy link

speigg commented Aug 28, 2018

@TrevorFSmith wrote:

I'm still failing to understand the stance that portal and immersive display modes can be irrelevant to authors. Each mode requires radically different interaction and content design. In portal mode, authors will need to create overlay controls, indicators for how the user should move the handset to help SLAM, and a whole host of other features that are essentially different work than what needs to happen in immersive display mode.

Portal vs Immersive have different design implications which can (and probably should) be handled by an application-level toolkit that helps authors do the right thing in each circumstance (much like I imagine your PottasiumES framework does), in the same way that modern UI frameworks give developers the tools needed to adapt the layout and presentation of their content for different screen sizes and input capabilities.

@toji wrote

I see where you're coming from on this a bit better now, but I'm still not sure I support the idea of this as a core API concept, because it IS still just an alternate way of getting the same effect as a fullscreen inline canvas. I would be 100% fully on board with tools like A-Frame providing it as a easy-to-use display mode, though.

I think I am still failing to communicate the problem... essentially, as a developer, I’m either in control of certain things, or the system/UA/user is in control (inversion of control). The assumption with an “inline” display mode is that the app remains in full control over how to render (including where to place the canvas on the screen), while with the “immersive” display mode the system dictates how the app must render and controls how that content is presented on the display. This distinction is important, because the semantics of “who is in control” has implications for the kinds of user interfaces the UA is able to embed the XR content within, and allows the UA to give the user direct control over how XR content is viewed (which is important for an XR-first browser). For “inline” display mode, the XR layer is already embedded and presented within the DOM, and there isn’t much the UA can do to “enforce” a different layout. With “immersive” display mode, the UA can enforce how and where content is rendered to the display. Speaking to the future, this has implications beyond what a single app (or application-level framework, like AFrame) is able to do. The missing piece here is a display mode for handheld devices that has the same “inversion of control” semantics that the “immersive” display mode currently (with regard to rendering, at least) employs.

So this is not really about the fullscreen “effect” and whether or not that is possible for the app to do on its own... rather it’s about the semantics of application controlled displays vs UA controlled displays, both extremes of which have their uses.

@speigg
Copy link

speigg commented Aug 28, 2018

@toji wrote:

Oh, I forgot to mention @speigg: Thanks for introducing me to the terms "endocentric" and "exocentric"! I wasn't aware of phrases for those concepts previously. It seems to me that based on the Wikipedia entries I linked, though, that both of the terms describe a variant of what I would consider an "immersive" display for the purposes of the API. The distinction is in how they achieve it. Specifically, a CAVE is used as an example of an exocentric environment.

Sure! BTW, a CAVE system (or a projected AR system, like RoomAlive) can be egocentric if it tracks the user’s head (so content can be rendered from the user’s perspective). Likewise, a handheld display can be egocentric if it tracks the user’s head (and renders content with the appropriate off-axis perspective projection matrix), and even AR can be done egocentrically on a handheld device by using a “virtual transparency” technique.

Edit: I see the Wikipedia entry you are referring to, about “exocentric vs endocentric environments”. That article uses these two words to refer to physical location of the display (essentially, on the user’s head or not). This doesn’t seem like quite a useful distinction as egocentric vs exocentric rendering, which is what I meant. See the classic paper by Milgram on categorization of mixed reality displays for the terminology I am referring to.

@speigg
Copy link

speigg commented Aug 28, 2018

@toji wrote:

We must allow developers to say "I only care about use case X", or we've failed at a pretty fundamental aspect of our API design.

That’s totally fine. Giving authors the option to explicitly exclude (or include) support for certain display modes does not preclude:

  1. a display mode with “inversion of control” rendering/presentation semantics on handheld devices (same “inversion of control” rendering semantics as “immersive” display mode, but let’s call it “portal” display mode to distinguish from rendering on HMDs and such).
  2. a default “path of least resistance” session initialization in which both “immersive” and “portal” display modes are implicitly considered supported by the app.

@blairmacintyre
Copy link
Contributor Author

Reading through the above, I think people are talking past each other.

I don't think either @speigg or I believe or have a "stance that portal and immersive display modes can be irrelevant to authors." Asserting that would be silly. I'm sorry if I'm somehow phrasing things that makes it sound like that. Obviously, creating good APIs for different sorts platforms will require non-trivial work.

Similarly, I don't think anyone asserted that "we start off saying, in effect, 'You must design your app to be usable by this modest slice of VR users in order to access this much larger market of phone AR users, even though there's no technical reason for that dependency'. Nothing I (or @speigg) has proposed does not 'allow developers to say "I only care about use case X"', I completely agree that if we don't support this 'we've failed at a pretty fundamental aspect of our API design'.

So, let me try again.

As (@ddorwin? Someone?) observed, "immersive" is really "exclusive presentation". Changing to "immersive" might have been a mistake. The gist of what @toji says about "inline" mode I tend to agree with, in that it's the right way to add XR content/elements to an existing website. And I completely agree with @TrevorFSmith and @toji when they argue that the bulk of the UI (and, thus, aspects of the app organization, and so on) for a site that wants to support both handheld AR/VR and HMD AR/VR will need some non-trivial differences.

And I agree that even if a handheld AR/VR app chooses to implement their API entirely in WebGL. ignoring the DOM, the 3D UI for a touch screen will need to be different than for an HMD plus 3D interaction; I've built simple demos that work in all 3 ways (touchscreen + DOM, touchscreen + WebGL UI, touchscreen + HMD) I will verify it's non-trivial. But I also agree with @speigg that over time, some of the pain of this will be handled by frameworks, at least for common cases.

What I'm arguing for is something else.

The only thing I disagree with @toji on is if "immersive" (or, instead, "exclusive presentation") makes sense on handhelds. I think it does, for a few reasons.

  • different handheld browsers treat fullscreen differently. Some don't even let you do fullscreen. so relying on this seems destined to make fullscreen AR on handhelds "hit or miss". I would expect that most people wanting to do AR with ARKit/ARCore are thinking about fullscreen AR, and really want to know it's possible, without some butt-ugly URL bar hanging around.
  • I am dubious that building a DOM structure, and making some element fullscreen, will have the same performance as a dedicated "this is just a single 3D rendering context, nothing else will be rendered, we are taking over the display to do 3D" rendering implementation. Surely telling the browser all DOM rendering can be stopped, we're just doing 3D, opens the door to performance improvements, moreso than "this one DOM element is now fullscreen". I'd love to hear from people implementing inside the browser here.
  • similarly, right now rAF on headmounts runs faster in immersive mode than rAF in the DOM. It seems plausible to me that a handheld AR platform could support a higher performance rendering setup for just AR that support higher frame rates and other features, beyond what could be done with fullscreen 3D canvases
  • beyond these things, it seems like there are two modes (integrate with the DOM, with 3D in a canvas; dedicated display that takes over everything). Different browsers will implement these differently, but given that these are both well defined, it makes sense to support both explicitly when possible. If I know I want full-screen AR, purely with 3D and webGL, why do I have to go through a bunch of crap with setting up DOM elements, requesting fullscreen, and not knowing if it will even work (i.e., give me fullscreen) on the device?

I'm not arguing against supporting inline+fullscreen, I'm arguing against NOT SUPPORTING "dedicated display" mode. Obviously, user's would need to know that they are on a handheld w/ touchscreen, not an HMD, just as they will want to know the nature of the controllers available to them on an HMD (just buttons/joysticks, 3DOF, or 6DOF, fingers or hardware device, 1 or 2 hands?)

Perhaps we need to revert from "immersive" to "exclusive" again! 😆

@blairmacintyre
Copy link
Contributor Author

blairmacintyre commented Aug 29, 2018

One more thing I wanted to call out from @toji's post above:

There's room for UA choice here, but I worry that this would be surprising behavior for most people. Let's assume that no matter how an "immersive" mode manifests, it'll require user activation (which typically == a button). So if we tell the page that they can use immersive mode, they add a button which the user clicks on, and the result is that the page simply goes fullscreen. That's probably not meeting user or developer expectations. You could make a case for that being a valid interpretation, but I have a hard time seeing most browsers following it.

I don't think there would be confusion, and we could probably talk about this at the F2F or outside this. We are clearly thinking about this differently. Perhaps because of the switch to "immersive" as the language, and the fact that for you this implies "HMD". If we used the term "dedicated rendering", and provided a way for the developer to know if immersive was going to an HMD or to "full screen on the phone" (so they could, for example, create the appropriate icon, or even not provide such at option), would that ease your concern?

(It seems that no matter how many times I say "immersive doesn't mean HMD to me", so when I talk about fullscreen immersive on handhelds I mean "dedicated non-DOM-bound rendering" it's not being heard)

@speigg
Copy link

speigg commented Aug 29, 2018

@blairmacintyre wrote:

(It seems that no matter how many times I say "immersive doesn't mean HMD to me", so when I talk about fullscreen immersive on handhelds I mean "dedicated non-DOM-bound rendering" it's not being heard)

Right. If “immersive” must mean “HMD-like”, then certainly the name of the “immersive-web” group seems to be quite limited in scope, and secondly I would have to revoke my original proposal (issue #320) to rename the “exclusive” option to “immersive” as my intent was not to limit the “exclusive”/“immersive” display mode to HMDs.

@danzeeeman
Copy link

danzeeeman commented Sep 4, 2018 via email

@NellWaliczek NellWaliczek added this to the TPAC 2018 milestone Sep 12, 2018
@toji toji mentioned this issue Sep 14, 2018
@NellWaliczek NellWaliczek added the agenda Request discussion in the next telecon/FTF label Oct 10, 2018
@toji
Copy link
Member

toji commented Oct 31, 2018

This was discussed a bit at TPAC (Search for "Issue 388"). One of the points that was brought up was that the framing of the issue has shifted somewhat with the decision to remove the ability to do inline AR, which makes a lot of this conversation less relevant.

Given the length of the issue itself and the spec moving forward in the meantime, I'm inclined to close this down in favor of more targeted discussion of the current mode names and surrounding text if needed.

@toji toji closed this as completed Oct 31, 2018
@cwilso cwilso removed agenda Request discussion in the next telecon/FTF labels Jan 16, 2019
@cwilso cwilso modified the milestones: Spec-Complete for 1.0, 1.0 Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants