New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec language precludes non-visual uses #815
Comments
I am building maps utilizing VR audio. At this point it is 3D web audio, but when WebXR is enabled, I will be connecting pos data and localizing the user's position into the VR map and connecting that to the audio listener. |
Also, many of my users will use devices that don't have screens, such as: |
Also, animation in the spec should be clear that it is not "the manipulation of electronic images by means of a computer in order to create moving images.", but instead "the state of being full of life or vigor; liveliness." Same with "View". It should be "regard in a particular light or with a particular attitude." rather than "the ability to see something or to be seen from a particular place." Everything should be written as if the user could access the object or scene from any sensory modality. |
Do you think it would be useful to broaden devices from Headset Devices to immersive devices? |
Re: "Audio AR", my impression is that it's referring to something similar to the Bose AR glasses. Based on the information that I've been able to find about those kinds of uses I'm not entirely sure how they perform their location-based functions. I would actually be surprised if it was built around any form of real positional tracking, and am guessing it's more along the lines of Google Lens which surfaces information based on a captured image with little understanding of the device's precise location. In any case, I'd love to know more about how existing or upcoming non-visual AR devices work so we can better evaluate what the appropriate interactions are with WebXR. Now, ignoring the above questions about how current hardware works, if we assume that we discover a device that provides precise positional tracking capabilities but has no visual output component we can brainstorm how that would theoretically work out. While it's not clear how web content would surface itself on such a device, it seems safe to say that traditional Regardless, given the relative scarcity of this style of hardware today and the large number of unknowns around it I don't see any pressing need to move to support this style of content just yet. It's absolutely a topic that the Working Group should follow with great interest, though! Marking this issue as part of the "Future" milestone so we don't lose track of it. |
It's possible to have a camera, GPS, accelerometer, Bluetooth, Wi-Fi, and any number of sensors within a device without a screen, see: It's not explicitly used for VR, but it's an Android-based phone which can access the web and can do location tracking.
Web content is accessed through both Braille and text to speech from a screen reader.
What's your definition of "traditional immersive-vr style content"? I am actually not aware of any VR devices that have only visual output. All the devices I'm aware of have either audio only, or audio and visual. |
I apologize, because I think two subject have been confusingly conflated here. My comments were primarily aimed at theorizing about how developers could create content specifically for "audio first" devices if desired. Allowing developers to target content to a specific form factor is something that we're heard repeatedly from developers is important, and this scenario would be no different. (And I will admit that, due to the circumstances in which I wrote my comment, I actually didn't even see your previous comments till just now. Sorry!) As API developers we see this as distinct from the accessibility considerations you describe for a variety of reasons, primarily to avoid discriminatory content that you mentioned. We definitely want to provide a variety of ways for developers to make content which has a visual component more accessible to nonvisual users. And while we don't prevent developers from adding audio to their VR experiences it's not as fundamental to the API's use as the rendering mechanics, and definitely should not be relied on as the soul source of accessibility. We've had some conversations about this in the past (I'm trying to dig them up to link here), and there's been some recent threads started about visual accessibility as well. It's a tough issue, and one we take seriously, but also isn't the intended topic of this specific issue. |
I think this is the topic we're discussing in this issue. Why do the rendering mechanics seem to require visual feedback? Can't we separate visual, auditory, and tactile rendering into their own render mechanics separate from the main loop? Currently I see very little about audio or tactile rendering in the documents like: I would like to see:
|
Hello from Mozilla!
This is a very interesting concept. I would love to explore the kind of
experiences that you may enable with an API like WebXR without visuals. Do
you have some particular ideas in mind that could help give context?
One that comes to mind for me is "Papa Sangre":
https://en.wikipedia.org/wiki/Papa_Sangre
Perhaps a similar story could be more immersive if expanded into a
non-visual, room scale experience.
Thanks for sharing your perspective. I would like to learn more.
Cheers,
- Kearwood "Kip" Gilbert
…On Fri, Aug 23, 2019 at 2:00 PM Brandon Jones ***@***.***> wrote:
I apologize, because I think two subject have been confusingly conflated
here. My comments were primarily aimed at theorizing about how developers
could create content specifically for "audio first" devices if desired.
Allowing developers to target content to a specific form factor is
something that we're heard repeatedly from developers is important, and
this scenario would be no different.
(And I will admit that, due to the circumstances in which I wrote my
comment, I actually didn't even see your previous comments till just now.
Sorry!)
As API developers we see this as distinct from the accessibility
considerations you describe for a variety of reasons, primarily to avoid
discriminatory content that you mentioned. We definitely want to provide a
variety of ways for developers to make content which has a visual component
more accessible to nonvisual users. And while we don't *prevent*
developers from adding audio to their VR experiences it's not as
fundamental to the API's use as the rendering mechanics, and definitely
should not be relied on as the soul source of accessibility.
We've had some conversations about this in the past (I'm trying to dig
them up to link here), and there's been some recent threads started about
visual accessibility
<immersive-web/proposals#54> as well. It's a
tough issue, and one we take seriously, but also isn't the intended topic
of this specific issue.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#815?email_source=notifications&email_token=AAY27QGAW4TWQXY457Z3R5TQGBFVPA5CNFSM4IOYBBGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BKELQ#issuecomment-524460590>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAY27QFFDDWN56NNKTYN3UTQGBFVRANCNFSM4IOYBBGA>
.
|
Auditory XR displaysThere are around 700+ only games that can be played completely using audio at audiogames.net. Papa Sangre is one very good example. These are the papers off the top of my head from the last 2 years of ICAD. I'll get a few other sonification researchers to give their input. Each of the above papers has an extensive literature review that give even more examples. Edit: For AR:[Here is a project using computer vision and other sensors to provide turn-by-turn navigation indoors using audio.(https://link.springer.com/chapter/10.1007/978-3-319-94274-2_13) TactileTouchX is an interactive XR tactile device. If you do a search for "Haptic Gloves", you'll find hundreds of examples. Woojer Vest a haptic vest. Haptic Only ExperiencesAccessing the internet with only a haptic glove Some other experiences that need to be done through XR touch include:
... Even if you used your sight for most of the above activities, I guarantee someone, like me, will only use touch. Reading in Modes Other than VisualOften the question is: "How is one going to access the web with just an audio or haptic interface?" |
This is a really good topic of discussion. Building on your last comment @frastlin, another application for auditory displays is data analysis and data exploration. Wanda L. Diaz Merced is a visually-impaired researcher who worked on auditory display of physics data with NASA. Her research was with low-dimensional data, but spatialization is a popular area of sonification research with benefits similar to 3D data visualization, allowing for the mapping of additional dimensions or data relationships to space. Sometimes the spatial relationship in the sound is related to actual spatial relationships in the data, as in this paper on meteorological data sonification, but it can also be used in a representational way, or merely to increase performance in audio process monitoring tasks. For a significant chunk of this research, accessibility is an added benefit. Most of the research in this area is for enhancing data analysis and process monitoring for all users. Even users who take advantage of visual displays are researching audio-only and audio-first immersive technology. The accessibility benefits are significant of course, and sonification is a topic of research for equal-access in education (1), (2), which makes support for auditory display in immersive web technologies exciting as on-line education becomes a norm. It would be great for immersive tech on the web to go even further than traditional tech in this direction. |
I'd caution against creating too hard a line between visual and audio XR. There will be times when both sighted and non-sighted people will want to experience the same XR space, and either to share that experience in realtime, or be able to compare experiences later on. There will also be times when an XR space is entirely visual or entirely audio of course. The language in the spec (understandably) emphasises the visual, but I think @ddorwin is right in saying that some slight changes to the language could gently protect the spec from inadvertently restricting future possibilities. |
Exactly, I would like to see:
I don't think this is extremely radical, and my hope is that 90% of the content will be multisensory. What I would like to see is a recognition that an XR experience could be visual, auditory, tactile, or any combination of the senses. |
For the webidl portion, perhaps a non-visual XRSession could be created
without an XRLayer. It would also be interesting to explore what kind of
XRLayer derivatives would support additional sensory feedback devices.
This could also have some implications on modality, perhaps with an
additional XRSessionMode to indicate the capability of new kinds of
immersion with such devices. It would be of great benefit to have someone
with such direct experience to guide these kind of choices.
I suspect that such changes could be made additive-ly without breaking
compatibility with the existing core WebXR spec. Would anyone be
interested (or opposed) to creating an incubation repo to continue this
discussion more openly?
Cheers,
- Kip
…On Tue, Aug 27, 2019 at 8:03 AM Brandon ***@***.***> wrote:
Exactly, I would like to see:
1. Language in the general spec switched from visual to a-modal.
2. Examples given of XR experiences in modalities other than just
visual, including auditory only, visual and auditory, and maybe visual,
auditory, and tactile.
I don't think this is extremely radical, and my hope is that 90% of the
content will be multisensory. What I would like to see is a recognition
that an XR experience could be visual, auditory, tactile, or any
combination of the senses.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#815?email_source=notifications&email_token=AAY27QH4XBYGOU3627LNMSLQGU63PA5CNFSM4IOYBBGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5IBSDQ#issuecomment-525342990>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAY27QDVW6TRN5NJJRSGFMDQGU63PANCNFSM4IOYBBGA>
.
|
Sure, another repo for this may be useful.
What do you mean without an XR layer? It would just be without the visuals. I have it on my short list of things to do to get a WebXR app working on my IPhone through WebXR Viewer. |
I can provide yet another example of an audio-only virtual reality
experience using the API. I am working on an audio-only WebXR experience
using Bose AR. It is a musical composition of mine which is "epikinetic",
i.e., your body's movement causes effects besides merely translating and
reorienting your avatar; in this case, the music progresses forward through
its developments and changes properties depending on your motion.
…On Tue, Aug 27, 2019 at 11:42 AM Brandon ***@***.***> wrote:
Sure, another repo for this may be useful.
For the webidl portion, perhaps a non-visual XRSession could be created
without an XRLayer.
What do you mean without an XR layer? It would just be without the
visuals. I have it on my short list of things to do to get a WebXR app
working on my IPhone through WebXR Viewer.
<https://apps.apple.com/us/app/webxr-viewer/id1295998056>
I will want access to all the APIs and tracking info a WebXR session
gives, I just won't be using WebGL for anything and instead will be making
the UI out of the Web Audio API, and aria-live regions with an optional Web
Speech API for those without a screen reader.
After I make my first app, I can give more guidance on what could be
changed.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#815?email_source=notifications&email_token=ABCBA7OK4NOXTBNNR5FMZU3QGVYTDA5CNFSM4IOYBBGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5IXJNA#issuecomment-525431988>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABCBA7IU2ZXQ6VHFQLEDNDLQGVYTDANCNFSM4IOYBBGA>
.
|
More specifically, I mean without an XRWebGLLayer, allowing you to use WebXR without having to create a WebGL context: https://immersive-web.github.io/webxr/#xrwebgllayer-interface The language of the spec says that we would not be able to get an active XRFrame if XRSession.baseLayer is not set to an XRWebGLLayer. I am proposing that we explore options of allowing non-visual XRSessions, without an XRWebGLLayer, that can still get poses from an XRFrame for use with WebAudio. Of course, it would also be possible to use WebXR while rendering nothing but a black screen. I would like to know if allowing usage of WebXR without the rendering requirements would perhaps enable this to be used in more scenarios, such as on hardware that has no display or GPU present at all. |
Yes, removing the GL layer, or not requiring it, would be perfect. |
Hello, |
Should I start going through the spec and pushing changes and adding examples? |
I've filed a new issue to track evaluating how the API can/should interact with audio-only devices at #892 so that this thread can stay focused on ensuring the spec doesn't require a visual component, which has gotten a lot more discussion thus far.
Yes, this would be the right path for non-visual uses of the API today. For historical reasons there's a couple of points within the API that indicate that a The API has evolved a fair amount since then, and that concern no longer really applies. The primary reason why is that we have some level of user consent baked into the API now for any scenario where the page might have otherwise been able to silently initiate device tracking. As such the requirement for a In the meantime, it's pretty easy to create an
|
OK, so starting the language discussion, what is an XR device?
To update that definition, here are a couple possibilities:
|
I think something like the first definition you listed would be appropriate, and probably deserves some further explanation as well. (I'm realizing now that we probably don't ever define the term "immersive?" Oops.) Maybe something like this:
|
I like it! So would you like to make the change, or should I?
Should be:
The pos should not be only for cameras, but listener objects as well.
I'm wondering if this whole section should be under a subheading called "Viewer Tracking with WebGL" because the discussion should be focused on the viewer tracking as a whole and not just updating viewer tracking in a webGL context. Many 3D libraries like Babylon also move the audio listener object along with the camera, so the user is going to need to be aware if their library does that.
Should be:
|
Pull requests are definitely welcome regarding this issue! Any minor issues can be worked out in the review process. A few specific comments:
Yes, though be aware that it would be a (minor) backwards compat issue and there's not going to be much appetite for actively addressing it right away, especially since in the meantime the path of setting up a dummy layer offers a way forward for non-visual content.
Having a section regarding interop with WebAudio would be great for the explainer! (It's not going to be the type of thing that we'll be able to surface in the spec itself, though.)
These sounds like a topic for discussion in #390
Lets be careful here, because listeners should absolutely not be placed at the transforms described by the Instead, when integrating with audio APIs the
Not that I'm aware of, nor do I think the browser is particularly interested in communicating that due to fingerprinting concerns. I think a written/spoken disclaimer that the experience won't work as intended without headphones is the most reliable way forward here.
There's ongoing discussions about what's appropriate to allow as a required/optional feature. I don't have a clear answer on that right now.
These are not related concepts. The near and far plane are explicitly related to the projection matrix math done for WebGL and should have no effect on audio. Audio attenuation is wholly the responsibility of the WebAudio API and, to my knowledge, is a content-specific choice rather than a device intrinsic.
Again, this falls outside the scope of WebXR, and should be facilitated by WebAudio (likely in conjunction with a library) WebXR has no concept of a 3D scene or rendered geometry or anything like that. It is a mechanism for surfacing the device's sensor data in a way that enables developers to present their content appropriately, and facilitates outputting visuals to the hardware because that's not adequately covered by any existing web APIs. Anything beyond that is the responsibility of the developer. Libraries like A-Frame and Babylon are more opinionated about how their content is represented, and thus are a better place to define audio interactions like this. |
I submitted a PR for many of the changes we talked about for explainer.md. |
@frastlin I'm reviewing this fascinating thread - and am wondering what came out of this discussion and the various PRs etc that you suggested? I'd appreciate it if you could post a brief update, or ping me on joconnor(at)w3.org. Many of your original points relate to work we are doing in the Research Questions Task Force, for example what you are suggesting is in line with our thinking around the idea of 'modal muting' - where visual modes may not be needed or consumed by a device or user agent, but can still be in sync when expressed as functions of time, in termed of shared spaces used in immersive environments. It would be also great to get your input into our work :-) |
While I know that @frastlin is well aware of this - something for others involved in the development of WebXR specs to consider is that the term screen reader is a misnomer. That is only part of what they do - they also facilitate navigation and interaction. |
@RealJoshue108 None of my PRs have been accepted. #925 was marked as unsubstantive and I'm not sure what to do with #927 to determan affiliation. #930 still needs some testing. Google's AR with animals is one example where not connecting the XR position and listener position has been of major detriment to the experience. When I move around and turn my head, the sound never changes. This is exactly what I mean when I say that the AR and VR listeners need to have an easy way to sink with one another. I would love to interact with the AR animal that is in our room, but because it's too difficult to sink audio and visuals, it was not done. Please can we fix this before Web XR becomes more prevalent? |
@RealJoshue108 Currently, there is no semantic method of interacting with XR content in the browser, so screen reader navigation functions are turned off. |
@frastlin Have you looked at the DOM Overlays API spec? This is promising, as it allows HTML and other code like ARIA attributes or potentially even personalization semantics to be embedded within an XR environment. |
@RealJoshue108 this is very useful for overlays, and if combobox or edit field HTML elements can be the overlay, and grab the focus of the screen reader, then it would work really well for overlays. It's never ever a good idea to have non screen reader users mucking about in aria, it's like programming CSS without a screen. I would highly recommend either new elements, or new versions of the existing elements for overlays. There also needs to be some kind of access to the meshes that is nonvisual. Similar to how HTML declares elements on a page, there need to be similar elements of the XR space. That way, screen readers or other user agents can add their own tools for interacting with the XR meshes or objects that don't require the creator to know anything about their users interaction patterns.
To fix these problems, there needs to be a mesh and object DOM, along with collision events. There needs to be a name requirement for the objects, and there needs to be some kind of way to show the environment. This can not be left up to the engines like AFrame. Otherwise I can wave goodbye to any XR that's not built with nonvisual users in mind. |
@frastlin wrote:
The current DOM overlay specification doesn't have any restrictions on HTML elements, so form input elements should work as expected. Typically, the application would use a transparent DIV element as the overlay, and this would then contain other elements placed within that DIV. For example, this stock ticker experiment uses text input and select elements. And this model-viewer example with annotations has DOM nodes that move along with the model. I don't have access to a dedicated screen reader, but Android's built-in "TalkBack" accessibility feature appears to work as expected for the content of the DOM layer. This of course doesn't solve the overall problem of making applications accessible since the WebGL layer is separate.
This is unfortunately not easy. WebGL doesn't inherently have a concept of meshes or scene graphs. The API basically provides access to programmable shader pipelines that produce screen pixels, and there aren't any semantic hooks at the WebGL API level that seem suitable for annotating objects. There have been multiple discussions of declarative 3D based on a DOM-style scene graph, but as far as I know this hasn't been getting much traction. |
Yeah, we had milestone'd them as Future since we didn't see them to be necessary for CR. I and @toji can still review them though.
I don't think devices typically surface this property. Fine control over HRTF parameters is something that would have to happen through the WebAudio API i think.
Devices often have a calibration for this, we take in this calibration information in the form of the viewer eye offsets.
There are rough plans for declarative XR, but they're probably something that will take a while to get to.
As it stands this requires some heavy collaboration between WebXR and the WebAudio people. Our rough plan is to at do this as a separate WebXR module, but not as a part of this one. The hope is that XR frameworks make this easy to do, and in my understanding many of them do already. |
@Manishearth I am shocked that head mounted displays don't surface a calibration for the headphones. The Web Audio API has a default head size already, so if you give the audio listener the position somewhere around the user's head, it will use a default head size already. |
Unfortunately all we know is where the eyes are. There's a third point called the "viewer" that is typically roughly the nose (or midpoint of the eyes) but there's no requirement it be attached to any specific point on the face.
Right, that's something that should be filed on the WebAudio API IMO. But it's not very useful unless there is a way of getting this data from devices. |
If you know the position of each eye, can't you figure out what is in the center of the two? |
Sure, but that's not going to tell you the head size. |
All you need with the current web audio API is that center point, and the size of the head is already estimated. If there is a way to obtain the head size from the unit, that is better, but not needed with the Audio API today. |
Right, all I was saying was that there's not enough to know the size of the head. Anyway, I've filed immersive-web/proposals#59 for a potential module that integrates WebXR and WebAudio. |
There are XR use cases (e.g., "audio AR") that could build on poses and other capabilities exposed by core WebXR (and future extensions). The current spec language, though, appears to require visual devices. The superficial issues can probably be addressed with a bit of rewording, though there may be some more complex issues as well.
Some of the most obvious examples revolve around the word "imagery":
More complex issues might include XR Compositor, assumptions about
XRWebGLLayer
, definition and/or assumptions aboutXRView
.While AR is out of scope for the first version of the core spec, it would be nice if the definitions weren’t technically incompatible with such use cases and form factors.
The text was updated successfully, but these errors were encountered: