Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting if an XRInputSource is an auxiliary or a primary input source #1358

Open
hybridherbst opened this issue Feb 5, 2024 · 50 comments · Fixed by immersive-web/webxr-hand-input#121 · May be fixed by immersive-web/webxr-hand-input#123

Comments

@hybridherbst
Copy link

The spec just states the definitions of auxiliary and primary input sources:

An XR input source is a primary input source if it supports a primary action.
An XR input source is an auxiliary input source if it does not support a primary action

but it does not provide a mechanism for applications to query if an XRInputSource does support a primary action.

Is there such a mechanism, and if not, what is the recommended approach
for applications to distinguish between auxiliary and primary input sources?

Usecase description:

  • hand tracking on Quest OS does support select events, so hands are a "primary input source" there.
  • hand tracking on Vision OS does not support select events, so hands are an "auxiliary input source" there.
  • we can emit wrapped events on Vision OS based on thumb-index distance, but then we risk sending duplicated events on Quest OS (both the wrapped event and then the system event).

Potential workaround:

  • treat all sources as auxiliary, these potentially emit wrapper events
  • once a source has received a selectstart or squeezestart event, mark it as primary and stop emitting wrapper events.
    While this would kind of work, it still has a risk of sending duplicate events the first time.
@Manishearth
Copy link
Contributor

I believe the intent of that is as internal spec convenience.

I'm not really convinced the use case of "we wish to wrap and supplement device events" is something designed to be supported in this regard, and using the notion of primary input devices to do so feels brittle. The proposal here solves the problem for these devices specifically, not in general.

I think wrapping until you know not to seems like an okay call to make.

I also think this can be solved in the Hands API via profiles: it does seem to make sense to expose "primary input capable hand" vs otherwise as a difference in the profiles string.

Unfortunately the current default hands profile is "generic-hands-select", which seems to imply a primary input action, not sure if we should change the default or do something else.

@hybridherbst
Copy link
Author

Thanks for the comment. With "wrapping" I don't mean "pretending this is a WebXR event" – I just mean: applications need to detect "hand selection" and that needs to work independent of whether the XRInputSource hand has a select event or not.

So to summarize:

  • there's no current mechanism to distinguish between these
    • maybe in the future with more diverse hand input profiles
  • we will have to live with the "double events"

If I was to add this to the spec, would this be a valid wording:
"Input sources should be treated as auxiliary until the first primary action has happened, then they should be treated as primary."

@Manishearth
Copy link
Contributor

Manishearth commented Feb 5, 2024

"Input sources should be treated as auxiliary until the first primary action has happened, then they should be treated as primary."

No, I don't think that's accurate. That is an engineering decision based on a specific use case and does not belong in the standard.

applications need to detect "hand selection" and that needs to work independent of whether the XRInputSource hand has a select event or not.

I guess part of my position is that platforms like Vision should expose a select event if that is part of the OS behavior around hands. It's not comformant of them to not have any primary input sources whatsoever: devices with input sources are required to have at least one primary one.

There's little point attempting to address nonconformance with more spec work.

There's a valid angle for devices that have a primary input but also support hands (I do not believe that is the case here). In general this API is designed under the principle of matching device norms so if a device doesn't typically consider hand input a selection then apps shouldn't either, and apps wishing to do so can expect some manual tracking. That's a discussion that can happen when there is actually a device with these characteristics.

@hybridherbst
Copy link
Author

hybridherbst commented Feb 5, 2024

That is an engineering decision based on a specific use case

I disagree – the spec notes what auxiliary and primary input sources are but does not note how to distinguish between them. That makes it ambiguous and impossible to detect what is what.

It's not comformant of them to not have any primary input sources whatsoever

I agree and believe this is a bug in VisionOS; however, their choice may be to expose a transient pointer (with eye tracking) later (which would be the primary input source) and people still want to use their hands to select stuff.
In that case there could even be multiple input sources active at the same time – the transient one and the hand – and there would still need to be a mechanism to detect which of these is a "primary" source and which not.

@Manishearth
Copy link
Contributor

I disagree – the spec notes what auxiliary and primary input sources are but does not note how to distinguish between them

The spec is allowed to have internal affordances to make spec writing easier. A term being defined has zero implication on whether it ought to be exposed. Were "it's defined in the spec" a reason in and of itself to expose things in the API then a bunch of the internal privacy-relevant concepts could be exposed too.

The discussion here is "should the fact that a hand input can trigger selections be exposed by the API". If tomorrow we remove or redefine the term from the spec, which we are allowed to do, that wouldn't and shouldn't change the nature of this discussion, which is about functionality, not a specific spec term.

however, their choice may be to expose a transient pointer (with eye tracking) later (which would be the primary input source) and people still want to use their hands to select stuff

I addressed that in an edit to my comment above: in that case the WebXR API defaults to matching device behavior, and expects apps to do the same. There's a valid argument to be made about making it easier for apps to diverge, but I don't think it can be made until there is an actual device with this behavior, and it is against the spirit of this standard so still something that's not a slam dunk.

@AdaRoseCannon
Copy link
Member

Unfortunately the current default hands profile is "generic-hands-select", which seems to imply a primary input action, not sure if we should change the default or do something else.

In visionOS WebXR the profiles for the hand is ["generic-hand"] because it does not fire a select event.

@Manishearth
Copy link
Contributor

@AdaRoseCannon should we update the sepc to include that and allow it as an option?

@AdaRoseCannon
Copy link
Member

That might be sensible. It's odd because generic-hand is already included in the WebXR input profiles repo.

@Manishearth
Copy link
Contributor

@hybridherbst
Copy link
Author

@AdaRoseCannon thanks for clarifying! The spec notes that

The device MUST support at least one primary input source.

but it seems that hands are the only input source on visionOS WebXR, and it's not a primary input source. Am I missing something?

@Manishearth
Copy link
Contributor

I actually think that line should probably be changed. Not all devices have input sources in the first place, and that's otherwise spec conformant.

I think it should instead be "for devices with input sources, at least one of them SHOULD be a primary input source"

@toji
Copy link
Member

toji commented Feb 7, 2024

I don't think we need to change the primary input source requirement, simply because it should be valid to have the primary input source be transient. (This is the case for handheld AR devices, IIRC). It's somewhat unique for a device like the Vision Pro to expose persistent auxiliary inputs and a transient primary input, but I don't think that's problematic from a spec perspective. It may break assumptions that some apps have made.

I remember discussing the reasons why the hands weren't considered the source of the select events with Ada in the past and being satisfied with the reasoning, I just don't recall it at the moment.

@cabanier
Copy link
Member

cabanier commented Feb 7, 2024

Looking at our code, we emit "oculus-hand", "generic-hand" and "generic-hand-select".
Does VSP just emit "generic-hand"? Is Quest browser still allowed to emit "generic-hand"?

@Manishearth
Copy link
Contributor

@cabanier continuing that discussion on the PR

@hybridherbst
Copy link
Author

hybridherbst commented Feb 7, 2024

@cabanier Yes, I can confirm that AVP only returns "generic-hand".

@toji the AVP currently to the best of my understanding does not have "persistent auxiliary inputs and a transient primary input". There is no primary input as far as I'm aware. The assumption it breaks is that there isn't any primary input source (a MUST as per the spec, at least right now).

@cabanier
Copy link
Member

cabanier commented Feb 7, 2024

@Manishearth 's new PR allows for both profile to be exposed. This is matching both implementation so I'm good with that change.
This will allow you to disambiguate between VSP and other browsers.

@toji
Copy link
Member

toji commented Feb 8, 2024

The assumption it breaks is that there isn't any primary input source (a MUST as per the spec, at least right now).

That conflicts with my understanding of the input model from prior conversations with @AdaRoseCannon. That said, I haven't used the AVP yet and it may have been that our discussion centered around future plans that have not yet been implemented. Perhaps Ada can help clarify?

@AdaRoseCannon
Copy link
Member

In the initial release of visionOS there was no primary input source, visionOS 1.1 beta (now available) has transient-pointer inputs which are primary input sources.

@cabanier
Copy link
Member

cabanier commented Feb 8, 2024

In the initial release of visionOS there was no primary input source, visionOS 1.1 beta (now available) has transient-pointer inputs which are primary input sources.

Interesting! We have some devices here that we'll update to visionOS 1.1 beta.
Do you have any sample sites that work well with transient-pointer? We have it as an experimental feature and if it works well, we will enable it by default so our behavior will match.

@AdaRoseCannon
Copy link
Member

A THREE.js demo which works well is: https://threejs.org/examples/?q=drag#webxr_xr_dragging but don't enable hand-tracking since THREE.js demos typically only look at the first two inputs and ignore events from other inputs.

Brandon's dinosaur demo also works well, although similar caveat.

@cabanier
Copy link
Member

cabanier commented Feb 8, 2024

I just tried it and created a recording:
https://github.com/immersive-web/webxr/assets/1513308/e1247e4b-1985-4a0e-a562-51d6aeb65f06

I will see if it matches Vision Pro.

THREE.js demos typically only look at the first two inputs and ignore events from other inputs.

Are you planning on exposing more than 2 input sources?
I've been thinking about doing the same since we can now track hands and controllers at the same time. I assumed this would need a new feature, or a new secondaryInputSources attribute.

@toji
Copy link
Member

toji commented Feb 8, 2024

This is getting a little off topic for the thread, but would you want to expose hands and controllers as separate inputs? A single XRInputSource can have both a hand and a gamepad.

(EDIT: I guess the input profiles start to get messy if you combine them, but it still wouldn't be out-of-spec)

@cabanier
Copy link
Member

cabanier commented Feb 8, 2024

I believe so because if you expose hands and a transient input source, it would be weird if the ray space of the hand suddenly jump and becomes a transient-inputsource.

@AdaRoseCannon
Copy link
Member

AdaRoseCannon commented Feb 8, 2024

I just tried it and created a recording

Looks correct to me.

Are you planning on exposing more than 2 input sources?
I've been thinking about doing the same since we can now track hands and controllers at the same time. I assumed this would need a new feature, or a new secondaryInputSources attribute.

In visionOS 1.1 if you enable hand-tracking then the transient-inputs appear after the hand-inputs as in elements 2 and 3 in the inputSources array.

I assumed this would need a new feature, or a new secondaryInputSources attribute.

We have events for new inputs being added which can be used to detect the new Inputs I personally don't believe we need another way to inform developers to expect more than two inputs.

@cabanier
Copy link
Member

cabanier commented Feb 8, 2024

I assumed this would need a new feature, or a new secondaryInputSources attribute.

We have events for new inputs being added which can be used to detect the new Inputs I personally don't believe we need another way to inform developers to expect more than two inputs.

I was mostly concerned about broken experiences. I assume you didn't find issues in your testing?

@cabanier
Copy link
Member

cabanier commented Feb 8, 2024

@AdaRoseCannon Are there any experiences that work correctly with hands and transient input?
@toji Should we move this to a different issue?

@cabanier
Copy link
Member

I worry that adding inputsources is confusing for authors and might break certain experiences.

Since every site needs to be updated anyway, maybe we can introduce a new attribute (secondaryInputSources?) that contains all the input sources that don't generate input events.

/agenda should we move secondary input sources to their own attribute?

@probot-label probot-label bot added the agenda Request discussion in the next telecon/FTF label Feb 12, 2024
@hybridherbst
Copy link
Author

I think there are a few cases where it won't be clear which thing is "secondary" and it highly depends on the application.

Example: if Quest had a mode where both hands and controllers are tracked at the same time, there could be up to 6 active input sources:

  • 2 hands (with or without select events)
  • 2 controllers (with select events)
  • 2 transient pointers
  • of which e.g. 4 could be active simultaneously, which would still be allowed according to spec if I'm not mistaken.

I think instead of a way to see which input sources may be designated "primary" or "secondary" by the OS, it may be better to have a way to identify which input events are caused by the same physical action (e.g. "physical left hand has caused this transient pointer and that hand selection") so that application developers can decide if they want to e.g. only allow one event from the same physical source.

@cabanier
Copy link
Member

I don't think it's enough to disambiguate the events.
For instance, if a headset could track controllers and hands at the same time, what is the primary input?

If the user is holding the controllers, the controllers are primary and hands are second.
However, if they put the controllers down, hands become the primary and controllers are now second.

WebXR allows you to inspect the gamepad or look at finger distance so we need to find a way to let authors know what the input state is. Just surfacing everything will be confusing.

@Manishearth
Copy link
Contributor

Since every site needs to be updated anyway,

Hold on, does it? I don't think we're requiring any major changes here.

@cabanier
Copy link
Member

Since every site needs to be updated anyway,

Hold on, does it? I don't think we're requiring any major changes here.

afaik no site today supports more than 2 input sources so they need to be updated to get support for hands and transient-input

@toji
Copy link
Member

toji commented Feb 12, 2024

The primary issue here is that libraries haven't been following the design patterns of the API.

The API fires select events on the session, and if you're listening for that and firing off interactions based on it then you'll be fine for the types of interactions the Vision Pro is proposing (because it's fundamentally the same as how mobile AR input works today.) But if you've abstracted the input handling to surface select events from the input sources themselves AND trained your users thought example code and library shape to generally only bother to track two inputs at a time (as Three has done) then you're going to have a bad time.

It's truly unfortunate that we've ended up in a situation where most content has optimized itself for a very specific set of inputs when the API itself is ostensibly agnostic to them (and I'm as guilty as anyone when it comes to the non-sample content I've built) but I don't think that we should be OK with making breaking changes to the API because of the choices of the libraries built atop it.

@Manishearth
Copy link
Contributor

I agree: If content isn't following the spec model of thing (as it can choose to do) I don't think adding more things will make it change its mind on that. Content had the option to treat these things in a more agnostic way, it still does.

@cabanier
Copy link
Member

The primary issue here is that libraries haven't been following the design patterns of the API.

Indeed, libraries such as aframe have been generating their own events based on either the gamepad or finger distance. Nobody looks at more than 2 inputs so every experience that request hands will be broken on the new vision pro release.
The hands spec mentions that it can be used for gesture recognition so we can't really fault developers for using it as a design pattern.

(By "broken" I mean that Vision Pro's intent to use gaze/transient-input as the input is not honored)

It's truly unfortunate that we've ended up in a situation where most content has optimized itself for a very specific set of inputs when the API itself is ostensibly agnostic to them (and I'm as guilty as anyone when it comes to the non-sample content I've built) but I don't think that we should be OK with making breaking changes to the API because of the choices of the libraries built atop it.

My point is that things are already broken. If an experience requests hands and does its own event generation, it will be broken once Vision Pro ships its the next version of its OS.
I'm seeing that all the developers on discord are updating their experiences to do their own event generation and that new logic will break in the near future because input is supposed to come from gaze.

My proposal to move secondary inputs to their own attribute will fix this and reduce confusion about what the primary input is. (See my hands and controllers example above)
The only drawback is that existing experiences that request hands, will only have transient-input

@hybridherbst
Copy link
Author

hybridherbst commented Feb 13, 2024

The primary issue here is that libraries haven't been following the design patterns of the API.

As both library implementor and library user, I can only partially agree. Yes, three.js handles it very minimalistic, as they often do, and that has already caused a number of problems (that are often prompty resolved when they actually happen).
Needle Engine for example handles any number of inputs, so I believe the next AVP OS update will "just work" for the most part.

However, I don't think the spec and API explain or handle:

  • close-range interactions that don't use select events (a finger poking at a UI element)
  • close-range interactions that may or may not have select events depending on device (a hand being pinched directly on an object in close range)
  • multi-source cases (a controller producing select events and a hand holding the same controller producing select events)

For example, the spec does not state that there is always an exact mapping of "one physical thing must only have one primary input source"; there could be more than one select events caused by the same physical action ("bending my finger") as per the spec, even if no device (that I'm aware of) does this today. I'm not sure if this is intended; and I'm not sure how anyone could build something entirely "future-proof" based on this unclarity.

I understand that cases like this are seen as "out of scope" for the spec, since they can be implemented on top of what the API returns. Yet, library users expect those cases to be handled or at least want to understand how to handle them. I don't think that counts as "not following the design patterns".

@cabanier
Copy link
Member

I understand that cases like this are seen as "out of scope" for the spec, since they can be implemented on top of what the API returns. Yet, library users expect those cases to be handled or at least want to understand how to handle them. I don't think that counts as "not following the design patterns".

I agree. Putting every tracked item in inputSources and leaving it up to authors is not a good indicator on how to handle multiple tracked items. (basing it on the name of the input profile feels like a hack)

Even the name is confusing "inputSources" since on Vision Pro, hands are NOT considered input; gaze is.
Likewise on Quest, if you hold controllers, your hands are NOT input or if you put the controllers down, they should stop becoming input.

Maybe instead of secondaryInputSources, we should call it trackedSources instead.

@cabanier
Copy link
Member

cabanier commented Feb 15, 2024

As an experiment, I added support for detached controllers to our WebXR implementation so you will now always get hands and controllers at the same time in the inputSources array.

I can report that every WebXR experience that I tested and that uses controllers was broken in some way with that change. Some only worked if I put the controllers down, others were just rendering the wrong controllers on top of each other and a couple completely stopped rendering because of javascript errors.
This is a clear indication that we can't just add entries to inputSources.

@toji
Copy link
Member

toji commented Feb 15, 2024

I guess I'm confused by how that's supposed to improve the situation vs. where we're at now. If we continue to use the input system as originally designed many apps will need to update their input handling patterns to account for new devices. If we introduce a new secondary input array... many apps will still need to update their input handling patterns to account for new devices?

@cabanier
Copy link
Member

I guess I'm confused by how that's supposed to improve the situation vs. where we're at now. If we continue to use the input system as originally designed many apps will need to update their input handling patterns to account for new devices. If we introduce a new secondary input array... many apps will still need to update their input handling patterns to account for new devices?

I'm saying:
If we continue to use the input system as originally designed many apps will break

I want to add support for concurrent hands and controllers but I can't make a change that breaks every site.

@toji
Copy link
Member

toji commented Feb 15, 2024

I understand that position but I'm trying to consider both the Quest's the Vision Pro's use cases.

Quest, by virtue of being a first mover in the space and the most popular device to date, has a lot of existing content built specifically to target it using abstractions that only really panned out for systems with Quest-like inputs. It's understandable that you're reluctant to break those apps. And I'm not suggesting that we do break them! (I still feel like we can and should expose hand poses and gamepad inputs on the same XrInputSource, but that's a slightly different topic).

An input system like Vision Pro's, however, will already be broken in those apps from day 1, so it's not a choice between breaking apps or not. They're just broken. So unless pages want to ignore Vision Pro users (or have been effectively abandoned by their creators, which we know is common) they'll have to update one way or the other. If updates are going to be mandatory to work on a given piece of hardware then I'd rather not invent new API surface to support it if what we already have serves the purpose.

Now, put bluntly I think that this is something Apple brought on themselves. I'm not a big fan of the limitations imposed by their input system, even if I understand the logic behind it. And I do think that if compatibility with existing apps is a high priority for Safari then there's probably reasonable paths that can be taken to introduce a not-particularly-magical-but-at-least-functional mode where hands emulate single button controllers. But those types of decisions aren't the sort of thing that this group is in the business of imposing on implementations.

@cabanier
Copy link
Member

cabanier commented Feb 15, 2024

An input system like Vision Pro's, however, will already be broken in those apps from day 1, so it's not a choice between breaking apps or not. They're just broken.

I don't believe that is the case. Only sites that request hand tracking will be broken since they won't look at more than 2 input sources.
My proposal will fix these sites because now hands will not be in the inputSources array anymore. Those sites should now work with transient-input, although they would no longer display hands.

So unless pages want to ignore Vision Pro users (or have been effectively abandoned by their creators, which we know is common) they'll have to update one way or the other. If updates are going to be mandatory to work on a given piece of hardware then I'd rather not invent new API surface to support it if what we already have serves the purpose.

How do you propose that I surface concurrent hands and controllers? How can I indicate to the author that hands are the primary input or the controllers?

Now, put bluntly I think that this is something Apple brought on themselves. I'm not a big fan of the limitations imposed by their input system, even if I understand the logic behind it. And I do think that if compatibility with existing apps is a high priority for Safari then there's probably reasonable paths that can be taken to introduce a not-particularly-magical-but-at-least-functional mode where hands emulate single button controllers. But those types of decisions aren't the sort of thing that this group is in the business of imposing on implementations.

Correct. Quest surfaces hands as single button controllers if hand tracking is not requested and this seems to work on a majority of sites.

@Manishearth
Copy link
Contributor

How do you propose that I surface concurrent hands and controllers

They should be the same input source with a hands and gamepads attribute, yes? The spec was designed with this use case in mind

@cabanier
Copy link
Member

How do you propose that I surface concurrent hands and controllers

They should be the same input source with a hands and gamepads attribute, yes? The spec was designed with this use case in mind

No, they are different input sources. 2 hands and 2 controllers.

@Manishearth
Copy link
Contributor

Oh I understand now. Not just the case of a hand grasping a controller

@toji
Copy link
Member

toji commented Feb 27, 2024

Re: surfacing both hands and controllers at the same time.

I was listening to a podcast today that described the new Meta feature Rik has been referring to and they brought up a hypothetical use case of someone strapping controllers to their feet in order to have both hands and tracked feet in something like VRChat. I also extrapolate that out to accessories or attachments where the controller is somewhere other than in the user's hand.

While that would be technically tricky to pull off and I certainly expect it to not be used that way in the majority of cases, it did solidify in my mind the type of scenario where you really do want to treat the hands and the controllers as completely separate input sources, and not just two different data streams on a single source.

Given that, I still do have questions about how an input is determined to be "primary" or "secondary" by the system. I guess that if controllers are present they would generally be considered to be the primary input, though that assumption breaks in the (fairly unorthodox) leg tracking scenario mentioned above.

I also wonder if it's enough from a compatibility standpoint to simply update the spec and state that any primary inputs should appear before any secondary inputs in the input array? (This is distinct from any potential identifiers that might be added to the input object itself). That could get messy with devices like the Vision Pro, though, in which the primary transient input firing a select event might trigger a flurry of devices being removed, added, removed, and added once again.

@cabanier
Copy link
Member

Re: surfacing both hands and controllers at the same time.

I was listening to a podcast today that described the new Meta feature Rik has been referring to and they brought up a hypothetical use case of someone strapping controllers to their feet in order to have both hands and tracked feet in something like VRChat. I also extrapolate that out to accessories or attachments where the controller is somewhere other than in the user's hand.

One other useful feature is that you can still draw the controllers when you're using hands. Otherwise, if you want to pick them the controllers again, you have to remember where you put them, or lift the headset up to see them.

Given that, I still do have questions about how an input is determined to be "primary" or "secondary" by the system. I guess that if controllers are present they would generally be considered to be the primary input, though that assumption breaks in the (fairly unorthodox) leg tracking scenario mentioned above.

The system knows if you're touching the controllers with your hands. If you're not touching them, hands become the primary input.

I also wonder if it's enough from a compatibility standpoint to simply update the spec and state that any primary inputs should appear before any secondary inputs in the input array? (This is distinct from any potential identifiers that might be added to the input object itself). That could get messy with devices like the Vision Pro, though, in which the primary transient input firing a select event might trigger a flurry of devices being removed, added, removed, and added once again.

Adding and removing controllers is not compatible with current WebXR implementations. We had to remove simultaneous hands and controllers last week because so many websites broke :-\

Options are:

  • move the secondary inputs to another attribute
  • have a session feature along with a boolean on the input source to say it's primary.

I would even consider that controllers must be static by default (ie there's always only one or two controllers) since so much content is relying on that behavior.

@toji
Copy link
Member

toji commented Feb 28, 2024

Thanks for the clarifications!

Adding and removing controllers is not compatible with current WebXR implementations.

Could you expand on that a bit? The spec certainly allows for it, and I believe that at the very least the Blink code in Chromium will handle the events properly. Are you referring to how the input devices are detected on the backend, or do you mean that libraries are ignoring the input change events?

@cabanier
Copy link
Member

Adding and removing controllers is not compatible with current WebXR implementations.

Could you expand on that a bit? The spec certainly allows for it, and I believe that at the very least the Blink code in Chromium will handle the events properly. Are you referring to how the input devices are detected on the backend, or do you mean that libraries are ignoring the input change events?

Sorry, I meant to say "WebXR experiences". AFAIK browsers are doing the right thing.
A good number of experiences expect 2 controllers that are always connected.

@Maksims
Copy link

Maksims commented Mar 2, 2024

Developers use engines recommended patterns of using the WebXR APIs in regards of input sources. Some engines provide good async abstraction over input sources, some don't.

In PlayCanvas I've designed async approach, while you can statically access a list of current input sources, developer are encourage to use event-based approach and react to input sources being added/removed. And based on input source capabilities (ray target mode, handedness, hand, etc), developers then add either models, or do raycasts, etc.

That way PlayCanvas apps actually work pretty well with multimodal and various cases of controllers being switched in realtime to hands and any other async add/remove scenarios.

Having hands and controllers at the same time would work also.

There could be some potential issues I can see:

  1. Experiences optimised for 2 hands/controllers, might use a hand model that is either IK'ed using hands, or animated based on a controller. So developer would need to check if there are more than one same handedness input sources, and react accordingly.
  2. Primary select action from input sources, when holding a controller, can it false trigger by hand too? I would assume if underlying system detects that controller is in the hand, then it will provide joints information not only based on CV reconstruction but also based on controller buttons state and transforms. While suppressing select action of a hand.

If Meta sees that a lot of experiences are being broken that way, it could be an additional session feature to opt in, that way developers can opt-in for it. Of course by default would be better, but as mentioned above it has consequences.

Also, it would be very useful to know if hand type input source is holding a controller, and if controller is being held, and information of input sources relation, e.g.: inputSource.related: XRInputSource | null

There is definitely a value to have with providing all information of input sources and hands at the same time. Other input trackers would be awesome too! This opens possibilities for experiments and more creative use of controllers.

@cabanier
Copy link
Member

cabanier commented Mar 2, 2024

@toji , @mrdoob, @AdaRoseCannon , Brandel and I had a meeting this week to do a deep dive into this problem space.
I volunteered to update the spec with the trackedSources attribute. It wil be discussed at the face-to-face in Bellevue at the end of March.

@Yonet Yonet removed the agenda Request discussion in the next telecon/FTF label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants