Skip to content

Simplified V1 Input Proposal #319

@toji

Description

@toji

Background

A good chunk of this is, primarily, a re-focusing of the previously produced Input Explainer. As a result nothing here should really be surprising.

For those who don't already know, we are unable to continue with the design put forward in that document because of concerns about compatibility with upcoming (and as of this writing unreleased) VR/AR standards. I will not dive into a comprehensive evaluation of those incompatibilities, due the said unreleasedness of the standard in question, but will broadly note that it's not known at this time if it will allow the full input device state to be queried in the manner the previous explainer would require.

Given that, and given that we would prefer to have users begin using the WebXR Device API as soon as is reasonable without being blocked on third parties, I propose that we re-focus on exposing a minimal but broadly compatible subset of the previously discussed functionality and have some clear ideas of how it could evolve to fit a variety of underlying input systems in the future.

Requirements

The "simple" proposal from the previous explainer was one that just allowed developers to listen for basic point-and-click events from an source of VR input, which is enough to enable basic button-based UIs. This is "good enough" for video players, galleries, some simple games, etc. It is insufficent for more complex uses like A-Painter style art apps, complex games, or really anything that involves direct manipulation of objects.

That's regrettable, but a limitation that I feel is worth accepting for the moment in order to enable the significant percentage of more simplistic content that we see on the web today.

So, what we need to enable that level of input is:

  • An event that fires when the input's primary interaction method occurs. Important that this be treated as a user activation event.
  • Frame-to-frame tracking of the input ray

I would also propose that, since this would be all we offer initially, we make this just a teensy bit more useful and future-proof by adding:

  • Notifications of starting and ending the primary interaction method. (ie: Button down/button up)

This would allow a bit more nuance in the interactions allowed by the system, giving the option to drag items around, for example.

Proposal

I find it easier to talk about these things when looking at an interface, so I'll start with a proposed IDL:

enum XRHandedness {
  "",
  "left",
  "right"
};

interface XRInputSource {
  readonly attribute VRHandedness handedness;
};

interface XRInputPose {
  readonly attribute Float32Array? gripMatrix;
  readonly attribute Float32Array? pointerMatrix;
};

//
// Extensions to existing interfaces
//

// Aside: I really think we should consider renaming this to just XRFrame
partial interface XRPresentationFrame {
  XRInputPose? getInputPose(XRInputSource inputSource, XRCoordinateSystem coordinateSystem);
};

partial interface XRSession {
  attribute EventHandler onselect;
  attribute EventHandler onselectstart;
  attribute EventHandler onselectend;

  attribute EventHandler oninputdeviceschange;

  FrozenArray<XRInputSource> getInputDevices();
};

//
// Events
//

[Constructor(DOMString type, XRInputSourceEventInit eventInitDict)]
interface XRInputSourceEvent : Event {
  readonly attribute XRPresentationFrame frame;
  readonly attribute XRInputSource inputSource;
};

dictionary XRInputSourceEventInit : EventInit {
  required XRPresentationFrame frame;
  required XRInputSource inputSource;
};

Tracking and rendering

Let's dive into tracking first, since it's relatively straightforward. xrSession.getInputDevices() returns a list of any tracked controllers. This does not include the users head in the case of gaze tracking devices like Cardboard. By themselves these objects do basically nothing useful.

On each event the user can iterate through the list and call xrFrame.getInputPose(inputDevice[i], frameOfReference); to get the pose of the input in the given coordinate system and synced to the head pose delivered by the same frame. This can be used to render some sort of input representation frame-to-frame. (Note: I'm not including anything that describes a controller mesh for practicality reasons. We can investigate that later. In the meantime apps will just have to use app specific or generic resources.)

The input device would be rendered using the gripMatrix, as that's what should be used to render things that are held in the hand.

Pointers are a little more subtle. We want to render a ray coming off the controllers in many cases, but not the users head. However if the device is gaze based we do still want to draw a gaze cursor, and if the session is a magic window context we don't want to draw any cursor at all. So a bit of logic is needed to handle that. When pointers are drawn they should be drawn using the pointerMatrix, which may differ from the gripMatrix for ergonomic reasons.

The basic pattern ends up looking like:

function drawInputs(xrFrame)
{
  let inputDevices = xrSession.getInputDevices();

  if (inputDevices.length) {
    // If input devices are reported always draw them with pointer rays/cursors.
    for (let inputDevice of inputDevices) {
      let inputPose = xrFrame.getInputPose(inputDevice, xrFrameOfRef);
      if (inputPose) {
        drawAController(inputPose.gripMatrix);
        drawAPointer(inputPose.pointerMatrix);
        drawACursor(inputPose.pointerMatrix, cursorDistance);
      }
    }
  } else if (xrSession.exclusive) {
    // Render a gaze cursor for exclusive sessions with no input devices.
    let devicePose = xrFrame.getDevicePose(xrFrameOfRef);
    drawACursor(devicePose.poseModelMatrix, cursorDistance);
  }

  // Render nothing for non-exclusive sessions with no input devices.
}

I'd expect that we'll get a Three.js library real quick that adds simple controller visualization to your scene and does all the right things in this regard.

Primary input events

Handling primary input events is the other half of this proposal. A quick recap of what that means, copy-pasted from the previous explainer:

The exact inputs that trigger these events are controlled by the UA and dependent on the hardware that the user has. For example, to trigger a "select" event on a variety of potential hardware:

  • On Daydream, the user would click the controller touchpad.
  • On HoloLens, the user would perform a tap with their index finger or say the system "Select" keyword.
  • On a Vive controller, Oculus Touch or Windows MR controller, the user would pull the trigger.
  • On Cardboard the user would press the headsets button.

To listen for any of the above the developer adds listeners for the "select", "selectstart", or "selectend" events. When any of them fire the event will supply an XRPresentationFrame that's used to query input and head poses. The frame will not contain any views, so it can't be used for rendering. It also provides an XRInputSource that represents the input device that generated the event. This may be one of the devices returned by xrSession.getInputDevices() (in the case of a tracked controller) or one that's not exposed anywhere else (in the case of a headset button, air tap, or magic window touch)

xrSession.addEventListener("select", onXRSelect);

function onXRSelect(event) {
{
  let inputPose = event.frame.getInputPose(event.inputSource, xrFrameOfRef);

  if (inputPose) {
    // Ray cast into scene with the pointer to determine if anything was hit.
    let selectedObject = scene.rayPick(inputPose.pointerMatrix);
    if (selectedObject) {
      onObjectSelected(selectedObject);
    }
  }
}

The exact interpretation of the pointer is dependent on the source that generates the event:

  • For tracked controllers it's a ray originating at the tip of the controller.
  • For gaze cursors it's a ray originating at the center of the users head and pointing in the direction of their gaze.
  • For magic window clicks the ray would originate at the graphics near plane and project out from directly under the cursor/touch point.

Use cases

The above capabilities give developers enough to handle the following (non-comprehensive) scenarios:

  • Dance Tonite-style passive viewing experiences
  • Video players (clicking buttons)
  • Image Galleries (clicking to expand, click and drag to scroll. No touchpad scrolling)
  • Matterport-style navigation (click to teleport)
  • Partial SketchFab-style use (click to teleport, but no touchpad scaling)
  • Simple shooter games (would need to use in-world buttons to switch weapons)
  • Simple painting apps (would need to just assume presence of 6DoF controllers, tools changes would need to use in-world UI)

Obviously we'd like to enable more robust usage, but this does allow a pretty wide range of apps in the most broadly compatible way we can manage.

Future directions

So that's the extent of the current proposal, but it's good to have an idea of how we could extend it in the future. A few thoughts on that:

The current Gamepad API maintainers would like us to continue using it in conjunction with VR, and have expressed a willingness to refactor the API if necessary to make it more generally useful. If we wanted to go that direction (and were confident we could map it to all relevant native APIs) I would propose that we either expose Gamepad objects on the XRInputSource or make XRInputSource inherit from Gamepad (We would drop the pose extensions and displayId).

But if that's not practical, which is a very real possibility, my general line of thinking is to add a way to query inputs by name or alias to receive back an object that can be used both for state polling of that element and input event listening. Something like this:

interface XRInputAction extends EventTarget {
  attribute EventHandler onchange;
  attribute EventHandler onclick;

  readonly attribute boolean pressed;
  readonly attribute boolean touched;
  readonly attribute double  value;
  readonly attribute double? xAxis;
  readonly attribute double? yAxis;
};

partial interface XRInputSource {
  XRInputAction? getAction(DOMString action);
};

interface XRInputActionEvent : Event {
  readonly attribute XRPresentationFrame frame;
  readonly attribute XRInputSource inputSource;
  readonly attribute XRInputAction inputSource;
};

Which could then be used like so to get the same effect as the "select" event documented earlier.

let selectAction = xrInputSource.getAction("select");
if (selectAction) {
  selectAction.addEventListener("click", onXRSelect);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions