feat(addons/agenthands): hands + orb that gesture and point as the agent speaks by salmanmkc · Pull Request #416 · google/xrblocks

salmanmkc · 2026-06-26T13:37:02Z

adds a free-standing pair of agent hands and a glowing orb that gesture while the agent talks and physically point at real things in the room. inspired by the AgentHands paper (CHI 2026, https://www.duruofei.com/papers/Liu_AgentHands-GeneratingInteractiveHandsGesturesForSpatiallyGroundedAgentConversationsInXR_CHI2026.pdf).

the paper's nice insight is that an agent feels a lot more present when it can gesture and actually point at things in your space, not just talk at you. It lays out a clean way to drive that from an llm (inline gesture markup), grounds the gestures to real objects, and keeps the embodiment deliberately minimal: a calm orb as the locus of attention plus translucent hands, no face, so it doesn't tip into uncanny.

Without a key the demo plays a short scripted monologue so you can see the gestures. add ?key=... and it's the full loop: you talk (speech recognizer), gemini replies with inline gesture markup, the reply is spoken (TTS) and the hands gesture in sync with the actual spoken words, pointing at real detected objects when it refers to them. runs on desktop through the simulator, and in headset via a small spatial control panel (talk / scan).

new addon src/addons/agenthands:

AgentHand: one posable hand rig. loads the webxr generic-hand glb, poses the bones toward a SimulatorHandPose, and can aim its index finger at a world point (pivoting from the wrist so close-range pointing stays accurate). a layered motion offset lets gestures move the hand on top of the pose and aim.
AgentHands: the left + right pair. dispatches gestures and points, picks the pointing hand by which side the target is on, and has beat / wave / iconic size / count motions.
AgentGestures: parses inline markup from the reply, [gesture:thumbs_up], [point:the lamp], [wave], [beat], [size:big], [count:2], into poses, motions and point targets, anchored to the spoken text so each fires on the right word.
AgentHead: the agent's presence, a semi-transparent blue orb with a drifting particle shell. breathes while idle, pulses while speaking, gazes at whatever it's pointing at. matches the paper's embodiment (orb as the locus of attention, no face), and the hands are the same translucent blue.

one SDK change: SpeechSynthesizer.onBoundaryCallback, fired on each word boundary with the character index into the spoken text, so callers can sync visuals (here, gestures) to the actual spoken words. optional, off by default.

pointing is grounded with a lightweight raycast against the depth mesh, so the demo only needs gemini and no extra detection deps. one thing that ended up different from the paper: there, you register objects one at a time in a dedicated mode (look at a thing, say "register this", and it builds a rich oriented box with face and region labels that lands in a registry). Here the agent detects the whole room in a single pass and keeps re-grounding in the background as you move, so the grounding isn't a one-time observation, it stays current as you walk around without any manual step. the tradeoff is that it's coarser: a point per object rather than the paper's region-level boxes. the richer 3D object-detection addon that grew alongside this (objects3d) is going up as its own PR.

Why bring it into xrblocks: the paper's system is a unity study app on a galaxy xr headset. this ports the idea to the open web on three.js / webxr, packaged as a reusable addon so it's something you can drop into an app rather than a one-off, and it runs in the desktop simulator so you can try the whole loop without a headset. the gesture taxonomy, the markup-driven control, and the minimal orb-plus-hands embodiment all follow the paper; the main new plumbing is syncing gestures to the spoken words through the small SpeechSynthesizer hook above.

Worth being upfront the per-hand gesture state machine is simpler, there are fewer gesture types, and timing comes from tts word boundaries rather than the paper's per-word energy model. pointing leans on depth-mesh quality and 2D detection instead of a full scene mesh, and there's no user-gaze input yet (the orb gazes, but we don't read where you look). most of the tuning happened in the simulator, so the in-headset path is wired but less exercised. plenty of room to grow it (more gestures, gaze, better grounding), which is part of why it's an addon.

colocated vitest specs for the addon cover the hand rig, the pair, the gesture parser and the orb. lint, prettier and build are clean.

Load the WebXR generic-hand glb as a free-standing pair of hands (not the user's tracked input), pose it with the simulator pose library, and cycle through gestures. Proves the standalone rig + pose animation before building the AgentHands feature.

Loads the WebXR generic-hand glb as a free-standing hand (not the user's tracked input) and animates its bones toward a SimulatorHandPose using the simulator pose library. The bone-lerp step is a pure, tested helper.

Owns a left/right AgentHand, loads both, and animates them toward their current poses each frame. gesture(pose, hand?) sets one or both hands; rest() relaxes them.

Add a gesture->pose vocabulary and parseAgentGestures(), which strips [gesture:point] style markup from the agent's text and returns the cleaned speech plus the ordered gestures anchored to where they occur. Pure + tested.

A free-standing pair of agent hands raised in front of the user that gesture as a scripted line is 'spoken': each line's [gesture:...] markup is parsed and played in sequence, then the hands relax. Runs without a key; the same pipeline is driven by Gemini Live next.

…bjects3d dependency)

… leaking into replies

…tion so re-aiming is absolute

…lback

…r backs

… gazing at targets

… on top of pose and aim

…xonomy

…once

…aper's embodiment Also make the hand meshes non-raycastable so they don't intercept the UI selection beam reaching panels behind them.

When the target is within the wrist's reach offset the aim direction is ill-conditioned, so re-aiming each frame swung the hand wildly. Skip the re-aim in that regime and hold the current orientation.

The orb's decorative core, halo, and point shell were raycastable, so the reticle hit the dense point cloud (a few cm apart) before the UI behind it. That made intersections[0] a stray point and the spatial panel buttons never received hover or select. No-op their raycast so the orb is presence-only.

The fingertip-to-target line and the target ring are decorative; no-op their raycast too so they can't steal hover from the control panel while pointing.

Adds AgentHand.orient(parentQuaternion) plus AgentHands.orient/clearOrientation so callers can override the resting tilt for a gesture (e.g. present an emblematic pose upright) and clear it again, mirroring how aimAt works.

Thumbs up, thumbs down, victory, rock and fist inherited the palms-up rest tilt, so a thumbs-up thumb pointed at the user instead of the ceiling. Cancel the rest tilt in the hand-root parent frame for those poses so they read upright and facing the user; open/relaxed keeps the rest tilt. Also stop pointing when any non-point gesture fires so the per-frame re-aim can't fight the pose.

The spike was the first throwaway hand-rig prototype. The full agent_hands demo and the agenthands addon supersede it, so drop the scaffold.

… clash THREE.Object3D already defines a count property, so an AgentHands.count method tripped a TS2416/2425 warning that failed the build under --failAfterWarnings. showCount also reads consistently next to showSize.

ruofeidu · 2026-06-26T22:18:43Z

Thank you Salman for the contribution! Do you have a recording of this?

I'm just back from a trip recently and a bit slow catching up.

salmanmkc · 2026-06-27T13:07:19Z

Thank you Salman for the contribution! Do you have a recording of this?

I'm just back from a trip recently and a bit slow catching up.

I will get one done for you soon! Hope you had a good trip!

The idle/hover fill colors (#2a2a2a -> #3a3a3a) differed by only a few percent of brightness, so hovering a control button produced no perceptible change and the buttons felt unresponsive. Use a dark chip for idle and a clear purple for hover (with a brighter click flash) so the highlight reads unmistakably.

The scanned depth mesh (walls/floor) is in the scene for occlusion, so the reticle's whole-scene raycast also hits it. Standing within ~1m of a wall makes the wall the closest hit, so it grabs hover from the control panel and the buttons stop responding. No-op the depth mesh's raycast so the reticle skips it, and restore the real raycast briefly inside groundPoint_ so the agent's own pointing still grounds on the geometry.

The hands rested in a palms-up 'offering' tilt (REST_TILT_X = pi/2). Drop the tilt so they rest in a neutral, level pose in front of the user, which reads more naturally and removes the need to cancel the tilt for upright gestures.

The agent-hand model's neutral orientation faces away from the user, so poses showed their edge/back. Add a half-turn about vertical in the head-anchor (and the initial rest rotation) so the hands present their front to the user.

With the hands now facing the user by default, the special rest-tilt cancellation for thumbs-up/victory/etc. is unnecessary. Remove the REST_CANCEL_QUAT quaternion, the UPRIGHT_POSES set, and the per-pose orient call so every pose just uses the default facing.

The wave used the RELAXED pose, which curls the fingers slightly so the wave looked like a half-closed hand. Switch the waving hand to NEUTRAL (all joints flat) for an open-palm greeting.

A point gesture that did not resolve to a detected object fell through to the bare POINTING pose, which (with the hands facing the user) aimed the finger back at the user. Aim such points ~1.5 m ahead into the room instead.

salmanmkc added 30 commits June 26, 2026 21:15

agenthands: add AgentHand, a standalone posable hand rig

cf5e203

Loads the WebXR generic-hand glb as a free-standing hand (not the user's tracked input) and animates its bones toward a SimulatorHandPose using the simulator pose library. The bone-lerp step is a pure, tested helper.

agenthands: add AgentHands pair manager + gesture API

1064f63

Owns a left/right AgentHand, loads both, and animates them toward their current poses each frame. gesture(pose, hand?) sets one or both hands; rest() relaxes them.

agenthands: parse gesture markup from agent replies

b7239f6

Add a gesture->pose vocabulary and parseAgentGestures(), which strips [gesture:point] style markup from the agent's text and returns the cleaned speech plus the ordered gestures anchored to where they occur. Pure + tested.

demos: add talk button and genai importmap to agent_hands

77d7c06

demos: wire interactive gemini mode into agent_hands

169cfa2

demos: add spatial control panel to agent_hands for immersive parity

bf09af3

demos: add gemini key-entry overlay to agent_hands

fdef3eb

agenthands: parse optional point target from gesture markup

f42b53b

agenthands: aim a hand's index finger at a world position

c16e133

agenthands: add pointAt to the hands pair

a41e2c9

demos: ground agent_hands pointing in detected objects

5defd34

agenthands: reach the hand toward the object it points at

a5adc1e

demos: load 3D detection deps in agent_hands importmap

ea595c3

demos: point agent_hands at oriented 3D boxes via Object3DDetector

3d30df6

demos: scan the room off the conversation path so replies stay fast

5ebe410

demos: ground agent_hands pointing with a lightweight raycast (drop o…

fc2152c

…bjects3d dependency)

demos: trim agent_hands importmap back to gemini only

a991ee0

demos: auto-rescan agent_hands in the background as the view moves

402e74d

demos: serialize agent_hands scanning and chat to stop detection JSON…

f609c55

… leaking into replies

agenthands: measure pointing direction before restoring the live rota…

13b852c

…tion so re-aiming is absolute

agenthands: pick the pointing hand in the pair's local frame

50fe417

agenthands: anchor gesture index to the normalized speech text

4245679

demos: clear stale objects on empty scans and guard overlapping TTS

3f046e2

agenthands: expose the index fingertip world position

2c16c1f

sound: let SpeechSynthesizer report word boundaries via onBoundaryCal…

3b10211

…lback

demos: head-anchor the agent hands to the user with idle sway

147991f

demos: draw a pointer ray and target ring while the agent points

a3a48db

demos: lean the agent hands toward the object they point at

734a1d9

salmanmkc added 18 commits June 26, 2026 21:15

demos: rest the agent hands palms-toward-user instead of showing thei…

a5292df

…r backs

agenthands: add AgentHead, a glowing orb presence between the hands

d95a76d

demos: float the agent orb above the hands, pulsing when speaking and…

d60b72a

… gazing at targets

agenthands: add a layered motion offset so gestures can move the hand…

ff8937c

… on top of pose and aim

agenthands: add beat, wave, iconic size and count motion gestures

fbf846d

agenthands: parse motion markup ([wave], [beat], [size:big], [count:2])

9fea082

demos: play motion gestures and teach the agent the richer gesture ta…

6735bd3

…xonomy

agenthands: aim from the wrist so close-range pointing stays accurate

c405fb7

agenthands: relax the hand that isn't pointing so only one points at …

66f2325

…once

demos: rest the agent hands palms-up toward the ceiling

3333022

agenthands: render the hands as semi-transparent blue, matching the p…

72be062

…aper's embodiment Also make the hand meshes non-raycastable so they don't intercept the UI selection beam reaching panels behind them.

agenthands: stop the aim thrashing when a point target sits on the hand

0035e8e

When the target is within the wrist's reach offset the aim direction is ill-conditioned, so re-aiming each frame swung the hand wildly. Skip the re-aim in that regime and hold the current orientation.

demos: keep the agent_hands pointer ray and ring out of the reticle

9b47ff4

The fingertip-to-target line and the target ring are decorative; no-op their raycast too so they can't steal hover from the control panel while pointing.

demos: remove the agent_hands spike now that the real demo exists

2562821

The spike was the first throwaway hand-rig prototype. The full agent_hands demo and the agenthands addon supersede it, so drop the scaffold.

salmanmkc mentioned this pull request Jun 26, 2026

feat(addons/objects3d): reusable 3D object detection (2D detect + depth into oriented boxes) #417

Open

Merge branch 'main' into agenthands

c0367ca

Merge branch 'main' into agenthands

829b114

salmanmkc added 7 commits June 28, 2026 10:38

agent_hands: rest the hands level instead of palms-up

e33da99

The hands rested in a palms-up 'offering' tilt (REST_TILT_X = pi/2). Drop the tilt so they rest in a neutral, level pose in front of the user, which reads more naturally and removes the need to cancel the tilt for upright gestures.

agent_hands: face the hands toward the user

23a12b8

The agent-hand model's neutral orientation faces away from the user, so poses showed their edge/back. Add a half-turn about vertical in the head-anchor (and the initial rest rotation) so the hands present their front to the user.

agent_hands: wave with flat open fingers

0e067e2

The wave used the RELAXED pose, which curls the fingers slightly so the wave looked like a half-closed hand. Switch the waving hand to NEUTRAL (all joints flat) for an open-palm greeting.

agent_hands: point outward when no object is targeted

ca4bbcc

A point gesture that did not resolve to a detected object fell through to the bare POINTING pose, which (with the hands facing the user) aimed the finger back at the user. Aim such points ~1.5 m ahead into the room instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(addons/agenthands): hands + orb that gesture and point as the agent speaks#416

feat(addons/agenthands): hands + orb that gesture and point as the agent speaks#416
salmanmkc wants to merge 60 commits into
google:mainfrom
salmanmkc:agenthands

salmanmkc commented Jun 26, 2026

Uh oh!

ruofeidu commented Jun 26, 2026

Uh oh!

salmanmkc commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants