Track 2 — Voice & Productivity · sponsored by Gradium & TinyFish Improve workplace tasks with voice AI: meetings, workflows, collaboration, automation.
TinyFish Go is a side-scrolling underwater runner where every fish on screen is a real AI agent doing real work for you in the background. Your job, as the human, is to keep those agents alive with your voice while they run — because if you get distracted by your phone, Slack, or a TikTok rabbit hole, the agents crash into those distractions and your work dies with them.
It is both a game and a demo of a thesis:
People increasingly hand tasks off to AI agents ("book this flight", "pull these prices", "draft these emails"). While the agent works, the human has nothing to do. Minutes later the user has forgotten the agent is even running, is nine reels deep on Instagram, and the agent has silently finished / errored / asked for input that will never come. What if the waiting period itself was a productivity game?
- Create a quest. The user speaks or types prompts for one or more TinyFish agents ("grab the current lithium spot price", "summarize my unread Slack threads", "draft a follow-up for the Acme deal"). Each quest gets its own color-coded fish.
- Pick your distractions. Before diving in, the user selects which real-world distractions this run will include — phone, chat bubble, email, snacks, game controller, notification pop-up, etc. These become the obstacles the fish must dodge. Default underwater obstacles (rocks, coral, reef, anchor, seaweed) are always present.
- Swim. The game starts, the fish swim right at a constant speed, and the world scrolls past. The TinyFish SDK begins streaming agent progress into a floating Quest Sidebar on the right — you can literally watch each agent think step by step.
- Voice-control the fish. The user keeps saying things like
"green up","blue down","purple up"to nudge each fish one "level" and weave it around obstacles. A live transcript + fuzzy matcher in the sidebar show what was heard (and what the system thinks you meant —"lew up"still resolves toblue upif blue is in play). - Finish or crash. When an agent's
tinyfish.agent.streamemitsCOMPLETE, its fish triumphantly exits stage right. If the fish hits a distraction first, the agent is aborted with acollisionreason and marked failed on its card. Last fish standing ends the run.
The meta-joke: you are now doing something while the AI works. If you stop paying attention, you see exactly what you lost.
Honest inventory of the current prototype. Everything below is wired up and
type-clean; npm run dev gives you the full loop end-to-end with mocked
agents.
- Side-scrolling runner with parallax water backgrounds, a fixed-timestep
update loop, and interpolated render (
lib/hooks/useGameLoop.ts,lib/game/world.ts,app/game/GameCanvas.tsx). - Multiple fish at once.
world.fishesholds oneFishEntityper active quest, each with its own color, input state, and lifecycle (alive → exiting → doneon quest complete, oralive → deadon collision).lib/game/fish.ts. - Spec-based obstacle system.
lib/game/obstacleCatalog.tsowns 16 obstacle specs:- 8 defaults always in the pool: 4 rock variants (all floor-anchored — rocks don't float), anchor, reef, seaweed (sway), coral.
- 8 "distraction" customs (
alarm_clock,chat_bubble,coffee_cup,email,game_controller,junk_food_snack,smartphone,notification) that the user opts into via the picker. Spawned mid-water with a free y-band; chat + notification sway slightly to sell the "floating interruption" vibe.
- Dynamic spawn pool.
lib/game/patternPool.tscombinesDEFAULT_PATTERNSwith one tiny pattern per selected custom, and the spawner (lib/game/spawner.ts) readsspawner.patternslive so the first spawned beat already reflects the user's choices.rock_anypattern sugar picks a randomrock_1..rock_4sprite per instance.
lib/tinyfish/mockSdk.ts— drop-in mock shaped like the real SDK (client.agent.stream({ url, goal })returning anAsyncIterableofSTARTED | PROGRESS | COMPLETEevents).lib/tinyfish/questScripts.ts— 8 canned quests (one per fish color) with hand-authored URLs, goals, planned steps, and summaries. Mirrors the real TinyFish response shape so swapping it for the network SDK is a one-file change.lib/tinyfish/timing.ts— STEP/COMPLETE delays deliberately in the minutes range so quests feel like real long-running agents, not a fake demo.lib/tinyfish/questRuntime.ts— React context that runs every stream, exposesruns, handlesabortQuest(id, 'collision'), and is the single source of truth the sidebar + fish exit transitions read from.
- Create Quest stage (
app/game/ui/QuestCreator.tsx) — drag fish in from the swatch rail, type a prompt into the chat bubble floating above each fish's head (cosmetic; mock runs key off color). Input fields now correctly accept space/arrow keys without the game's global keybindings stealing them. - Obstacle Picker stage (
app/game/ui/ObstaclePicker.tsx) — second step in the run-setup flow. Shows the 8 "always active" defaults on top, a stage of user-selected customs in the middle, and a swatch rail to toggle customs on/off. Back preserves the QuestCreator draft. - Quest Sidebar (
app/game/ui/QuestSidebar.tsx) — streaming card per active agent (Figma node3-8): progress steps with checkboxes as the stream emitsPROGRESSevents, final summary onCOMPLETE, error state on collision. Bottom panel shows live voice status: listening indicator, active agent swatches, interim transcript (▸ blew up), and a pulsing "Heard" chip for the last fuzzy-matched command. - Draggable Camera Feed (
app/game/ui/CameraFeed.tsx) — webcam window that defaults bottom-right (out of the sidebar's way), persists its position tolocalStorage, can be minimized or closed. Currently only shows the user's face. This is the hook for the next-phase vision work. - HUD, Menu, Pause, Game Over overlays — all wired.
- Voice commands (
lib/hooks/useVoiceCommands.ts) — custom layer overreact-speech-recognition. Parses raw transcripts (no commands API to avoid interim/final double-firing), uses Levenshtein fuzzy matching for colors restricted to the currently-active fish palette, and stricter exact-ish matching for verbs (up | down | stop). One-level pulse movement: each voice command setsstate.pulseEndsAt = now + 220msso the fish nudges exactly one step instead of continuously drifting. A 1200ms per-(color, verb) cooldown deduplicates interim + final matches. Sonar ring around the fish flashes for 900ms per command, in that fish's color, for immediate visual confirmation. - Keyboard (
lib/hooks/useKeyboardControls.ts) — Arrow / WASD / Space / Tab-focus-cycle. Skips handling when the event originates from an editable element so QuestCreator prompts type normally, and overrides any active voice pulse when a physical key is pressed. - Touch (
lib/hooks/useTouchControls.ts) — tap-top-half / tap-bottom-half to nudge the focused fish, with the same "physical input beats voice pulse" rule.
The whole point of Track 2 is voice × productivity, and the whole point of the distraction metaphor is that it should match the user's actual room. The prioritized backlog is what moves us from "clever canned demo" to "this is actually a productivity tool that runs during your agent runs".
-
Voice-driven Create Quest. Add a mic button inside each fish's chat bubble in
QuestCreator.tsx(and on the global header for "add + describe next fish"). Push-to-talk captures the prompt via the samegradiumspeeach-to-text (https://docs.gradium.ai/api-reference/endpoint/stt-websocket.md) pipeline we already use in-game. For prompts longer than a few seconds, bounce through a cleanup step (Gemini / Whisper) to get punctuation and casing right. Fall back to the current text input for environments without mic permission. Optional stretch: a voice "start" command once all fish have prompts. -
Live vision → custom obstacle injection. The big idea. Today customs are picked from a fixed library of 8 distractions. We want the camera feed (already mounted in
CameraFeed.tsx) to stream frames to a vision model — Gemini 2.5 / 3.x Live is the target — with a system prompt along the lines of:You are watching the user's desk during a productivity game. Every few seconds, tell me if a new real-world distraction is visible (phone, coffee cup, snack, smartwatch, secondary monitor, another person, pet, TV playing in the background...). Call
addObstacle({kind})with a kind from this whitelist: [smartphone, coffee_cup, junk_food_snack,chat_bubble, notification, game_controller, email, alarm_clock]. Only call once per visually-distinct appearance.
The
addObstacle(kind)function call maps to an existingObstacleSpecId. When invoked mid-run we push a custom one-shot pattern intoworld.spawner.patternsso the real distraction starts spawning within a few seconds. This is where Track 2's thesis pays off — the game literally watches you get distracted and throws that distraction into your path.Same pipeline should run inside
ObstaclePicker.tsxas an optional "scan my desk" button — one camera sweep, one Gemini call, and it pre-selects the custom obstacles it saw. If the user has a phone + a coffee in frame, those light up automatically in the picker. -
Gradium sign-up + API key wiring. Stub a minimal onboarding flow before first play: store a Gradium API key in
localStorage(scaffold only — we don't have real auth yet), show a "Connect Gradium" tile on the menu, and gate the in-fish speech features on that key being present. Nothing calls Gradium yet at this phase; this just sets up the contract.
-
Regenerate coin art to the Gradium logo. Replace the three coin frames in
public/game/images/coin_0..2.pngwith a spinning Gradium token. Purely visual, but it turns the coin into the brand currency that powers the talking fish. -
Coin = speech credit. Right now coins just add +1 to the score. Make each fish carry a per-fish
gradiumBalance. Picking up a coin credits only the nearest alive fish. The fish uses balance to speak:- Idle chatter (throttled) while it's working: "I'm almost done", "Still
reading the page", "Waiting on a redirect" — sampled from a cheap
summarization of the TinyFish stream's last
PROGRESS.purpose. - Contextual barks when the user saves it: "Thanks, that was close."
- Salty ones on collision: "Really? You let me hit a phone?"
- When balance hits zero: "Can't hear you — need more Gradium…"
(TTS'd). If balance stays at zero for N seconds, that fish's quest
ends with a
needs_creditsstatus on its sidebar card.
- Idle chatter (throttled) while it's working: "I'm almost done", "Still
reading the page", "Waiting on a redirect" — sampled from a cheap
summarization of the TinyFish stream's last
-
Fish voice + TinyFish progress narration. Each fish periodically peeks at
runtime.runs[questId].events.at(-1)and, if it's been quiet for a while, TTS's the current step purpose ("Extracting first two products and prices…"). Uses Gradium for the voice generation once (3) is live. ElevenLabs / browserspeechSynthesisare acceptable fallbacks.
-
Real
@tiny-fish/sdk. Replacelib/tinyfish/mockSdk.tswith the real client.questScripts.tsalready holds realurl+goalpairs, andquestRuntime.tsalready consumes anAsyncIterable<RunEvent>, so this should be a single-file swap + an API key read from env. -
Real Gradium integration for voice + coin economy. End state: one wallet, spend on live TTS + optionally on STT model upgrades.
-
Persistence + multiplayer nicety. Save quest history, replays, leaderboards ("longest undistracted agent run"), share a run as a clip.
- A "dodge training" mode — no agents, just obstacles, for warming up before a big task.
- Voice commands beyond
up/down/stop:"focus green","pause","abort blue","what's blue doing?". - Gradium-powered intent parsing so the user can just say
"stop distracting me with snacks"and the vision pipeline demotes that obstacle kind for the rest of the session. - Connect to real workplace surfaces (Slack, Gmail, calendars) so quests aren't synthetic examples.
app/
game/
GameShell.tsx // phase machine: menu → creating → picking → playing → paused → gameover
GameCanvas.tsx // canvas, game loop, collision, spawner reset
ui/
QuestCreator.tsx // Stage A: pick fish + (voice-)type prompts
ObstaclePicker.tsx // Stage B: pick which real-world distractions to spawn
QuestSidebar.tsx // streaming agent cards + live voice panel
CameraFeed.tsx // draggable webcam (future: vision pipeline target)
HUD.tsx, Menu.tsx, PauseOverlay.tsx, GameOverOverlay.tsx, LoadingScreen.tsx
lib/
game/
obstacleCatalog.ts // 16 specs (8 default + 8 custom)
obstacles.ts // Obstacle struct + hitbox math (spec-driven)
patterns.ts // DEFAULT_PATTERNS
patternPool.ts // buildPatternPool(selectedCustomIds)
spawner.ts // live pattern pool, reachability check
world.ts, fish.ts, render.ts, input.ts, collision.ts, parallax.ts, ...
tinyfish/
questScripts.ts // 8 canned quests matching the real SDK shape
mockSdk.ts // streaming mock; swap for real SDK last
questRuntime.ts // React context running every stream
timing.ts // minute-scale delays so runs feel real
hooks/
useVoiceCommands.ts // fuzzy voice → input registry (pulse movement)
useKeyboardControls.ts, useTouchControls.ts, useGameLoop.ts, useAssetLoader.ts, ...
Key invariants worth keeping as the codebase grows:
- Spec-driven obstacles. Every spawned obstacle carries an
ObstacleSpecId; sprite + hitbox come fromOBSTACLE_CATALOG. Adding a new distraction = one catalog entry + one image inpublic/game/images/obstacles/custom/. The vision pipeline in Phase 1 item (2) should only ever emit ids that already exist in this catalog — adding the corresponding image + spec is the human checkpoint. - Input registry is the single mutation surface. Keyboard, touch, and
voice all write to the same
InputRegistry. Voice adds apulseEndsAttimer that the game loop auto-releases; physical input clears the pulse. New input sources (e.g. eye-tracking, MIDI, foot-pedal) plug in the same way. - Agent runtime is decoupled from rendering.
QuestRuntimeis the only thing that knows about streams. The game readsruntime.runs[questId]; the sidebar reads the same. Swapping the mock SDK for the real one affects exactly one file.
npm install
npm run devOpen http://localhost:3000.
Grant microphone permission when prompted — voice is the primary control scheme. Keyboard (Arrows / WASD / Space, Tab to cycle focused fish) works as a fallback during development.
window.__registry— the liveInputRegistry. Inspect__registry.states.blueto see per-color{ up, down, pulseEndsAt, commandFlashAt }.- Console is intentionally chatty during voice processing
(
[voice] env,interim,final,command applied,skip (cooldown)).
npm run build
npm run startBuilt for a voice-AI hackathon under the Gradium + TinyFish Voice & Productivity track. The game side riffs on an existing Construct 2 Flappy Fish codebase (used only as an asset quarry — sprites, backgrounds, audio); the React/Canvas runtime, agent streams, voice pipeline, and UI are all new.




