Skip to content

ehewes/TinyFish-Go

Repository files navigation

TinyFish Go — a voice-driven productivity game for the AI-agent era

Track 2 — Voice & Productivity · sponsored by Gradium & TinyFish Improve workplace tasks with voice AI: meetings, workflows, collaboration, automation.

TinyFish Go is a side-scrolling underwater runner where every fish on screen is a real AI agent doing real work for you in the background. Your job, as the human, is to keep those agents alive with your voice while they run — because if you get distracted by your phone, Slack, or a TikTok rabbit hole, the agents crash into those distractions and your work dies with them.

It is both a game and a demo of a thesis:

People increasingly hand tasks off to AI agents ("book this flight", "pull these prices", "draft these emails"). While the agent works, the human has nothing to do. Minutes later the user has forgotten the agent is even running, is nine reels deep on Instagram, and the agent has silently finished / errored / asked for input that will never come. What if the waiting period itself was a productivity game?


Screenshots

Main menu

Main menu

Multiple sessions

Multiple sessions running side by side

Agent orchestration

Agent orchestration and live quest streaming

Add your distractions

Pick the real-world distractions that become obstacles

Phone controller

Phone-as-controller pairing and multiplayer


The core loop

  1. Create a quest. The user speaks or types prompts for one or more TinyFish agents ("grab the current lithium spot price", "summarize my unread Slack threads", "draft a follow-up for the Acme deal"). Each quest gets its own color-coded fish.
  2. Pick your distractions. Before diving in, the user selects which real-world distractions this run will include — phone, chat bubble, email, snacks, game controller, notification pop-up, etc. These become the obstacles the fish must dodge. Default underwater obstacles (rocks, coral, reef, anchor, seaweed) are always present.
  3. Swim. The game starts, the fish swim right at a constant speed, and the world scrolls past. The TinyFish SDK begins streaming agent progress into a floating Quest Sidebar on the right — you can literally watch each agent think step by step.
  4. Voice-control the fish. The user keeps saying things like "green up", "blue down", "purple up" to nudge each fish one "level" and weave it around obstacles. A live transcript + fuzzy matcher in the sidebar show what was heard (and what the system thinks you meant — "lew up" still resolves to blue up if blue is in play).
  5. Finish or crash. When an agent's tinyfish.agent.stream emits COMPLETE, its fish triumphantly exits stage right. If the fish hits a distraction first, the agent is aborted with a collision reason and marked failed on its card. Last fish standing ends the run.

The meta-joke: you are now doing something while the AI works. If you stop paying attention, you see exactly what you lost.


What's already built

Honest inventory of the current prototype. Everything below is wired up and type-clean; npm run dev gives you the full loop end-to-end with mocked agents.

Gameplay engine (React + Canvas)

  • Side-scrolling runner with parallax water backgrounds, a fixed-timestep update loop, and interpolated render (lib/hooks/useGameLoop.ts, lib/game/world.ts, app/game/GameCanvas.tsx).
  • Multiple fish at once. world.fishes holds one FishEntity per active quest, each with its own color, input state, and lifecycle (alive → exiting → done on quest complete, or alive → dead on collision). lib/game/fish.ts.
  • Spec-based obstacle system. lib/game/obstacleCatalog.ts owns 16 obstacle specs:
    • 8 defaults always in the pool: 4 rock variants (all floor-anchored — rocks don't float), anchor, reef, seaweed (sway), coral.
    • 8 "distraction" customs (alarm_clock, chat_bubble, coffee_cup, email, game_controller, junk_food_snack, smartphone, notification) that the user opts into via the picker. Spawned mid-water with a free y-band; chat + notification sway slightly to sell the "floating interruption" vibe.
  • Dynamic spawn pool. lib/game/patternPool.ts combines DEFAULT_PATTERNS with one tiny pattern per selected custom, and the spawner (lib/game/spawner.ts) reads spawner.patterns live so the first spawned beat already reflects the user's choices. rock_any pattern sugar picks a random rock_1..rock_4 sprite per instance.

Agent simulation (mock @tiny-fish/sdk)

  • lib/tinyfish/mockSdk.ts — drop-in mock shaped like the real SDK (client.agent.stream({ url, goal }) returning an AsyncIterable of STARTED | PROGRESS | COMPLETE events).
  • lib/tinyfish/questScripts.ts — 8 canned quests (one per fish color) with hand-authored URLs, goals, planned steps, and summaries. Mirrors the real TinyFish response shape so swapping it for the network SDK is a one-file change.
  • lib/tinyfish/timing.ts — STEP/COMPLETE delays deliberately in the minutes range so quests feel like real long-running agents, not a fake demo.
  • lib/tinyfish/questRuntime.ts — React context that runs every stream, exposes runs, handles abortQuest(id, 'collision'), and is the single source of truth the sidebar + fish exit transitions read from.

UI

  • Create Quest stage (app/game/ui/QuestCreator.tsx) — drag fish in from the swatch rail, type a prompt into the chat bubble floating above each fish's head (cosmetic; mock runs key off color). Input fields now correctly accept space/arrow keys without the game's global keybindings stealing them.
  • Obstacle Picker stage (app/game/ui/ObstaclePicker.tsx) — second step in the run-setup flow. Shows the 8 "always active" defaults on top, a stage of user-selected customs in the middle, and a swatch rail to toggle customs on/off. Back preserves the QuestCreator draft.
  • Quest Sidebar (app/game/ui/QuestSidebar.tsx) — streaming card per active agent (Figma node 3-8): progress steps with checkboxes as the stream emits PROGRESS events, final summary on COMPLETE, error state on collision. Bottom panel shows live voice status: listening indicator, active agent swatches, interim transcript (▸ blew up), and a pulsing "Heard" chip for the last fuzzy-matched command.
  • Draggable Camera Feed (app/game/ui/CameraFeed.tsx) — webcam window that defaults bottom-right (out of the sidebar's way), persists its position to localStorage, can be minimized or closed. Currently only shows the user's face. This is the hook for the next-phase vision work.
  • HUD, Menu, Pause, Game Over overlays — all wired.

Input

  • Voice commands (lib/hooks/useVoiceCommands.ts) — custom layer over react-speech-recognition. Parses raw transcripts (no commands API to avoid interim/final double-firing), uses Levenshtein fuzzy matching for colors restricted to the currently-active fish palette, and stricter exact-ish matching for verbs (up | down | stop). One-level pulse movement: each voice command sets state.pulseEndsAt = now + 220ms so the fish nudges exactly one step instead of continuously drifting. A 1200ms per-(color, verb) cooldown deduplicates interim + final matches. Sonar ring around the fish flashes for 900ms per command, in that fish's color, for immediate visual confirmation.
  • Keyboard (lib/hooks/useKeyboardControls.ts) — Arrow / WASD / Space / Tab-focus-cycle. Skips handling when the event originates from an editable element so QuestCreator prompts type normally, and overrides any active voice pulse when a physical key is pressed.
  • Touch (lib/hooks/useTouchControls.ts) — tap-top-half / tap-bottom-half to nudge the focused fish, with the same "physical input beats voice pulse" rule.

What needs to change — roadmap

The whole point of Track 2 is voice × productivity, and the whole point of the distraction metaphor is that it should match the user's actual room. The prioritized backlog is what moves us from "clever canned demo" to "this is actually a productivity tool that runs during your agent runs".

Phase 1 — the voice & vision features that make it feel real (do these first)

  1. Voice-driven Create Quest. Add a mic button inside each fish's chat bubble in QuestCreator.tsx (and on the global header for "add + describe next fish"). Push-to-talk captures the prompt via the same gradium speeach-to-text (https://docs.gradium.ai/api-reference/endpoint/stt-websocket.md) pipeline we already use in-game. For prompts longer than a few seconds, bounce through a cleanup step (Gemini / Whisper) to get punctuation and casing right. Fall back to the current text input for environments without mic permission. Optional stretch: a voice "start" command once all fish have prompts.

  2. Live vision → custom obstacle injection. The big idea. Today customs are picked from a fixed library of 8 distractions. We want the camera feed (already mounted in CameraFeed.tsx) to stream frames to a vision model — Gemini 2.5 / 3.x Live is the target — with a system prompt along the lines of:

    You are watching the user's desk during a productivity game. Every few seconds, tell me if a new real-world distraction is visible (phone, coffee cup, snack, smartwatch, secondary monitor, another person, pet, TV playing in the background...). Call addObstacle({kind}) with a kind from this whitelist: [smartphone, coffee_cup, junk_food_snack,

    chat_bubble, notification, game_controller, email, alarm_clock]. Only call once per visually-distinct appearance.

    The addObstacle(kind) function call maps to an existing ObstacleSpecId. When invoked mid-run we push a custom one-shot pattern into world.spawner.patterns so the real distraction starts spawning within a few seconds. This is where Track 2's thesis pays off — the game literally watches you get distracted and throws that distraction into your path.

    Same pipeline should run inside ObstaclePicker.tsx as an optional "scan my desk" button — one camera sweep, one Gemini call, and it pre-selects the custom obstacles it saw. If the user has a phone + a coffee in frame, those light up automatically in the picker.

  3. Gradium sign-up + API key wiring. Stub a minimal onboarding flow before first play: store a Gradium API key in localStorage (scaffold only — we don't have real auth yet), show a "Connect Gradium" tile on the menu, and gate the in-fish speech features on that key being present. Nothing calls Gradium yet at this phase; this just sets up the contract.

Phase 2 — Gradium coin economy & fish voice

  1. Regenerate coin art to the Gradium logo. Replace the three coin frames in public/game/images/coin_0..2.png with a spinning Gradium token. Purely visual, but it turns the coin into the brand currency that powers the talking fish.

  2. Coin = speech credit. Right now coins just add +1 to the score. Make each fish carry a per-fish gradiumBalance. Picking up a coin credits only the nearest alive fish. The fish uses balance to speak:

    • Idle chatter (throttled) while it's working: "I'm almost done", "Still reading the page", "Waiting on a redirect" — sampled from a cheap summarization of the TinyFish stream's last PROGRESS.purpose.
    • Contextual barks when the user saves it: "Thanks, that was close."
    • Salty ones on collision: "Really? You let me hit a phone?"
    • When balance hits zero: "Can't hear you — need more Gradium…" (TTS'd). If balance stays at zero for N seconds, that fish's quest ends with a needs_credits status on its sidebar card.
  3. Fish voice + TinyFish progress narration. Each fish periodically peeks at runtime.runs[questId].events.at(-1) and, if it's been quiet for a while, TTS's the current step purpose ("Extracting first two products and prices…"). Uses Gradium for the voice generation once (3) is live. ElevenLabs / browser speechSynthesis are acceptable fallbacks.

Phase 3 — swap the simulation for the real thing (do last)

  1. Real @tiny-fish/sdk. Replace lib/tinyfish/mockSdk.ts with the real client. questScripts.ts already holds real url + goal pairs, and questRuntime.ts already consumes an AsyncIterable<RunEvent>, so this should be a single-file swap + an API key read from env.

  2. Real Gradium integration for voice + coin economy. End state: one wallet, spend on live TTS + optionally on STT model upgrades.

  3. Persistence + multiplayer nicety. Save quest history, replays, leaderboards ("longest undistracted agent run"), share a run as a clip.

Nice-to-haves / parking lot

  • A "dodge training" mode — no agents, just obstacles, for warming up before a big task.
  • Voice commands beyond up/down/stop: "focus green", "pause", "abort blue", "what's blue doing?".
  • Gradium-powered intent parsing so the user can just say "stop distracting me with snacks" and the vision pipeline demotes that obstacle kind for the rest of the session.
  • Connect to real workplace surfaces (Slack, Gmail, calendars) so quests aren't synthetic examples.

Architecture at a glance

app/
  game/
    GameShell.tsx        // phase machine: menu → creating → picking → playing → paused → gameover
    GameCanvas.tsx       // canvas, game loop, collision, spawner reset
    ui/
      QuestCreator.tsx   // Stage A: pick fish + (voice-)type prompts
      ObstaclePicker.tsx // Stage B: pick which real-world distractions to spawn
      QuestSidebar.tsx   // streaming agent cards + live voice panel
      CameraFeed.tsx     // draggable webcam (future: vision pipeline target)
      HUD.tsx, Menu.tsx, PauseOverlay.tsx, GameOverOverlay.tsx, LoadingScreen.tsx

lib/
  game/
    obstacleCatalog.ts   // 16 specs (8 default + 8 custom)
    obstacles.ts         // Obstacle struct + hitbox math (spec-driven)
    patterns.ts          // DEFAULT_PATTERNS
    patternPool.ts       // buildPatternPool(selectedCustomIds)
    spawner.ts           // live pattern pool, reachability check
    world.ts, fish.ts, render.ts, input.ts, collision.ts, parallax.ts, ...
  tinyfish/
    questScripts.ts      // 8 canned quests matching the real SDK shape
    mockSdk.ts           // streaming mock; swap for real SDK last
    questRuntime.ts      // React context running every stream
    timing.ts            // minute-scale delays so runs feel real
  hooks/
    useVoiceCommands.ts  // fuzzy voice → input registry (pulse movement)
    useKeyboardControls.ts, useTouchControls.ts, useGameLoop.ts, useAssetLoader.ts, ...

Key invariants worth keeping as the codebase grows:

  • Spec-driven obstacles. Every spawned obstacle carries an ObstacleSpecId; sprite + hitbox come from OBSTACLE_CATALOG. Adding a new distraction = one catalog entry + one image in public/game/images/obstacles/custom/. The vision pipeline in Phase 1 item (2) should only ever emit ids that already exist in this catalog — adding the corresponding image + spec is the human checkpoint.
  • Input registry is the single mutation surface. Keyboard, touch, and voice all write to the same InputRegistry. Voice adds a pulseEndsAt timer that the game loop auto-releases; physical input clears the pulse. New input sources (e.g. eye-tracking, MIDI, foot-pedal) plug in the same way.
  • Agent runtime is decoupled from rendering. QuestRuntime is the only thing that knows about streams. The game reads runtime.runs[questId]; the sidebar reads the same. Swapping the mock SDK for the real one affects exactly one file.

Getting started

npm install
npm run dev

Open http://localhost:3000.

Grant microphone permission when prompted — voice is the primary control scheme. Keyboard (Arrows / WASD / Space, Tab to cycle focused fish) works as a fallback during development.

Useful DevTools bits

  • window.__registry — the live InputRegistry. Inspect __registry.states.blue to see per-color { up, down, pulseEndsAt, commandFlashAt }.
  • Console is intentionally chatty during voice processing ([voice] env, interim, final, command applied, skip (cooldown)).

Build

npm run build
npm run start

Credits / why this exists

Built for a voice-AI hackathon under the Gradium + TinyFish Voice & Productivity track. The game side riffs on an existing Construct 2 Flappy Fish codebase (used only as an asset quarry — sprites, backgrounds, audio); the React/Canvas runtime, agent streams, voice pipeline, and UI are all new.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors