Agent Vision for Unity

⚠️ WORK IN PROGRESS — NOT PRODUCTION READY This project currently only uses static screenshots for visual verification. It is not a viable product for agentic Unity development assistance — it's an exploration of what agent-Unity integration could look like. Real agentic dev assistance would require live video streaming, DOM-like scene inspection, and bidirectional control, none of which are fully implemented here.

Like Playwright for web apps, but for Unity games. Screenshots + JSON state are streamed to an external agent for visual verification and regression testing.

What Each Script Does

Unity Side (C#) — drop into any Unity project

Script	What it does
`AgentVisionBootstrap.cs`	Automatically creates a `GameViewRecorder` and `GameStateLogger` when a scene loads. Add this to any GameObject and it will start capturing on Play.
`GameViewRecorder.cs`	Captures a JPG screenshot of the Game view every 0.5 seconds and on every mouse click. Saves them to `persistentDataPath/AgentVision/Frames/<timestamp>/`.
`GameStateLogger.cs`	Writes a JSON snapshot of game state alongside each recorded frame. Base version logs timestamp, scene name, and FPS only. To log game-specific state (HP, cards, enemies), subclass it and override `BuildGameJson()`. See "Custom Game State" section below.
`UnityWebhookBridge.cs`	An Editor script that sends an HTTP POST to `localhost:8765/webhook` whenever you press Play or Stop in Unity. The server now knows when gameplay starts and ends.
`UnityBridgeNotifier.cs`	An Editor script that writes a `status.json` file to `%LocalAppData%/AI_Bridge/` on Play/Stop. This is a file-based fallback if the HTTP server isn't running.

Python Side — the observation and analysis tools

Script	What it does
`agent_server.py`	Runs an HTTP server on port 8765. Receives webhook events from Unity, captures screenshots of the Unity window every 2 seconds, and serves them via GET endpoints (`/status`, `/frame`, `/events`). This is the main entry point — start this first.
`agent_client.py`	A command-line tool to query the server. Run `python agent_client.py full` to get the latest event, screenshot, and pixel diagnosis all at once.
`capture_unity_view.py`	Finds the Unity window by title, takes a screenshot of its client area, and analyzes pixels for pink/black/brightness. Used as a library by other scripts.
`run_agent_vision.py`	A unified CLI with four modes: `capture` (one screenshot), `session` (record many frames at a target FPS), `analyze` (extract keyframes from a session), `watch` (continuous capture loop).
`session_recorder.py`	Records numbered frames (frame_00000.jpg, frame_00001.jpg, ...) into a timestamped folder. Optionally encodes them to MP4 via ffmpeg. Stop with `stop_recording.py`.
`analyze_session.py`	Reads a session folder, picks the most distinct keyframes, runs pixel diagnosis on each, and writes a `report.json` summary.
`vision_daemon.py`	Runs a loop that captures a screenshot and overwrites `Log/current.png` every N seconds. A simpler alternative to `agent_server.py`'s built-in vision thread.
`watch.py`	Minimal loop that captures a screenshot and prints the diagnosis (bright/pink/black) to the terminal every few seconds.
`auto_diag.py`	Runs a detect-fix-retest loop: captures a screenshot, diagnoses it, applies a fix (shader, camera background, etc.), then captures again to verify. ⚠️ Game-specific — contains project-specific fix functions. Adapt for your own game.
`unity_input.py`	Sends mouse clicks, drags, and keyboard presses to the Unity window using Win32 `SendInput`. For programmatic gameplay control.
`windows_capture.py`	Takes a screenshot using Win32 GDI calls with no PIL dependency. A fallback capture method.
`stop_recording.py`	Creates a `STOP` sentinel file that tells `session_recorder.py` to finish recording.

Process Flow

┌─────────────────────────────────────────────────────────────┐
│                        SETUP                                │
│                                                             │
│  1. Copy the Assets/ folder into your Unity project          │
│  2. Add AgentVisionBootstrap to a GameObject in your scene   │
│  3. pip install pillow                                      │
│  4. python agent_server.py          ← start this first      │
│  5. Press Play in Unity            ← game starts            │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    DURING GAMEPLAY                           │
│                                                             │
│  Unity (C#)                      Python (server)            │
│  ┌─────────────────────┐         ┌──────────────────┐      │
│  │ GameViewRecorder     │         │ Listens on :8765 │      │
│  │  captures frame.jpg  │──file──>│                  │      │
│  │  every 0.5s + click  │         │  /webhook  ←── POST    │
│  │                      │         │  /status   ──> GET      │
│  │ GameStateLogger      │         │  /frame    ──> GET      │
│  │  writes state.json   │──file──>│  /events   ──> GET      │
│  │                      │         │                  │      │
│  │ UnityWebhookBridge   │────────>│  (POST on       │      │
│  │  POST on Play/Stop   │  HTTP  │   Play/Stop)     │      │
│  └─────────────────────┘         └────────┬─────────┘      │
│                                            │                │
└────────────────────────────────────────────┼────────────────┘
                                             │
                                             ▼
┌─────────────────────────────────────────────────────────────┐
│                    AI AGENT QUERY                            │
│                                                             │
│  python agent_client.py full                                │
│                                                             │
│  Returns:                                                   │
│    • Latest event (play_started / play_stopped)             │
│    • Pixel diagnosis (brightness, pink%, black%)            │
│    • Screenshot saved to Log/current_frame.png              │
│                                                             │
│  OR query individual endpoints:                             │
│    python agent_client.py status   → event + diagnosis     │
│    python agent_client.py frame    → save screenshot only   │
│    python agent_client.py events   → last 20 webhook events│
└─────────────────────────────────────────────────────────────┘

Requirements

Dependency	Version	Install
Unity	2022.3+ (URP recommended)	https://unity.com/download
Python	3.8+	https://python.org
Pillow	10+	`pip install pillow`
Windows	10/11	Required for Win32 screenshot capture

No other Python packages are needed. The server uses only the standard library (http.server, json, threading).

Step-by-Step Reproduction

1. Copy Unity scripts into your project

The repo folder structure matches Unity's convention:

YourUnityProject/
  Assets/
    Editor/
      UnityWebhookBridge.cs      ← Editor-only: POSTs on Play/Stop
      UnityWebhookBridge.cs.meta
      UnityBridgeNotifier.cs      ← Editor-only: writes status.json
      UnityBridgeNotifier.cs.meta
    Scripts/
      AgentVision/
        AgentVisionBootstrap.cs  ← Add this to a GameObject
        AgentVisionBootstrap.cs.meta
        GameViewRecorder.cs      ← Auto-started by Bootstrap
        GameViewRecorder.cs.meta
        GameStateLogger.cs       ← Auto-started by Bootstrap
        GameStateLogger.cs.meta

Copy the Assets/ folder from this repo into your Unity project's Assets/ folder. The .meta files are included so Unity preserves GUIDs.

2. Add to your scene

Open any scene in Unity
Create an empty GameObject (right-click hierarchy → Create Empty)
Name it AgentVision
In the Inspector, click Add Component → search for AgentVisionBootstrap
The captureInterval (default 0.5s) and autoStart (default true) are configurable in the Inspector

3. Install Python dependencies

pip install pillow

4. Start the server

python agent_server.py

You should see:

[AgentServer] Running on http://127.0.0.1:8765
  POST /webhook   ← Unity sends events here
  GET  /status    → Latest event + screenshot diagnosis
  GET  /frame     → Base64 PNG of last capture
  GET  /events    → Last 20 events

If your Unity window has a custom title (not "Unity"), specify it:

python agent_server.py --title "My Game Title"

5. Press Play in Unity

The Console should show:

[GameViewRecorder] Output: C:\...\AgentVision\Frames\2026-04-28_16-34-43
[GameViewRecorder] Capturing every 0.5s as JPG
[AgentVisionBootstrap] Vision pipeline started automatically.
[WebhookBridge] Sent play_started to http://127.0.0.1:8765/webhook

6. Query from another terminal

python agent_client.py full

Expected output:

=== STATUS ===
Server time: 1777419319.055
Latest event: play_started at 2026-04-28T16:34:43Z
Scene: MyScene
Frame exists: True
Diagnosis: bright=90.6 pink=0.0 black=0.07
  > OK

=== FRAME ===
Frame saved to: Log/current_frame.png

7. Stop playing

Press Stop in Unity. The server receives play_stopped with needs_ai_attention: true.

Where Files Are Saved

What	Where
Screenshot frames	`%LocalAppData%/<Company>/<Project>/AgentVision/Frames/<timestamp>/`
State JSON files	Same folder as frames (`state_XXXXX.json`)
Bridge status.json	`%LocalAppData%/AI_Bridge/status.json`
Server screenshot	`Log/current.png` (in the directory where `agent_server.py` runs)

<Company> defaults to DefaultCompany, <Project> is your Unity project name (e.g. slay-the-spire-mock).

Optional: Session Recording

# Record a session at 10 FPS
python run_agent_vision.py session --fps 10 --title "YourUnityWindowTitle"

# Stop recording
python stop_recording.py

# Analyze keyframes from the session
python run_agent_vision.py analyze --dir "path/to/session/folder"

Optional: Auto-Diagnosis Loop

# Captures, diagnoses, and attempts to fix common rendering issues
python auto_diag.py

Note: auto_diag.py is game-specific. It contains hardcoded fix functions for the Slay the Spire mock project. Adapt the try_fix() function for your own game.

Self-Diagnosis

Each captured frame is analyzed for rendering failures:

Pink pixels → shader fallback (missing URP material)
Black pixels (>60%) → nothing rendered (camera/rendering bug)
Overexposed (>240 avg) → material/lighting issue

Custom Game State

The base GameStateLogger only logs timestamp, scene name, and FPS. To log your game's state (HP, enemies, cards, etc.), create a subclass:

using AgentVision;
using UnityEngine;

public class MyGameStateLogger : GameStateLogger
{
    protected override string BuildGameJson()
    {
        // Example: return your game's state as JSON
        return $"{{ \"playerHp\": {Player.Instance.hp}, \"score\": {GameManager.score} }}";
    }
}

Then replace go.AddComponent<GameStateLogger>() in AgentVisionBootstrap.cs with go.AddComponent<MyGameStateLogger>().

Note: All C# scripts use the AgentVision namespace. You can change this to match your project's conventions.

Making GIFs from Sessions

python -c "
from PIL import Image; import glob
frames = sorted(glob.glob('path/to/Frames/frame_*.jpg'))
images = [Image.open(f).resize((640,480)) for f in frames[:60]]
images[0].save('gameplay.gif', save_all=True, append_images=images[1:], duration=500, loop=0)
"

Troubleshooting

Problem	Solution
No screenshots captured	Make sure `AgentVisionBootstrap` is on a GameObject in the active scene
Screenshots are all black	The window capture looks for title containing "Unity". Use `--title` flag if your window title differs
`agent_client.py` can't connect	Make sure `agent_server.py` is running first on port 8765
Compile error: `FindObjectOfType` is obsolete	The scripts use `FindAnyObjectByType` which requires Unity 2022.3+. Update Unity or replace with `FindObjectOfType`
State JSON only shows `timestamp` and `scene`	That's the base logger. Subclass `GameStateLogger` and override `BuildGameJson()` to add game state

Credits

Code: Generated by glm-5.1:cloud via Ollama Cloud
Design, input, feedback, testing: Kevin Cho (kevinkicho)

License

MIT License — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Vision for Unity

What Each Script Does

Unity Side (C#) — drop into any Unity project

Python Side — the observation and analysis tools

Process Flow

Requirements

Step-by-Step Reproduction

1. Copy Unity scripts into your project

2. Add to your scene

3. Install Python dependencies

4. Start the server

5. Press Play in Unity

6. Query from another terminal

7. Stop playing

Where Files Are Saved

Optional: Session Recording

Optional: Auto-Diagnosis Loop

Self-Diagnosis

Custom Game State

Making GIFs from Sessions

Troubleshooting

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Assets		Assets
agent_tools		agent_tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent_client.py		agent_client.py
agent_server.py		agent_server.py
auto_diag.py		auto_diag.py
run_agent_vision.py		run_agent_vision.py
stop_recording.py		stop_recording.py

Folders and files

Latest commit

History

Repository files navigation

Agent Vision for Unity

What Each Script Does

Unity Side (C#) — drop into any Unity project

Python Side — the observation and analysis tools

Process Flow

Requirements

Step-by-Step Reproduction

1. Copy Unity scripts into your project

2. Add to your scene

3. Install Python dependencies

4. Start the server

5. Press Play in Unity

6. Query from another terminal

7. Stop playing

Where Files Are Saved

Optional: Session Recording

Optional: Auto-Diagnosis Loop

Self-Diagnosis

Custom Game State

Making GIFs from Sessions

Troubleshooting

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages