⚠️ WORK IN PROGRESS — NOT PRODUCTION READY This project currently only uses static screenshots for visual verification. It is not a viable product for agentic Unity development assistance — it's an exploration of what agent-Unity integration could look like. Real agentic dev assistance would require live video streaming, DOM-like scene inspection, and bidirectional control, none of which are fully implemented here.
Like Playwright for web apps, but for Unity games. Screenshots + JSON state are streamed to an external agent for visual verification and regression testing.
| Script | What it does |
|---|---|
AgentVisionBootstrap.cs |
Automatically creates a GameViewRecorder and GameStateLogger when a scene loads. Add this to any GameObject and it will start capturing on Play. |
GameViewRecorder.cs |
Captures a JPG screenshot of the Game view every 0.5 seconds and on every mouse click. Saves them to persistentDataPath/AgentVision/Frames/<timestamp>/. |
GameStateLogger.cs |
Writes a JSON snapshot of game state alongside each recorded frame. Base version logs timestamp, scene name, and FPS only. To log game-specific state (HP, cards, enemies), subclass it and override BuildGameJson(). See "Custom Game State" section below. |
UnityWebhookBridge.cs |
An Editor script that sends an HTTP POST to localhost:8765/webhook whenever you press Play or Stop in Unity. The server now knows when gameplay starts and ends. |
UnityBridgeNotifier.cs |
An Editor script that writes a status.json file to %LocalAppData%/AI_Bridge/ on Play/Stop. This is a file-based fallback if the HTTP server isn't running. |
| Script | What it does |
|---|---|
agent_server.py |
Runs an HTTP server on port 8765. Receives webhook events from Unity, captures screenshots of the Unity window every 2 seconds, and serves them via GET endpoints (/status, /frame, /events). This is the main entry point — start this first. |
agent_client.py |
A command-line tool to query the server. Run python agent_client.py full to get the latest event, screenshot, and pixel diagnosis all at once. |
capture_unity_view.py |
Finds the Unity window by title, takes a screenshot of its client area, and analyzes pixels for pink/black/brightness. Used as a library by other scripts. |
run_agent_vision.py |
A unified CLI with four modes: capture (one screenshot), session (record many frames at a target FPS), analyze (extract keyframes from a session), watch (continuous capture loop). |
session_recorder.py |
Records numbered frames (frame_00000.jpg, frame_00001.jpg, ...) into a timestamped folder. Optionally encodes them to MP4 via ffmpeg. Stop with stop_recording.py. |
analyze_session.py |
Reads a session folder, picks the most distinct keyframes, runs pixel diagnosis on each, and writes a report.json summary. |
vision_daemon.py |
Runs a loop that captures a screenshot and overwrites Log/current.png every N seconds. A simpler alternative to agent_server.py's built-in vision thread. |
watch.py |
Minimal loop that captures a screenshot and prints the diagnosis (bright/pink/black) to the terminal every few seconds. |
auto_diag.py |
Runs a detect-fix-retest loop: captures a screenshot, diagnoses it, applies a fix (shader, camera background, etc.), then captures again to verify. |
unity_input.py |
Sends mouse clicks, drags, and keyboard presses to the Unity window using Win32 SendInput. For programmatic gameplay control. |
windows_capture.py |
Takes a screenshot using Win32 GDI calls with no PIL dependency. A fallback capture method. |
stop_recording.py |
Creates a STOP sentinel file that tells session_recorder.py to finish recording. |
┌─────────────────────────────────────────────────────────────┐
│ SETUP │
│ │
│ 1. Copy the Assets/ folder into your Unity project │
│ 2. Add AgentVisionBootstrap to a GameObject in your scene │
│ 3. pip install pillow │
│ 4. python agent_server.py ← start this first │
│ 5. Press Play in Unity ← game starts │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ DURING GAMEPLAY │
│ │
│ Unity (C#) Python (server) │
│ ┌─────────────────────┐ ┌──────────────────┐ │
│ │ GameViewRecorder │ │ Listens on :8765 │ │
│ │ captures frame.jpg │──file──>│ │ │
│ │ every 0.5s + click │ │ /webhook ←── POST │
│ │ │ │ /status ──> GET │
│ │ GameStateLogger │ │ /frame ──> GET │
│ │ writes state.json │──file──>│ /events ──> GET │
│ │ │ │ │ │
│ │ UnityWebhookBridge │────────>│ (POST on │ │
│ │ POST on Play/Stop │ HTTP │ Play/Stop) │ │
│ └─────────────────────┘ └────────┬─────────┘ │
│ │ │
└────────────────────────────────────────────┼────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ AI AGENT QUERY │
│ │
│ python agent_client.py full │
│ │
│ Returns: │
│ • Latest event (play_started / play_stopped) │
│ • Pixel diagnosis (brightness, pink%, black%) │
│ • Screenshot saved to Log/current_frame.png │
│ │
│ OR query individual endpoints: │
│ python agent_client.py status → event + diagnosis │
│ python agent_client.py frame → save screenshot only │
│ python agent_client.py events → last 20 webhook events│
└─────────────────────────────────────────────────────────────┘
| Dependency | Version | Install |
|---|---|---|
| Unity | 2022.3+ (URP recommended) | https://unity.com/download |
| Python | 3.8+ | https://python.org |
| Pillow | 10+ | pip install pillow |
| Windows | 10/11 | Required for Win32 screenshot capture |
No other Python packages are needed. The server uses only the standard library (http.server, json, threading).
The repo folder structure matches Unity's convention:
YourUnityProject/
Assets/
Editor/
UnityWebhookBridge.cs ← Editor-only: POSTs on Play/Stop
UnityWebhookBridge.cs.meta
UnityBridgeNotifier.cs ← Editor-only: writes status.json
UnityBridgeNotifier.cs.meta
Scripts/
AgentVision/
AgentVisionBootstrap.cs ← Add this to a GameObject
AgentVisionBootstrap.cs.meta
GameViewRecorder.cs ← Auto-started by Bootstrap
GameViewRecorder.cs.meta
GameStateLogger.cs ← Auto-started by Bootstrap
GameStateLogger.cs.meta
Copy the Assets/ folder from this repo into your Unity project's Assets/ folder. The .meta files are included so Unity preserves GUIDs.
- Open any scene in Unity
- Create an empty GameObject (right-click hierarchy → Create Empty)
- Name it
AgentVision - In the Inspector, click Add Component → search for
AgentVisionBootstrap - The
captureInterval(default 0.5s) andautoStart(default true) are configurable in the Inspector
pip install pillowpython agent_server.pyYou should see:
[AgentServer] Running on http://127.0.0.1:8765
POST /webhook ← Unity sends events here
GET /status → Latest event + screenshot diagnosis
GET /frame → Base64 PNG of last capture
GET /events → Last 20 events
If your Unity window has a custom title (not "Unity"), specify it:
python agent_server.py --title "My Game Title"The Console should show:
[GameViewRecorder] Output: C:\...\AgentVision\Frames\2026-04-28_16-34-43
[GameViewRecorder] Capturing every 0.5s as JPG
[AgentVisionBootstrap] Vision pipeline started automatically.
[WebhookBridge] Sent play_started to http://127.0.0.1:8765/webhook
python agent_client.py fullExpected output:
=== STATUS ===
Server time: 1777419319.055
Latest event: play_started at 2026-04-28T16:34:43Z
Scene: MyScene
Frame exists: True
Diagnosis: bright=90.6 pink=0.0 black=0.07
> OK
=== FRAME ===
Frame saved to: Log/current_frame.png
Press Stop in Unity. The server receives play_stopped with needs_ai_attention: true.
| What | Where |
|---|---|
| Screenshot frames | %LocalAppData%/<Company>/<Project>/AgentVision/Frames/<timestamp>/ |
| State JSON files | Same folder as frames (state_XXXXX.json) |
| Bridge status.json | %LocalAppData%/AI_Bridge/status.json |
| Server screenshot | Log/current.png (in the directory where agent_server.py runs) |
<Company> defaults to DefaultCompany, <Project> is your Unity project name (e.g. slay-the-spire-mock).
# Record a session at 10 FPS
python run_agent_vision.py session --fps 10 --title "YourUnityWindowTitle"
# Stop recording
python stop_recording.py
# Analyze keyframes from the session
python run_agent_vision.py analyze --dir "path/to/session/folder"# Captures, diagnoses, and attempts to fix common rendering issues
python auto_diag.pyNote: auto_diag.py is game-specific. It contains hardcoded fix functions for the Slay the Spire mock project. Adapt the try_fix() function for your own game.
Each captured frame is analyzed for rendering failures:
- Pink pixels → shader fallback (missing URP material)
- Black pixels (>60%) → nothing rendered (camera/rendering bug)
- Overexposed (>240 avg) → material/lighting issue
The base GameStateLogger only logs timestamp, scene name, and FPS. To log your game's state (HP, enemies, cards, etc.), create a subclass:
using AgentVision;
using UnityEngine;
public class MyGameStateLogger : GameStateLogger
{
protected override string BuildGameJson()
{
// Example: return your game's state as JSON
return $"{{ \"playerHp\": {Player.Instance.hp}, \"score\": {GameManager.score} }}";
}
}Then replace go.AddComponent<GameStateLogger>() in AgentVisionBootstrap.cs with go.AddComponent<MyGameStateLogger>().
Note: All C# scripts use the AgentVision namespace. You can change this to match your project's conventions.
python -c "
from PIL import Image; import glob
frames = sorted(glob.glob('path/to/Frames/frame_*.jpg'))
images = [Image.open(f).resize((640,480)) for f in frames[:60]]
images[0].save('gameplay.gif', save_all=True, append_images=images[1:], duration=500, loop=0)
"| Problem | Solution |
|---|---|
| No screenshots captured | Make sure AgentVisionBootstrap is on a GameObject in the active scene |
| Screenshots are all black | The window capture looks for title containing "Unity". Use --title flag if your window title differs |
agent_client.py can't connect |
Make sure agent_server.py is running first on port 8765 |
Compile error: FindObjectOfType is obsolete |
The scripts use FindAnyObjectByType which requires Unity 2022.3+. Update Unity or replace with FindObjectOfType |
State JSON only shows timestamp and scene |
That's the base logger. Subclass GameStateLogger and override BuildGameJson() to add game state |
- Code: Generated by glm-5.1:cloud via Ollama Cloud
- Design, input, feedback, testing: Kevin Cho (kevinkicho)
MIT License — see LICENSE.