AI-powered esports coaching — upload a gameplay video, get timestamped coaching tips from Gemini 3.
Competitive gaming has 640M+ viewers and a $2.4B esports industry, but improving as a player remains expensive and inaccessible. Human coaching costs $15–100+/hour. Players watch their own replays but can't reliably spot their own mistakes. Existing AI tools (Trophi.ai, Omnic.ai, iTero) are locked to a single game and require proprietary data pipelines.
- Upload a gameplay video (MP4) and optional replay file
- Gemini 3 Pro watches the video and generates timestamped coaching tips via a 2-agent pipeline (Observer → Validator)
- Tips are verified against video evidence — hallucinated tips are removed via confidence scoring
- Get results: spoken audio coaching, shareable links, and follow-up chat with the video still in context
Key differentiators:
- Game-agnostic: Works from video alone. Supports AoE2 and CS2 today, extensible to any game.
- Multi-agent verification: Observer generates tips; Validator cross-checks each one against the video. Only verified tips are returned.
- No game API needed: Unlike competitors, works from raw video — no proprietary data pipeline required.
| # | Feature | How Forging Uses It |
|---|---|---|
| 1 | File API | Upload gameplay videos (MP4, up to 700MB). Video uploaded once, reused across entire agent chain. |
| 2 | Multimodal Video Analysis | Gemini 3 Pro watches gameplay video alongside replay data. Identifies mistakes and timestamps them. |
| 3 | Thinking (Extended Reasoning) | thinking_level="high" for Observer and Validator agents. Enables deeper analysis and careful hallucination detection. |
| 4 | Interactions API | previous_interaction_id chains Observer → Validator → Chat. Video and context persist on Gemini's servers — no re-upload. Server-managed thought signatures preserve reasoning context across the agent chain. |
| 5 | Text-to-Speech | Gemini 2.5 Flash TTS with "Charon" voice generates spoken coaching tips as MP3 audio. |
| 6 | Structured Output | Native response_schema enforcement ensures deterministic tip format at every pipeline step — no regex JSON extraction needed. |
| Model | Purpose |
|---|---|
gemini-3-pro-preview |
Analysis, validation, follow-up chat |
gemini-3-flash-preview |
Round detection (fast) |
gemini-2.5-flash-preview-tts |
Voice coaching audio |
flowchart TD
User["User uploads video + replay"] --> GCS["Google Cloud Storage"]
User --> FileAPI["Gemini File API"]
GCS --> Observer
FileAPI --> Observer
subgraph Pipeline ["2-Agent Analysis Pipeline"]
Observer["Observer Agent<br/>(Gemini 3 Pro)<br/>video + replay + thinking=high<br/>structured output"] -- "interaction_id" --> Validator["Validator Agent<br/>(Gemini 3 Pro)<br/>chained context + thinking=high<br/>structured output"]
end
Validator --> TTS["TTS Generation<br/>(Gemini 2.5 Flash TTS)<br/>Charon voice → MP3"]
Validator -- "interaction_id" --> Chat["Follow-up Chat<br/>(Gemini 3 Pro)<br/>video still in context"]
Validator --> Results["Firestore + GCS"]
TTS --> Results
Results --> Frontend["Next.js Frontend"]
Chat --> Frontend
The Interactions API is central to the architecture: the gameplay video is uploaded once to Gemini's File API and stays in context across the entire Observer → Validator → Chat chain via previous_interaction_id. This eliminates redundant video re-uploads and preserves reasoning context (thought signatures) across multi-agent steps. When a user asks a follow-up question in chat, the conversation chains from the Validator's interaction — inheriting the full pipeline context without re-sending anything.
Video Analysis:
- Upload gameplay videos (MP4, up to 700MB, 30 min)
- Timestamped coaching tips with clickable video timestamps
- AI-generated spoken coaching (TTS audio per tip)
- Follow-up chat with AI coach (video stays in context)
- Multi-agent verification eliminates hallucinated tips
Platform:
- Shareable analysis links
- Community carousel
- CS2 round navigation with death-time awareness
- Replay file parsing for ground-truth data
- Age of Empires II: Definitive Edition —
.aoe2recordreplays + video - Counter-Strike 2 —
.demdemos + video
Architecture is game-agnostic. Adding a game = implementing an Observer + Validator agent pair. No game API needed.
- Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS 4
- Backend: Python 3.12, FastAPI, uvicorn
- AI: Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Flash TTS
- Parsing:
- AoE2: mgz
- CS2: demoparser2, awpy
- Cloud: Google Cloud Run, Cloud Storage, Firestore
- Node.js 20+
- Python 3.11+
- pnpm
- Gemini API key (get one at Google AI Studio)
-
Clone the repository
git clone git@github.com:lucaslencinas/forging.git cd forging -
Set up the backend
cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt cp .env.example .env # Edit .env and add your GEMINI_API_KEY
-
Run the backend
uvicorn main:app --reload --port 8080
-
Set up the frontend (in a new terminal)
cd frontend pnpm install cp .env.example .env.local -
Run the frontend
pnpm dev
Backend API: https://forging-backend-nht57oxpca-uc.a.run.app
# Gemini API (required)
GEMINI_API_KEY=your_gemini_api_key
# Optional: Multiple API keys for rate limit fallback (comma-separated)
# GEMINI_API_KEYS=key1,key2,key3
# Server config
ALLOWED_ORIGINS=http://localhost:3000
LOG_LEVEL=INFO
# GCS config (only needed for video uploads - uses demo mode without it)
# GOOGLE_CLOUD_PROJECT=your-project-id
# GCS_BUCKET_NAME=your-bucket-name
# GCP_LOCAL_ACCOUNT=your-email@gmail.comNEXT_PUBLIC_API_URL=http://localhost:8080
The video upload feature requires GCP credentials (Cloud Storage). Without GCP setup:
- Replay analysis works fully — Just needs a Gemini API key
- Video upload is disabled — Requires GCS bucket access
For hackathon judges: Use the live demo to test video features, or run locally for replay-only analysis.
forging/
├── frontend/ # Next.js application
│ ├── src/
│ │ ├── app/ # App router pages
│ │ ├── components/ # React components
│ │ │ ├── VideoPlayer.tsx # Video player with seek
│ │ │ ├── TimestampedTips.tsx # Clickable coaching tips
│ │ │ └── VideoAnalysisResults.tsx
│ │ ├── hooks/ # Custom hooks
│ │ │ └── useVideoUpload.ts # GCS upload with progress
│ │ └── types/ # Generated API types
│ └── package.json
├── backend/ # Python FastAPI application
│ ├── main.py # API entry point
│ ├── models.py # Pydantic models
│ ├── analyze.py # CLI tool
│ ├── services/
│ │ ├── pipelines/ # Game-specific analysis pipelines
│ │ │ ├── base.py # Base pipeline class
│ │ │ ├── factory.py # Pipeline factory
│ │ │ ├── aoe2/ # AoE2: observer, validator
│ │ │ └── cs2/ # CS2: observer, validator, round_detector
│ │ ├── agents/ # Base agent class (Interactions API)
│ │ │ └── base.py
│ │ ├── tts.py # Gemini TTS audio generation
│ │ ├── gcs.py # Cloud Storage
│ │ ├── firestore.py # Firestore persistence
│ │ ├── thumbnail.py # Thumbnail generation
│ │ ├── llm/ # LLM provider abstraction
│ │ │ ├── base.py # Abstract provider class
│ │ │ ├── gemini.py # Gemini + File API
│ │ │ └── factory.py # Provider auto-selection
│ │ ├── knowledge/ # Game-specific coaching knowledge
│ │ │ ├── aoe2.py
│ │ │ └── cs2.py
│ │ └── parsers/ # Replay file parsers
│ │ ├── aoe2.py
│ │ └── cs2.py
│ └── requirements.txt
├── .github/workflows/ # CI/CD pipelines
│ ├── deploy-backend.yml
│ └── deploy-frontend.yml
└── deploy/ # GCS CORS config
This project's architecture is informed by recent research on scaling agent systems:
-
Towards a science of scaling agent systems: When and why agent systems work (Google Research, 2025)
Key insight: Multi-agent AI systems don't universally improve performance. Their effectiveness depends on task characteristics and coordination architecture. For parallelizable tasks (like analyzing different aspects of gameplay), centralized coordination can improve performance by up to 80%. However, independent agents can amplify errors up to 17x without proper orchestration.
MIT
Built for the Gemini API Developer Competition