AI-Native Viral Reel Producer Built on AgentField
Sample Reels • See It Run • Cost • How It Works • Quick Start • Customize
Article URL or topic phrase → 1080×1920 vertical reel with word-burst karaoke, in about 80 seconds at ≈$0.10 per reel. Open source, one API call, OpenRouter-only — Gemini 3.1 Flash TTS + Gemini 2.5 Flash Image + ken-burns motion. Flip on Veo 3.1 Lite i2v for full motion at ≈$1.20.
Trigger it with the af CLI (requires af ≥ 0.1.87) — it streams live progress and prints the result:
# URL → reel
af call reel-af.reel_article_to_reel --in '{"url": "https://arxiv.org/abs/2509.25541"}'
# Topic → reel (runs the 4-hunter → critic → 3-narrator → judge cascade)
af call reel-af.reel_topic_to_reel --in '{"topic": "the placebo effect"}'Prefer raw HTTP? Hit the API directly with curl:
# URL → reel
curl -X POST http://localhost:8080/api/v1/execute/async/reel-af.reel_article_to_reel \
-H "Content-Type: application/json" \
-d '{"input": {"url": "https://arxiv.org/abs/2509.25541"}}'
# Topic → reel (runs the 4-hunter → critic → 3-narrator → judge cascade)
curl -X POST http://localhost:8080/api/v1/execute/async/reel-af.reel_topic_to_reel \
-H "Content-Type: application/json" \
-d '{"input": {"topic": "the placebo effect"}}'Other pipelines fight TTS sync drift, over-pause and kill retention, generate literal-but-boring visuals, or front-load the hook with no curiosity gap. reels-af is the multi-reasoner answer: 18 specialized reasoners run through the AgentField control plane to extract the essence, write a Hook→Mechanism→Payoff script, hunt a viral angle (topic mode), synthesize sample-accurate audio, plan beats and cards in parallel, generate per-beat first frames + motion, and stitch everything in a single ffmpeg pass.
Two entry points, one downstream pipeline. Drop in a URL when you have a source. Drop in a topic when you just have a thread to pull on.
Every reel:
reel.mp4— 1080×1920 vertical, 20-25s, H.264 + AAC, ready to upload- Word-burst karaoke — one word at a time, 170px bottom-center, sample-accurate
- Per-beat first frames — Gemini Flash Image stills, content-mode-aware style
- Motion — ken-burns by default (free); Veo 3.1 Lite i2v when
REEL_AF_USE_VEO=true - Optional editorial accents — UPPERCASE callouts for numbers, names, jargon glosses
result.json— hook variant, beats, voice, hunter rankings, judge verdict, per-phase timings
Topic runs also produce: 12 candidate essences (4 hunters × 3), critic rankings on novelty/specificity/hookability/narratability, 3 narrator drafts, the pairwise judge verdict.
Sample sidecar:
{
"source": "topic",
"topic": "fingerprints",
"video_path": "output/topic-2ec74c00/reel.mp4",
"duration_s": 22.4,
"tease": "Why do we have fingerprints?",
"reveal": "In 2009, a biomechanics study led by Georges Debrégeas found that fingerprints reduce friction on smooth surfaces. They channel moisture away like tire treads — but their real purpose is to amplify vibrations our fingertips' hyper-sensitive touch receptors can feel.",
"payoff": "Fingerprints aren't for holding on. They're for feeling.",
"open_style": "question",
"chosen_essence": {
"core_claim": "Fingerprints amplify vibrations to enable hyper-sensitive touch perception, not grip",
"angle": "specific_figure",
"novelty_pitch": "Most assume fingerprints exist for grip — the 2009 Debrégeas paper showed they're actually a vibration-amplification system for touch sensing"
},
"winner_composite": 8.4,
"winner_why": "Specific named researcher + counter-intuitive reversal + clean payoff that callbacks the tease.",
"beat_count": 4,
"card_count": 18,
"accent_count": 2,
"timings_s": {
"hunt": 8.1, "critic": 4.2, "narrate": 7.5, "judge": 3.1,
"tts": 12.4, "plan": 1.1, "visual_accent": 6.8, "media": 38.2, "stitch": 4.6,
"total": 86.0
}
}Three reels, each from a single function call.
preview-fingerprints.mp4topic → "fingerprints" Hunter cascade landed on the 2009 Debrégeas paper. Delayed-reveal — answer arrives at beat 2. |
preview-placebo.mp4topic → "placebo effect" Specific-figure angle won the critic round: Ted Kaptchuk's 2010 open-label IBS study. |
preview-reasoning.mp4article → arXiv paper Scientific mode auto-activated — tighter pacing (175 WPM), paper-specific terms defined inline. |
ui-cascade.mp4
The AgentField control plane rendering the 18-reasoner DAG live. Each node is one reasoner — its prompt, inputs, outputs, latency, cost. A single topic_to_reel invocation lights up the 4-hunter fan, the critic, the 3-narrator fan, the judge, then the shared downstream — about 80 seconds end-to-end.
| Layer | Tool | What it brings |
|---|---|---|
| Runtime | AgentField | Async-parallel reasoner orchestration. 18 reasoners per reel; depth-3 DAG (4 hunters → critic → 3 narrators → judge → 6 downstream phases); every node visible in the workflow graph. |
| Reasoning | OpenRouter → DeepSeek V4 Pro | One env var swaps the whole stack to any OpenRouter model. |
| TTS | Gemini 3.1 Flash TTS | 200+ inline audio tags for delivery direction. Sentence-by-sentence in parallel, ffprobe-measured, atempo=1.35 sped, native-wave concatenated → sample-accurate sentence boundaries. |
| Image | Gemini 2.5 Flash Image | One 720×1280 first frame per beat, content-mode-aware style. |
| Motion | ffmpeg zoompan (default) / Veo 3.1 Lite i2v (REEL_AF_USE_VEO=true) |
Ken-burns animation of the still is free and ships by default. Flip the env var to Veo for real i2v motion at ~$0.05/sec of video. |
| Subtitles | libass + pysubs2 | Word-burst (one word at a time, 170px, bottom-center) + optional Layer-2 accent overlays in the opposite third of the frame. |
| Stitch | ffmpeg concat filter (single pass) |
concat + libass burn + AAC mux in one invocation. Sample-accurate; no per-shot priming drift. |
Default config — ken-burns motion from generated first frames, OpenRouter list prices verified 2026-05:
| Path | Reasoners | Wall time | Cost / reel |
|---|---|---|---|
article_to_reel (URL → reel) |
10 | ~70-90s | ~$0.08 |
topic_to_reel (topic → reel) |
18 | ~85-110s | ~$0.10 |
The topic path is slightly slower and slightly pricier because of the 4-hunter → critic → 3-narrator → judge cascade. Cost split per reel:
| Stage | Pricing (OpenRouter list) | Cost / reel |
|---|---|---|
| Gemini 2.5 Flash Image (first frames) | $0.30/M in, $2.50/M out | ~$0.02 |
| Gemini 3.1 Flash TTS (4-6 sentences) | $1/M in, $20/M out | ~$0.015 |
| DeepSeek V4 Pro reasoning (10-18 calls) | $0.435/M in, $0.87/M out | ~$0.02 |
| Ken-burns motion (local ffmpeg) | free | $0 |
Upgrade to full Veo i2v motion by setting REEL_AF_USE_VEO=true. Veo 3.1 Lite at $0.05/sec adds ~$1.10/reel (5 beats × ~6s of generated video), bringing the total to ~$1.20/reel. Worth it for premium output; the ken-burns default is fine for high-volume.
Track actual numbers via the OpenRouter activity dashboard and the timings_s block in result.json.
Two paths, six phases. Both converge on the same downstream after phase 02.
- Intake —
article_to_reelruns one harness call to extract the surprising claim + mechanism + evidence + content_mode.topic_to_reelfans out four hunters (specific_figure / reversal / temporal / cross_domain) → 12 candidates → critic picks the top 3 → 3 narrators write delayed-reveal scripts → pairwise judge picks the winner. - Script — one
.ai()call produces a ScriptDraft (Hook → Mechanism → Payoff + inline TTS tags). A schema validator enforces the final clause to echo a hook keyword, creating the loop. - Audio — sentences synthesize in parallel, are ffprobe-measured, sped via
atempo=1.35, then native-wave concatenated. Sentence boundaries are sample-accurate; words inside are distributed by syllable count. No ASR in the loop. - Plan — two parallel deterministic helpers (cards for subtitle layout, beats for visual planning) and two parallel LLM fan-outs (per-beat image prompts, per-beat optional accents). Cards and beats render onto the same reel but don't gate each other — no shot-too-long failures.
- Render — one first-frame image per beat (Gemini Flash Image), then either local ken-burns animation (default) or a Veo i2v call per beat (
REEL_AF_USE_VEO=true). Per-beat fallback: image fail → placeholder; Veo fail → ken-burns. - Stitch — one ffmpeg invocation: concat filter (sample-accurate) + libass burn (word-burst + accents) + AAC mux of the full TTS WAV. One encode, no priming drift.
The architectural choice that earns the engagement: video is decoupled from word timing. Cards drive subtitles, beats drive visuals, audio is master.
git clone https://github.com/Agent-Field/reels-af
cd reels-af
cp .env.example .env # paste OPENROUTER_API_KEY
docker compose up --buildOpen http://localhost:8080/ui/ to watch the 18-reasoner DAG execute live. Fire either of the curls from One-Call DX above — outputs land under ./output/<run-id>/reel.mp4 with a result.json sidecar.
Without Docker — local CLI:
uv sync
brew install ffmpeg # macOS (apt install ffmpeg fonts-montserrat on Linux)
cp .env.example .env
reel-af article "https://arxiv.org/abs/2509.25541"
reel-af topic "the placebo effect"Swap any model or flip on Veo motion with one env var:
REEL_AF_USE_VEO=true docker compose up --build
REEL_AF_MODEL=openrouter/anthropic/claude-sonnet-4 docker compose up --build# Topics — the hunter cascade finds an angle
reel-af topic "the placebo effect"
reel-af topic "the dark forest hypothesis"
reel-af topic "octopus cognition"
reel-af topic "the Antikythera mechanism"
# Articles — direct from the source
reel-af article "https://arxiv.org/abs/2509.25541"
reel-af article "https://en.wikipedia.org/wiki/Tardigrade"Most behaviour is driven by environment variables; see .env.example for the full list.
| Env var | Default | What it controls |
|---|---|---|
OPENROUTER_API_KEY |
— | Required. Get one at openrouter.ai and load $5+ in credits (~50 reels at default config). |
REEL_AF_USE_VEO |
false |
Set to true for Veo 3.1 Lite i2v motion (~$1.10 extra per reel). Default ken-burns mode animates the generated stills locally. |
REEL_AF_MODEL |
openrouter/deepseek/deepseek-v4-pro |
Reasoning model for every .ai() call. Any OpenRouter model works. |
REEL_AF_TTS_MODEL |
google/gemini-3.1-flash-tts-preview |
TTS model. Gemini Flash is the only one supporting inline audio tags. |
REEL_AF_IMAGE_MODEL |
openrouter/google/gemini-2.5-flash-image |
First-frame image generator. Swap for Flux, Imagen, etc. |
REEL_AF_VIDEO_MODEL |
openrouter/google/veo-3.1-lite |
Veo model used when REEL_AF_USE_VEO=true. |
AGENT_NODE_ID |
reel-af |
Node id registered with the AgentField control plane. |
AGENTFIELD_SERVER |
http://localhost:8080 |
Control-plane URL (Docker compose wires this automatically). |
AGENTFIELD_LLM_CALL_TIMEOUT |
120 |
Per-call timeout in seconds. |
Voice, pacing, and tone are picked in code (render/tts.py:_VOICE_BY_TONE and the _AUDIO_SPEED_FACTOR constant). Edit those to dial the delivery.
"OPENROUTER_API_KEY not set in env." — paste your key into .env. The Docker container reads it via docker-compose.yml; the CLI reads it via python-dotenv.
"ffmpeg / ffprobe not found on PATH." — brew install ffmpeg on macOS, apt install ffmpeg on Linux. The Docker build already includes it.
Subtitles look like sans-serif blocks instead of Montserrat. — install Montserrat Bold (brew install --cask font-montserrat on macOS, apt install fonts-montserrat on Linux). The renderer falls back to DejaVu Sans Bold if Montserrat isn't found.
A single beat's video came out as a still ken-burns instead of motion. — Veo i2v hit a content-moderation false positive or transient error on that beat. The pipeline's two-tier fallback rendered the beat as a still + slow zoom so the reel still assembles. Re-run that beat by re-running the whole reel, or accept the fallback (often visually fine).
Reel runs longer than 25 seconds. — Gemini occasionally honors a stray [pause] or punctuation cluster too literally. Check result.json.timings_s.tts; if it's >20s, the narration likely picked up an extra tag. Re-run — temperature variance usually resolves it.
Custom font: drop a .ttf somewhere libass can see and update the candidate list in render/stitch.py:_FONT_CANDIDATES.
Shipped:
- Two entry reasoners —
article_to_reel,topic_to_reel - 4-hunter angle-constrained essence generation + critic + pairwise judge
- Delayed-reveal narration (tease → common_belief → reveal → payoff) with schema-level loop-back validation
- Sample-accurate sentence-by-sentence TTS — no ASR, no Whisper
- Word-burst karaoke (170px, bottom-center, libass)
- Optional editorial accents — 6 patterns (number / named_entity / jargon_translation / hook_title_card / reaction / list_marker)
- Ken-burns motion default + Veo 3.1 Lite i2v upgrade via env var
- Two-tier per-beat fallback (image fail → placeholder; Veo fail → ken-burns)
- Single-pass ffmpeg stitch (concat + libass + AAC)
- Content-mode style switch (cinematic-doc / clinical-lab)
- Docker compose stack with the AgentField control plane bundled
- OpenRouter-only — no Whisper, no local models, no platform lock-in
In progress:
- Voice cloning via OpenRouter-compatible TTS providers
- B-roll insertion from a stock-footage retriever
- Multi-language output (auto-translated script + native-voice TTS)
- Real-time preview while reasoners are running
- Direct publish to TikTok / Reels / Shorts via Buffer-style API
Built on the open-source work of:
- AgentField: async-parallel multi-reasoner runtime
- OpenRouter: single endpoint for the entire model stack (reasoning + TTS + image + video)
- Google DeepMind: Gemini 3.1 Flash TTS, Gemini 2.5 Flash Image, Veo 3.1 Lite
- DeepSeek: DeepSeek V4 Pro reasoning model
- libass + pysubs2: industry-standard ASS subtitle rendering
- FFmpeg: the single-pass stitch engine
- readability-lxml: clean article extraction
- Montserrat: the karaoke typeface
Apache License 2.0 — see LICENSE.
- SEC-AF: AI-native security auditor
- PR-AF: agentic PR reviewer
- Contract-AF: legal contract risk analyzer
- Roboscribe-AF: multi-agent annotation for robotic demonstrations
- Reactive-Atlas: MongoDB to AI enrichment pipeline

