Skip to content

mongrlz/reels-af

 
 

Repository files navigation

REELS-AF

AI-Native Viral Reel Producer Built on AgentField

Apache 2.0 Python Built with AgentField OpenRouter More from Agent-Field

Sample ReelsSee It RunCostHow It WorksQuick StartCustomize

Article URL or topic phrase → 1080×1920 vertical reel with word-burst karaoke, in about 80 seconds at ≈$0.10 per reel. Open source, one API call, OpenRouter-only — Gemini 3.1 Flash TTS + Gemini 2.5 Flash Image + ken-burns motion. Flip on Veo 3.1 Lite i2v for full motion at ≈$1.20.

reels-af: every URL, every topic, a reel

One-Call DX

Trigger it with the af CLI (requires af ≥ 0.1.87) — it streams live progress and prints the result:

# URL → reel
af call reel-af.reel_article_to_reel --in '{"url": "https://arxiv.org/abs/2509.25541"}'

# Topic → reel (runs the 4-hunter → critic → 3-narrator → judge cascade)
af call reel-af.reel_topic_to_reel --in '{"topic": "the placebo effect"}'

Prefer raw HTTP? Hit the API directly with curl:

# URL → reel
curl -X POST http://localhost:8080/api/v1/execute/async/reel-af.reel_article_to_reel \
  -H "Content-Type: application/json" \
  -d '{"input": {"url": "https://arxiv.org/abs/2509.25541"}}'

# Topic → reel (runs the 4-hunter → critic → 3-narrator → judge cascade)
curl -X POST http://localhost:8080/api/v1/execute/async/reel-af.reel_topic_to_reel \
  -H "Content-Type: application/json" \
  -d '{"input": {"topic": "the placebo effect"}}'

Other pipelines fight TTS sync drift, over-pause and kill retention, generate literal-but-boring visuals, or front-load the hook with no curiosity gap. reels-af is the multi-reasoner answer: 18 specialized reasoners run through the AgentField control plane to extract the essence, write a Hook→Mechanism→Payoff script, hunt a viral angle (topic mode), synthesize sample-accurate audio, plan beats and cards in parallel, generate per-beat first frames + motion, and stitch everything in a single ffmpeg pass.

Two entry points, one downstream pipeline. Drop in a URL when you have a source. Drop in a topic when you just have a thread to pull on.


What you get

Every reel:

  • reel.mp4 — 1080×1920 vertical, 20-25s, H.264 + AAC, ready to upload
  • Word-burst karaoke — one word at a time, 170px bottom-center, sample-accurate
  • Per-beat first frames — Gemini Flash Image stills, content-mode-aware style
  • Motion — ken-burns by default (free); Veo 3.1 Lite i2v when REEL_AF_USE_VEO=true
  • Optional editorial accents — UPPERCASE callouts for numbers, names, jargon glosses
  • result.json — hook variant, beats, voice, hunter rankings, judge verdict, per-phase timings

Topic runs also produce: 12 candidate essences (4 hunters × 3), critic rankings on novelty/specificity/hookability/narratability, 3 narrator drafts, the pairwise judge verdict.

Sample sidecar:

{
  "source": "topic",
  "topic": "fingerprints",
  "video_path": "output/topic-2ec74c00/reel.mp4",
  "duration_s": 22.4,
  "tease": "Why do we have fingerprints?",
  "reveal": "In 2009, a biomechanics study led by Georges Debrégeas found that fingerprints reduce friction on smooth surfaces. They channel moisture away like tire treads — but their real purpose is to amplify vibrations our fingertips' hyper-sensitive touch receptors can feel.",
  "payoff": "Fingerprints aren't for holding on. They're for feeling.",
  "open_style": "question",
  "chosen_essence": {
    "core_claim": "Fingerprints amplify vibrations to enable hyper-sensitive touch perception, not grip",
    "angle": "specific_figure",
    "novelty_pitch": "Most assume fingerprints exist for grip — the 2009 Debrégeas paper showed they're actually a vibration-amplification system for touch sensing"
  },
  "winner_composite": 8.4,
  "winner_why": "Specific named researcher + counter-intuitive reversal + clean payoff that callbacks the tease.",
  "beat_count": 4,
  "card_count": 18,
  "accent_count": 2,
  "timings_s": {
    "hunt": 8.1, "critic": 4.2, "narrate": 7.5, "judge": 3.1,
    "tts": 12.4, "plan": 1.1, "visual_accent": 6.8, "media": 38.2, "stitch": 4.6,
    "total": 86.0
  }
}

Sample reels

Three reels, each from a single function call.

preview-fingerprints.mp4

topic → "fingerprints"
Hunter cascade landed on the 2009 Debrégeas paper. Delayed-reveal — answer arrives at beat 2.
preview-placebo.mp4

topic → "placebo effect"
Specific-figure angle won the critic round: Ted Kaptchuk's 2010 open-label IBS study.
preview-reasoning.mp4

article → arXiv paper
Scientific mode auto-activated — tighter pacing (175 WPM), paper-specific terms defined inline.

See it run

ui-cascade.mp4

The AgentField control plane rendering the 18-reasoner DAG live. Each node is one reasoner — its prompt, inputs, outputs, latency, cost. A single topic_to_reel invocation lights up the 4-hunter fan, the critic, the 3-narrator fan, the judge, then the shared downstream — about 80 seconds end-to-end.


What powers it

Layer Tool What it brings
Runtime AgentField Async-parallel reasoner orchestration. 18 reasoners per reel; depth-3 DAG (4 hunters → critic → 3 narrators → judge → 6 downstream phases); every node visible in the workflow graph.
Reasoning OpenRouter → DeepSeek V4 Pro One env var swaps the whole stack to any OpenRouter model.
TTS Gemini 3.1 Flash TTS 200+ inline audio tags for delivery direction. Sentence-by-sentence in parallel, ffprobe-measured, atempo=1.35 sped, native-wave concatenated → sample-accurate sentence boundaries.
Image Gemini 2.5 Flash Image One 720×1280 first frame per beat, content-mode-aware style.
Motion ffmpeg zoompan (default) / Veo 3.1 Lite i2v (REEL_AF_USE_VEO=true) Ken-burns animation of the still is free and ships by default. Flip the env var to Veo for real i2v motion at ~$0.05/sec of video.
Subtitles libass + pysubs2 Word-burst (one word at a time, 170px, bottom-center) + optional Layer-2 accent overlays in the opposite third of the frame.
Stitch ffmpeg concat filter (single pass) concat + libass burn + AAC mux in one invocation. Sample-accurate; no per-shot priming drift.

Cost and timing

Default config — ken-burns motion from generated first frames, OpenRouter list prices verified 2026-05:

Path Reasoners Wall time Cost / reel
article_to_reel (URL → reel) 10 ~70-90s ~$0.08
topic_to_reel (topic → reel) 18 ~85-110s ~$0.10

The topic path is slightly slower and slightly pricier because of the 4-hunter → critic → 3-narrator → judge cascade. Cost split per reel:

Stage Pricing (OpenRouter list) Cost / reel
Gemini 2.5 Flash Image (first frames) $0.30/M in, $2.50/M out ~$0.02
Gemini 3.1 Flash TTS (4-6 sentences) $1/M in, $20/M out ~$0.015
DeepSeek V4 Pro reasoning (10-18 calls) $0.435/M in, $0.87/M out ~$0.02
Ken-burns motion (local ffmpeg) free $0

Upgrade to full Veo i2v motion by setting REEL_AF_USE_VEO=true. Veo 3.1 Lite at $0.05/sec adds ~$1.10/reel (5 beats × ~6s of generated video), bringing the total to ~$1.20/reel. Worth it for premium output; the ken-burns default is fine for high-volume.

Track actual numbers via the OpenRouter activity dashboard and the timings_s block in result.json.


How it works

reels-af 6-phase two-path multi-reasoner pipeline

Two paths, six phases. Both converge on the same downstream after phase 02.

  1. Intakearticle_to_reel runs one harness call to extract the surprising claim + mechanism + evidence + content_mode. topic_to_reel fans out four hunters (specific_figure / reversal / temporal / cross_domain) → 12 candidates → critic picks the top 3 → 3 narrators write delayed-reveal scripts → pairwise judge picks the winner.
  2. Script — one .ai() call produces a ScriptDraft (Hook → Mechanism → Payoff + inline TTS tags). A schema validator enforces the final clause to echo a hook keyword, creating the loop.
  3. Audio — sentences synthesize in parallel, are ffprobe-measured, sped via atempo=1.35, then native-wave concatenated. Sentence boundaries are sample-accurate; words inside are distributed by syllable count. No ASR in the loop.
  4. Plan — two parallel deterministic helpers (cards for subtitle layout, beats for visual planning) and two parallel LLM fan-outs (per-beat image prompts, per-beat optional accents). Cards and beats render onto the same reel but don't gate each other — no shot-too-long failures.
  5. Render — one first-frame image per beat (Gemini Flash Image), then either local ken-burns animation (default) or a Veo i2v call per beat (REEL_AF_USE_VEO=true). Per-beat fallback: image fail → placeholder; Veo fail → ken-burns.
  6. Stitch — one ffmpeg invocation: concat filter (sample-accurate) + libass burn (word-burst + accents) + AAC mux of the full TTS WAV. One encode, no priming drift.

The architectural choice that earns the engagement: video is decoupled from word timing. Cards drive subtitles, beats drive visuals, audio is master.


Run it yourself

git clone https://github.com/Agent-Field/reels-af
cd reels-af
cp .env.example .env       # paste OPENROUTER_API_KEY
docker compose up --build

Open http://localhost:8080/ui/ to watch the 18-reasoner DAG execute live. Fire either of the curls from One-Call DX above — outputs land under ./output/<run-id>/reel.mp4 with a result.json sidecar.

Without Docker — local CLI:

uv sync
brew install ffmpeg            # macOS  (apt install ffmpeg fonts-montserrat on Linux)
cp .env.example .env

reel-af article "https://arxiv.org/abs/2509.25541"
reel-af topic   "the placebo effect"

Swap any model or flip on Veo motion with one env var:

REEL_AF_USE_VEO=true docker compose up --build
REEL_AF_MODEL=openrouter/anthropic/claude-sonnet-4 docker compose up --build

Try a few

# Topics — the hunter cascade finds an angle
reel-af topic "the placebo effect"
reel-af topic "the dark forest hypothesis"
reel-af topic "octopus cognition"
reel-af topic "the Antikythera mechanism"

# Articles — direct from the source
reel-af article "https://arxiv.org/abs/2509.25541"
reel-af article "https://en.wikipedia.org/wiki/Tardigrade"

Customize

Most behaviour is driven by environment variables; see .env.example for the full list.

Env var Default What it controls
OPENROUTER_API_KEY Required. Get one at openrouter.ai and load $5+ in credits (~50 reels at default config).
REEL_AF_USE_VEO false Set to true for Veo 3.1 Lite i2v motion (~$1.10 extra per reel). Default ken-burns mode animates the generated stills locally.
REEL_AF_MODEL openrouter/deepseek/deepseek-v4-pro Reasoning model for every .ai() call. Any OpenRouter model works.
REEL_AF_TTS_MODEL google/gemini-3.1-flash-tts-preview TTS model. Gemini Flash is the only one supporting inline audio tags.
REEL_AF_IMAGE_MODEL openrouter/google/gemini-2.5-flash-image First-frame image generator. Swap for Flux, Imagen, etc.
REEL_AF_VIDEO_MODEL openrouter/google/veo-3.1-lite Veo model used when REEL_AF_USE_VEO=true.
AGENT_NODE_ID reel-af Node id registered with the AgentField control plane.
AGENTFIELD_SERVER http://localhost:8080 Control-plane URL (Docker compose wires this automatically).
AGENTFIELD_LLM_CALL_TIMEOUT 120 Per-call timeout in seconds.

Voice, pacing, and tone are picked in code (render/tts.py:_VOICE_BY_TONE and the _AUDIO_SPEED_FACTOR constant). Edit those to dial the delivery.


Troubleshooting

"OPENROUTER_API_KEY not set in env." — paste your key into .env. The Docker container reads it via docker-compose.yml; the CLI reads it via python-dotenv.

"ffmpeg / ffprobe not found on PATH."brew install ffmpeg on macOS, apt install ffmpeg on Linux. The Docker build already includes it.

Subtitles look like sans-serif blocks instead of Montserrat. — install Montserrat Bold (brew install --cask font-montserrat on macOS, apt install fonts-montserrat on Linux). The renderer falls back to DejaVu Sans Bold if Montserrat isn't found.

A single beat's video came out as a still ken-burns instead of motion. — Veo i2v hit a content-moderation false positive or transient error on that beat. The pipeline's two-tier fallback rendered the beat as a still + slow zoom so the reel still assembles. Re-run that beat by re-running the whole reel, or accept the fallback (often visually fine).

Reel runs longer than 25 seconds. — Gemini occasionally honors a stray [pause] or punctuation cluster too literally. Check result.json.timings_s.tts; if it's >20s, the narration likely picked up an extra tag. Re-run — temperature variance usually resolves it.

Custom font: drop a .ttf somewhere libass can see and update the candidate list in render/stitch.py:_FONT_CANDIDATES.


Features

Shipped:

  • Two entry reasoners — article_to_reel, topic_to_reel
  • 4-hunter angle-constrained essence generation + critic + pairwise judge
  • Delayed-reveal narration (tease → common_belief → reveal → payoff) with schema-level loop-back validation
  • Sample-accurate sentence-by-sentence TTS — no ASR, no Whisper
  • Word-burst karaoke (170px, bottom-center, libass)
  • Optional editorial accents — 6 patterns (number / named_entity / jargon_translation / hook_title_card / reaction / list_marker)
  • Ken-burns motion default + Veo 3.1 Lite i2v upgrade via env var
  • Two-tier per-beat fallback (image fail → placeholder; Veo fail → ken-burns)
  • Single-pass ffmpeg stitch (concat + libass + AAC)
  • Content-mode style switch (cinematic-doc / clinical-lab)
  • Docker compose stack with the AgentField control plane bundled
  • OpenRouter-only — no Whisper, no local models, no platform lock-in

In progress:

  • Voice cloning via OpenRouter-compatible TTS providers
  • B-roll insertion from a stock-footage retriever
  • Multi-language output (auto-translated script + native-voice TTS)
  • Real-time preview while reasoners are running
  • Direct publish to TikTok / Reels / Shorts via Buffer-style API

Acknowledgments

Built on the open-source work of:


License

Apache License 2.0 — see LICENSE.


Other projects on AgentField

About

Automate viral reel/tik-tok videos via AI multi-agent system. ~$0.1/reel.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.2%
  • Shell 1.4%
  • Dockerfile 0.4%