Skip to content

Releases: lcy362/agnes-video-generator

v2.2 — Image-to-Image End Frames + Stability Enhancements

19 Jun 03:59

Choose a tag to compare

Release v2.2 — Image-to-Image End Frames + Stability Enhancements

Release date: 2026-06-19

Overview

v2.2 introduces the i2i (Image-to-Image) end frame pipeline, enabling visual consistency across creative video scenes. This release also delivers comprehensive stability fixes from the second code review batch, a global rate limiter, unified API retry logic, and i18n improvements.


i2i End Frame Pipeline

Six-batch feature implementation for visual consistency across scenes:

  • Batch 1+2 — Image model unified to agnes-image-2.1-flash, i2i array API, character reference image size normalization
  • Batch 3 — Character appearance persistence across scenes, programmatic prompt injection
  • Batch 4 — Prompt structure optimization, facial detail requirements in character reference prompts
  • Batch 5 — Multi-image guided i2i end frames, visual chain linking across scenes
  • Batch 6 — Keyframes fallback branch synchronization, full 6/6 batches complete
  • Creative videos now default to i2i end frames enabled, narrator subtitles disabled

Stability & Bug Fixes

Code Review Batch 2 Fixes (P1-P13)

ID Fix
P1 Video concatenation sync blocking → async
P2 active_pipelines concurrent race condition
P3 Custom end frame not applied
P4 Manuscript step key alignment
P5 chat_json robustness
P6 Resource leaks
P7 Parameter validation improvements
P8 Prompt injection protection
P9 SilentTTS return code handling
P10 Subtitle silent degradation on failure
P11 LLM retry logic
P12 URL cache expiry
P13 Temp filename uniqueness

Other Fixes

  • Resume crash_upload_image_to_host method name error, _run_pipeline task_id undefined, load() creating empty directories
  • Concatenator AttributeError — video concatenation failure path
  • Global rate limiter — token bucket (16 req/min) shared across Chat + Image + Video APIs
  • API retry — exponential backoff for 429/5xx errors across all three API modules
  • Regression runner — 404 polling detection, --quick manifest mode, resume enhancements

i18n Improvements

  • Duration parsing now supports all 7 languages (zh/en/ru/ja/ko/ms/id)
  • User requirements and visual style defaults localized per language

Documentation

  • docs/bug_fix_plan.md — comprehensive bug fix plan (added)
  • docs/regression_test_plan.md — updated scenarios and flow rules
  • AGENTS.md — synced rate limiter architecture, runner resume strategy
  • docs/release-notes/ — v2.0 and v2.1 release notes (added)
  • Fixed official website link label (not "Live Demo")

Stats

23 files changed, 1,189 insertions(+), 479 deletions(-)

Key Files

File Description
core/api/rate_limiter.py New — global token bucket rate limiter
core/api/agnes_chat.py LLM retry + JSON mode improvements
core/api/agnes_image.py i2i array API + ref image support
core/api/agnes_video.py Retry logic + 429 handling
core/pipelines/creative_video.py i2i end frame pipeline integration
core/screenwriter.py Character appearance persistence
core/compositor/concatenator.py Async refactor + bug fixes
core/task_manager.py Resume crash fixes + backward compat
server.py Rate limiter integration + endpoint fixes
static/index.html i18n duration parse + style defaults
scripts/regression_runner.py Resume + quick-verify enhancements
docs/bug_fix_plan.md New — bug fix tracking

Upgrade Notes

From v2.1:

git pull
./start.sh

2.1 version release

16 Jun 10:46

Choose a tag to compare

Release v2.1 — Code Review Fixes + Regression Test Framework + Quality Improvements

Release date: 2026-06-16

Overview

v2.1 focuses on code quality and engineering robustness. All 24 issues from the full code review have been fixed, and an automated regression test framework has been introduced to ensure long-term stability.


Code Review Fixes

Based on docs/code_review_report.md, all 24 issues resolved:

High Severity (H1-H6)

  • H1 — API Key hardcoded in agnes_chat.py → unified read from config.py
  • H2 — Path traversal in server.py file upload → safe path join with os.path.basename
  • H3 — Missing font fallback in concatenator.py subtitle overlay → resolve_font_path CJK fallback
  • H4 — Shell injection in processor.py → list arguments instead of shell=True
  • H5 — moviepy write_videofile log leakage in subtitle.py → redirect to devnull
  • H6 — JSON parse failure in screenwriter.py → LLM retry with fallback parsing

Medium Severity (M1-M10)

  • Index / bounds safety (M1-M3)
  • Overly broad exception handling → granular catch (M4-M5)
  • Task directory path normalization (M6)
  • Unified HTTP timeouts (M7)
  • Task state race condition (M8)
  • TTS file handle leak (M9)
  • Frontend i18n variable shadowing (M10)

Low Severity (L1-L8)

  • Automated unit test framework (L1)
  • Typo fixes (L2-L3)
  • Redundant documentation cleanup (L4-L5)
  • AGENTS.md alignment with code (L6)
  • Dead file cleanup (L7-L8)

Regression Test Framework

  • 9 scenarios concurrent execution (3 simple + 4 creative + 2 manuscript)
  • Weighted semaphore for parallelism control (total weight ≤ 10, 50% API headroom)
  • Incremental JSON report + Markdown readable report
  • Resume / quick-verify modes
  • --cleanup safe artifact removal

Endpoint Verification (E1-E9)

All 9 endpoints auto-verified: homepage, config, three task creation endpoints, task query, resume, stop

Artifact Verification (F1-F7, R1-R10)

  • final_video.mp4 existence + non-empty + duration + resolution
  • Audio track + whisper ASR speech content matching
  • SRT subtitle entry validation
  • Resume checkpoint completeness

Other Improvements

  • Subtitle multi-line wrapping — dynamic max_chars_per_line, CJK punctuation break priority, method="caption" rendering
  • TTS — auto 2.5x volume boost, edge case error handling
  • Concatenator — single-video shortcut optimization, subtitle overlay failure degradation (non-blocking)
  • start.sh — auto venv creation, dependency install, macOS browser auto-open
  • Requirements — pinned edge_tts>=7.0.0, srt>=3.5.0, moviepy>=2.0.0
  • Config — API Key clear functionality, enhanced font path fallback
  • Static analysis integration — each Taskfile includes ruff + mypy checks

Stats

26 files changed, 1,611 insertions(+), 235 deletions(-)

New / Deleted Files

File Action Description
docs/code_review_report.md +added 24 code review issues documented
docs/release-notes/release_notes_v2.0.md +added v2.0 release notes
docs/release-notes/release_notes_v2.1.md +added v2.1 release notes
tests/test_core.py +added 428-line automated unit test suite
test_ref.png / test_end.png +added Regression test assets
_test_reset.py -deleted Deprecated test script
start.sh refactored One-click startup with auto venv + deps + browser

Upgrade Notes

From v2.0:

git pull
.venv/bin/pip install -r requirements.txt
./start.sh

Run regression tests:

.venv/bin/python scripts/regression_runner.py --auto-start

2.0 version release

16 Jun 10:46

Choose a tag to compare

Release v2.0 — Three-Pipeline Architecture + Multilingual Web UI

Release date: 2026-06-15

Overview

v2.0 is a complete architectural refactor from a single-file script to an engineered application with three distinct video generation pipelines, a four-layer backend, WebSocket real-time progress, and a 7-language internationalized frontend.


Features

Three Task Types

  • Simple Video — Single prompt → single video, exposing all 9 Agnes API parameters (t2v/i2v/ti2vid/keyframes)
  • Creative Video — AI screenwriter → storyboards → per-scene videos → edge_tts narration → fine-grained subtitles → concatenation
  • Manuscript Video — Long text splitting → AI scene prompt → per-paragraph videos → unified TTS+subtitles → concatenation

Architecture

  • core/api/ — Agnes Chat / Image / Video API wrappers with retry and polling
  • core/audio/ — edge_tts engine (word-level timestamps) + SRT subtitle generation + moviepy overlay
  • core/compositor/ — Video concatenation, scaling, frame extraction, silent audio generation
  • core/pipelines/ — Three pipeline implementations (simple / creative / manuscript)
  • models/ — Pydantic v2 data models with persistent task state serialization

Web UI

  • Three-tab frontend (Simple / Creative / Manuscript), Tailwind CDN single-page
  • 7 languages: 中文 / English / Русский / 日本語 / 한국어 / Bahasa Melayu / Bahasa Indonesia
  • WebSocket real-time progress push
  • Task pause, resume, and stop

Subtitle System

  • edge_tts word-level timestamps → fine-grained SRT grouping
  • CJK multi-line wrapping (break at punctuation)
  • method="caption" rendering, supports stroke / background / position customization

Other

  • One-click startup script start.sh
  • docs/system_design.md system design document
  • 3 demo videos embedded in README

Stats

40 files changed, 11,268 insertions(+), 2,792 deletions(-)

New Files

File Description
core/pipelines/ Three pipeline types (simple / creative / manuscript)
core/api/ Agnes API wrapper layer
core/audio/ TTS + subtitle engine
core/compositor/ Video compositing / processing
models/task.py Three task subtype data models
scripts/regression_runner.py Regression test script
docs/system_design.md System design document
docs/regression_test_plan.md Test plan

Upgrade Notes

  • Python 3.10+ required
  • New dependencies: edge_tts>=6.1.0, srt>=3.5.0
  • Run ./start.sh for one-click startup, or .venv/bin/pip install -r requirements.txt && .venv/bin/python server.py