Assembly: Agentic Narratives

This is a barebone setup with basic scaffolding for agentic narrative explorations with continuous, semi-autonomous multimedia content generation. The prompts are intentionally minimal and should be enhanced and adapted to specific creative tasks.

The system uses LLM agents to orchestrate narrative generation, which feeds into a pipeline for image, video, and audio synthesis. Visual generation uses Runware.ai cloud service for its optimal price/performance ratio. Audio generation uses Chatterbox text-to-speech and MMAudio video-to-audio methods (code included in this repo).

Backends

Agent Types

LMStudio Agent (agt_llm.py) - OpenAI-compatible API for local models (Llama, Qwen, Mistral, etc.) via LMStudio or cloud (GPT-5, GPT-4o-mini; API key required). Supports custom tool calling with automatic malformed JSON recovery.
Google ADK Agent (agt_adk.py) - Native Gemini integration via Google's Agent Development Kit (API key required). Supports Google Search as a built-in tool.
Claude SDK Agent (agt_claude.py) - Native Claude integration via Anthropic API (key required). Supports custom tool calling.
LangChain/LangGraph Agent (agt_lang.py) - LangChain implementation with optional graph-based execution on LangGraph (API key required). Supports custom tool calling.

Media Generation

Text-to-Image (T2I) via [Runware API] (cloud, API key required) - Flux, Imagen, GPT, NanoBanana, HiDream, etc.
Image-to-Video (I2V) via [Runware API] (cloud, API key required) - Seedance, Kling, Veo, and others
Text-to-Speech (TTS) via ChatterBox (local) - Multi-speaker neural TTS with voice cloning
Audio Mixing via MMAudio (local) - Video-to-audio generation

Internal Features

Async Processing (base.py) All time-consuming operations (LLM calls, TTS, image/video generation, audio mixing) run asynchronously, enabling concurrent execution and responsive pipeline flow.

State & Context Management (base.py, author.py, chat.py) Centralized state with JSON persistence serves as working memory. Context for each agent call is selectively composed from relevant state portions, ensuring optimal size and actuality for the task at hand.

QA Evaluation Loop (--evals N) Each agent call can be followed by an evaluation pass using a separate LLM call. The evaluator scores the output and provides feedback; if not approved, the agent retries with the feedback incorporated. This significantly improves output quality at the cost of additional API calls.

Built-in Agentic Tools (tools.py)

search - Web search via Brave Search API for grounding with current information
fetch_url_content - URL fetching with automatic HTML-to-markdown conversion
@tool decorator - Auto-generates OpenAI-compatible tool schemas from function signatures and docstrings

Sequential Thinking Tool (tool_think.py) Provides structured chain-of-thought reasoning for smaller LLMs that lack native reasoning capabilities. The tool exposes a sequential_thinking function that tracks thought sequences, supports branching (exploring alternatives), and revision (reconsidering previous thoughts). The tool auto-manages numbering since small models often lose track of sequence counts.

Malformed JSON Recovery (agt_llm.py) The LMStudio agent automatically recovers from malformed tool calls - common with smaller models that struggle with strict JSON formatting. Invalid responses trigger automatic retries with corrected parsing.

Workflows

Story Mode

# Text-only generation
python src/author.py -txt ...

# Generation with sound and visuals
python src/author.py -vt runware -imod flux -vmod seedprof -iref 8 ...

# Resume from saved state and config
python src/author.py -json _out/log.json -arg _out/config.txt

Instructions for hierarchical storytelling content generation (see data/prompts/story):

arch-init - Initialize story structure, settings, and characters (+ auto-generate reference images for visual consistency)
writ-chap - Generate chapter outlines from the overall narrative arc
writ-scen - Create detailed scenes within each chapter
writ-frag - Write narrative fragments (the actual prose)
writ-voc - Generate voice-over scripts (+ auto-convert to speech via TTS)
writ-vis - Generate visual descriptions (+ auto-create images (T2I), animate to video (I2V), mix with audio)
arch-upd - Update global story state based on generated content

Chat Mode

# Text-only generation, fixed personas
python src/chat.py -txt -pers fix ...

# Generation with sound and visuals, evolving personas
python src/author.py -pers evo -vt runware -t2v -vmod pruna ...

# Resume from saved state and config
python src/author.py -json _out/log.json -arg _out/config.txt

Instructions for linear multi-persona discussions (see data/chat/story):

arch-init - Create debate personas with distinct perspectives, styles, and voice profiles (+ auto-generate ref images)
writ-frag - Generate each persona's elaborated inner thoughts for their turn
writ-voc - Condense thoughts into short spoken remarks (+ auto-convert to speech via TTS)
writ-vis - Generate visual representation of the current speaker/moment (+ auto-create images/videos with audio)
arch-upd - Update persona states and debate direction based on progress
curat-dir-anti - Analyze debate and select provocative direction
curat-pers-anti - Select or create persona for controversial position
curat-dir-evo - Analyze debate and forecast next thought direction
curat-pers-evo - Select or create persona for thought direction

See JSON schemas in data folder for complete structure. Both workflows save state to log.json after each step, enabling resume from any point.

Other Usage

# Use local LLM via LMStudio
... -a lms -tmod openai/gpt-oss-20b -lh localhost

# Use OpenAI cloud
... -a lms -tmod gpt-4o-mini

# Use Gemini via Google ADK
... -a adk -tmod gemini-2.5-flash

# Use Claude via Anthropic SDK
... -a claude -tmod claude-sonnet-4-6

# Use LangChain/LangGraph
... -a langchain -tmod gpt-4o-mini

# Extract discussion transcript
python src/readlog.py -i _out/log.json -o discussion.txt

Key Arguments

-t/--in_txt - topic starter (text or filename)
-docs - optional documents to build the narrative on
-a/--agent - Agent type: lms (LMStudio/OpenAI), adk (Google ADK), claude (Anthropic), langchain (LangChain/LangGraph)
-tmod/--txt_model - LLM model name
-lh/--llm_host - LMStudio server host
-json/--load_json - Resume from JSON state file
-vt/--vis_type - Visual backend: runware, comfy, wan, walk
-imod/--img_model - Image model (e.g., flux for Flux 2 Pro)
-vmod/--vid_model - Video model (e.g., seedprof for Seedance Pro Fast)
-isz/--img_size - Image dimensions (e.g., 1344-752)
-vsz/--vid_size - Video dimensions (e.g., 864-480)
-fps - Video framerate
-iref/--img_refs - Number of reference images for consistency
--use_thinking - Enable extended thinking (Claude backend)
--use_graph - Use LangGraph execution (LangChain backend)
--db_url - Database URL for persistent sessions (ADK/LangGraph backends)
-txt/--txtonly - Text-only mode, skip media
-o/--out_dir - Output directory (default: _out)
-v/--verbose - Verbose output

Files

assembly/
├── src/                    # Core library
│   ├── author.py           # Story generation pipeline
│   ├── chat.py             # Interactive chat/debate pipeline
│   ├── base.py             # Infrastructure (MediaGen, StoryState, PromptMan, etc.)
│   ├── agt_llm.py          # LMStudio/OpenAI agent with tool calling
│   ├── agt_adk.py          # Google ADK (Gemini) agent
│   ├── agt_claude.py       # Anthropic Claude agent
│   ├── agt_lang.py         # LangChain/LangGraph agent
│   ├── tools.py            # Tool system
│   ├── tool_think.py       # Sequential thinking tool for small LLMs
│   ├── api_runware.py      # Runware cloud integration (T2I, I2V)
│   ├── sound.py            # Audio mixing (MMAudio)
│   ├── tts.py              # Text-to-speech (ChatterBox)
│   ├── readlog.py          # Extract discussion from log.json
│   └── util.py             # Utilities
├── _in/                    # Input documents (source materials)
├── _out/                   # Generated outputs
│   └── join_vids.bat       # FFmpeg video concatenation
├── data/                   # JSON schemas, templates, databases, etc.
│   └── prompts/            # Markdown prompt templates
│       ├── story/          # Story mode prompts
│       └── chat/           # Chat/debate mode prompts
└── au.bat                  # Base wrapper script

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
_out		_out
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
au.bat		au.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assembly: Agentic Narratives

Backends

Agent Types

Media Generation

Internal Features

Workflows

Story Mode

Chat Mode

Other Usage

Key Arguments

Files

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assembly: Agentic Narratives

Backends

Agent Types

Media Generation

Internal Features

Workflows

Story Mode

Chat Mode

Other Usage

Key Arguments

Files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages