Skip to content

eps696/assembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assembly: Agentic Narratives

This is a barebone setup with basic scaffolding for agentic narrative explorations with continuous, semi-autonomous multimedia content generation. The prompts are intentionally minimal and should be enhanced and adapted to specific creative tasks.

The system uses LLM agents to orchestrate narrative generation, which feeds into a pipeline for image, video, and audio synthesis. Visual generation uses Runware.ai cloud service for its optimal price/performance ratio. Audio generation uses Chatterbox text-to-speech and MMAudio video-to-audio methods (code included in this repo).

Backends

Agent Types

  • LMStudio Agent (agt_llm.py) - OpenAI-compatible API for local models (Llama, Qwen, Mistral, etc.) via LMStudio or cloud (GPT-5, GPT-4o-mini; API key required). Supports custom tool calling with automatic malformed JSON recovery.
  • Google ADK Agent (agt_adk.py) - Native Gemini integration via Google's Agent Development Kit (API key required). Supports Google Search as a built-in tool.
  • Claude SDK Agent (agt_claude.py) - Native Claude integration via Anthropic API (key required). Supports custom tool calling.
  • LangChain/LangGraph Agent (agt_lang.py) - LangChain implementation with optional graph-based execution on LangGraph (API key required). Supports custom tool calling.

Media Generation

  • Text-to-Image (T2I) via [Runware API] (cloud, API key required) - Flux, Imagen, GPT, NanoBanana, HiDream, etc.
  • Image-to-Video (I2V) via [Runware API] (cloud, API key required) - Seedance, Kling, Veo, and others
  • Text-to-Speech (TTS) via ChatterBox (local) - Multi-speaker neural TTS with voice cloning
  • Audio Mixing via MMAudio (local) - Video-to-audio generation

Internal Features

Async Processing (base.py) All time-consuming operations (LLM calls, TTS, image/video generation, audio mixing) run asynchronously, enabling concurrent execution and responsive pipeline flow.

State & Context Management (base.py, author.py, chat.py) Centralized state with JSON persistence serves as working memory. Context for each agent call is selectively composed from relevant state portions, ensuring optimal size and actuality for the task at hand.

QA Evaluation Loop (--evals N) Each agent call can be followed by an evaluation pass using a separate LLM call. The evaluator scores the output and provides feedback; if not approved, the agent retries with the feedback incorporated. This significantly improves output quality at the cost of additional API calls.

Built-in Agentic Tools (tools.py)

  • search - Web search via Brave Search API for grounding with current information
  • fetch_url_content - URL fetching with automatic HTML-to-markdown conversion
  • @tool decorator - Auto-generates OpenAI-compatible tool schemas from function signatures and docstrings

Sequential Thinking Tool (tool_think.py) Provides structured chain-of-thought reasoning for smaller LLMs that lack native reasoning capabilities. The tool exposes a sequential_thinking function that tracks thought sequences, supports branching (exploring alternatives), and revision (reconsidering previous thoughts). The tool auto-manages numbering since small models often lose track of sequence counts.

Malformed JSON Recovery (agt_llm.py) The LMStudio agent automatically recovers from malformed tool calls - common with smaller models that struggle with strict JSON formatting. Invalid responses trigger automatic retries with corrected parsing.

Workflows

Story Mode

# Text-only generation
python src/author.py -txt ...

# Generation with sound and visuals
python src/author.py -vt runware -imod flux -vmod seedprof -iref 8 ...

# Resume from saved state and config
python src/author.py -json _out/log.json -arg _out/config.txt 

Instructions for hierarchical storytelling content generation (see data/prompts/story):

  • arch-init - Initialize story structure, settings, and characters (+ auto-generate reference images for visual consistency)
  • writ-chap - Generate chapter outlines from the overall narrative arc
  • writ-scen - Create detailed scenes within each chapter
  • writ-frag - Write narrative fragments (the actual prose)
  • writ-voc - Generate voice-over scripts (+ auto-convert to speech via TTS)
  • writ-vis - Generate visual descriptions (+ auto-create images (T2I), animate to video (I2V), mix with audio)
  • arch-upd - Update global story state based on generated content

Chat Mode

# Text-only generation, fixed personas
python src/chat.py -txt -pers fix ...

# Generation with sound and visuals, evolving personas
python src/author.py -pers evo -vt runware -t2v -vmod pruna ...

# Resume from saved state and config
python src/author.py -json _out/log.json -arg _out/config.txt 

Instructions for linear multi-persona discussions (see data/chat/story):

  • arch-init - Create debate personas with distinct perspectives, styles, and voice profiles (+ auto-generate ref images)
  • writ-frag - Generate each persona's elaborated inner thoughts for their turn
  • writ-voc - Condense thoughts into short spoken remarks (+ auto-convert to speech via TTS)
  • writ-vis - Generate visual representation of the current speaker/moment (+ auto-create images/videos with audio)
  • arch-upd - Update persona states and debate direction based on progress
  • curat-dir-anti - Analyze debate and select provocative direction
  • curat-pers-anti - Select or create persona for controversial position
  • curat-dir-evo - Analyze debate and forecast next thought direction
  • curat-pers-evo - Select or create persona for thought direction

See JSON schemas in data folder for complete structure. Both workflows save state to log.json after each step, enabling resume from any point.

Other Usage

# Use local LLM via LMStudio
... -a lms -tmod openai/gpt-oss-20b -lh localhost

# Use OpenAI cloud
... -a lms -tmod gpt-4o-mini

# Use Gemini via Google ADK
... -a adk -tmod gemini-2.5-flash

# Use Claude via Anthropic SDK
... -a claude -tmod claude-sonnet-4-6

# Use LangChain/LangGraph
... -a langchain -tmod gpt-4o-mini

# Extract discussion transcript
python src/readlog.py -i _out/log.json -o discussion.txt

Key Arguments

  • -t/--in_txt - topic starter (text or filename)

  • -docs - optional documents to build the narrative on

  • -a/--agent - Agent type: lms (LMStudio/OpenAI), adk (Google ADK), claude (Anthropic), langchain (LangChain/LangGraph)

  • -tmod/--txt_model - LLM model name

  • -lh/--llm_host - LMStudio server host

  • -json/--load_json - Resume from JSON state file

  • -vt/--vis_type - Visual backend: runware, comfy, wan, walk

  • -imod/--img_model - Image model (e.g., flux for Flux 2 Pro)

  • -vmod/--vid_model - Video model (e.g., seedprof for Seedance Pro Fast)

  • -isz/--img_size - Image dimensions (e.g., 1344-752)

  • -vsz/--vid_size - Video dimensions (e.g., 864-480)

  • -fps - Video framerate

  • -iref/--img_refs - Number of reference images for consistency

  • --use_thinking - Enable extended thinking (Claude backend)

  • --use_graph - Use LangGraph execution (LangChain backend)

  • --db_url - Database URL for persistent sessions (ADK/LangGraph backends)

  • -txt/--txtonly - Text-only mode, skip media

  • -o/--out_dir - Output directory (default: _out)

  • -v/--verbose - Verbose output

Files

assembly/
├── src/                    # Core library
│   ├── author.py           # Story generation pipeline
│   ├── chat.py             # Interactive chat/debate pipeline
│   ├── base.py             # Infrastructure (MediaGen, StoryState, PromptMan, etc.)
│   ├── agt_llm.py          # LMStudio/OpenAI agent with tool calling
│   ├── agt_adk.py          # Google ADK (Gemini) agent
│   ├── agt_claude.py       # Anthropic Claude agent
│   ├── agt_lang.py         # LangChain/LangGraph agent
│   ├── tools.py            # Tool system
│   ├── tool_think.py       # Sequential thinking tool for small LLMs
│   ├── api_runware.py      # Runware cloud integration (T2I, I2V)
│   ├── sound.py            # Audio mixing (MMAudio)
│   ├── tts.py              # Text-to-speech (ChatterBox)
│   ├── readlog.py          # Extract discussion from log.json
│   └── util.py             # Utilities
├── _in/                    # Input documents (source materials)
├── _out/                   # Generated outputs
│   └── join_vids.bat       # FFmpeg video concatenation
├── data/                   # JSON schemas, templates, databases, etc.
│   └── prompts/            # Markdown prompt templates
│       ├── story/          # Story mode prompts
│       └── chat/           # Chat/debate mode prompts
└── au.bat                  # Base wrapper script

About

speculative experimental setup for exploring agentic narratives

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors