Veta: AI Viral Shorts Generator

Veta is an autonomous agent designed to mass-produce high-quality, viral-style short videos (YouTube Shorts, TikTok, Reels) from structured data. It combines state-of-the-art TTS, generative AI for logic/vision, and dynamic video editing into a single pipeline.

🌟 Key Features

1. 📦 Bulk Processing Pipeline

Input: Accepts a simple JSON file (config/input.json by default) defining multiple topics.
Workflow: Processes topics sequentially, managing detailed asset generation for each.

2. 🦅 Asset Supply Agent (Generative AI)

Engine: ComfyUI with Flux.1 Schnell.
Generative Workflow: Creates high-quality, custom images for each segment based on the script context.
Workflow File: Uses config/flux_schnell_workflow.json.

3. 🎙️ Cinematic Audio

Engine: Powered by Kokoro TTS (v1.0).
Style: Uses the af_heart voice profile with a 1.0x speed for natural, high-energy narration.
Audio Processing: Normalizes audio levels and ensures clean segment transitions.

4. 🎬 Dynamic Video Engine

Resolution: Native 1080x1920 (9:16) Vertical Video.
Supersampling: Renders internally at 2160x3840 (2x) before downscaling to eliminate shimmer/aliasing during zooms.
Vision-Enhanced Refinement: Uses Llama 3.2 Vision to review generated images against the script and intelligently refine prompts if quality is low or if the user rejects them.
Effects:
- Ken Burns: Randomized smooth pans and zooms for every static image.
- Stabilization: Applies deshake filters to ensure smooth motion.
- Solid Backgrounds: Uses professional solid black backgrounds for letterboxing.

5. 📝 Automatic Captions

Transcription: Uses OpenAI Whisper (base model) for accurate word-level timestamps.
Styling: Generates .ass subtitles with "Influencer" styling (Montserrat Black font, Karaoke effects).
Burn-in: Hardcodes subtitles into the final video using ffmpeg.

🎥 Example Output

Here is a sample generated by Veta:

🛠️ Installation & Setup (Linux Guide)

This guide assumes you are running a modern Linux distribution (Ubuntu 22.04+ or similar).

Prerequisites

System Tools:

sudo apt update
sudo apt update
sudo apt install git python3 python3-venv python3-pip ffmpeg

Optional but Recommended: Install uv (Fast Package Manager)

curl -LsSf https://astral.sh/uv/install.sh | sh

Ollama (LLM Engine): Install Ollama to run the Llama 3.1 model locally.

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
ollama pull gemma3:4b

ComfyUI (Image Generation Engine): You need a local instance of ComfyUI running.
- Follow the ComfyUI Installation Guide.
- Model: Download the Flux.1 Schnell checkpoint and place it in ComfyUI/models/checkpoints/.
- Running: Start ComfyUI (usually python main.py). It typically runs at http://127.0.0.1:8188.

Installation

Clone the Repository:

git clone https://github.com/your-username/veta.git
cd veta

Set Up Virtual Environment: It is highly recommended to use a virtual environment to avoid conflicts.
```
python3 -m venv .venv
source .venv/bin/activate
```
Install Python Dependencies:
```
# Using standard pip
pip install -r requirements.txt

# OR using uv (faster)
uv pip install -r requirements.txt
```
Note: If you encounter issues with whisper, ensure you have openai-whisper installed, not the package named whisper.

Configuration: Create a .env file in the root directory:

touch .env

Add the following content to .env:

# ComfyUI Configuration
COMFYUI_URL=127.0.0.1:8188

# Optional: Pixabay API Key for Stock Images (Fallback)
PIXABAY_API_KEY=your_pixabay_api_key_here

# Optional: Ollama Vision Model
OLLAMA_VISION_MODEL=llama3.2-vision

🚀 Usage

1. Prepare `config/input.json`

Create or edit config/input.json. This file controls what videos are generated.

Format:

[
  {
    "topic": "The Search Engine Shift",
    "hook": "Stop Googling Everything",
    "script": "Stop Googling everything. Seriously. For twenty years we have been using search engines the same way..."
  },
  {
     "topic": "The Dead Internet Theory",
     "script": "Have you ever felt like the internet is empty? Like you are the only real person left..."
  }
]

topic: Unique identifier for the video (used for folder naming).
hook: The headline displayed on the video (optional, defaults to topic).
script: Full voiceover text. The AI will automatically segment this into scenes.

2. Run the Agent

With your virtual environment activated:

python3 main.py --input_file config/input.json

3. Review Process

The agent runs in an interactive mode:

Script Generation: It segments your script.
Prompt Generation: It creates image prompts.
Review: It will generate images and ask for your approval.
- [a] Approve: Keeps the image.
- [r] Reject: Prompts you for feedback to regenerate.
- [s] Skip: Auto-approved.

4. Interrupt & Resume

If you stop the process (Ctrl+C) or if it crashes, Veta saves your progress automatically. When you run the same command again:

python3 main.py --input_file config/input.json

It will detect the existing checkpoint and ask:

[r] Resume: Continues exactly where it left off.
[n] New: Deletes the checkpoint and starts fresh.

📂 Output Structure

Final Videos: output/{Topic_Name}/final_video_captioned.mp4
Temporary Files: output/temp/{Topic_Name}/
- Contains raw audio, generated images, and segment videos.
- These are auto-deleted after successful generation to save space.

🧩 Architecture

graph TD
    JSON[input.json] --> Main[main.py]
    Main --> Graph[LangGraph Workflow]
    
    subgraph "Agents & Tools"
        Graph --> Script["Script Writer\n(Ollama Llama 3.1)"]
        Script --> VisualDir["Visual Director\n(Llama 3.1 + Vision)"]
        
        VisualDir --> Audio["Audio Gen\n(Kokoro TTS)"]
        VisualDir --> Visual["Visual Gen\n(ComfyUI Flux)"]
        
        Audio --> Wav["Segment.wav"]
        Visual --> Img["Segment.jpg"]
        
        Wav & Img --> Render["Renderer\n(ffmpeg)"]
        Render --> Caps["Caption Engine\n(Whisper + pysubs2)"]
    end
    
    Caps --> Final[Final Video]

⚙️ Customization

1. Changing the LLM (Script & Visuals)

Currently, the agents are hardcoded to use llama3.1:8b. To use a different local model (e.g., gemma2 or mistral):

Pull the model: ollama pull <model_name>
Edit Files: Update the model string in:
- src/agents/script_writer.py
- src/agents/visual_director.py

2. Changing the Vision Model

The review capability uses llama3.2-vision by default. To change this:

Pull the model: ollama pull <model_name>
Update .env:
```
OLLAMA_VISION_MODEL=llava
```

3. Customizing Image Generation

To use a different ComfyUI workflow (e.g., for SDXL or a Realism LoRA):

Save your workflow: Export it as API Format (JSON) from ComfyUI.
Replace File: Overwrite config/flux_schnell_workflow.json OR update the path in src/tools/image_tools.py.

⚠️ Troubleshooting

1. AttributeError: module 'whisper' has no attribute 'load_model' This means you installed the wrong whisper package. Fix:

pip uninstall whisper
pip install openai-whisper

2. ffmpeg not found Ensure ffmpeg is installed system-wide.

sudo apt install ffmpeg
ffmpeg -version  # Verify installation

3. ComfyUI Connection Refused Ensure ComfyUI is running in a separate terminal window and verify the URL in your .env file matches its output (default 127.0.0.1:8188).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Veta: AI Viral Shorts Generator

🌟 Key Features

1. 📦 Bulk Processing Pipeline

2. 🦅 Asset Supply Agent (Generative AI)

3. 🎙️ Cinematic Audio

4. 🎬 Dynamic Video Engine

5. 📝 Automatic Captions

🎥 Example Output

🛠️ Installation & Setup (Linux Guide)

Prerequisites

Installation

🚀 Usage

1. Prepare `config/input.json`

2. Run the Agent

3. Review Process

4. Interrupt & Resume

📂 Output Structure

🧩 Architecture

⚙️ Customization

1. Changing the LLM (Script & Visuals)

2. Changing the Vision Model

3. Customizing Image Generation

⚠️ Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Veta: AI Viral Shorts Generator

🌟 Key Features

1. 📦 Bulk Processing Pipeline

2. 🦅 Asset Supply Agent (Generative AI)

3. 🎙️ Cinematic Audio

4. 🎬 Dynamic Video Engine

5. 📝 Automatic Captions

🎥 Example Output

🛠️ Installation & Setup (Linux Guide)

Prerequisites

Installation

🚀 Usage

1. Prepare config/input.json

2. Run the Agent

3. Review Process

4. Interrupt & Resume

📂 Output Structure

🧩 Architecture

⚙️ Customization

1. Changing the LLM (Script & Visuals)

2. Changing the Vision Model

3. Customizing Image Generation

⚠️ Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Prepare `config/input.json`

Packages