# Hookswap: AI-Generated Video Hooks

Generate scroll-stopping video hook introductions using AI. This workflow creates multiple hook variants tailored to specific industries and states, complete with AI-generated visuals, voice narration, and captions.

## Problem

You need to create engaging video hooks that capture attention in the first few seconds. Each hook should be tailored to specific audiences.

| Use case | Input | Output |
|----------|-------|--------|
| Marketing videos | Original video + industry | Multiple hook variants |
| Regional targeting | Video + target states | State-specific hooks |
| Social media | Long-form content | Scroll-stopping intros |

## Solution

**What's in this recipe:**

1. **Transcribe** the original video for context
2. **Generate hooks** tailored to industry and location
3. **Create visuals** matching each hook's psychology
4. **Add narration** with AI text-to-speech
5. **Assemble** the final videos with captions
6. **Export** in multiple aspect ratios (9:16, 1:1, 16:9)

All steps are declarative computed columns—add new videos and hooks generate automatically.

### Setup

In [None]:
%pip install -qU pixeltable openai fal-client httpx

In [None]:
import os
import getpass

# OpenAI for GPT-4o, DALL-E, Whisper, and TTS
if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')

# fal.ai for LumaAI image-to-video (optional but recommended)
if 'FAL_API_KEY' not in os.environ:
    os.environ['FAL_API_KEY'] = getpass.getpass('fal.ai API Key (for LumaAI): ')

# ElevenLabs for high-quality TTS (optional, falls back to OpenAI)
if 'ELEVENLABS_API_KEY' not in os.environ:
    key = getpass.getpass('ElevenLabs API Key (press Enter to skip): ')
    if key:
        os.environ['ELEVENLABS_API_KEY'] = key

In [None]:
import pixeltable as pxt
from pixeltable.functions import openai, fal
from pixeltable.functions.video import with_audio, concat_videos

In [None]:
# Create fresh directory for our hookswap demo
pxt.drop_dir('hookswap', force=True)
pxt.create_dir('hookswap')

## Custom UDFs

We'll create a few custom functions for ElevenLabs TTS, LumaAI video generation, and aspect ratio transformations.

### ElevenLabs Text-to-Speech

In [None]:
import httpx
from pixeltable.utils.local_store import TempStore

@pxt.udf
async def elevenlabs_tts(
    text: str,
    voice_id: str = 'EXAVITQu4vr4xnSDxMaL',  # "Sarah" - default warm female voice
    model_id: str = 'eleven_multilingual_v2'
) -> pxt.Audio:
    """
    Generate speech using ElevenLabs API.
    
    Args:
        text: Text to synthesize
        voice_id: ElevenLabs voice ID (default: Sarah)
        model_id: Model to use for synthesis
    
    Returns:
        Audio file path
    """
    api_key = os.environ.get('ELEVENLABS_API_KEY')
    if not api_key:
        raise ValueError('ELEVENLABS_API_KEY not set')
    
    url = f'https://api.elevenlabs.io/v1/text-to-speech/{voice_id}'
    headers = {
        'Accept': 'audio/mpeg',
        'Content-Type': 'application/json',
        'xi-api-key': api_key
    }
    data = {
        'text': text,
        'model_id': model_id,
        'voice_settings': {
            'stability': 0.5,
            'similarity_boost': 0.75
        }
    }
    
    async with httpx.AsyncClient() as client:
        response = await client.post(url, json=data, headers=headers, timeout=60.0)
        response.raise_for_status()
        
        output_path = TempStore.create_path(extension='.mp3')
        with open(output_path, 'wb') as f:
            f.write(response.content)
        
        return str(output_path)

### Voice Selection Based on Hook Tone

In [None]:
# ElevenLabs voice mapping by tone
VOICE_MAP = {
    'authoritative_male': '29vD33N1CtxCmqQRPOHJ',    # Drew
    'authoritative_female': 'EXAVITQu4vr4xnSDxMaL',  # Sarah
    'warm_male': 'ErXwobaYiN019PkySvjV',             # Antoni
    'warm_female': '21m00Tcm4TlvDq8ikWAM',           # Rachel
    'urgent_male': 'VR6AewLTigWG4xSOukaG',           # Arnold
    'urgent_female': 'ThT5KcBeYPX3keUQqHPh',         # Dorothy
    'friendly_male': 'TxGEqnHWrfWFTfGW9XjX',         # Josh
    'friendly_female': 'MF3mGyEYCl7XYWbV9V6O',       # Emily
}

@pxt.udf
def get_voice_id(tone: str, gender: str = 'female') -> str:
    """
    Get ElevenLabs voice ID based on tone and gender.
    
    Args:
        tone: One of 'authoritative', 'warm', 'urgent', 'friendly'
        gender: 'male' or 'female'
    
    Returns:
        ElevenLabs voice ID
    """
    key = f"{tone}_{gender}"
    return VOICE_MAP.get(key, VOICE_MAP['warm_female'])

### LumaAI Image-to-Video via fal.ai

In [None]:
import PIL.Image
import base64
import io

@pxt.udf
def image_to_data_url(image: PIL.Image.Image) -> str:
    """Convert PIL Image to base64 data URL for API calls."""
    buffer = io.BytesIO()
    image.save(buffer, format='PNG')
    b64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
    return f'data:image/png;base64,{b64}'

In [None]:
@pxt.udf
def extract_video_url(fal_response: dict) -> str:
    """Extract video URL from fal.ai LumaAI response."""
    if fal_response and 'video' in fal_response:
        return fal_response['video']['url']
    return None

### Aspect Ratio Transformations

In [None]:
import subprocess
import pathlib
from pixeltable.utils.local_store import TempStore

@pxt.udf
def add_blur_padding(
    video: pxt.Video,
    target_ratio: str = '1:1'
) -> pxt.Video:
    """
    Add blurred padding to video for different aspect ratios.
    
    Args:
        video: Input video (assumed 9:16 vertical)
        target_ratio: '1:1' for square or '16:9' for landscape
    
    Returns:
        Video with blur-padded background
    """
    output_path = str(TempStore.create_path(extension='.mp4'))
    
    if target_ratio == '1:1':
        # Square: 1080x1080
        filter_complex = (
            "[0:v]split=2[blur][fg];"
            "[blur]scale=1080:1080:force_original_aspect_ratio=increase,"
            "crop=1080:1080,boxblur=20:20[bg];"
            "[fg]scale=-1:1080:force_original_aspect_ratio=decrease[scaled];"
            "[bg][scaled]overlay=(W-w)/2:(H-h)/2"
        )
    elif target_ratio == '16:9':
        # Landscape: 1920x1080
        filter_complex = (
            "[0:v]split=2[blur][fg];"
            "[blur]scale=1920:1080:force_original_aspect_ratio=increase,"
            "crop=1920:1080,boxblur=20:20[bg];"
            "[fg]scale=-1:1080:force_original_aspect_ratio=decrease[scaled];"
            "[bg][scaled]overlay=(W-w)/2:(H-h)/2"
        )
    else:
        raise ValueError(f"Unsupported ratio: {target_ratio}")
    
    cmd = [
        'ffmpeg', '-y',
        '-i', str(video),
        '-filter_complex', filter_complex,
        '-c:v', 'libx264',
        '-crf', '23',
        '-c:a', 'aac',
        '-loglevel', 'error',
        output_path
    ]
    
    subprocess.run(cmd, check=True, capture_output=True)
    return output_path

## Create the Video Pipeline

### Step 1: Main Videos Table

In [None]:
# Create the main videos table
videos = pxt.create_table(
    'hookswap.videos',
    {
        'video': pxt.Video,
        'industry': pxt.String,      # e.g., 'insurance', 'fitness', 'real_estate'
        'target_state': pxt.String,  # e.g., 'California', 'Texas', 'Florida'
    }
)

print("Created videos table")

### Step 2: Extract Audio and Transcribe

In [None]:
# Extract audio from video
videos.add_computed_column(
    audio=videos.video.extract_audio(format='mp3')
)

# Transcribe with word-level timestamps
videos.add_computed_column(
    transcription=openai.transcriptions(
        videos.audio,
        model='whisper-1',
        model_kwargs={
            'response_format': 'verbose_json',
            'timestamp_granularities': ['word']
        }
    )
)

# Extract just the text for context
videos.add_computed_column(
    transcript_text=videos.transcription['text']
)

print("Added transcription columns")

### Step 3: Generate Hook Text

In [None]:
# Build the hook generation prompt
@pxt.udf
def build_hook_prompt(industry: str, state: str, transcript: str) -> list:
    """
    Build the prompt for GPT-4o to generate a hook.
    """
    # State-specific context
    state_context = {
        'California': 'wildfires, Prop 13 property tax, earthquake risk, high cost of living',
        'Texas': 'hailstorms, homestead exemption, no state income tax, flooding',
        'Florida': 'hurricanes, flood zones, rising insurance costs, sinkholes',
        'New York': 'high property taxes, rent control, winter storms, aging infrastructure',
        'Arizona': 'extreme heat, dust storms, water scarcity, rapid growth',
    }
    
    context = state_context.get(state, 'local regulations and regional challenges')
    
    system_prompt = f"""You are an expert at writing scroll-stopping video hooks for {industry} content.
Your hooks should:
- Be 2-3 sentences max (under 15 seconds when spoken)
- Create immediate urgency or curiosity
- Reference specific local context when relevant
- Use psychology: fear of missing out, social proof, or problem-agitation

For {state}, consider: {context}"""

    user_prompt = f"""Write a compelling video hook for this {industry} video targeting {state} audiences.

Original video transcript (for context):
{transcript[:500]}...

Return ONLY the hook text, nothing else."""

    return [
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': user_prompt}
    ]

In [None]:
# Generate the hook text
videos.add_computed_column(
    hook_messages=build_hook_prompt(videos.industry, videos.target_state, videos.transcript_text)
)

videos.add_computed_column(
    hook_response=openai.chat_completions(
        videos.hook_messages,
        model='gpt-4o'
    )
)

videos.add_computed_column(
    hook_text=videos.hook_response['choices'][0]['message']['content']
)

print("Added hook generation columns")

### Step 4: Analyze Hook Tone and Select Voice

In [None]:
@pxt.udf
def build_tone_prompt(hook_text: str) -> list:
    """Build prompt to analyze hook tone."""
    return [
        {
            'role': 'system',
            'content': 'Analyze the emotional tone of the given text. Return ONLY one word: authoritative, warm, urgent, or friendly.'
        },
        {
            'role': 'user',
            'content': hook_text
        }
    ]

videos.add_computed_column(
    tone_messages=build_tone_prompt(videos.hook_text)
)

videos.add_computed_column(
    tone_response=openai.chat_completions(
        videos.tone_messages,
        model='gpt-4o-mini'
    )
)

videos.add_computed_column(
    hook_tone=videos.tone_response['choices'][0]['message']['content'].apply(str.lower).apply(str.strip)
)

videos.add_computed_column(
    voice_id=get_voice_id(videos.hook_tone, 'female')
)

print("Added tone analysis and voice selection")

### Step 5: Generate Visual Concept and Image

In [None]:
@pxt.udf
def build_visual_prompt(hook_text: str, industry: str, state: str) -> list:
    """Build prompt for visual concept generation."""
    return [
        {
            'role': 'system',
            'content': f"""You are a visual director creating imagery for {industry} video content.
Generate a concise 30-50 word visual description for a vertical video frame.
Include: main subject, environment, lighting, mood, any state-specific elements for {state}.
The image should evoke the emotion of the hook without containing text."""
        },
        {
            'role': 'user',
            'content': f'Create a visual concept for this hook: "{hook_text}"'
        }
    ]

videos.add_computed_column(
    visual_messages=build_visual_prompt(videos.hook_text, videos.industry, videos.target_state)
)

videos.add_computed_column(
    visual_response=openai.chat_completions(
        videos.visual_messages,
        model='gpt-4o'
    )
)

videos.add_computed_column(
    visual_concept=videos.visual_response['choices'][0]['message']['content']
)

print("Added visual concept generation")

In [None]:
# Generate the hook image with DALL-E
videos.add_computed_column(
    hook_image=openai.image_generations(
        videos.visual_concept,
        model='dall-e-3',
        model_kwargs={
            'size': '1024x1792',  # Vertical format
            'quality': 'standard'
        }
    )
)

print("Added image generation")

### Step 6: Convert Image to Video with LumaAI

In [None]:
# Convert image to data URL for fal.ai
videos.add_computed_column(
    hook_image_url=image_to_data_url(videos.hook_image)
)

# Generate video from image using LumaAI via fal.ai
videos.add_computed_column(
    luma_response=fal.run(
        input={
            'prompt': 'Subtle cinematic motion, gentle camera movement, professional video quality',
            'image_url': videos.hook_image_url,
            'aspect_ratio': '9:16',
            'loop': False
        },
        app='fal-ai/luma-dream-machine'
    )
)

# Extract the video URL and convert to Video type
videos.add_computed_column(
    hook_video_url=extract_video_url(videos.luma_response)
)

videos.add_computed_column(
    hook_video=videos.hook_video_url.astype(pxt.Video)
)

print("Added image-to-video conversion")

### Step 7: Generate Text-to-Speech

In [None]:
# Use ElevenLabs if available, otherwise fall back to OpenAI
use_elevenlabs = 'ELEVENLABS_API_KEY' in os.environ and os.environ['ELEVENLABS_API_KEY']

if use_elevenlabs:
    videos.add_computed_column(
        hook_audio=elevenlabs_tts(videos.hook_text, videos.voice_id)
    )
    print("Using ElevenLabs for TTS")
else:
    videos.add_computed_column(
        hook_audio=openai.speech(
            videos.hook_text,
            model='tts-1',
            voice='nova'  # Warm, engaging voice
        )
    )
    print("Using OpenAI for TTS (ElevenLabs not configured)")

### Step 8: Assemble Hook Video with Audio

In [None]:
# Combine hook video with TTS audio
videos.add_computed_column(
    assembled_hook=with_audio(
        videos.hook_video,
        videos.hook_audio
    )
)

print("Added audio assembly")

### Step 9: Add Captions

In [None]:
# Add text overlay as captions (simplified - full SRT would need more processing)
videos.add_computed_column(
    captioned_hook=videos.assembled_hook.overlay_text(
        videos.hook_text,
        font_size=36,
        color='white',
        box=True,
        box_color='black',
        box_opacity=0.7,
        box_border=[8, 16],
        vertical_align='bottom',
        vertical_margin=100
    )
)

print("Added captions")

### Step 10: Stitch Hook to Original Video

In [None]:
# Concatenate hook with original video
videos.add_computed_column(
    final_video=concat_videos([videos.captioned_hook, videos.video])
)

print("Added final video stitching")

### Step 11: Generate Aspect Ratio Variants

In [None]:
# Square version (1:1) for Instagram feed
videos.add_computed_column(
    final_video_square=add_blur_padding(videos.final_video, '1:1')
)

# Landscape version (16:9) for YouTube
videos.add_computed_column(
    final_video_landscape=add_blur_padding(videos.final_video, '16:9')
)

print("Added aspect ratio variants")

## Demo: Process a Sample Video

Let's insert a sample video and watch the pipeline in action.

In [None]:
# Insert a sample video
# Using a short sample video URL - replace with your own video
sample_video_url = 'https://github.com/pixeltable/pixeltable/raw/release/docs/resources/bangkok.mp4'

videos.insert([
    {
        'video': sample_video_url,
        'industry': 'travel',
        'target_state': 'California'
    }
])

print("Inserted sample video - pipeline is now processing...")

In [None]:
# Check the generated hook text
videos.select(
    videos.industry,
    videos.target_state,
    videos.hook_text,
    videos.hook_tone,
    videos.visual_concept
).head()

In [None]:
# View the generated hook image
videos.select(
    videos.hook_text,
    videos.hook_image
).head()

In [None]:
# View the final videos
videos.select(
    videos.industry,
    videos.target_state,
    videos.final_video,
    videos.final_video_square,
    videos.final_video_landscape
).head()

## Generate Multiple Hooks

Add the same video with different target states to generate multiple hook variants.

In [None]:
# Generate hooks for multiple states
videos.insert([
    {'video': sample_video_url, 'industry': 'travel', 'target_state': 'Texas'},
    {'video': sample_video_url, 'industry': 'travel', 'target_state': 'Florida'},
])

print("Added 2 more state variants - processing...")

In [None]:
# Compare all hook variants
videos.select(
    videos.target_state,
    videos.hook_text,
    videos.hook_tone
).collect()

## Summary

This notebook demonstrated a complete AI video hook generation pipeline:

1. **Transcription**: Extract audio and transcribe with word-level timestamps
2. **Hook Generation**: GPT-4o creates state/industry-specific hooks
3. **Voice Selection**: Analyze tone and select appropriate voice
4. **Visual Creation**: Generate concept and image with DALL-E
5. **Video Synthesis**: Convert image to video with LumaAI
6. **Audio Synthesis**: Generate narration with ElevenLabs/OpenAI TTS
7. **Assembly**: Combine video, audio, and captions
8. **Final Output**: Stitch to original and create aspect ratio variants

All steps are **declarative computed columns**—add new videos and the entire pipeline runs automatically!

### Next Steps

- Add more industries and state contexts
- Implement A/B testing by generating multiple hook styles
- Add embedding-based similarity search to find best-performing hooks
- Connect to a frontend for batch processing