# üé¨ Social Video Engine

**AI-powered social video generator: Qwen3-TTS + Remotion**

```
Story Script ‚Üí Qwen3-TTS (voiceover) ‚Üí Remotion (React animations) ‚Üí FFmpeg (merge) ‚Üí Final MP4
```

**Cost: $0.00 per video** ‚Äî runs on Colab's free GPU

---
## Step 1 ‚Äî Install Everything
‚è±Ô∏è ~5-7 min on first run

In [None]:
%%bash
set -e

echo "üì¶ Node.js..."
if ! command -v node &>/dev/null; then
  curl -fsSL https://deb.nodesource.com/setup_20.x | bash - > /dev/null 2>&1
  apt-get install -y nodejs > /dev/null 2>&1
fi
echo "  $(node -v) / npm $(npm -v)"

echo "üì¶ Chrome dependencies..."
apt-get update -qq > /dev/null 2>&1
apt-get install -y -qq libnss3 libdbus-1-3 libatk1.0-0 libgbm-dev libasound2 \
  libxrandr2 libxkbcommon-dev libxfixes3 libxcomposite1 libxdamage1 \
  libatk-bridge2.0-0 libpango-1.0-0 libcairo2 libcups2 > /dev/null 2>&1
echo "  ‚úÖ deps installed"

echo "üì¶ FFmpeg: $(ffmpeg -version 2>&1 | head -1)"
echo "‚úÖ System ready"

In [None]:
import os
REPO = "/content/social-video-engine"

# Clone or update repo
if not os.path.exists(REPO):
    !git clone https://github.com/redwanJemal/social-video-engine.git {REPO}
else:
    !cd {REPO} && git pull

# Install + upgrade Remotion to latest
!cd {REPO} && npm install @remotion/cli@latest @remotion/bundler@latest @remotion/renderer@latest remotion@latest --legacy-peer-deps 2>&1 | tail -3

# Download Remotion's Chrome Headless Shell
!cd {REPO} && npx remotion browser ensure 2>&1 | tail -3

print(f"\n‚úÖ Video engine ready")

In [None]:
# Install Qwen3-TTS
!pip install -q -U qwen-tts soundfile numpy

# Flash attention (optional, speeds up TTS)
!MAX_JOBS=2 pip install -q flash-attn --no-build-isolation 2>/dev/null || echo "flash-attn skipped (still works fine)"

import torch
print(f"üñ•Ô∏è GPU: {torch.cuda.get_device_name(0)} ({torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB)")
print("‚úÖ TTS ready")

---
## Step 2 ‚Äî Load TTS Model

In [None]:
import torch, soundfile as sf, numpy as np
from qwen_tts import Qwen3TTSModel
from IPython.display import Audio, display

MODEL = "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice"

print(f"Loading {MODEL}...")
try:
    model = Qwen3TTSModel.from_pretrained(MODEL, device_map="cuda:0", dtype=torch.bfloat16, attn_implementation="flash_attention_2")
except:
    model = Qwen3TTSModel.from_pretrained(MODEL, device_map="cuda:0", dtype=torch.bfloat16)

print(f"‚úÖ Loaded | Speakers: {model.get_supported_speakers()}")

---
## Step 3 ‚Äî Define Your Video

**Templates:** `intro` ¬∑ `kinetic-text` ¬∑ `stat-card` ¬∑ `list-reveal` ¬∑ `quote-card` ¬∑ `cta`

**Speakers:** `Ryan` (dynamic M) ¬∑ `Aiden` (sunny M) ¬∑ `Vivian` (bright F) ¬∑ `Serena` (warm F)

**Themes:** `midnight` ¬∑ `ocean` ¬∑ `sunset` ¬∑ `forest` ¬∑ `noir` ¬∑ `fire`

In [None]:
VIDEO_CONFIG = {
    "theme": {
        "name": "Midnight",
        "bgGradient": ["#0f0c29", "#302b63"],
        "textColor": "#ffffff",
        "accentColor": "#f5576c",
        "fontFamily": "sans-serif"
    },
    "scenes": [
        {
            "type": "intro",
            "duration": 90,
            "props": {"hook": "Stop scrolling.", "subtitle": "This changes everything"},
            "tts": {"text": "Stop scrolling. This is going to change everything you know about productivity.", "speaker": "Ryan", "instruct": "Dramatic, attention-grabbing, confident."}
        },
        {
            "type": "kinetic-text",
            "duration": 120,
            "props": {"lines": ["Most people waste", "3 HOURS a day", "on tasks AI can do", "in 3 MINUTES"], "accentLineIndex": 1, "animation": "slide-up"},
            "tts": {"text": "Most people waste three hours every single day on tasks that AI can finish in just three minutes.", "speaker": "Ryan", "instruct": "Building intensity, emphasize the contrast."}
        },
        {
            "type": "stat-card",
            "duration": 120,
            "props": {"title": "The Numbers Don't Lie", "stats": [{"value": "87%", "label": "Time Saved"}, {"value": "10x", "label": "More Output"}, {"value": "$0", "label": "Extra Cost"}]},
            "tts": {"text": "Eighty-seven percent time saved. Ten times more output. And it costs you absolutely nothing extra.", "speaker": "Ryan", "instruct": "Confident, impressive."}
        },
        {
            "type": "list-reveal",
            "duration": 150,
            "props": {"title": "Top 3 AI Tools", "items": ["ChatGPT for writing", "Midjourney for design", "Cursor for coding"], "icon": "üöÄ", "numbered": true},
            "tts": {"text": "Here are the top three AI tools you need right now. Number one: ChatGPT for writing. Number two: Midjourney for design. And number three: Cursor for coding.", "speaker": "Ryan", "instruct": "Enthusiastic, clear pauses between items."}
        },
        {
            "type": "quote-card",
            "duration": 120,
            "props": {"quote": "AI won't replace you. But someone using AI will.", "author": "Tech Industry", "role": "Common saying"},
            "tts": {"text": "Remember this: AI won't replace you. But someone using AI, definitely will.", "speaker": "Ryan", "instruct": "Thoughtful pause before the punchline, serious tone."}
        },
        {
            "type": "cta",
            "duration": 90,
            "props": {"headline": "Start Today", "subtext": "Follow for more AI tips", "buttonText": "Follow ‚Üí"},
            "tts": {"text": "Follow for more AI tips that actually save you time. See you in the next one.", "speaker": "Ryan", "instruct": "Warm, inviting, friendly call to action."}
        }
    ]
}

total = sum(s['duration'] for s in VIDEO_CONFIG['scenes'])
print(f"üìã {len(VIDEO_CONFIG['scenes'])} scenes ¬∑ {total} frames ¬∑ {total/30:.1f}s")
for i, s in enumerate(VIDEO_CONFIG['scenes']):
    print(f"  {i+1}. [{s['type']}] {s['duration']/30:.1f}s ‚Äî {s['tts']['speaker']}")

---
## Step 4 ‚Äî Generate Voiceover

In [None]:
import os
AUDIO_DIR = f"{REPO}/public/audio"
os.makedirs(AUDIO_DIR, exist_ok=True)

SAMPLE_RATE = None  # will be set from first generation

for i, scene in enumerate(VIDEO_CONFIG['scenes']):
    tts = scene['tts']
    print(f"üéôÔ∏è Scene {i+1}/{len(VIDEO_CONFIG['scenes'])} [{scene['type']}] {tts['speaker']}")
    
    wavs, sr = model.generate_custom_voice(
        text=tts['text'], language='English',
        speaker=tts['speaker'], instruct=tts.get('instruct', '')
    )
    if SAMPLE_RATE is None:
        SAMPLE_RATE = sr
    
    audio_file = f"{AUDIO_DIR}/scene_{i:03d}.wav"
    sf.write(audio_file, wavs[0], sr, subtype='PCM_16')  # PCM_16 for max compatibility
    dur = len(wavs[0]) / sr
    scene['_audio_dur'] = dur
    print(f"   ‚úÖ {dur:.1f}s")

print(f"\nüîä Preview last scene:")
display(Audio(wavs[0], rate=sr))
print(f"\n‚úÖ All audio generated (sample rate: {SAMPLE_RATE} Hz)")

---
## Step 5 ‚Äî Adjust Durations + Build Full Audio Track

In [None]:
FPS = 30
PAD = 15  # 0.5s padding per scene

audio_parts = []
print("‚è±Ô∏è Matching scene durations to audio:\n")

for i, scene in enumerate(VIDEO_CONFIG['scenes']):
    audio_dur = scene.get('_audio_dur', scene['duration'] / FPS)
    needed_frames = int(audio_dur * FPS) + PAD
    scene['duration'] = max(needed_frames, scene['duration'])
    scene_dur_s = scene['duration'] / FPS
    
    # Read audio and pad to match scene duration
    audio_file = f"{AUDIO_DIR}/scene_{i:03d}.wav"
    data, sr = sf.read(audio_file, dtype='float32')
    target_len = int(scene_dur_s * sr)
    
    if len(data) < target_len:
        data = np.concatenate([data, np.zeros(target_len - len(data), dtype=np.float32)])
    else:
        data = data[:target_len]
    
    audio_parts.append(data)
    print(f"  Scene {i+1}: audio {audio_dur:.1f}s ‚Üí video {scene_dur_s:.1f}s")

# Concatenate and save as PCM_16 WAV (universal compatibility)
full_audio = np.concatenate(audio_parts)
FULL_AUDIO = f"{AUDIO_DIR}/full_narration.wav"
sf.write(FULL_AUDIO, full_audio, sr, subtype='PCM_16')

total_frames = sum(s['duration'] for s in VIDEO_CONFIG['scenes'])
print(f"\nüìä Video: {total_frames} frames = {total_frames/FPS:.1f}s")
print(f"üìä Audio: {len(full_audio)/sr:.1f}s ({os.path.getsize(FULL_AUDIO)/1e6:.1f} MB)")
print(f"\nüîä Full audio preview:")
display(Audio(full_audio, rate=sr))

---
## Step 6 ‚Äî Render Video (Remotion)

In [None]:
import json

CONFIG_FILE = f"{REPO}/render-config.json"
VIDEO_SILENT = f"{REPO}/out/video-silent.mp4"
os.makedirs(f"{REPO}/out", exist_ok=True)

# Write Remotion config (only template props, no TTS fields)
remotion_config = {
    "theme": VIDEO_CONFIG["theme"],
    "scenes": [{"type": s["type"], "duration": s["duration"], "props": s["props"]} for s in VIDEO_CONFIG["scenes"]]
}
with open(CONFIG_FILE, 'w') as f:
    json.dump(remotion_config, f, indent=2)

total = sum(s['duration'] for s in remotion_config['scenes'])
print(f"üé¨ Rendering {total} frames ({total/30:.1f}s)...\n")

# Use Remotion CLI (handles browser internally)
!cd {REPO} && npx remotion render src/index.ts SocialVideo {VIDEO_SILENT} \
  --props render-config.json \
  --enable-multi-process-on-linux \
  --log=warning \
  2>&1

if os.path.exists(VIDEO_SILENT):
    mb = os.path.getsize(VIDEO_SILENT) / 1e6
    print(f"\n‚úÖ Silent video: {mb:.1f} MB")
else:
    print("\n‚ùå Render failed")

---
## Step 7 ‚Äî Merge Audio + Video

In [None]:
import subprocess

FINAL = f"{REPO}/out/final-video.mp4"

# First verify audio file is valid
print("üîç Checking audio...")
r = subprocess.run(['ffprobe', '-v', 'error', '-show_entries', 'stream=codec_type,sample_rate,channels,duration',
                    '-of', 'compact', FULL_AUDIO], capture_output=True, text=True)
print(f"   {r.stdout.strip()}")

print("üîç Checking video...")
r = subprocess.run(['ffprobe', '-v', 'error', '-show_entries', 'stream=codec_type,width,height,duration',
                    '-of', 'compact', VIDEO_SILENT], capture_output=True, text=True)
print(f"   {r.stdout.strip()}")

# Merge: convert audio to AAC, copy video stream
print("\nüîä Merging...")
cmd = [
    'ffmpeg', '-y',
    '-i', VIDEO_SILENT,       # video input
    '-i', FULL_AUDIO,         # audio input
    '-map', '0:v:0',          # take video from first input
    '-map', '1:a:0',          # take audio from second input
    '-c:v', 'copy',           # copy video (no re-encode)
    '-c:a', 'aac',            # encode audio as AAC
    '-b:a', '192k',           # audio bitrate
    '-ac', '1',               # mono (TTS is mono)
    '-ar', '44100',           # standard sample rate
    '-shortest',              # match shortest stream
    FINAL
]
r = subprocess.run(cmd, capture_output=True, text=True)

if os.path.exists(FINAL):
    mb = os.path.getsize(FINAL) / 1e6
    # Verify the final file has both streams
    r2 = subprocess.run(['ffprobe', '-v', 'error', '-show_entries', 'stream=codec_type',
                         '-of', 'csv=p=0', FINAL], capture_output=True, text=True)
    streams = r2.stdout.strip().split('\n')
    has_video = 'video' in streams
    has_audio = 'audio' in streams
    print(f"\n‚úÖ Final: {FINAL} ({mb:.1f} MB)")
    print(f"   Video: {'‚úÖ' if has_video else '‚ùå'} | Audio: {'‚úÖ' if has_audio else '‚ùå'}")
    if not has_audio:
        print(f"\n‚ö†Ô∏è Audio stream missing! FFmpeg stderr:\n{r.stderr[-500:]}")
else:
    print(f"‚ùå Merge failed:\n{r.stderr[-500:]}")

---
## Step 8 ‚Äî Preview + Download

In [None]:
from IPython.display import HTML
from base64 import b64encode

if os.path.exists(FINAL) and os.path.getsize(FINAL) > 100000:
    mp4 = open(FINAL, 'rb').read()
    b64 = b64encode(mp4).decode()
    display(HTML(f'<video width="360" height="640" controls><source src="data:video/mp4;base64,{b64}" type="video/mp4"></video>'))
else:
    print("Video too small or missing")

In [None]:
from google.colab import files
files.download(FINAL)

---
## üí° Quick Reference

**To re-render with changes:** edit Step 3, then run Steps 4‚Üí8

| Speaker | Voice | Best For |
|---------|-------|----------|
| Ryan | Dynamic male | Energy, narration |
| Aiden | Sunny American male | Casual, friendly |
| Vivian | Bright young female | Punchy, engaging |
| Serena | Warm gentle female | Calm, storytelling |
| Ono_Anna | Playful female | Japanese |
| Sohee | Warm female | Korean |