<a href="https://colab.research.google.com/github/mshumer/sora-extend/blob/main/Sora_Extend.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sora 2 — AI‑Planned, Scene‑Exact Prompts with Continuity (Chained >12s)

Built by [Matt Shumer](https://x.com/mattshumer_).

Pipeline:
1) Use an LLM (“GPT‑5 Thinking”) to plan N scene prompts from a base idea. The LLM is prompted to do this intelligently to enable continuity.
2) Render each segment with Sora 2; for continuity, pass the prior segment’s **final frame** as `input_reference`.
3) Concatenate segments into a single MP4.

In [None]:
# @title 1) Install dependencies

import sys, subprocess, importlib, site

# Install everything upfront
print("Installing dependencies...")
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "-U", "moviepy==1.0.3", "imageio", "imageio-ffmpeg", "openai", "requests", "opencv-python-headless"])

# Reload site packages so newly installed packages are found
importlib.reload(site)
import site
site.main()

print("Dependencies installed successfully")

# 2) Config

Fill in:
- OPENAI_API_KEY
- SECONDS_PER_SEGMENT (options: 4, 8, 12)
- NUM_GENERATIONS (this is the total number of segments we will generate and concatenate. to get the total length, do `SECONDS_PER_SEGMENT * NUM_GENERATIONS`)

In [None]:
# @title 2) Imports and logging setup

import os, re, io, json, time, math, mimetypes, shutil, textwrap, tempfile
from pathlib import Path
import requests
import cv2
from moviepy.editor import VideoFileClip, concatenate_videoclips
from IPython.display import Video as IPyVideo, display
from openai import OpenAI

import logging
from datetime import datetime

print("All imports configured successfully")

In [None]:
os.environ["OPENAI_API_KEY"] = "Your API Key"

client = OpenAI()

# ---------- Planner (text model) ----------
PLANNER_MODEL = os.environ.get("PLANNER_MODEL", "gpt-5")

# ---------- Sora (video model) ----------
SORA_MODEL = "sora-2"
SIZE       = "1280x720"

# ---------- Your project inputs ----------
BASE_PROMPT = "Gameplay footage of a game releasing in 2027, a car driving through a futuristic city"
SECONDS_PER_SEGMENT  = 8
NUM_GENERATIONS      = 2

# ---------- Multi-frame continuity settings ----------
REFERENCE_FRAMES_COUNT = 8  # Number of frames to extract (1, 8, 32, or any number)
REFERENCE_FRAMES_STRATEGY = "evenly_spaced"  # "evenly_spaced" or "end_of_file"
REFERENCE_FRAMES_FORMAT = "webp"  # "webp" or "png" for animated output

# ---------- Retry settings ----------
INITIAL_RETRY_DELAY = 2  # Initial delay in seconds
MAX_RETRY_DELAY = 60  # Maximum delay between retries in seconds
MAX_TOTAL_RETRY_TIME = 7200  # Maximum total time to retry (2 hours)

# Polling cadence
POLL_INTERVAL_SEC = 2
PRINT_PROGRESS_BAR = True

# API setup
API_BASE = "https://api.openai.com/v1"
HEADERS_JSON = {"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}", "Content-Type": "application/json"}
HEADERS_AUTH = {"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"}

# Create unique output directory
now = datetime.now()
human_timestamp = now.strftime("%Y-%m-%d %I-%M %p")
prompt_words = " ".join(re.sub(r'[^a-z0-9\s]', '', BASE_PROMPT.lower()).split()[:5]).title()
folder_name = f"Sora Extend {human_timestamp} {prompt_words}"
OUT_DIR = Path(folder_name)
OUT_DIR.mkdir(parents=True, exist_ok=True)

# Setup logging
log_timestamp = now.strftime("%Y%m%d_%H%M%S")
log_prompt_words = "_".join(re.sub(r'[^a-z0-9\s]', '', BASE_PROMPT.lower()).split()[:5])
log_filename = f"sora_extend_{log_timestamp}_{log_prompt_words}.md"
log_dir = OUT_DIR / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
log_path = log_dir / log_filename

# Initialize markdown log
with open(log_path, "w") as f:
    f.write(f"""# Sora Extend Execution Log (Multi-Frame)
**Started:** {now.strftime("%Y-%m-%d %H:%M:%S")}  
**Prompt:** {BASE_PROMPT[:100]}{'...' if len(BASE_PROMPT) > 100 else ''}

## Configuration
- **Model:** {SORA_MODEL}
- **Planner Model:** {PLANNER_MODEL}
- **Resolution:** {SIZE}
- **Segment Length:** {SECONDS_PER_SEGMENT} seconds
- **Number of Segments:** {NUM_GENERATIONS}
- **Total Duration:** {SECONDS_PER_SEGMENT * NUM_GENERATIONS} seconds
- **Reference Frames:** {REFERENCE_FRAMES_COUNT} frames
- **Frame Strategy:** {REFERENCE_FRAMES_STRATEGY}
- **Frame Format:** {REFERENCE_FRAMES_FORMAT}
- **Max Total Retry Time:** {MAX_TOTAL_RETRY_TIME}s ({MAX_TOTAL_RETRY_TIME/3600:.1f} hours)
- **Output Directory:** `{OUT_DIR}`

---

""")

def log_md(content):
    """Append content to markdown log file"""
    with open(log_path, "a") as f:
        f.write(content + "\n")

log_md(f"## Phase 1: Dependencies Installation")
log_md(f"✓ All dependencies installed successfully\n")
log_md(f"---\n")

print(f"Output folder: {OUT_DIR}")
print(f"Multi-frame mode: {REFERENCE_FRAMES_COUNT} frames ({REFERENCE_FRAMES_STRATEGY})")
print(f"Logging to: {log_path}")

# 3) The planner system prompt

We’ll ask the planner model to output a clean JSON object with one prompt per generation.
The prompts contain context and the actual shot details, maximizing continuity.

This isn't super optimized and was a first pass done by GPT. If people like this notebook, let me know on X, and I'll improve this!

In [None]:
PLANNER_SYSTEM_INSTRUCTIONS = r"""
You are a senior prompt director for Sora 2. Your job is to transform:
- a Base prompt (broad idea),
- a fixed generation length per segment (seconds),
- and a total number of generations (N),

into **N crystal-clear shot prompts** with **maximum continuity** across segments.

IMPORTANT CONTEXT ABOUT MULTI-FRAME CONTINUITY:
- We are using {REFERENCE_FRAMES_COUNT} reference frames per segment
- Frame extraction strategy: {REFERENCE_FRAMES_STRATEGY}
- If "evenly_spaced": Frames show the progression of ALL previous video content from start to end
- If "end_of_file": Frames show the final moments/motion at the end of the previous segment

Your prompts must instruct Sora to USE these reference frames:
- For "evenly_spaced" mode: Tell Sora to "use the provided reference frames to understand the motion and progression throughout the entire scene so far"
- For "end_of_file" mode: Tell Sora to "use the provided reference frames to understand the motion at the end of the previous clip"

Rules:
1) Return **valid JSON** only. Structure:
   {{
     "segments": [
       {{
         "title": "Generation 1",
         "seconds": 6,
         "prompt": "<prompt block to send into Sora>"
       }},
       ...
     ]
   }}
   - `seconds` MUST equal the given generation length for ALL segments.
   - `prompt` should include a **Context** section for model guidance AND a **Prompt** line for the shot itself,
     exactly like in the example below.
2) Continuity:
   - Segment 1 starts fresh from the BASE PROMPT.
   - Segment k (k>1) must **begin exactly at the final frame** of segment k-1.
   - Maintain consistent visual style, tone, lighting, and subject identity unless explicitly told to change.
   - EXPLICITLY instruct Sora to use the reference frames for continuity
3) Safety & platform constraints:
   - Do not depict real people (including public figures) or copyrighted characters.
   - Avoid copyrighted music and avoid exact trademark/logos if policy disallows them; use brand-safe wording.
   - Keep content suitable for general audiences.
4) Output only JSON (no Markdown, no backticks).
5) Keep the **Context** lines inside the prompt text (they're for the AI, not visible).
6) Make the writing specific and cinematic; describe camera, lighting, motion, and subject focus succinctly.

Below is an **EXAMPLE (verbatim)** of exactly how to structure prompts with context and continuity:

Example:
Base prompt: "Intro video for the iPhone 19"
Generation length: 6 seconds each
Total generations: 3

Clearly defined prompts with maximum continuity and context:

### Generation 1:

<prompt>
First shot introducing the new iPhone 19. Initially, the screen is completely dark. The phone, positioned vertically and facing directly forward, emerges slowly and dramatically out of darkness, gradually illuminated from the center of the screen outward, showcasing a vibrant, colorful, dynamic wallpaper on its edge-to-edge glass display. The style is futuristic, sleek, and premium, appropriate for an official Apple product reveal.
<prompt>

---

### Generation 2:

<prompt>
Context (not visible in video, only for AI guidance):

* You are creating the second part of an official intro video for Apple's new iPhone 19.
* The previous 6-second scene ended with the phone facing directly forward, clearly displaying its vibrant front screen and colorful wallpaper.
* Use the provided reference frames to understand the motion and progression throughout the entire scene so far.

Prompt: Second shot begins exactly from the final frame of the previous scene, showing the front of the iPhone 19 with its vibrant, colorful display clearly visible. Now, smoothly rotate the phone horizontally, turning it from the front to reveal the back side. Focus specifically on the advanced triple-lens camera module, clearly highlighting its premium materials, reflective metallic surfaces, and detailed lenses. Maintain consistent dramatic lighting, sleek visual style, and luxurious feel matching the official Apple product introduction theme.
</prompt>

---

### Generation 3:

<prompt>
Context (not visible in video, only for AI guidance):

* You are creating the third and final part of an official intro video for Apple's new iPhone 19.
* The previous 6-second scene ended clearly showing the back of the iPhone 19, focusing specifically on its advanced triple-lens camera module.
* Use the provided reference frames to understand the motion and progression throughout the entire scene so far.

Prompt: Final shot begins exactly from the final frame of the previous scene, clearly displaying the back side of the iPhone 19, with special emphasis on the triple-lens camera module. Now, have a user's hand gently pick up the phone, naturally rotating it from the back to the front and bringing it upward toward their face. Clearly show the phone smoothly and quickly unlocking via Face ID recognition, transitioning immediately to a vibrant home screen filled with updated app icons. Finish the scene by subtly fading the home screen into the iconic Apple logo. Keep the visual style consistent, premium, and elegant, suitable for an official Apple product launch.
</prompt>

--

Notice how we broke up the initial prompt into multiple prompts that provide context and continuity so this all works seamlessly.
""".strip()

# Format the instructions with actual config values
PLANNER_SYSTEM_INSTRUCTIONS = PLANNER_SYSTEM_INSTRUCTIONS.format(
    REFERENCE_FRAMES_COUNT=REFERENCE_FRAMES_COUNT,
    REFERENCE_FRAMES_STRATEGY=REFERENCE_FRAMES_STRATEGY
)

# 4) Planner: ask the LLM to generate prompts (JSON)

In [None]:
def plan_prompts_with_ai(base_prompt: str, seconds_per_segment: int, num_generations: int):
    """
    Calls the Responses API to produce a JSON object:
    {
      "segments": [
        {"title": "...", "seconds": <int>, "prompt": "<full prompt block>"},
        ...
      ]
    }
    """
    log_md(f"## Phase 2: Prompt Planning ({PLANNER_MODEL})")
    start_time = time.time()
    log_md(f"**Started:** {datetime.now().strftime('%H:%M:%S')}")
    log_md(f"\n**Base Prompt:**")
    log_md(f"```\n{base_prompt}\n```\n")
    
    # Compose a single plain-text input with the variables:
    user_input = f"""
BASE PROMPT: {base_prompt}

GENERATION LENGTH (seconds): {seconds_per_segment}
TOTAL GENERATIONS: {num_generations}

Return exactly {num_generations} segments.
""".strip()

    # Minimal Responses API call; see docs & library readme for details.
    # (If your account lacks the requested model, change PLANNER_MODEL accordingly.)
    resp = client.responses.create(
        model=PLANNER_MODEL,
        instructions=PLANNER_SYSTEM_INSTRUCTIONS,
        input=user_input,
    )
    
    elapsed = time.time() - start_time
    log_md(f"**Response received:** {datetime.now().strftime('%H:%M:%S')} ({elapsed:.1f}s)")

    text = getattr(resp, "output_text", None) or ""
    if not text:
        # Fallback: collect from structured blocks if needed
        # (Different SDK versions may put text in resp.output or in content items.)
        try:
            # Attempt to reconstruct from generic fields
            text = json.dumps(resp.to_dict())
        except Exception:
            raise RuntimeError("Planner returned no text; try changing PLANNER_MODEL.")

    # Extract the first JSON object found in the response text.
    m = re.search(r'\{[\s\S]*\}', text)
    if not m:
        raise ValueError("Planner did not return JSON. Inspect response and adjust instructions.")
    data = json.loads(m.group(0))

    # Basic validation and enforcement
    segments = data.get("segments", [])
    if len(segments) != num_generations:
        segments = segments[:num_generations]
        # or pad/adjust; here we simply clamp.

    # Force durations to the requested number (some models might deviate)
    for seg in segments:
        seg["seconds"] = int(seconds_per_segment)

    # Log generated segments
    log_md(f"\n### Generated Segments:\n")
    for i, seg in enumerate(segments, start=1):
        log_md(f"#### Segment {i}: \"{seg.get('title', 'Untitled')}\" ({seg['seconds']}s)")
        log_md(f"```\n{seg['prompt']}\n```\n")
    
    log_md(f"---\n")
    
    return segments

segments_plan = plan_prompts_with_ai(BASE_PROMPT, SECONDS_PER_SEGMENT, NUM_GENERATIONS)

print("AI‑planned segments:\n")
for i, seg in enumerate(segments_plan, start=1):
    print(f"[{i:02d}] {seg['seconds']}s — {seg.get('title','(untitled)')}")
    print(seg["prompt"])
    print("-" * 80)

# 5) Sora helpers (create → poll → download → extract final frame)

In [None]:
import json, mimetypes
from pathlib import Path
from typing import Optional, List
import requests
from PIL import Image

API_BASE = "https://api.openai.com/v1"
HEADERS_AUTH = {"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"}

def guess_mime(path: Path) -> str:
    t = mimetypes.guess_type(str(path))[0]
    return t or "application/octet-stream"

def _dump_error(resp: requests.Response):
    rid = resp.headers.get("x-request-id", "<none>")
    try:
        body = resp.json()
    except Exception:
        body = resp.text
    return f"HTTP {resp.status_code} (request-id: {rid})\n{body}"

def extract_reference_frames(video_paths: List[Path], frame_count: int, strategy: str, out_path: Path) -> Path:
    """
    Extract reference frames from ALL videos combined and create animated WebP/PNG.
    
    Args:
        video_paths: List of video file paths to extract from (in order)
        frame_count: Number of frames to extract
        strategy: "evenly_spaced" or "end_of_file"
        out_path: Output path for animated file
    
    Returns:
        Path to created animated file
    """
    log_md(f"  - Extracting {frame_count} frames ({strategy}) from {len(video_paths)} video(s)")
    
    all_frames = []
    total_frame_count = 0
    
    # Load all videos and count total frames
    caps = [cv2.VideoCapture(str(p)) for p in video_paths]
    frame_counts = [int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) for cap in caps]
    total_frame_count = sum(frame_counts)
    
    log_md(f"  - Total frames available: {total_frame_count}")
    
    # Determine which frames to extract
    if frame_count >= total_frame_count:
        # Extract ALL frames
        frame_indices = list(range(total_frame_count))
        log_md(f"  - Extracting ALL {total_frame_count} frames")
    elif strategy == "evenly_spaced":
        # Always include first and last, distribute rest evenly
        if frame_count == 1:
            frame_indices = [total_frame_count - 1]
        elif frame_count == 2:
            frame_indices = [0, total_frame_count - 1]
        else:
            step = (total_frame_count - 1) / (frame_count - 1)
            frame_indices = [int(i * step) for i in range(frame_count)]
    else:  # end_of_file
        # Extract last N frames
        start_frame = max(0, total_frame_count - frame_count)
        frame_indices = list(range(start_frame, total_frame_count))
    
    log_md(f"  - Frame indices: {frame_indices[:5]}{'...' if len(frame_indices) > 5 else ''}")
    
    # Extract frames
    global_frame_idx = 0
    for video_idx, (cap, frame_count_in_video) in enumerate(zip(caps, frame_counts)):
        for local_frame_idx in range(frame_count_in_video):
            ret, frame = cap.read()
            if not ret:
                break
            
            if global_frame_idx in frame_indices:
                # Convert BGR to RGB for PIL
                frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                pil_image = Image.fromarray(frame_rgb)
                all_frames.append(pil_image)
            
            global_frame_idx += 1
    
    # Close all captures
    for cap in caps:
        cap.release()
    
    if not all_frames:
        raise RuntimeError("No frames extracted")
    
    log_md(f"  - Extracted {len(all_frames)} frames successfully")
    
    # Create animated file
    out_path.parent.mkdir(parents=True, exist_ok=True)
    
    if REFERENCE_FRAMES_FORMAT == "webp":
        # Save as animated WebP
        all_frames[0].save(
            str(out_path),
            save_all=True,
            append_images=all_frames[1:],
            duration=100,  # 100ms per frame = 10fps
            loop=0
        )
    else:  # png (APNG)
        # Save as animated PNG
        all_frames[0].save(
            str(out_path),
            save_all=True,
            append_images=all_frames[1:],
            duration=100,
            loop=0,
            format="PNG"
        )
    
    file_size = out_path.stat().st_size / 1024
    log_md(f"  - Created animated {REFERENCE_FRAMES_FORMAT.upper()}: `{out_path.name}` ({file_size:.1f} KB)")
    
    return out_path


def create_video(prompt: str, size: str, seconds: int, model: str, input_reference: Optional[Path]):
    """
    Create video with exponential backoff retry for all errors.
    Retries until MAX_TOTAL_RETRY_TIME is reached (no attempt limit).
    """
    files = {
        "model":   (None, model),
        "prompt":  (None, prompt),
        "seconds": (None, str(seconds)),
    }
    if size:
        files["size"] = (None, size)

    if input_reference is not None:
        mime = guess_mime(input_reference)
        files["input_reference"] = (Path(input_reference).name, open(input_reference, "rb"), mime)
        log_md(f"  - Using input reference: `{input_reference.name}` ({input_reference.stat().st_size / 1024:.1f} KB)")

    retry_start_time = time.time()
    attempt = 0
    
    while True:
        elapsed_total = time.time() - retry_start_time
        if elapsed_total > MAX_TOTAL_RETRY_TIME:
            log_md(f"  - ❌ **TIMEOUT:** Exceeded max retry time ({MAX_TOTAL_RETRY_TIME}s / {MAX_TOTAL_RETRY_TIME/3600:.1f}h)")
            raise RuntimeError(f"Failed after {elapsed_total:.1f}s (max: {MAX_TOTAL_RETRY_TIME}s)")
        
        try:
            r = requests.post(f"{API_BASE}/videos", headers=HEADERS_AUTH, files=files, timeout=300)
            if r.status_code >= 400:
                error_msg = _dump_error(r)
                
                # Check if this is a retryable error
                is_rate_limit = r.status_code == 429
                is_server_error = r.status_code >= 500
                is_moderation = "moderation" in error_msg.lower()
                
                if is_moderation:
                    # Don't retry moderation blocks
                    log_md(f"  - ❌ **MODERATION ERROR:** {error_msg}")
                    raise RuntimeError("Moderation block (not retrying):\n" + error_msg)
                
                if is_rate_limit or is_server_error:
                    delay = min(INITIAL_RETRY_DELAY * (2 ** attempt), MAX_RETRY_DELAY)
                    remaining_time = MAX_TOTAL_RETRY_TIME - elapsed_total
                    log_md(f"  - ⚠️ **Attempt {attempt + 1} failed** (HTTP {r.status_code}): Retrying in {delay}s... ({remaining_time:.0f}s remaining)")
                    print(f"  Rate limit/server error, retrying in {delay}s... (attempt {attempt + 1}, {remaining_time:.0f}s remaining)")
                    time.sleep(delay)
                    attempt += 1
                    continue
                
                # Non-retryable error
                log_md(f"  - ❌ **ERROR:** {error_msg}")
                raise RuntimeError("Create video failed:\n" + error_msg)
            
            # Success!
            if attempt > 0:
                log_md(f"  - ✓ **Succeeded** on attempt {attempt + 1} after {elapsed_total:.1f}s")
            return r.json()
            
        except requests.exceptions.RequestException as e:
            delay = min(INITIAL_RETRY_DELAY * (2 ** attempt), MAX_RETRY_DELAY)
            remaining_time = MAX_TOTAL_RETRY_TIME - elapsed_total
            log_md(f"  - ⚠️ **Network error** (attempt {attempt + 1}): {str(e)} - Retrying in {delay}s... ({remaining_time:.0f}s remaining)")
            print(f"  Network error, retrying in {delay}s... (attempt {attempt + 1}, {remaining_time:.0f}s remaining)")
            time.sleep(delay)
            attempt += 1
            continue


def retrieve_video(video_id: str):
    r = requests.get(f"{API_BASE}/videos/{video_id}", headers=HEADERS_AUTH, timeout=60)
    if r.status_code >= 400:
        raise RuntimeError("Retrieve video failed:\n" + _dump_error(r))
    return r.json()

def download_video_content(video_id: str, out_path: Path, variant: str = "video"):
    with requests.get(
        f"{API_BASE}/videos/{video_id}/content",
        headers=HEADERS_AUTH,
        params={"variant": variant},
        stream=True,
        timeout=600,
    ) as r:
        if r.status_code >= 400:
            raise RuntimeError("Download failed:\n" + _dump_error(r))
        with open(out_path, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                if chunk: f.write(chunk)
    return out_path

def poll_until_complete(job: dict, poll_interval=POLL_INTERVAL_SEC, segment_num=1):
    video = job
    vid = video["id"]
    start_poll_time = time.time()
    last_progress = -1

    def bar(pct: float, width=30):
        filled = int(max(0.0, min(100.0, pct)) / 100 * width)
        return "=" * filled + "-" * (width - filled)

    while video.get("status") in ("queued", "in_progress"):
        if PRINT_PROGRESS_BAR:
            pct = float(video.get("progress", 0) or 0)
            status_text = "Queued" if video["status"] == "queued" else "Processing"
            print(f"\r{status_text}: [{bar(pct)}] {pct:5.1f}%", end="")
            
            # Log progress at 25% intervals
            if int(pct / 25) > int(last_progress / 25):
                elapsed = time.time() - start_poll_time
                log_md(f"  - Progress: {pct:.1f}% ({status_text}, {elapsed:.1f}s elapsed)")
                last_progress = pct
                
        time.sleep(poll_interval)
        video = retrieve_video(vid)

    if PRINT_PROGRESS_BAR:
        print()

    total_time = time.time() - start_poll_time
    
    if video.get("status") != "completed":
        msg = (video.get("error") or {}).get("message", f"Job {vid} failed")
        log_md(f"  - ❌ **FAILED** after {total_time:.1f}s: {msg}")
        raise RuntimeError(msg)
    
    log_md(f"  - ✓ **Completed** in {total_time:.1f}s")
    return video

# 6) Chain generator (use planner output; continuity via final frame)

In [None]:
def chain_generate_sora(segments, size: str, model: str):
    """
    segments: list of {"title": str, "seconds": int, "prompt": str}
    Returns list of video segment Paths.

    Multi-frame mode:
    - Extract frames from ALL segments generated so far (cumulative)
    - Use animated WebP/PNG with multiple reference frames
    """
    log_md(f"## Phase 3: Video Generation\n")
    total_start = time.time()

    input_ref = None
    segment_paths = []

    for i, seg in enumerate(segments, start=1):
        secs   = int(seg["seconds"])
        prompt = seg["prompt"]

        log_md(f"### Segment {i}/{len(segments)}")
        log_md(f"**Title:** {seg.get('title', 'Untitled')}")
        log_md(f"**Duration:** {secs} seconds")
        log_md(f"**Started:** {datetime.now().strftime('%H:%M:%S')}")
        log_md(f"\n**Prompt:**")
        log_md(f"```\n{prompt}\n```\n")

        print(f"\n=== Generating Segment {i}/{len(segments)} — {secs}s ===")
        seg_start = time.time()
        job = create_video(prompt=prompt, size=size, seconds=secs, model=model, input_reference=input_ref)
        log_md(f"  - **Job ID:** `{job['id']}`")
        log_md(f"  - **Status:** {job['status']}")
        print("Started job:", job["id"], "| status:", job["status"])

        completed = poll_until_complete(job, segment_num=i)

        seg_path = OUT_DIR / f"segment_{i:02d}.mp4"
        download_video_content(completed["id"], seg_path, variant="video")
        file_size_mb = seg_path.stat().st_size / (1024 * 1024)
        log_md(f"  - **Downloaded:** `{seg_path.name}` ({file_size_mb:.2f} MB)")
        print("Saved", seg_path)
        segment_paths.append(seg_path)

        # Extract reference frames from ALL segments so far for the next segment
        if i < len(segments):  # Don't extract for last segment
            ref_path = OUT_DIR / f"reference_frames_{i:02d}_to_{i+1:02d}.{REFERENCE_FRAMES_FORMAT}"
            log_md(f"\n**Extracting reference frames for next segment:**")
            input_ref = extract_reference_frames(
                video_paths=segment_paths,  # All segments accumulated so far
                frame_count=REFERENCE_FRAMES_COUNT,
                strategy=REFERENCE_FRAMES_STRATEGY,
                out_path=ref_path
            )
            print(f"Extracted {REFERENCE_FRAMES_COUNT} reference frames ->", ref_path)

        seg_duration = time.time() - seg_start
        log_md(f"  - **Total time for segment:** {seg_duration:.1f}s\n")

    total_duration = time.time() - total_start
    log_md(f"**Total video generation time:** {total_duration:.1f}s\n")
    log_md(f"---\n")

    return segment_paths


def concatenate_segments(segment_paths, out_path: Path) -> Path:
    log_md(f"## Phase 4: Video Concatenation\n")
    log_md(f"**Started:** {datetime.now().strftime('%H:%M:%S')}")
    log_md(f"\n**Input files:**")

    concat_start = time.time()
    clips = [VideoFileClip(str(p)) for p in segment_paths]

    for i, (path, clip) in enumerate(zip(segment_paths, clips), 1):
        size_mb = path.stat().st_size / (1024 * 1024)
        log_md(f"  {i}. `{path.name}` ({size_mb:.2f} MB, {clip.duration:.2f}s @ {clip.fps:.1f} fps)")

    target_fps = clips[0].fps or 24
    log_md(f"\n**Target FPS:** {target_fps}")
    log_md(f"**Output file:** `{out_path.name}`\n")

    result = concatenate_videoclips(clips, method="compose")
    result.write_videofile(
        str(out_path),
        codec="libx264",
        audio_codec="aac",
        fps=target_fps,
        preset="medium",
        threads=0
    )
    for c in clips:
        c.close()

    concat_duration = time.time() - concat_start
    final_size_mb = out_path.stat().st_size / (1024 * 1024)
    log_md(f"✓ **Concatenation complete**")
    log_md(f"  - **Final file:** `{out_path.name}` ({final_size_mb:.2f} MB)")
    log_md(f"  - **Duration:** {result.duration:.2f} seconds")
    log_md(f"  - **Time taken:** {concat_duration:.1f}s\n")
    log_md(f"---\n")

    return out_path

# 7) Run the whole pipeline

In [None]:
# 1) (Already ran) Plan prompts with AI -> segments_plan
# 2) Generate with Sora 2 in a chain
execution_start = time.time()
segment_paths = chain_generate_sora(segments_plan, size=SIZE, model=SORA_MODEL)

# 3) Concatenate
final_path = OUT_DIR / "combined.mp4"
concatenate_segments(segment_paths, final_path)
print("\nWrote combined video:", final_path)

# 4) Write summary to log
total_execution = time.time() - execution_start
log_md(f"## Summary\n")
log_md(f"✓ **Execution completed successfully**")
log_md(f"**Finished:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
log_md(f"**Total execution time:** {total_execution:.1f}s ({total_execution/60:.1f} minutes)")
log_md(f"**Final output:** `{final_path}`")
log_md(f"**Total segments generated:** {len(segment_paths)}")
log_md(f"**Total video duration:** {SECONDS_PER_SEGMENT * NUM_GENERATIONS} seconds\n")

print(f"\n{'='*60}")
print(f"✓ EXECUTION COMPLETE")
print(f"{'='*60}")
print(f"Log file: {log_path}")
print(f"Video output: {final_path}")
print(f"Total time: {total_execution:.1f}s")

# 5) Inline preview
display(IPyVideo(str(final_path), embed=True, width=768))
