# Podcast Studio – Jupyter Notebook Overview

This Jupyter notebook implements an **end-to-end podcast generation pipeline**.  
Given one or more **URLs**, it:

1. Fetches and cleans text from web pages
2. Generates a podcast script using OpenAI
3. Converts the script to audio in chunks
4. Merges the audio chunks into a single file
5. Exposes the workflow through a **Gradio web interface**





## Step 1 – Setup & OpenAI API Connectivity Test

This cell is a **sanity check** for the prototype.  
Its goal is to confirm that the environment and OpenAI API connection are working **before** building the rest of the pipeline.

---

### What this step does

- Loads environment variables from a `.env` file
- Reads the OpenAI API key securely
- Initializes the OpenAI client
- Sends a minimal test prompt to the model
- Prints the model’s response



In [5]:
from dotenv import load_dotenv
import os
load_dotenv()
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def say_hello():
    response = client.chat.completions.create(
        model="gpt-5-nano",  # or another available model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello to me in one short sentence."}
        ]
    )
    # Get the text of the assistant's reply
    return response.choices[0].message.content

if __name__ == "__main__":
    hello_message = say_hello()
    print(hello_message)


Hello there!


## Step 2 – Web Article Fetching & Text Extraction

This step is responsible for **turning web pages into clean, readable text** that can be used as input for the podcast script.  

### What it does
- Accepts a list of URLs pointing to online articles.
- Downloads each web page and extracts the main textual content.
- Removes HTML, scripts, navigation menus, ads, and other noise.
- Prints a preview of the first part of the text to check that extraction worked correctly.
- Includes error handling to ensure that if one URL fails, the process continues with the others.

### Why this step matters
- Provides the **raw material** for the podcast script generation.
- Ensures the input to the AI is clean and readable.
- Acts as a foundation for all subsequent steps in the pipeline (script generation, audio creation, and UI interaction).
- Allows developers to validate and debug web content extraction quickly during prototype development.


In [6]:
from newspaper import Article

urls = [
    "https://aeon.co/essays/how-the-fall-of-the-roman-empire-paved-the-road-to-modernity",
    "https://www.bbc.co.uk/history/ancient/romans/fallofrome_article_01.shtml",
]

def fetch_and_print(urls):
    for url in urls:
        try:
            article = Article(url)
            article.download()
            article.parse()

            print(f"----- Text content from: {url} -----\n")
            print(article.text[:2000])
            print("\n\n")

        except Exception as e:
            print(f"Error fetching {url}: {e}")

if __name__ == "__main__":
    fetch_and_print(urls)

----- Text content from: https://aeon.co/essays/how-the-fall-of-the-roman-empire-paved-the-road-to-modernity -----

For an empire that collapsed more than 1,500 years ago, ancient Rome maintains a powerful presence. About 1 billion people speak languages derived from Latin; Roman law shapes modern norms; and Roman architecture has been widely imitated. Christianity, which the empire embraced in its sunset years, remains the world’s largest religion. Yet all these enduring influences pale against Rome’s most important legacy: its fall. Had its empire not unravelled, or had it been replaced by a similarly overpowering successor, the world wouldn’t have become modern.

This isn’t the way that we ordinarily think about an event that has been lamented pretty much ever since it happened. In the late 18th century, in his monumental work The History of the Decline and Fall of the Roman Empire (1776-1788), the British historian Edward Gibbon called it ‘the greatest, perhaps, and most awful scen

## Step 3 – End-to-End Podcast Script and Audio Generation (Prototype)

This step combines **web content fetching, AI script writing, and text-to-speech generation** into a single prototype workflow. It demonstrates how the notebook transforms raw web articles into a playable podcast segment.

---

### What it does

1. **Fetch articles**  
   - Iterates through a list of URLs and downloads their content.  
   - Extracts the main text from each web page.  
   - Collects the articles in a structured format for later processing.  

2. **Generate a podcast script**  
   - Takes the extracted articles and concatenates them as sources.  
   - Creates a detailed prompt for the OpenAI model, specifying:  
     - Target audience (general curious listeners)  
     - Tone (engaging, slightly narrative)  
     - Episode structure (hook, context, key ideas, recap, reflective closing)  
     - Optional host cues ([PAUSE], [MUSIC IN/OUT])  
   - Sends the prompt to the OpenAI API to generate an original podcast script, **rephrasing ideas** rather than copying them.  

3. **Convert the script to audio**  
   - Uses a text-to-speech model to turn the script (or the first chunk of it) into an audio file.  
   - Saves the generated audio as a `.mp3` file for playback.  

4. **Prototype output**  
   - Prints the generated script for review.  
   - Produces a playable audio file representing a short podcast episode.  

---

### Why this step matters in the prototype

- Shows a **complete proof-of-concept**: URL → Script → Audio.  
- Validates integration between **web scraping**, **LLM-based script generation**, and **TTS**.  
- Allows testing and iteration of prompt design, script structure, and voice generation.  
- Forms the foundation for handling **multiple chunks**, more complex scripts, and eventual UI integration.  

---

This cell essentially demonstrates the **core functionality of the podcast studio** in a single, executable workflow.


In [10]:
import os
from newspaper import Article
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

urls = [
    "https://aeon.co/essays/how-the-fall-of-the-roman-empire-paved-the-road-to-modernity",
    "https://www.bbc.co.uk/history/ancient/romans/fallofrome_article_01.shtml",
]

def fetch_articles(urls):
    articles = []
    for url in urls:
        try:
            article = Article(url)
            article.download()
            article.parse()

            articles.append({
                "url": url,
                "text": article.text
            })

        except Exception as e:
            print(f"Error fetching {url}: {e}")

    return articles


def generate_podcast_script(articles):
    sources_text = ""
    for i, a in enumerate(articles, start=1):
        sources_text += f"\nSOURCE {i} ({a['url']}):\n{a['text']}\n"

    prompt = f"""
You are a podcast scriptwriter.

Task:
Using the sources below, write an ORIGINAL podcast script suitable for a
5–10 minute episode.

Guidelines:
- Target audience: curious general listeners
- Tone: clear, engaging, slightly narrative
- Do NOT copy sentences or paragraphs from the sources
- Rephrase ideas in your own words
- Structure:
  1. Hook (30–45 seconds)
  2. Context and background
  3. 3–5 key ideas with explanations
  4. Short recap
  5. Reflective closing question

Optional:
- Light host cues like [PAUSE], [MUSIC IN], [MUSIC OUT]

Sources:
{sources_text}
"""

    response = client.chat.completions.create(
        model="gpt-5-nano",
        messages=[
            {"role": "system", "content": "You write engaging podcast scripts."},
            {"role": "user", "content": prompt},
        ]
    )

    return response.choices[0].message.content


if __name__ == "__main__":
    articles = fetch_articles(urls)
    script = generate_podcast_script(articles)

    print("\n===== GENERATED PODCAST SCRIPT =====\n")
    print(script)

    speech = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input=script[:4000]
)

    output_file = "podcast_episode.mp3"

    with open(output_file, "wb") as f:
        f.write(speech.read())

    print(f"Audio file created: {output_file}")



===== GENERATED PODCAST SCRIPT =====

[MUSIC IN — a light, reflective tune fades in]

Host: Imagine standing at the edge of a vast, once-mighty empire. The roads you’ve known fade into ruins, the coin you once used barely clinks in your pocket, and new rulers arrive with unfamiliar banners. It sounds like a catastrophe. But what if that very moment—the collapse—wasn’t the end of something, but the opening of something else entirely? A doorway through which a very different kind of world could begin to take shape. Welcome to Across the Ruins, where we ask how the fall of an empire helped spark a path to our modern world.

[PAUSE, MUSIC FADES]

Context and background

Host: We’re talking about the fall of Rome in the western half of its empire—a transition that didn’t just rewrite maps, but rewired how power, money, and ideas circulated. By the late 5th century, central Rome’s grip loosened. Germanic kingdoms emerged on old soil, and the “Roman state” as a single, centralized machine ga

## Step 4 – Full Prototype Pipeline with Chunking, TTS, and Audio Concatenation

This cell represents the **an almost complete version of the prototype**, combining all prior steps and adding robust handling for long scripts, multiple audio chunks, and final podcast assembly.

---

### What it does

1. **Configuration & Setup**
   - Loads environment variables and initializes the OpenAI client.
   - Defines input URLs, output directories, TTS model/voice, and chunk size limits.
   - Prepares folders for storing temporary chunk files and the final episode.

2. **Web Article Fetching**
   - Downloads and parses each URL into clean text.
   - Stores articles in a structured format for script generation.
   - Includes error handling to skip problematic URLs without halting the pipeline.

3. **Podcast Script Generation**
   - Aggregates all fetched articles into a prompt for the LLM.
   - Defines a detailed podcast structure, tone, and narrative style:
     - Hook, context, 3–5 key ideas, recap, reflective closing.
     - Emphasizes storytelling and lessons from history.
   - Sends the prompt to the OpenAI model to produce an original script.

4. **Chunking the Script**
   - Splits the script into manageable segments for TTS.
   - Ensures chunks do not exceed TTS model limits.
   - Respects paragraph and sentence boundaries when possible.
   - Produces a list of text chunks ready for audio conversion.

5. **Text-to-Speech (TTS)**
   - Converts each chunk into an individual MP3 file.
   - Supports multiple voices or TTS models.
   - Saves each chunk to a dedicated output directory.

6. **Audio Concatenation**
   - Combines all chunked MP3 files into a single episode.
   - Uses **ffmpeg** if available for robust and fast concatenation.
   - Falls back to **pydub** if ffmpeg is not installed.
   - Ensures the final audio file is continuous and playable.

7. **Prototype Output**
   - Prints a preview of the generated script.
   - Shows the number of chunks and progress during TTS generation.
   - Confirms the path of the final audio file once completed.

---

### Why this step matters in the prototype

- Demonstrates a **production-ready workflow** from URLs to a finished podcast episode.
- Handles **long scripts** and ensures TTS limitations do not break the pipeline.
- Separates concerns:
  - Fetching articles
  - Script generation
  - Chunking
  - Audio creation
  - Final assembly
- Provides a clear structure for moving from **notebook prototyping** to **modular Python scripts**.

This step essentially represents the **core engine of the AI podcast studio**, capable of generating full episodes from multiple sources in an automated, reproducible way.


In [20]:
import os
import re
import shutil
import subprocess
from pathlib import Path

from dotenv import load_dotenv
from newspaper import Article
from openai import OpenAI

# -----------------------------
# Config
# -----------------------------
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

urls = [
    "https://aeon.co/essays/how-the-fall-of-the-roman-empire-paved-the-road-to-modernity",
    "https://www.bbc.co.uk/history/ancient/romans/fallofrome_article_01.shtml",
    "https://www.khanacademy.org/humanities/whp-origins/x23c41635548726c4:regional-webs/x23c41635548726c4:why-do-empires-collapse-5-2/a/article-the-fall-of-rome"
]

OUTPUT_DIR = Path("podcast_output")
CHUNKS_DIR = OUTPUT_DIR / "chunks"
FINAL_AUDIO = OUTPUT_DIR / "podcast_episode_final.mp3"

TTS_MODEL = "tts-1"
TTS_VOICE = "onyx"

# Keep headroom for TTS limits
MAX_CHARS_PER_CHUNK = 3500


# -----------------------------
# Fetch + Script generation
# -----------------------------
def fetch_articles(urls):
    articles = []
    for url in urls:
        try:
            article = Article(url)
            article.download()
            article.parse()
            articles.append({"url": url, "text": article.text})
        except Exception as e:
            print(f"Error fetching {url}: {e}")
    return articles


def generate_podcast_script(articles):
    sources_text = ""
    for i, a in enumerate(articles, start=1):
        sources_text += f"\nSOURCE {i} ({a['url']}):\n{a['text']}\n"

    prompt = f"""

You are a podcast scriptwriter and narrator.
Task: Using the sources below, write an ORIGINAL podcast script about the fall of the Roman Empire, suitable for a 8–10 minute episode.
Guidelines for the script:
Audience: curious general listeners
Tone: clear, engaging, slightly narrative, cinematic
Do NOT copy sentences or paragraphs; rephrase ideas in your own words
Highlight lessons from history – show how events from the past may provide insights into today
Structure:
Hook (30–45 seconds) – grab attention with a dramatic scene and the idea that history repeats itself; for example:
Introduce Rome as one of the most advanced and significant civilizations ever
Emphasize that understanding its decline can give clues about patterns in our world today
Pose a reflective question to the listener: “Could the lessons of ancient Rome help us understand the challenges of our own time?”
Context and background – explain the empire’s peak, structure, significance, and key players
3–5 key ideas – focus on the main causes and consequences of Rome’s fall, explained clearly and with storytelling flair
Short recap – summarize the main points
Reflective closing question – tie back to learning from history and its relevance today
Narration style:
Speak in a natural, clear, and engaging voice
Use expressive tone, pacing, and pauses to emphasize dramatic moments
Ensure the listener can follow the story easily while staying engaged
End result: A script ready for a podcast episode that explains the fall of the Roman Empire, its key players, and the lessons we can learn today.

Sources:
{sources_text}
"""

    response = client.chat.completions.create(
        model="gpt-5-nano",
        messages=[
            {"role": "system", "content": "You write engaging podcast scripts."},
            {"role": "user", "content": prompt},
        ],
    )
    return response.choices[0].message.content


# -----------------------------
# Chunking
# -----------------------------
def normalize_whitespace(text: str) -> str:
    text = text.replace("\r\n", "\n").replace("\r", "\n")
    text = re.sub(r"[ \t]+", " ", text)
    text = re.sub(r"\n{3,}", "\n\n", text)
    return text.strip()


def chunk_text(text: str, max_chars: int = MAX_CHARS_PER_CHUNK) -> list[str]:
    """
    Chunk by paragraphs, then by sentences if needed.
    Avoids cutting mid-sentence when possible.
    Hard-slices only as a last resort.
    """
    text = normalize_whitespace(text)
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]

    chunks = []
    current = ""

    def flush():
        nonlocal current
        if current.strip():
            chunks.append(current.strip())
        current = ""

    sentence_split = re.compile(r"(?<=[.!?])\s+")

    for p in paragraphs:
        parts = [p]
        if len(p) > max_chars:
            parts = sentence_split.split(p)

        for part in parts:
            part = part.strip()
            if not part:
                continue

            candidate = part if not current else current + "\n\n" + part

            if len(candidate) <= max_chars:
                current = candidate
            else:
                flush()

                if len(part) > max_chars:
                    for i in range(0, len(part), max_chars):
                        piece = part[i : i + max_chars].strip()
                        if piece:
                            chunks.append(piece)
                else:
                    current = part

    flush()
    return chunks


# -----------------------------
# TTS
# -----------------------------
def tts_to_mp3(text: str, out_path: Path):
    speech = client.audio.speech.create(
        model=TTS_MODEL,
        voice=TTS_VOICE,
        input=text,
    )
    out_path.parent.mkdir(parents=True, exist_ok=True)
    with open(out_path, "wb") as f:
        f.write(speech.read())


# -----------------------------
# Audio concat (absolute paths)
# -----------------------------
def ffmpeg_available() -> bool:
    return shutil.which("ffmpeg") is not None


def _ffmpeg_concat_escape(path: Path) -> str:
    # concat demuxer lines are: file '...'
    # use absolute posix path + escape single quotes
    p = path.resolve().as_posix()
    return p.replace("'", r"'\''")


def concat_mp3_ffmpeg(mp3_files: list[Path], out_file: Path):
    """
    Robust concat:
      1) Try stream-copy concat (fast, may fail for MP3 chunks).
      2) Retry with re-encode (reliable).
    Uses absolute paths in concat_list to avoid path duplication issues.
    """
    out_file.parent.mkdir(parents=True, exist_ok=True)
    list_file = out_file.parent / "concat_list.txt"

    # Ensure all files exist before calling ffmpeg
    missing = [p for p in mp3_files if not p.exists()]
    if missing:
        raise FileNotFoundError(f"Missing chunk files: {missing}")

    with open(list_file, "w", encoding="utf-8") as f:
        for p in mp3_files:
            f.write(f"file '{_ffmpeg_concat_escape(p)}'\n")

    def run(cmd: list[str]):
        res = subprocess.run(cmd, capture_output=True, text=True)
        if res.returncode != 0:
            raise RuntimeError(
                "ffmpeg concat failed.\n"
                f"CMD: {' '.join(cmd)}\n\n"
                f"STDOUT:\n{res.stdout}\n\nSTDERR:\n{res.stderr}"
            )
        return res

    # 1) Stream-copy
    try:
        run([
            "ffmpeg", "-y",
            "-f", "concat", "-safe", "0",
            "-i", str(list_file),
            "-c", "copy",
            str(out_file),
        ])
        return
    except RuntimeError as e:
        print(str(e))
        print("\nRetrying with re-encode (more reliable)...\n")

    # 2) Re-encode
    run([
        "ffmpeg", "-y",
        "-f", "concat", "-safe", "0",
        "-i", str(list_file),
        "-vn",
        "-c:a", "libmp3lame",
        "-q:a", "2",
        "-ar", "44100",
        "-ac", "2",
        str(out_file),
    ])


def concat_mp3_pydub(mp3_files: list[Path], out_file: Path):
    from pydub import AudioSegment  # pip install pydub

    combined = AudioSegment.empty()
    for p in mp3_files:
        combined += AudioSegment.from_file(p, format="mp3")

    out_file.parent.mkdir(parents=True, exist_ok=True)
    combined.export(out_file, format="mp3")


# -----------------------------
# Main
# -----------------------------
def main():
    OUTPUT_DIR.mkdir(exist_ok=True)
    CHUNKS_DIR.mkdir(parents=True, exist_ok=True)

    articles = fetch_articles(urls)
    if not articles:
        raise RuntimeError("No articles fetched. Check URLs / connectivity.")

    script = generate_podcast_script(articles)

    print("\n===== GENERATED PODCAST SCRIPT (preview) =====\n")
    print(script + ("\n...\n" if len(script) > 2000 else ""))

    chunks = chunk_text(script, max_chars=MAX_CHARS_PER_CHUNK)
    print(f"\nChunked script into {len(chunks)} parts (<= {MAX_CHARS_PER_CHUNK} chars each).")


    mp3_files = []
    for idx, chunk in enumerate(chunks, start=1):
        mp3_path = CHUNKS_DIR / f"chunk_{idx:02d}.mp3"
        print(f"Generating audio for chunk {idx}/{len(chunks)} -> {mp3_path}")
        tts_to_mp3(chunk, mp3_path)
        mp3_files.append(mp3_path)

    print("\nCombining chunks into final audio...")

    if ffmpeg_available():
        concat_mp3_ffmpeg(mp3_files, FINAL_AUDIO)
    else:
        print("ffmpeg not found in PATH. Trying pydub fallback...")
        concat_mp3_pydub(mp3_files, FINAL_AUDIO)

    print(f"\nDone. Final audio file created: {FINAL_AUDIO.as_posix()}")

if __name__ == "__main__":
    main()



===== GENERATED PODCAST SCRIPT (preview) =====

Hook

SFX: a faint wind across stone, then the creak of a timber door. A distant hammer on an anvil, markets waking, horses neighing.

NARRATOR: Imagine a city built on stone and speech: Rome at its zenith, a tapestry of law, roads, temples, and a language that would echo for centuries. It’s not just a museum showpiece. It’s a living machine of power, culture, and engineering that once stitched a world together. And then—dramatically, almost overnight—the machine falters. Armies shift, money vanishes from the mint, and a city that once sent legions across continents becomes a shadow of its former self. What went wrong? And more hauntingly: if a superpower can fall, what patterns in our own world might be repeating themselves right now?

Rome was not merely a city or a collection of rulers. It was one of humanity’s greatest experiments in scale, law, and unity. Its fall didn’t just mark the end of an empire; it seeded a long, uneven ascen

## Step 5 – Two-Voice Podcast Experiment

This cell introduces a **proof-of-concept for multi-voice AI podcasts**, where two “speakers” (host and cohost) speak in turn. It demonstrates how to:

- Parse a dialogue script with multiple speakers
- Generate separate TTS audio segments per speaker
- Insert short pauses for natural pacing
- Concatenate all segments into a final podcast file

---

### What it does

1. **Configuration & Setup**
   - Initializes the OpenAI client and loads environment variables.
   - Defines TTS model, voices for host and cohost, output directories, and pause duration between turns.
   - Prepares folders for temporary segment files and the final audio.

2. **Short Test Script**
   - A simple dialogue script with `[HOST]` and `[COHOST]` tags.
   - Serves as a prototype to test multi-voice rendering and segment handling.

3. **Dialogue Parsing**
   - Splits the script into structured speaker turns.
   - Each turn contains the speaker identity and the corresponding text.
   - Ensures that text for each speaker is clearly separated for TTS generation.

4. **TTS Segment Generation**
   - Converts each speaker turn into a separate MP3 file using the specified voice.
   - Saves segments in a dedicated directory.
   - Inserts silent audio segments to create natural pauses between turns.

5. **Audio Concatenation**
   - Combines all speaker segments and pauses into a single, continuous MP3.
   - Uses `ffmpeg` to robustly handle concatenation and re-encoding.
   - Ensures timing, audio quality, and speaker order are preserved.

6. **Prototype Output**
   - Prints progress for each segment as it is generated.
   - Produces a final multi-voice MP3 file for listening.
   - Validates that multiple voices and pacing can work in a single episode.

---

### Why this step matters in the prototype

- Demonstrates **multi-voice AI capabilities**, moving beyond single-voice narration.
- Provides a template for **host/cohost dialogue** in podcasts.
- Allows testing of **timing, pacing, and natural transitions**.
- Forms the foundation for future features like:
  - Multiple guest voices
  - Dynamic speaker assignments
  - More complex conversational scripts

This step extends the prototype from linear, single-voice narration to **interactive, multi-speaker episodes**.


In [22]:
import os
import re
import shutil
import subprocess
from pathlib import Path

from dotenv import load_dotenv
from openai import OpenAI

# -----------------------------
# Config
# -----------------------------
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# TTS model: set this to what your account supports.
# Try: "gpt-4o-mini-tts" or fallback to "tts-1"
TTS_MODEL = "gpt-4o-mini-tts"
HOST_VOICE = "alloy"
COHOST_VOICE = "cedar"

OUTPUT_DIR = Path("tts_two_voice_test")
SEG_DIR = OUTPUT_DIR / "segments"
FINAL_AUDIO = OUTPUT_DIR / "two_voice_test_final.mp3"

PAUSE_MS_BETWEEN_TURNS = 450  # pacing pause


# -----------------------------
# Short test script (two speakers)
# -----------------------------
TEST_SCRIPT = """
[HOST]
Hey! Welcome back. Quick experiment: can AI do a natural two-voice podcast?

[COHOST]
Let’s find out. I’ll ask questions, you keep us on track.

[HOST]
Deal. First topic: why we should keep sentences short for text to speech.

[COHOST]
Because long sentences sound like someone reading an essay out loud.

[HOST]
Exactly. Okay, let’s render this and stitch it together.
""".strip()


# -----------------------------
# Helpers
# -----------------------------
def ffmpeg_available() -> bool:
    return shutil.which("ffmpeg") is not None


def run(cmd: list[str]) -> None:
    res = subprocess.run(cmd, capture_output=True, text=True)
    if res.returncode != 0:
        raise RuntimeError(
            "Command failed.\n"
            f"CMD: {' '.join(cmd)}\n\n"
            f"STDOUT:\n{res.stdout}\n\nSTDERR:\n{res.stderr}"
        )


def parse_dialogue(script: str):
    """
    Parses blocks like:
      [HOST]
      text...
      [COHOST]
      text...
    Returns: list of {"speaker": "...", "text": "..."}
    """
    lines = [ln.rstrip() for ln in script.splitlines()]
    blocks = []
    current_speaker = None
    current_text = []

    tag_re = re.compile(r"^\[(HOST|COHOST)\]\s*$")

    def flush():
        nonlocal current_speaker, current_text
        if current_speaker and any(t.strip() for t in current_text):
            blocks.append(
                {"speaker": current_speaker, "text": "\n".join(current_text).strip()}
            )
        current_text = []

    for ln in lines:
        m = tag_re.match(ln.strip())
        if m:
            flush()
            current_speaker = m.group(1)
        else:
            if current_speaker is not None:
                current_text.append(ln)

    flush()
    return blocks


def tts_segment(text: str, voice: str, out_path: Path):
    """
    Generates one MP3 segment for a single speaker turn.
    """
    out_path.parent.mkdir(parents=True, exist_ok=True)

    speech = client.audio.speech.create(
        model=TTS_MODEL,
        voice=voice,
        input=text,
    )

    with open(out_path, "wb") as f:
        f.write(speech.read())


def make_silence_mp3(duration_ms: int, out_path: Path):
    """
    Creates a silent MP3 using ffmpeg.
    """
    out_path.parent.mkdir(parents=True, exist_ok=True)
    seconds = max(duration_ms / 1000.0, 0.01)

    run([
        "ffmpeg", "-y",
        "-f", "lavfi",
        "-i", f"anullsrc=r=44100:cl=stereo",
        "-t", f"{seconds:.3f}",
        "-c:a", "libmp3lame",
        "-q:a", "2",
        str(out_path),
    ])


def concat_mp3_with_reencode(mp3_files: list[Path], out_file: Path):
    """
    Concats MP3s robustly using ffmpeg concat demuxer and re-encodes.
    Uses absolute paths to avoid path issues.
    """
    out_file.parent.mkdir(parents=True, exist_ok=True)
    list_file = out_file.parent / "concat_list.txt"

    for p in mp3_files:
        if not p.exists():
            raise FileNotFoundError(f"Missing file: {p}")

    def esc(p: Path) -> str:
        return p.resolve().as_posix().replace("'", r"'\''")

    with open(list_file, "w", encoding="utf-8") as f:
        for p in mp3_files:
            f.write(f"file '{esc(p)}'\n")

    run([
        "ffmpeg", "-y",
        "-f", "concat", "-safe", "0",
        "-i", str(list_file),
        "-vn",
        "-c:a", "libmp3lame",
        "-q:a", "2",
        "-ar", "44100",
        "-ac", "2",
        str(out_file),
    ])


# -----------------------------
# Main
# -----------------------------
def main():
    if not ffmpeg_available():
        raise RuntimeError("ffmpeg not found. Install ffmpeg and try again.")

    OUTPUT_DIR.mkdir(exist_ok=True)
    SEG_DIR.mkdir(parents=True, exist_ok=True)

    turns = parse_dialogue(TEST_SCRIPT)
    if not turns:
        raise RuntimeError("No dialogue turns parsed. Check your [HOST]/[COHOST] tags.")

    voice_map = {"HOST": HOST_VOICE, "COHOST": COHOST_VOICE}

    mp3_sequence = []
    silence_path = SEG_DIR / f"pause_{PAUSE_MS_BETWEEN_TURNS}ms.mp3"
    make_silence_mp3(PAUSE_MS_BETWEEN_TURNS, silence_path)

    print(f"Generating {len(turns)} segments with two voices using model={TTS_MODEL}...")

    for i, turn in enumerate(turns, start=1):
        speaker = turn["speaker"]
        text = turn["text"]

        out_mp3 = SEG_DIR / f"seg_{i:02d}_{speaker.lower()}.mp3"
        tts_segment(text=text, voice=voice_map[speaker], out_path=out_mp3)
        mp3_sequence.append(out_mp3)

        # Add pacing pause after each turn except the last
        if i != len(turns):
            mp3_sequence.append(silence_path)

        print(f"  {i:02d}. {speaker} -> {out_mp3.name}")

    print("Concatenating into final audio...")
    concat_mp3_with_reencode(mp3_sequence, FINAL_AUDIO)

    print(f"Done: {FINAL_AUDIO}")


if __name__ == "__main__":
    main()

Generating 5 segments with two voices using model=gpt-4o-mini-tts...
  01. HOST -> seg_01_host.mp3
  02. COHOST -> seg_02_cohost.mp3
  03. HOST -> seg_03_host.mp3
  04. COHOST -> seg_04_cohost.mp3
  05. HOST -> seg_05_host.mp3
Concatenating into final audio...
Done: tts_two_voice_test/two_voice_test_final.mp3


## Step 6 – Refactored Modular Pipeline

This cell represents a **clean, modular, and reusable version** of the podcast prototype, designed to be flexible and production-ready. It consolidates all previous steps—article fetching, script generation, chunking, TTS, and audio concatenation—into a single callable function.

---

### Key Features

1. **Flexible Inputs**
   - Accepts either pre-fetched article texts (`urls_text`) or a list of URLs to fetch (`fetch_urls`).
   - Allows targeting a specific episode length (`minutes`) and adjusting TTS chunk sizes (`max_chars`).
   - Supports specifying different TTS models and voices independently from script generation.

2. **Script Generation**
   - Generates an original podcast script using OpenAI models.
   - Emphasizes engaging storytelling, cinematic narration, and lessons from history.
   - Handles both the classic “fall of Rome” narrative and user-provided content.

3. **Chunking**
   - Splits the script into manageable segments for TTS.
   - Avoids cutting mid-sentence when possible.
   - Ensures each chunk respects model input limits.

4. **Text-to-Speech (TTS)**
   - Converts each chunk into an MP3 file.
   - Supports any OpenAI TTS model and voice combination.
   - Stores temporary audio files in a dedicated chunk directory.

5. **Audio Concatenation**
   - Combines all chunks into a single final episode.
   - Uses `ffmpeg` if available, falling back to `pydub` otherwise.
   - Produces a smooth, continuous MP3 suitable for podcast publishing.

6. **Refactored Pipeline Function**
   - `run_pipeline(...)` serves as the **single entry point** for end-to-end processing.
   - Returns the generated script, final audio file path, and optional metadata.
   - Makes the workflow reusable in notebooks, scripts, or production applications.

7. **Example Main**
   - Demonstrates running the pipeline with a set of URLs.
   - Prints a preview of the generated script and confirms final audio creation.
   - Shows backward compatibility with the previous workflow.

---

### Why this step matters

- Provides a **modular interface** that separates configuration, fetching, TTS, and concatenation.
- Allows **reuse across multiple projects** without rewriting core logic.
- Simplifies experimentation with:
  - Different source content
  - TTS voices and models
  - Episode lengths and chunk sizes
- Forms the foundation for a **scalable AI podcast generation system**, capable of both prototyping and production deployment.


In [1]:
import os
import re
import shutil
import subprocess
from pathlib import Path
from typing import Any, Optional

from dotenv import load_dotenv
from newspaper import Article
from openai import OpenAI


# -----------------------------
# Config (defaults)
# -----------------------------
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

OUTPUT_DIR = Path("podcast_output")
CHUNKS_DIR = OUTPUT_DIR / "chunks"
FINAL_AUDIO_DEFAULT = OUTPUT_DIR / "podcast_episode_final.mp3"

# Keep headroom for TTS limits
MAX_CHARS_PER_CHUNK_DEFAULT = 3500


# -----------------------------
# Fetch + Script generation
# -----------------------------
def fetch_articles(urls: list[str]) -> list[dict[str, str]]:
    articles: list[dict[str, str]] = []
    for url in urls:
        try:
            article = Article(url)
            article.download()
            article.parse()
            articles.append({"url": url, "text": article.text})
        except Exception as e:
            print(f"Error fetching {url}: {e}")
    return articles


def _estimate_words_target(minutes: int, wpm: int = 150) -> int:
    return max(1, minutes) * wpm


def generate_podcast_script(
    articles: list[dict[str, str]],
    minutes: int,
    model: str = "gpt-4o-mini",
) -> str:
    sources_text = ""
    for i, a in enumerate(articles, start=1):
        sources_text += f"\nSOURCE {i} ({a['url']}):\n{a['text']}\n"

    # Let the model target duration, but also give a rough words target as guidance.
    words_target = _estimate_words_target(minutes)

    prompt = f"""
You are a podcast scriptwriter and narrator.

Task: Using the sources below, write an original podcast script about the fall of the Roman Empire, suitable for a several-minute episode.

Write only the spoken narration.
Do not include headers, labels, categories, brackets, stage directions, or performance notes of any kind.

Use simple, clear English that a general audience can follow. Rephrase ideas fully in your own words; do not copy sentences from the sources.

Make the narration engaging and cinematic using spoken language only.
Create emphasis and pacing naturally through:

Sentence length

Paragraph breaks

Punctuation such as commas, ellipses, and short sentences

Do not describe pauses, voice changes, or cadence explicitly.
If it would not be spoken aloud by a human narrator, it must not appear in the text.

Focus on the fall of the Roman Empire and its lessons. Emphasize that history often repeats itself, and that understanding Rome’s decline can offer insight into challenges societies face today.

Target length: about {minutes} minutes.
Aim for roughly {words_target} words (±10%).

The script should naturally include:

A strong opening hook that presents Rome as one of the most advanced and influential civilizations in history, and introduces the idea that its decline still matters today. Include the reflective question:
“Could the lessons of ancient Rome help us understand the challenges of our own time?”

Background and context explaining Rome’s peak, its structure, its significance, and key figures or institutions.

Three to five main causes and consequences of the fall, explained in a clear, story-driven way.

A brief recap summarizing the main ideas.

A reflective closing that connects Rome’s story to lessons for the modern world.

End result: a smooth, immersive, ready-to-read script designed for a text-to-speech system, where every word is intended to be spoken aloud.
Sources:
{sources_text}
"""

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You write engaging podcast scripts."},
            {"role": "user", "content": prompt},
        ],
    )
    script = response.choices[0].message.content or ""
    return script.strip()


# -----------------------------
# Chunking
# -----------------------------
def normalize_whitespace(text: str) -> str:
    text = text.replace("\r\n", "\n").replace("\r", "\n")
    text = re.sub(r"[ \t]+", " ", text)
    text = re.sub(r"\n{3,}", "\n\n", text)
    return text.strip()


def chunk_text(text: str, max_chars: int) -> list[str]:
    """
    Chunk by paragraphs, then by sentences if needed.
    Avoids cutting mid-sentence when possible.
    Hard-slices only as a last resort.
    """
    text = normalize_whitespace(text)
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]

    chunks: list[str] = []
    current = ""

    def flush():
        nonlocal current
        if current.strip():
            chunks.append(current.strip())
        current = ""

    sentence_split = re.compile(r"(?<=[.!?])\s+")

    for p in paragraphs:
        parts = [p]
        if len(p) > max_chars:
            parts = sentence_split.split(p)

        for part in parts:
            part = part.strip()
            if not part:
                continue

            candidate = part if not current else current + "\n\n" + part

            if len(candidate) <= max_chars:
                current = candidate
            else:
                flush()

                if len(part) > max_chars:
                    for i in range(0, len(part), max_chars):
                        piece = part[i : i + max_chars].strip()
                        if piece:
                            chunks.append(piece)
                else:
                    current = part

    flush()
    return chunks


# -----------------------------
# TTS
# -----------------------------
def tts_to_mp3(text: str, out_path: Path, tts_model: str, tts_voice: str):
    speech = client.audio.speech.create(
        model=tts_model,
        voice=tts_voice,
        input=text,
    )
    out_path.parent.mkdir(parents=True, exist_ok=True)
    with open(out_path, "wb") as f:
        f.write(speech.read())


# -----------------------------
# Audio concat (absolute paths)
# -----------------------------
def ffmpeg_available() -> bool:
    return shutil.which("ffmpeg") is not None


def _ffmpeg_concat_escape(path: Path) -> str:
    p = path.resolve().as_posix()
    return p.replace("'", r"'\''")


def concat_mp3_ffmpeg(mp3_files: list[Path], out_file: Path):
    """
    Robust concat:
      1) Try stream-copy concat (fast, may fail for MP3 chunks).
      2) Retry with re-encode (reliable).
    Uses absolute paths in concat_list to avoid path duplication issues.
    """
    out_file.parent.mkdir(parents=True, exist_ok=True)
    list_file = out_file.parent / "concat_list.txt"

    missing = [p for p in mp3_files if not p.exists()]
    if missing:
        raise FileNotFoundError(f"Missing chunk files: {missing}")

    with open(list_file, "w", encoding="utf-8") as f:
        for p in mp3_files:
            f.write(f"file '{_ffmpeg_concat_escape(p)}'\n")

    def run(cmd: list[str]):
        res = subprocess.run(cmd, capture_output=True, text=True)
        if res.returncode != 0:
            raise RuntimeError(
                "ffmpeg concat failed.\n"
                f"CMD: {' '.join(cmd)}\n\n"
                f"STDOUT:\n{res.stdout}\n\nSTDERR:\n{res.stderr}"
            )
        return res

    try:
        run([
            "ffmpeg", "-y",
            "-f", "concat", "-safe", "0",
            "-i", str(list_file),
            "-c", "copy",
            str(out_file),
        ])
        return
    except RuntimeError as e:
        print(str(e))
        print("\nRetrying with re-encode (more reliable)...\n")

    run([
        "ffmpeg", "-y",
        "-f", "concat", "-safe", "0",
        "-i", str(list_file),
        "-vn",
        "-c:a", "libmp3lame",
        "-q:a", "2",
        "-ar", "44100",
        "-ac", "2",
        str(out_file),
    ])


def concat_mp3_pydub(mp3_files: list[Path], out_file: Path):
    from pydub import AudioSegment  # pip install pydub

    combined = AudioSegment.empty()
    for p in mp3_files:
        combined += AudioSegment.from_file(p, format="mp3")

    out_file.parent.mkdir(parents=True, exist_ok=True)
    combined.export(out_file, format="mp3")


def concat_mp3(mp3_files: list[Path], out_file: Path):
    if ffmpeg_available():
        concat_mp3_ffmpeg(mp3_files, out_file)
    else:
        print("ffmpeg not found in PATH. Trying pydub fallback...")
        concat_mp3_pydub(mp3_files, out_file)


# -----------------------------
# Refactored pipeline entrypoint
# -----------------------------
def run_pipeline(
    urls_text: list[str],
    minutes: int,
    tts_model: str,
    tts_voice: str,
    max_chars: int,
    *,
    output_dir: Path = OUTPUT_DIR,
    chunks_dir: Optional[Path] = None,
    final_audio_path: Optional[Path] = None,
    fetch_urls: Optional[list[str]] = None,
    script_model: str = "gpt-5-nano",
) -> tuple[str, Path, Any]:
    """
    Call signature the user asked for:
        script, final_audio_path, _ = run_pipeline(urls_text, minutes, tts_model, tts_voice, max_chars)

    Inputs:
      - urls_text: list[str] of source article texts already fetched (strings).
                  If you want the function to fetch itself, pass fetch_urls=[...]
                  and urls_text can be [].
      - minutes: target episode length
      - tts_model / tts_voice: passed to OpenAI audio.speech.create(...)
      - max_chars: chunk size for TTS

    Returns:
      - script (str)
      - final_audio_path (Path)
      - metadata (Any): currently None, reserved for future use
    """
    output_dir.mkdir(parents=True, exist_ok=True)
    chunks_dir = chunks_dir or (output_dir / "chunks")
    chunks_dir.mkdir(parents=True, exist_ok=True)

    final_audio_path = final_audio_path or (output_dir / "podcast_episode_final.mp3")

    # 1) Sources: either use provided texts or fetch from URLs
    articles: list[dict[str, str]] = []
    if fetch_urls:
        fetched = fetch_articles(fetch_urls)
        if not fetched:
            raise RuntimeError("No articles fetched. Check URLs / connectivity.")
        articles = fetched
    else:
        if not urls_text:
            raise ValueError("urls_text is empty. Provide texts or pass fetch_urls=[...] to fetch.")
        # Preserve the original script generator interface by faking URLs.
        articles = [{"url": f"provided_text_{i+1}", "text": t} for i, t in enumerate(urls_text)]

    # 2) Script
    script = generate_podcast_script(articles, minutes=minutes, model=script_model)

    # 3) Chunk
    chunks = chunk_text(script, max_chars=max_chars)

    # 4) TTS each chunk
    mp3_files: list[Path] = []
    for idx, chunk in enumerate(chunks, start=1):
        mp3_path = chunks_dir / f"chunk_{idx:02d}.mp3"
        tts_to_mp3(chunk, mp3_path, tts_model=tts_model, tts_voice=tts_voice)
        mp3_files.append(mp3_path)

    # 5) Concat
    concat_mp3(mp3_files, final_audio_path)

    return script, final_audio_path, None


# -----------------------------
# Example old-style main (optional)
# -----------------------------
def main():
    urls = [
        "https://www.nationalgeographic.com/history/article/fall-of-ancient-roman-empire",
        "https://www.bbc.co.uk/history/ancient/romans/fallofrome_article_01.shtml",
        "https://www.britannica.com/place/Roman-Empire/Height-and-decline-of-imperial-Rome",
        "https://en.wikipedia.org/wiki/Fall_of_the_Western_Roman_Empire",
    ]

    # Old flow: fetch URLs, then run_pipeline via fetch_urls
    script, final_audio_path, _ = run_pipeline(
        urls_text=[],
        minutes=1,
        tts_model="tts-1",
        tts_voice="sage",
        max_chars=MAX_CHARS_PER_CHUNK_DEFAULT,
        fetch_urls=urls,
    )

    print("\n===== GENERATED PODCAST SCRIPT (preview) =====\n")
    print(script[:2000] + ("\n...\n" if len(script) > 2000 else ""))
    print(f"\nDone. Final audio file created: {final_audio_path.as_posix()}")


if __name__ == "__main__":
    main()



===== GENERATED PODCAST SCRIPT (preview) =====

Rome was one of the most advanced civilizations in history. It built mighty cities, wrote laws, and linked distant lands. "Could the lessons of ancient Rome help us understand the challenges of our own time?"

At its height, Rome spread across Europe, Asia, and Africa. Its strength came from a capable administration, busy cities, and a disciplined army. Diocletian split power; Constantine moved the capital toward the East.

Three forces pushed Rome toward collapse. Climate shifts and plagues cut harvests. Invaders and migrations strained defenses and taxes. Political chaos and costly wars weakened leaders, letting rival groups carve kingdoms.

Western Rome faded into new kingdoms; the East endured as Byzantium. Yet Rome’s legacy lived on in law, architecture, and literacy, reminding us that decline is a long, uneven process, not a moment.

Recap: peak power, slow unraveling, and hard-won lessons. Crises test nations. The fall of Rome inv

## Step 7 – Gradio User Interface for Podcast Generation

This cell wraps the full pipeline in a **web-based interface** using Gradio, allowing users to generate podcast episodes without touching the code.

---

### Key Features

1. **Input Configuration**
   - **Article URLs**: Users paste one URL per line to serve as sources.
   - **Target Length**: Specify the desired episode duration in minutes.
   - **TTS Model and Voice**: Choose which text-to-speech model and voice to use.
   - **Chunk Size**: Adjust how the script is split into chunks for TTS.

2. **Outputs**
   - **Script Preview**: Displays the generated podcast script in a textbox.
   - **Audio Player**: Provides a playable MP3 of the final episode.

3. **Controls**
   - **Generate Episode**: Calls the `run_pipeline` function with the provided inputs, generating both the script and audio.
   - **Clear**: Resets the script and audio output.

4. **UI Layout and Styling**
   - Responsive design with a sidebar for configuration and a main panel for outputs.
   - Custom CSS to improve readability, spacing, and responsive behavior.
   - Two-column layout: narrow left column for settings, wider right column for outputs.

5. **Integration with Pipeline**
   - Uses `roman_run` as the callback function, which internally calls `run_pipeline` (Step 6).
   - Ensures OpenAI API key is set before running.
   - Returns both the generated script and the path to the final audio file.

---

### Why this step matters

- Provides a **friendly, interactive interface** for non-technical users to generate AI-powered podcast episodes.
- Makes the workflow **accessible in a browser**, so the system can be demonstrated or shared without installing Python.
- Complements the modular pipeline by exposing all key parameters (sources, length, TTS model, voice, chunk size) in one place.
- Enables quick experimentation with different source articles, episode lengths, and voices.


In [2]:
import os
import gradio as gr

DEFAULT_URLS_TEXT = "\n".join([
   "https://www.nationalgeographic.com/history/article/fall-of-ancient-roman-empire",
        "https://www.bbc.co.uk/history/ancient/romans/fallofrome_article_01.shtml",
        #"https://www.britannica.com/place/Roman-Empire/Height-and-decline-of-imperial-Rome",
        #"https://en.wikipedia.org/wiki/Fall_of_the_Western_Roman_Empire",
])

APP_CSS = """
#app {
  max-width: 1100px;
  margin: 0 auto;
  padding: 32px 24px;
}

.section {
  margin-top: 20px;
}

.sidebar .section:first-child {
  margin-top: 0px;
}

.primary-action button {
  min-height: 44px;
  font-weight: 600;
  width: 100%;
}

.secondary-action button {
  background: transparent !important;
  border: 1px solid #E5E7EB !important;
  color: #374151 !important;
  width: 100%;
  min-height: 44px;
}

/* Output emphasis */
#script_out textarea {
  line-height: 1.6 !important;
  font-size: 14.5px !important;
}

/* Make audio span full width comfortably */
#audio_out {
  width: 100%;
}

/* Slightly tighten the left column controls */
.sidebar label, .sidebar .gr-label {
  margin-bottom: 6px !important;
}

/* Responsive: stack on small screens */
@media (max-width: 900px) {
  #main_row {
    flex-direction: column !important;
  }
}
"""

def roman_run(urls_text: str, minutes: int, tts_model: str, tts_voice: str, max_chars: int):
    if not os.getenv("OPENAI_API_KEY"):
        raise gr.Error("OPENAI_API_KEY not found.")
    script, final_audio_path, _ = run_pipeline(
        urls_text, minutes, tts_model, tts_voice, max_chars
    )
    return script, final_audio_path


with gr.Blocks(
    theme=gr.themes.Soft(),
    css=APP_CSS,
    title="Fall of Rome Podcast Generator"
) as demo:

    with gr.Column(elem_id="app"):

        gr.Markdown(
            """
            # Fall of Rome Podcast Generator
            Paste article URLs, choose a target length, and generate a narrated podcast episode.
            """
        )

        with gr.Row(elem_id="main_row", elem_classes=["section"]):
            # LEFT: Configuration (narrower)
            with gr.Column(scale=4, elem_classes=["sidebar"]):
                with gr.Column(elem_classes=["section"]):
                    gr.Markdown("## Sources")
                    urls_text = gr.Textbox(
                        label="Article URLs (one per line)",
                        value=DEFAULT_URLS_TEXT,
                        lines=3,
                        placeholder="Paste one URL per line..."
                    )

                with gr.Column(elem_classes=["section"]):
                    gr.Markdown("## Settings")
                    minutes = gr.Number(
                        label="Target length (minutes)",
                        value=10,
                        precision=0,
                        minimum=1,
                        maximum=20
                    )
                    tts_model = gr.Textbox(label="TTS model", value="tts-1")
                    tts_voice = gr.Dropdown(
                        label="Voice",
                        choices=["alloy", "onyx", "nova", "shimmer", "echo", "fable", "ash", "sage", "coral"],
                        value="onyx"
                    )
                    max_chars = gr.Slider(
                        label="Chunk size (characters)",
                        minimum=1200,
                        maximum=4500,
                        step=100,
                        value=3500
                    )

                with gr.Row(elem_classes=["section"]):
                    with gr.Column(scale=1, elem_classes=["primary-action"]):
                        generate_btn = gr.Button("Generate episode", variant="primary")
                    with gr.Column(scale=1, elem_classes=["secondary-action"]):
                        clear_btn = gr.Button("Clear")

            # RIGHT: Outputs (wider)
            with gr.Column(scale=7):
                with gr.Column(elem_classes=["section"]):
                    gr.Markdown("## Script")
                    script_out = gr.Textbox(
                        elem_id="script_out",
                        lines=7,
                        placeholder="Generated script will appear here…"
                    )

                with gr.Column(elem_classes=["section"]):
                    gr.Markdown("## Audio")
                    audio_out = gr.Audio(
                        elem_id="audio_out",
                        type="filepath"
                    )

        generate_btn.click(
            fn=roman_run,
            inputs=[urls_text, minutes, tts_model, tts_voice, max_chars],
            outputs=[script_out, audio_out]
        )

        clear_btn.click(
            fn=lambda: ("", None),
            outputs=[script_out, audio_out]
        )

demo.launch()


  from .autonotebook import tqdm as notebook_tqdm
  with gr.Blocks(


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


