<a href="https://colab.research.google.com/github/sakethpragallapati/AI-Podcast-Generator-from-Notes/blob/main/PODCAST_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PROJECT OVERVIEW: PDF → SUMMARY → PODCAST
Pipeline:
*   Upload a .pdf or .txt file (like lecture notes).
*   Extract text automatically.
*   Clean and summarize using Long-T5.
*   Turn summary into podcast script (Gemini).
*   Generate natural TTS audio (ElevenLabs).

# 1. Imports & Upload
Import essential libraries for file handling and PDF reading.

In [None]:
import os
from google.colab import files
from pypdf import PdfReader
from io import BytesIO

In [None]:
print("Please upload a .pdf or a .txt file (e.g., sample notes).")
# Prompts user to upload a file interactively.
uploaded = files.upload()

Please upload a .pdf or a .txt file (e.g., sample notes).


Saving CaseBasedReasoning.pdf to CaseBasedReasoning (1).pdf


In [None]:
# Get uploaded filename and its binary data.
filename = next(iter(uploaded))
file_data = uploaded[filename]

# 2. File Type & Extraction
Detects file type (PDF/TXT) and extracts content accordingly.

In [None]:
def extract_content_from_file(filename, file_data):
    """Detects file type and extracts content using the appropriate method."""

    if filename.lower().endswith('.txt'):
        try:
            text_content = file_data.decode('utf-8')
            return f"--- TXT FILE CONTENT ---\n\n{text_content}"
        except Exception as e:
            return f"An error occurred while reading the TXT file: {e}"

    elif filename.lower().endswith('.pdf'):
        file_stream = BytesIO(file_data)
        full_text = ""
        try:
            reader = PdfReader(file_stream)

            for i, page in enumerate(reader.pages):
                text = page.extract_text()
                full_text += f"--- PAGE {i + 1} ---\n"

                if text and text.strip():
                    full_text += text.strip() + "\n\n"
                else:
                    full_text += "!!! WARNING: No selectable text found on this page. !!!\n\n"

            return full_text.strip()

        except Exception as e:
            return f"An error occurred during PDF extraction: {e}"

    else:
        return "!!! ERROR: Unsupported file type. Please upload a .pdf or .txt file. !!!"

In [None]:
# Extract text content from uploaded file.
extracted_content = extract_content_from_file(filename, file_data)


# 3. TEXT CLEANING STEP
Removes page markers and extra newlines for smoother input to summarizer.

In [None]:
import re

def clean_extracted_text(raw_text):
    text = re.sub(r'--- PAGE \d+ ---', '', raw_text)
    text = re.sub(r'\n+', ' ', text)
    text = text.strip()
    return text

cleaned_text = clean_extracted_text(extracted_content)

In [None]:
cleaned_text

"Naveen Pragallapati  Case-Based Reasoning (CBR)  Case-Based Reasoning is an approach in machine learning where new problems are solved by  referencing or adapƟng soluƟons from previously encountered cases or examples. Unlike  methods that generalize from training data to build a model, CBR relies on a memory of  speciﬁc instances.  Key Steps in Case-Based Reasoning  1. Retrieve: Given a new problem, retrieve the most similar cases from memory. This involves  deﬁning a similarity measure that can match the new case with those stored in memory.     2. Reuse: Once a similar case or set of cases is retrieved, reuse the soluƟon (or parts of it) for  the new problem. This step may require adaptaƟon if the old soluƟon doesn't exactly ﬁt the  new context.  3. Revise: AŌer applying the retrieved soluƟon, test it. If necessary, revise or adapt the  soluƟon to improve it for the current problem.  4. Retain: Once the soluƟon is successfully applied, it’s stored as a new case in memory for  future

# 4. Load Summarization Model
Load pretrained Long-T5 model and tokenizer for long-text summarization.

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

In [None]:
model_name = "pszemraj/long-t5-tglobal-base-16384-book-summary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Some weights of LongT5ForConditionalGeneration were not initialized from the model checkpoint at pszemraj/long-t5-tglobal-base-16384-book-summary and are newly initialized: ['decoder.embed_tokens.weight', 'encoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Recovering a missing tied weight decoder.embed_tokens.weight from a legacy LongT5 checkpoint. Consider saving decoder.embed_tokens.weight in your checkpoint or updating the config (tie_word_embeddings=true).
Recovering a missing tied weight encoder.embed_tokens.weight from a legacy LongT5 checkpoint. Consider saving encoder.embed_tokens.weight in your checkpoint or updating the config (tie_word_embeddings=true).


In [None]:
# Create summarization pipeline.
summarizer = pipeline("summarization", model=model, tokenizer=tokenizer)

Device set to use cuda:0


# 5. Summarization with custom prompt

In [None]:
instruction = """
    Summarize the following text into a **comprehensive, clear, and naturally flowing narrative** suitable for spoken delivery (like a podcast or audio summary).

    Your goals:
    - Capture **all major points, key ideas, and important explanations** from the text — do not skip significant details.
    - Maintain **logical flow and context** so the listener can follow the complete picture.
    - Avoid repetition, meta-comments, or references to "the text" or "this summary".
    - Ensure the tone feels **natural, engaging, and informative**, as if explaining clearly to an attentive audience.
    - Use **smooth transitions** between topics to maintain coherence.
    - Keep **factual accuracy** and include **relevant examples or reasoning** when they help understanding.
    - Target a length of **250–300 words** for shorter sections or chapters.
    - Avoid bullet points, numbered lists, or formatting.
    - Write in continuous prose, as though the narrator is explaining the material conversationally and confidently.

    Now, summarize the following text:
    """

In [None]:
prompt = instruction + "\n\nText:\n" + cleaned_text

In [None]:
summary = summarizer(
    prompt,
    max_length=360,
    min_length=200,
    temperature=0.7,
    do_sample=True,
    repetition_penalty=1.2,
)

In [None]:
# Extract the summarized text.
summary_text = summary[0]['summary_text']

In [None]:
summary_text

"In this chapter, Dr. Manson explains how to summarize a text using case-based reasoning, or CBR. This approach uses a set of pre-existing cases to automatically learn about new problems. It's particularly useful in situations where historical cases don't fit into a strict categorization of what's appropriate for a given task. For example, if a patient needs to diagnose a new paxent, CBR can use past cases to predict what'll happen when the patient re-enters the patient's care. This is especially useful for situations with limited resources, such as medical diagnoses or troubleshooting. CBR also works well in situations with lots of historical cases because it can easily adapt past cases for new problems, making it ideal for situations where no strict generalizan rule is followed. In other words, it's perfect for situations in which cases aren't strictly generalized."

# 6. Podcast script via GEMINI model
Converts the summary into a natural conversation script between Alex and Jordan.

In [None]:
from google import genai

In [None]:
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

In [None]:
prompt = f"""
You are a professional podcast scriptwriter for a top-rated **tech and education show**.

Write a **3–4 minute** (around **450–550 words**) **natural, engaging, and idea-driven conversation** between two hosts — **Alex** and **Jordan** — as they **talk to each other** about the topic below.

**Summary:**
{summary_text}

**Guidelines:**
- The conversation should **begin with a warm, natural podcast-style welcome**, where Alex and Jordan greet listeners briefly and introduce the topic in a catchy way.
- This must feel like a **real conversation between the two hosts**, not an explanation to an audience.
- They **speak to each other**, react, question, clarify, and build on each other’s points.
- Keep the discussion **focused, insightful, and content-rich** — no filler or small talk.
- Each speaker’s line must begin exactly as:
    - Alex: ...
    - Jordan: ...
- Alternate naturally between them throughout.
- Use a **conversational yet intelligent** tone — two knowledgeable people exploring ideas together.
- Structure the conversation to cover:
    1. **Hook / Introduction** – start with a welcome and an interesting opening.
    2. **Origins or context** – where the idea or concept comes from.
    3. **Core explanation or evolution** – how it works or developed.
    4. **Breakthroughs, examples, or applications** – what makes it interesting or important.
    5. **Challenges or implications** – any ethical or practical issues.
    6. **Future vision or closing thoughts** – end with a reflective or inspiring takeaway.
- No stage directions, emotional cues, or bracketed text.
- Each line should be **short (1–2 sentences)** for TTS clarity.
- Maintain **smooth rhythm and conversational flow**, like a genuine back-and-forth between thoughtful co-hosts.

**Output format (exactly like this):**
Alex: ...
Jordan: ...
Alex: ...
Jordan: ...

This script will be **fed directly into ElevenLabs TTS**, so ensure clear line breaks, alternating turns, and natural flow between Alex and Jordan.
"""

In [None]:
response = client.models.generate_content(
    model="gemini-2.5-flash", contents= prompt
)

In [None]:
# Store podcast script text.
podcast_script = response.text

In [None]:
podcast_script

'Alex: Welcome back to the show, everyone! Today, Jordan, I want to dive into something fascinating about how we learn and adapt, especially in the world of technology. It’s all about leveraging our past experiences.\n\nJordan: Absolutely, Alex. And when we talk about past experiences, a really intriguing concept comes to mind for me: Case-Based Reasoning, or CBR. It’s essentially how systems learn from history.\n\nAlex: Exactly! It’s like how we, as humans, often deal with a new problem by thinking, "Have I seen something like this before?" We draw parallels and adapt past solutions.\n\nJordan: That’s a perfect way to put it. In essence, CBR takes this human intuition and formalizes it for computational systems. It’s not about creating universal laws, but about finding similar instances.\n\nAlex: Right. Dr. Manson\'s work highlights this, explaining how CBR allows us to summarize text, or really, any complex problem, by looking at a collection of pre-existing cases. Each case is a pas

# 7. TEXT-TO-SPEECH with ELEVENLABS
Converts podcast script dialogue into natural-sounding speech.

In [None]:
import re
from elevenlabs.client import ElevenLabs
from IPython.display import Audio
from io import BytesIO

In [None]:
# Initialize ElevenLabs API client (replace with your API key).
client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")

In [None]:
# Voice mapping for each speaker.
VOICE_MAP = {
    "ALEX": "90ipbRoKi4CpHXvKVtl0",
    "JORDAN": "UzYWd2rD2PPFPjXRG3Ul",
}

#8. Parse script & generate audio
Splits the script by speakers and matches them to their respective voices.

In [None]:
def parse_script_to_dialogue(script_text, voice_map):
    """
    Parses a multi-speaker script string into a list of (voice_id, text) tuples.

    It handles:
    1. Extracting the speaker label (e.g., Alex, Jordan).
    2. Removing text inside square brackets [like this] (emotional cues).
    3. Ignoring lines that are just audio cues (SFX).
    """
    dialogue_list = []

    pattern = re.compile(r"^\s*([A-Za-z]+):\s*(.*)$", re.MULTILINE)
    cue_pattern = re.compile(r"\[.?\]|\(.?\)")

    for match in pattern.finditer(script_text):
        speaker_label = match.group(1).upper()
        raw_text = match.group(2).strip()
        voice_id = voice_map.get(speaker_label)
        if voice_id:
            clean_text = cue_pattern.sub('', raw_text).strip()
            if clean_text:
                dialogue_list.append((voice_id, clean_text))

        else:
            print(f"[Warning: Speaker '{speaker_label}' not found in VOICE_MAP. Dialogue skipped.]")

    return dialogue_list
# Parse the generated script.
dialogue = parse_script_to_dialogue(podcast_script, VOICE_MAP)

# 9. Combine & play final podcast
Generates speech for each dialogue line and merges into one audio stream.

In [None]:
combined_audio = BytesIO()

for voice_id, text in dialogue:
    speaker_name = next(name for name, vid in VOICE_MAP.items() if vid == voice_id)
    print(f"Generating [{speaker_name}]: '{text[:60]}...'")

    audio_stream = client.text_to_speech.convert(
        voice_id=voice_id,
        model_id="eleven_multilingual_v2",
        text=text
    )

    for chunk in audio_stream:
        if chunk:
            combined_audio.write(chunk)

# Reset stream position and play audio in Colab.
combined_audio.seek(0)
print("\n--- Playback Ready ---")
Audio(combined_audio.read(), rate=44100)

Generating [ALEX]: 'Welcome back to the show, everyone! Today, Jordan, I want to...'
Generating [JORDAN]: 'Absolutely, Alex. And when we talk about past experiences, a...'
Generating [ALEX]: 'Exactly! It’s like how we, as humans, often deal with a new ...'
Generating [JORDAN]: 'That’s a perfect way to put it. In essence, CBR takes this h...'
Generating [ALEX]: 'Right. Dr. Manson's work highlights this, explaining how CBR...'
Generating [JORDAN]: 'So, instead of trying to fit a new situation into a rigid, p...'
Generating [ALEX]: 'It makes so much sense, especially in fields where strict ge...'
Generating [JORDAN]: 'That's a brilliant example. A doctor can use historical pati...'
Generating [ALEX]: 'And it’s not just medical. The summary mentions troubleshoot...'
Generating [JORDAN]: 'Exactly. That ability to adapt past cases for new problems i...'
Generating [ALEX]: 'You're essentially building a knowledge base of lived experi...'
Generating [JORDAN]: 'It’s not just about quantity, tho