<a href="https://colab.research.google.com/github/s11khushboo/podcast-studio/blob/main/podcast.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install openai gradio PyPDF2 requests python-dotenv

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m232.6/232.6 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [3]:
import os
import uuid
import requests
import PyPDF2
import gradio as gr

from openai import OpenAI

# Set your API key here or use environment variable
OPENAI_API_KEY = "OPENAI_API_KEY"

MODEL_NAME = "gpt-3.5-turbo"
TTS_MODEL = "gpt-4o-mini-tts"

client = OpenAI(api_key=OPENAI_API_KEY)

os.makedirs("outputs/audio", exist_ok=True)
os.makedirs("outputs/scripts", exist_ok=True)


In [4]:
def load_text_input(text: str) -> str:
    return text.strip()


def load_pdf(file_obj) -> str:
    text = ""
    reader = PyPDF2.PdfReader(file_obj)
    for page in reader.pages:
        extracted = page.extract_text()
        if extracted:
            text += extracted + "\n"
    return text


def load_url(url: str) -> str:
    r = requests.get(url, timeout=10)
    return r.text[:15000]  # safety truncate


In [5]:
SYSTEM_PROMPT = """
You are a professional podcast script writer.

Turn the provided content into an engaging spoken podcast episode.
Style rules:
- Conversational
- Clear structure
- Friendly tone
- Include intro, main insights, and closing summary
- Avoid bullet points ‚Äî use natural speech
"""

def transform_to_podcast_script(raw_text: str) -> str:
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": raw_text}
        ],
        temperature=0.7
    )

    return response.choices[0].message.content


In [6]:
def format_script(script: str) -> str:
    intro = "Welcome to today‚Äôs AI-generated podcast episode.\n\n"
    outro = "\n\nThat wraps up today‚Äôs episode ‚Äî thanks for listening."
    return intro + script + outro


In [7]:
def generate_audio(script: str) -> str:
    file_path = f"outputs/audio/{uuid.uuid4()}.mp3"

    with client.audio.speech.with_streaming_response.create(
        model=TTS_MODEL,
        voice="alloy",
        input=script
    ) as response:
        response.stream_to_file(file_path)

    return file_path


In [8]:
def run_pipeline_from_text(text: str):
    raw = load_text_input(text)
    script = transform_to_podcast_script(raw)
    formatted = format_script(script)

    script_file = f"outputs/scripts/{uuid.uuid4()}.txt"
    with open(script_file, "w", encoding="utf-8") as f:
        f.write(formatted)

    audio_path = generate_audio(formatted)

    return formatted, audio_path


def run_pipeline_from_pdf(file_obj):
    raw = load_pdf(file_obj)
    return run_pipeline_from_text(raw)


def run_pipeline_from_url(url: str):
    raw = load_url(url)
    return run_pipeline_from_text(raw)


In [9]:
sample_text = """
Large Language Models are transforming how people interact with technology.
They can summarize documents, generate code, answer questions, and create media.
"""

script, audio = run_pipeline_from_text(sample_text)

print(script)
audio


Welcome to today‚Äôs AI-generated podcast episode.

[Intro music fades out]

Welcome back to [Podcast Name]! I‚Äôm your host, [Your Name], and today we‚Äôre diving into a topic that‚Äôs been buzzing around the tech world: Large Language Models, or LLMs for short. If you‚Äôve ever wondered how this fascinating technology is shifting the way we interact with our devices and the digital landscape, you‚Äôre in the right place!

So, let‚Äôs get right into it. Large Language Models are more than just impressive bits of code; they‚Äôre revolutionizing the way we communicate with technology. Imagine a world where you can simply ask your computer to summarize a lengthy document, generate lines of code, or even create unique pieces of media‚Äîall with just a few words. It‚Äôs pretty incredible, isn‚Äôt it?

Now, let‚Äôs break this down a bit. First off, LLMs are trained on vast amounts of text data. This training allows them to understand and generate human-like text. So when you type in a quest

'outputs/audio/8fd547af-40fd-4e90-8045-96cb9a107f82.mp3'

In [10]:
def gradio_text_flow(text):
    return run_pipeline_from_text(text)


def gradio_pdf_flow(pdf_file):
    return run_pipeline_from_pdf(pdf_file)


def gradio_url_flow(url):
    return run_pipeline_from_url(url)


with gr.Blocks(title="Podcast Studio") as demo:
    gr.Markdown("# üéôÔ∏è Podcast Studio ‚Äî AI Podcast Generator")

    with gr.Tab("Text Input"):
        text_in = gr.Textbox(lines=12, label="Paste Content")
        text_btn = gr.Button("Generate Podcast")
        text_script = gr.Textbox(label="Podcast Script")
        text_audio = gr.Audio(label="Audio Output")

        text_btn.click(gradio_text_flow, text_in, [text_script, text_audio])

    with gr.Tab("PDF Upload"):
        pdf_in = gr.File(label="Upload PDF")
        pdf_btn = gr.Button("Generate Podcast")
        pdf_script = gr.Textbox(label="Podcast Script")
        pdf_audio = gr.Audio(label="Audio Output")

        pdf_btn.click(gradio_pdf_flow, pdf_in, [pdf_script, pdf_audio])

    with gr.Tab("URL Article"):
        url_in = gr.Textbox(label="Article URL")
        url_btn = gr.Button("Generate Podcast")
        url_script = gr.Textbox(label="Podcast Script")
        url_audio = gr.Audio(label="Audio Output")

        url_btn.click(gradio_url_flow, url_in, [url_script, url_audio])


demo.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://d698015c406dfb1aed.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


