# Summarize podcasts and audio

Transcribe audio files and generate summaries automatically using Whisper and LLMs.


## Problem

You have podcast episodes, meeting recordings, or interviews that need both transcription and summarization. Doing this manually is time-consuming and doesn't scale.

| Content | Duration | Need |
|---------|----------|------|
| Podcast episodes | 60 min | Episode summary + key points |
| Meeting recordings | 30 min | Action items + decisions |
| Interviews | 45 min | Main topics + quotes |


## Solution

**What's in this recipe:**
- Transcribe audio with Whisper (runs locally)
- Generate summaries with an LLM
- Chain transcription → summarization automatically

You create a pipeline where audio is transcribed first, then the transcript is summarized. Both steps run automatically when you insert new audio files.


### Setup


In [None]:
%pip install -qU pixeltable openai-whisper openai


In [None]:
import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')


In [None]:
import pixeltable as pxt
from pixeltable.functions import whisper, openai


In [None]:
# Create a fresh directory
pxt.drop_dir('podcast_demo', force=True)
pxt.create_dir('podcast_demo')


### Create the pipeline

Create a table with audio input, then add computed columns for transcription and summarization:


In [None]:
# Create table for audio files
podcasts = pxt.create_table(
    'podcast_demo.episodes',
    {'title': pxt.String, 'audio': pxt.Audio}
)


In [None]:
# Step 1: Transcribe with local Whisper (uses GPU if available)
podcasts.add_computed_column(
    transcription=whisper.transcribe(podcasts.audio, model='base.en')
)


In [None]:
# Extract the text from transcription result
podcasts.add_computed_column(
    transcript_text=podcasts.transcription.text
)


In [None]:
# Step 2: Summarize the transcript with OpenAI
summary_prompt = '''Summarize this transcript in 2-3 sentences, then list 3 key points.

Transcript:
''' + podcasts.transcript_text

podcasts.add_computed_column(
    summary_response=openai.chat_completions(
        messages=[{'role': 'user', 'content': summary_prompt}],
        model='gpt-4o-mini'
    )
)


In [None]:
# Extract summary text from response
podcasts.add_computed_column(
    summary=podcasts.summary_response.choices[0].message.content
)


### Process audio files

Insert audio files and watch the pipeline run automatically:


In [None]:
# Insert sample audio (using a short sample for demo)
audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/audio/jfk_rice_moon_speech_excerpt.mp3'

podcasts.insert([{
    'title': 'JFK Moon Speech Excerpt',
    'audio': audio_url
}])


In [None]:
# View transcript
podcasts.select(podcasts.title, podcasts.transcript_text).collect()


In [None]:
# View summary
for row in podcasts.select(podcasts.title, podcasts.summary).collect():
    print(f"Title: {row['title']}")
    print(f"\nSummary:\n{row['summary']}")


## Explanation

**Pipeline architecture:**

```
Audio → Whisper transcription → Transcript text → LLM summarization → Summary
```

Each step is a computed column that depends on the previous one. When you insert a new audio file, all steps run automatically in sequence.

**Whisper model options:**

| Model | Size | Speed | Accuracy |
|-------|------|-------|----------|
| `tiny.en` | 39M | Fastest | Good for clear speech |
| `base.en` | 74M | Fast | Balanced |
| `small.en` | 244M | Medium | Better accuracy |
| `medium.en` | 769M | Slow | High accuracy |

For production with varied audio quality, use `small.en` or larger.


## See also

- [Transcribe audio](./audio-transcribe.ipynb) - Basic audio transcription
- [Summarize text](./text-summarize.ipynb) - Text summarization patterns
