# Transcribe meetings with speaker identification

Automatically transcribe audio and identify who said what using speaker diarization.


## Problem

You have recordings of meetings, interviews, or podcasts and need transcripts that show who said what. Standard transcription only gives you text—you lose track of speakers.

| Recording type | Need |
|----------------|------|
| Meeting recordings | Attribute comments to participants |
| Interview audio | Separate interviewer from interviewee |
| Podcasts | Identify hosts vs guests |
| Customer calls | Track agent vs customer |


## Solution

**What's in this recipe:**
- Transcribe audio with WhisperX
- Enable speaker diarization to identify speakers
- Extract speaker-labeled segments

Use WhisperX's `transcribe()` with `diarize=True` to get transcripts with speaker labels.


### Setup


In [None]:
%pip install -qU pixeltable whisperx


In [None]:
import pixeltable as pxt
from pixeltable.functions import whisperx


In [None]:
# Create a fresh directory
pxt.drop_dir('diarization_demo', force=True)
pxt.create_dir('diarization_demo')


### Create a meetings table


In [None]:
# Create a table for meeting recordings
meetings = pxt.create_table(
    'diarization_demo.meetings',
    {'audio': pxt.Audio, 'meeting_name': pxt.String}
)


In [None]:
# Insert a sample audio file (replace with your own recordings)
meetings.insert([
    {
        'audio': 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/audio/short-clip.mp3',
        'meeting_name': 'Sample Meeting'
    }
])


### Add transcription with speaker diarization


In [None]:
# Add a computed column with speaker diarization
# Note: Requires HF_TOKEN environment variable for pyannote models
meetings.add_computed_column(
    transcript=whisperx.transcribe(
        meetings.audio,
        model='tiny.en',       # Use 'large-v3' for production
        diarize=True,          # Enable speaker identification
        min_speakers=1,        # Minimum expected speakers
        max_speakers=4         # Maximum expected speakers
    )
)


In [None]:
# View the transcript with speaker labels
result = meetings.select(meetings.meeting_name, meetings.transcript).collect()

for row in result:
    print(f"Meeting: {row['meeting_name']}")
    print("-" * 40)
    
    # Extract segments with speaker labels
    segments = row['transcript'].get('segments', [])
    for seg in segments:
        speaker = seg.get('speaker', 'Unknown')
        text = seg.get('text', '').strip()
        start = seg.get('start', 0)
        print(f"[{start:.1f}s] {speaker}: {text}")


### Extract speaker-specific text


In [None]:
# Create a UDF to extract text by speaker
@pxt.udf
def get_speaker_text(transcript: dict, speaker_id: str) -> str:
    """Extract all text from a specific speaker."""
    segments = transcript.get('segments', [])
    speaker_text = [
        seg.get('text', '').strip() 
        for seg in segments 
        if seg.get('speaker') == speaker_id
    ]
    return ' '.join(speaker_text)

# Add columns for each speaker's contributions
meetings.add_computed_column(speaker_0_text=get_speaker_text(meetings.transcript, 'SPEAKER_00'))
meetings.add_computed_column(speaker_1_text=get_speaker_text(meetings.transcript, 'SPEAKER_01'))


In [None]:
# View speaker-separated text
meetings.select(
    meetings.meeting_name,
    meetings.speaker_0_text,
    meetings.speaker_1_text
).collect()


## Explanation

**WhisperX diarization parameters:**

| Parameter | Description |
|-----------|-------------|
| `diarize=True` | Enable speaker identification |
| `min_speakers` | Minimum expected speakers |
| `max_speakers` | Maximum expected speakers |
| `num_speakers` | Exact number if known |

**Output structure:**

The transcript contains `segments` with speaker labels:

```python
{
    'segments': [
        {'speaker': 'SPEAKER_00', 'text': '...', 'start': 0.0, 'end': 2.5},
        {'speaker': 'SPEAKER_01', 'text': '...', 'start': 2.5, 'end': 5.0},
    ]
}
```

**Model selection:**

| Model | Speed | Accuracy | Use case |
|-------|-------|----------|----------|
| `tiny.en` | Fast | Lower | Testing, English only |
| `base.en` | Medium | Good | English production |
| `large-v3` | Slow | Best | Multi-language, high accuracy |

**Requirements:**
- Set `HF_TOKEN` environment variable for pyannote diarization models
- GPU recommended for larger models


## See also

- [Transcribe audio files](https://docs.pixeltable.com/howto/cookbooks/audio/audio-transcribe) - Basic transcription without diarization
- [Summarize podcasts](https://docs.pixeltable.com/howto/cookbooks/audio/audio-summarize-podcast) - Transcribe and summarize
