Overview
Speaker diarization models answer the question "who spoke when?" by segmenting and clustering an audio recording by speaker identity. The pyannote suite includes end-to-end diarization pipelines (3.x generation) as well as individual segmentation and overlapped-speech detection components that underpin them.
Agent Scenarios
- Meeting notes agent: produce transcripts annotated with speaker turns ("Alice: ...", "Bob: ...") for post-meeting summaries and action item extraction
- Call center analytics agent: separate agent voice from customer voice to compute per-speaker metrics (talk ratio, interruption rate, sentiment)
- Podcast / media production agent: auto-generate speaker-labeled chapters or subtitles for multi-speaker recordings
- Legal / compliance agent: create speaker-attributed transcripts of depositions or earnings calls for searchable archiving
ModelKit Integration
Models must pass the full wmk pipeline on all EPs:
wmk config → wmk build (ONNX export) → wmk perf → wmk eval
Acceptance Criteria
Overview
Speaker diarization models answer the question "who spoke when?" by segmenting and clustering an audio recording by speaker identity. The pyannote suite includes end-to-end diarization pipelines (3.x generation) as well as individual segmentation and overlapped-speech detection components that underpin them.
Agent Scenarios
ModelKit Integration
Models must pass the full wmk pipeline on all EPs:
Acceptance Criteria