Skip to content

[Task] speaker-diarization model support #287

@DingmaomaoBJTU

Description

@DingmaomaoBJTU

Overview

Speaker diarization models answer the question "who spoke when?" by segmenting and clustering an audio recording by speaker identity. The pyannote suite includes end-to-end diarization pipelines (3.x generation) as well as individual segmentation and overlapped-speech detection components that underpin them.

Agent Scenarios

  • Meeting notes agent: produce transcripts annotated with speaker turns ("Alice: ...", "Bob: ...") for post-meeting summaries and action item extraction
  • Call center analytics agent: separate agent voice from customer voice to compute per-speaker metrics (talk ratio, interruption rate, sentiment)
  • Podcast / media production agent: auto-generate speaker-labeled chapters or subtitles for multi-speaker recordings
  • Legal / compliance agent: create speaker-attributed transcripts of depositions or earnings calls for searchable archiving

ModelKit Integration

Models must pass the full wmk pipeline on all EPs:

wmk config → wmk build (ONNX export) → wmk perf → wmk eval

Acceptance Criteria

  • pyannote/speaker-diarization-3.1
  • pyannote/speaker-diarization-3.0
  • pyannote/speaker-diarization-community-1
  • pyannote/speaker-diarization
  • pyannote/segmentation-3.0
  • pyannote/segmentation
  • pyannote/overlapped-speech-detection

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions