feat: Switch from Sortformer to PyAnnote offline diarization by r3dbars · Pull Request #55 · r3dbars/transcripted

r3dbars · 2026-03-20T02:30:06Z

Summary

Removes the 4-speaker architectural limit by switching post-recording diarization from Sortformer to PyAnnote Community-1 (segmentation + WeSpeaker + VBx clustering) via FluidAudio's OfflineDiarizerManager
Keeps Sortformer streaming for future real-time preview via diarizeStreaming() — hybrid dual-pipeline approach
Fixes duplicate speaker names in transcript breakdown when PyAnnote over-segments one person into multiple clusters (consolidates stats after naming)

Changes

SortformerService.swift → DiarizationService.swift with diarizeOffline() + diarizeStreaming()
EmbeddingClusterer.postProcess() gains skipPairwiseMerge param (VBx already handles merging)
Handles PyAnnote's "S0"/"S1" speaker ID format
Updated engine strings, log messages, onboarding UI, README, and tests

What's unchanged

SpeakerSegment struct, SpeakerDatabase, all embedding/matching logic — same 256-dim WeSpeaker embeddings from both pipelines
Familiar Voices, Qwen naming, transcript format

Test plan

Closes #40

🤖 Generated with Claude Code

…actions - Single-click recording from collapsed idle pill with subtle mic icon hint - Remove auto-collapse during recording — timer and stop button always visible - Compact dialogue format for transcript detail (colored left borders, full-width 12pt text) - Actionable success state with Copy/Open buttons, 8s display after speaker naming - Speaker naming escape hatch (X button + Escape key after 3s guard) - Simplified transcript tray footer (transcript count replaces Connect Agent) - Detail footer overflow menu with Agent + Open in Finder - Right-click context menu on pill (Record, Transcripts, Settings, Quit) - Better badge visibility (18px failed badge, 12px processing dot, red border tint, glow pulse) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the 4-speaker architectural limit by switching the post-recording diarization pipeline from Sortformer (T×4 output matrix) to PyAnnote Community-1 (segmentation + WeSpeaker + VBx clustering) via FluidAudio's OfflineDiarizerManager. Both pipelines produce identical 256-dim WeSpeaker embeddings, so the entire speaker identification stack is unchanged. - Add DiarizationService (replaces SortformerService) with dual-pipeline support: diarizeOffline() for PyAnnote, diarizeStreaming() for Sortformer - Add skipPairwiseMerge parameter to EmbeddingClusterer.postProcess() since PyAnnote's VBx already handles speaker merging - Handle PyAnnote's "S0"/"S1" speaker ID format in speakerIdFromString() - Consolidate duplicate speaker names in transcript breakdown when PyAnnote over-segments one person into multiple clusters - Update all references, log messages, engine strings, and README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

r3dbars and others added 2 commits March 19, 2026 20:51

r3dbars merged commit 1abadcd into main Mar 20, 2026

r3dbars deleted the feat/pyannote-offline-diarization branch April 3, 2026 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Switch from Sortformer to PyAnnote offline diarization#55

feat: Switch from Sortformer to PyAnnote offline diarization#55
r3dbars merged 2 commits intomainfrom
feat/pyannote-offline-diarization

r3dbars commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

r3dbars commented Mar 20, 2026

Summary

Changes

What's unchanged

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant