Skip to content

feat: Switch from Sortformer to PyAnnote offline diarization#55

Merged
r3dbars merged 2 commits intomainfrom
feat/pyannote-offline-diarization
Mar 20, 2026
Merged

feat: Switch from Sortformer to PyAnnote offline diarization#55
r3dbars merged 2 commits intomainfrom
feat/pyannote-offline-diarization

Conversation

@r3dbars
Copy link
Copy Markdown
Owner

@r3dbars r3dbars commented Mar 20, 2026

Summary

  • Removes the 4-speaker architectural limit by switching post-recording diarization from Sortformer to PyAnnote Community-1 (segmentation + WeSpeaker + VBx clustering) via FluidAudio's OfflineDiarizerManager
  • Keeps Sortformer streaming for future real-time preview via diarizeStreaming() — hybrid dual-pipeline approach
  • Fixes duplicate speaker names in transcript breakdown when PyAnnote over-segments one person into multiple clusters (consolidates stats after naming)

Changes

  • SortformerService.swiftDiarizationService.swift with diarizeOffline() + diarizeStreaming()
  • EmbeddingClusterer.postProcess() gains skipPairwiseMerge param (VBx already handles merging)
  • Handles PyAnnote's "S0"/"S1" speaker ID format
  • Updated engine strings, log messages, onboarding UI, README, and tests

What's unchanged

  • SpeakerSegment struct, SpeakerDatabase, all embedding/matching logic — same 256-dim WeSpeaker embeddings from both pipelines
  • Familiar Voices, Qwen naming, transcript format

Test plan

  • Build succeeds with no errors
  • App launches, both Sortformer streaming + PyAnnote offline models load
  • 2-min recording with 4+ speakers: PyAnnote detects speakers beyond Sortformer's 4-speaker cap
  • 5.5-min recording: 3 speakers detected, 67 segments, processed in 4.9s (68x realtime)
  • Speaker DB matching works (existing profiles matched at 0.854 similarity)
  • Qwen naming infers correct names from PyAnnote output
  • Duplicate speaker consolidation works when PyAnnote over-segments
  • YAML frontmatter shows diarization_engine: pyannote_offline
  • Footer shows Parakeet + PyAnnote (local)
  • Longer recording (15+ min) to validate threshold tuning

Closes #40

🤖 Generated with Claude Code

r3dbars and others added 2 commits March 19, 2026 20:51
…actions

- Single-click recording from collapsed idle pill with subtle mic icon hint
- Remove auto-collapse during recording — timer and stop button always visible
- Compact dialogue format for transcript detail (colored left borders, full-width 12pt text)
- Actionable success state with Copy/Open buttons, 8s display after speaker naming
- Speaker naming escape hatch (X button + Escape key after 3s guard)
- Simplified transcript tray footer (transcript count replaces Connect Agent)
- Detail footer overflow menu with Agent + Open in Finder
- Right-click context menu on pill (Record, Transcripts, Settings, Quit)
- Better badge visibility (18px failed badge, 12px processing dot, red border tint, glow pulse)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the 4-speaker architectural limit by switching the post-recording
diarization pipeline from Sortformer (T×4 output matrix) to PyAnnote
Community-1 (segmentation + WeSpeaker + VBx clustering) via FluidAudio's
OfflineDiarizerManager. Both pipelines produce identical 256-dim WeSpeaker
embeddings, so the entire speaker identification stack is unchanged.

- Add DiarizationService (replaces SortformerService) with dual-pipeline
  support: diarizeOffline() for PyAnnote, diarizeStreaming() for Sortformer
- Add skipPairwiseMerge parameter to EmbeddingClusterer.postProcess() since
  PyAnnote's VBx already handles speaker merging
- Handle PyAnnote's "S0"/"S1" speaker ID format in speakerIdFromString()
- Consolidate duplicate speaker names in transcript breakdown when PyAnnote
  over-segments one person into multiple clusters
- Update all references, log messages, engine strings, and README

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@r3dbars r3dbars merged commit 1abadcd into main Mar 20, 2026
@r3dbars r3dbars deleted the feat/pyannote-offline-diarization branch April 3, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch from Sortformer to PyAnnote diarization (remove 4-speaker limit)

1 participant