feat: Switch from Sortformer to PyAnnote offline diarization#55
Merged
feat: Switch from Sortformer to PyAnnote offline diarization#55
Conversation
…actions - Single-click recording from collapsed idle pill with subtle mic icon hint - Remove auto-collapse during recording — timer and stop button always visible - Compact dialogue format for transcript detail (colored left borders, full-width 12pt text) - Actionable success state with Copy/Open buttons, 8s display after speaker naming - Speaker naming escape hatch (X button + Escape key after 3s guard) - Simplified transcript tray footer (transcript count replaces Connect Agent) - Detail footer overflow menu with Agent + Open in Finder - Right-click context menu on pill (Record, Transcripts, Settings, Quit) - Better badge visibility (18px failed badge, 12px processing dot, red border tint, glow pulse) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the 4-speaker architectural limit by switching the post-recording diarization pipeline from Sortformer (T×4 output matrix) to PyAnnote Community-1 (segmentation + WeSpeaker + VBx clustering) via FluidAudio's OfflineDiarizerManager. Both pipelines produce identical 256-dim WeSpeaker embeddings, so the entire speaker identification stack is unchanged. - Add DiarizationService (replaces SortformerService) with dual-pipeline support: diarizeOffline() for PyAnnote, diarizeStreaming() for Sortformer - Add skipPairwiseMerge parameter to EmbeddingClusterer.postProcess() since PyAnnote's VBx already handles speaker merging - Handle PyAnnote's "S0"/"S1" speaker ID format in speakerIdFromString() - Consolidate duplicate speaker names in transcript breakdown when PyAnnote over-segments one person into multiple clusters - Update all references, log messages, engine strings, and README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OfflineDiarizerManagerdiarizeStreaming()— hybrid dual-pipeline approachChanges
SortformerService.swift→DiarizationService.swiftwithdiarizeOffline()+diarizeStreaming()EmbeddingClusterer.postProcess()gainsskipPairwiseMergeparam (VBx already handles merging)"S0"/"S1"speaker ID formatWhat's unchanged
SpeakerSegmentstruct,SpeakerDatabase, all embedding/matching logic — same 256-dim WeSpeaker embeddings from both pipelinesTest plan
diarization_engine: pyannote_offlineParakeet + PyAnnote (local)Closes #40
🤖 Generated with Claude Code