feat(asr): add Apple Speech provider (macOS 26+) by erning · Pull Request #40 · missuo/koe

erning · 2026-03-31T04:19:59Z

Summary

Add Apple Speech as a new on-device ASR provider using Apple's SpeechAnalyzer and SpeechTranscriber APIs (macOS 26+). Zero-config, zero-download speech recognition with system-managed language assets.

New KoeAppleSpeech Swift package bridges Apple's Speech framework to Rust via C FFI
Audio flows as PCM16 → AsyncStream<AnalyzerInput> → SpeechAnalyzer → progressive transcription
Results accumulated using Apple's official result.isFinal model: finalized segments build a stable prefix, volatile segments show in-progress recognition — no string-overlap heuristics
Dictionary entries passed as contextualStrings for vocabulary bias
Speech assets managed by macOS via AssetInventory (download/release/status through FFI)
Setup Wizard: language picker, asset status indicator, download/release buttons
Speech Recognition permission: requested at startup, checked defensively at session start, shown in menu bar

Implementation

Layer	Detail
Swift (`KoeAppleSpeech`)	`AppleSpeechManager` — session lifecycle, audio bridging, `finalizedTranscript + volatileTranscript` accumulation; `CBridge` — `@_cdecl` FFI for session control and asset management
Rust (`koe-asr`)	`AppleSpeechProvider` — FFI wrapper, PCM routing, tokio mpsc event channel
Rust (`koe-core`)	Provider creation from `asr.provider = "apple-speech"` config, locale and dictionary wiring
Obj-C (`KoeApp`)	Setup Wizard UI (locale popup, asset status, download/release), permission flow, status bar permission item

Runtime requirements

Minimum deployment target: macOS 14.0 (unchanged)
Apple Speech requires: macOS 26.0+ — all code paths gated with @available(macOS 26.0, *); invisible on older systems
New permission: NSSpeechRecognitionUsageDescription in Info.plist (only needed for Apple Speech)
Feature flag: apple-speech (enabled by default, excludable with --no-default-features)

App bundle changes

Size increase: ~100–200 KB (compiled Swift, no embedded models)
No bundled models: speech assets are system-managed, downloaded on-demand
New framework: Speech.framework (system framework, always present)
Zero new third-party dependencies

Test plan

Add KoeAppleSpeech Swift package with SpeechAnalyzer + SpeechTranscriber (macOS 26+) for zero-config on-device speech recognition. Audio fed as Int16 PCM via AsyncStream bridge. Dictionary entries passed as contextual strings. Asset status check and auto-install before session start. Rust side: AppleSpeechProvider implements AsrProvider trait with FFI bridge following the same pattern as KoeMLX. Feature-gated behind apple-speech flag in koe-asr. Swift FFI includes session management (start/feed/stop/cancel) and asset management (is_available/asset_status/install_asset/release_asset/ supported_locales) for Setup Wizard integration.

Add apple-speech feature flag, AppleSpeechAsrConfig (locale, default zh-Hans), provider dispatch match arm, and DEFAULT_CONFIG_YAML section. Dictionary entries passed as contextual strings for vocabulary bias.

Add KoeAppleSpeech package reference and Speech framework to both Koe and Koe-x86 targets. Add NSSpeechRecognitionUsageDescription to Info.plist. Add speech recognition permission check/request methods to SPPermissionManager.

@available

Add Apple Speech (On-Device) to ASR provider popup with @available guard. Dynamic locale list from SpeechTranscriber.supportedLocales, sorted by localized display name. Asset status display with download button (auto-download on Save if not installed). Locale picker replaces model UI when selected. Status shows "Installed — managed by macOS" with secondary color hint.

DESIGN.md: add section 30.5 (architecture, audio flow, availability, locale handling), update provider lists, feature flags, setup wizard, permissions, and summary. README.md: add to provider list, config example, permissions table, architecture diagram, ASR pipeline, and Local ASR section.

missuo

Review: PR #40 — feat(asr): add Apple Speech provider (macOS 26+)

Overall: Well-structured, follows existing patterns (KoeMLX). A few concerns:

Architecture

Good: Follows the same FFI pattern as KoeMLX (Swift → C → Rust), generation-based session management, callback locking
Good: @available(macOS 26.0, *) gating throughout, invisible on older systems
Good: Asset management FFI for Setup Wizard integration

Concerns

Semaphore + Task pattern in CBridge.swift — _supportedLocalesImpl, _assetStatusImpl, _releaseAssetImpl all use DispatchSemaphore.wait() to block the calling thread while spawning a Task. If called from the main thread, this deadlocks if the async work needs main-thread access. Consider documenting these must be called from a background thread, or use a different synchronization pattern.
SFSpeechRecognizer.requestAuthorization with semaphore in koeAppleSpeechStartSession — same deadlock risk if called from main thread.
Memory safety of event_tx_ptr — The leaked Box<Sender> pattern works but is fragile. If connect() is called twice without close(), the first sender leaks. The reclaim_sender in close() and Drop helps, but connect() should call reclaim_sender() at the top to handle re-connection.
audioFormat force-unwrap — AVAudioFormat(...)! in AppleSpeechManager will crash if the format is unsupported. Unlikely for 16kHz mono Int16, but a guard with error callback would be safer.
Missing Definite events — The provider only emits Interim (type 0) and Final (type 2). The result.isFinal is used to accumulate finalized segments internally, but no Definite (type 1) events are sent upstream. This means the core's TranscriptAggregator won't receive definite confirmations. Is this intentional?

Minor

.build/ gitignore addition is good (SPM build directory)
apple-speech feature flag name is consistent with sherpa-onnx convention

erning added 4 commits April 1, 2026 12:47

feat(core): wire apple-speech provider into koe-core

0205753

Add apple-speech feature flag, AppleSpeechAsrConfig (locale, default zh-Hans), provider dispatch match arm, and DEFAULT_CONFIG_YAML section. Dictionary entries passed as contextual strings for vocabulary bias.

erning force-pushed the feature/apple-speech branch from bc767fa to 5a53176 Compare April 1, 2026 04:47

erning force-pushed the feature/apple-speech branch from 6cda2b3 to c94c598 Compare April 1, 2026 08:44

missuo reviewed Apr 6, 2026

View reviewed changes

missuo merged commit f9e3a86 into missuo:main Apr 6, 2026

erning deleted the feature/apple-speech branch April 7, 2026 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(asr): add Apple Speech provider (macOS 26+)#40

feat(asr): add Apple Speech provider (macOS 26+)#40
missuo merged 5 commits into
missuo:mainfrom
erning:feature/apple-speech

erning commented Mar 31, 2026 •

edited

Loading

Uh oh!

missuo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erning commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation

Runtime requirements

App bundle changes

Test plan

Uh oh!

missuo left a comment

Choose a reason for hiding this comment

Review: PR #40 — feat(asr): add Apple Speech provider (macOS 26+)

Architecture

Concerns

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erning commented Mar 31, 2026 •

edited

Loading