✨ feat: add Qwen ASR provider support#22
Conversation
Add AliyunAsrProvider for Qwen3-ASR-Flash-Realtime API: - Implement WebSocket-based realtime ASR client - Support session management and audio buffer streaming - Add configuration options for sample rate and language - Update provider trait and config for multi-provider support
- Use aggregator.best_text() for final transcript display - Handle ASR error events during finalization - Fix ASR settings UI: API Key alignment and spacing - Update version to 1.1.0 (build 11)
Add pending_events queue to properly handle multiple events returned from a single server message (e.g., Definite + Final).
Add "Test Connection" button to ASR configuration panel for both Doubao and Aliyun providers. Users can now verify their API keys before saving the configuration. Features: - WebSocket connection test for Doubao ASR - WebSocket connection test for Aliyun ASR - Chinese error messages for common failure cases - 10-second timeout protection - UI state management (disable button during test) 🤖 Generated with Claude via Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
- Add text merging logic to combine final and interim results - Cache best_text to avoid repeated computation - Fix ASR event draining to capture trailing segments - Add api_test for transcript aggregation 🤖 Generated with Claude via Claude Code + Compound Engineering v2.53.0 Co-Authored-By: Claude (Claude Code context) <noreply@anthropic.com>
- Add Test button next to Provider dropdown in ASR settings - Implement WebSocket connection test for Doubao ASR - Implement WebSocket connection test for Aliyun ASR - Show localized Chinese error messages for connection failures - Adjust ASR panel height for better layout 🤖 Generated with Claude via Claude Code + Compound Engineering v2.53.0 Co-Authored-By: Claude (Claude Code context) <noreply@anthropic.com>
阿里云 ASR 的 interim 结果包含累积的 text + stash,直接使用会导致 UI 上先显示第一段,再显示第二段,然后又合并显示的问题。 改为使用 aggregator.best_text() 统一聚合所有文本结果,确保预览 文本正确去重和合并。
- Increase threshold from 0.0 to 0.5 to filter out ambient noise - Reduce prefix_padding_ms from 300 to 100 to minimize noise inclusion - Extract VAD parameters into named constants for clarity
…list and update-feed.json)
1. Rename provider from aliyun to qwen for clarity 2. Remove NSLog debug statements from SPSetupWizardWindowController.m 3. Revert TranscriptAggregator changes to avoid affecting Doubao 4. Restore hotkey cancel-key logic to original condition 5. Restore wait_for_final condition to original 6. Restore config.rs formatting from main 7. Update QwenAsrConfig with Default derive
- Remove two failing tests that expected unsupported behavior (transcript merging after final result) - Add basic test coverage for QwenAsrProvider - Import QwenAsrProvider in integration tests
missuo
left a comment
There was a problem hiding this comment.
Review
Core functionality looks good
The Qwen ASR provider implementation, config integration, and UI changes for provider switching + test connection are solid.
Unnecessary changes
-
Formatting-only diffs — The following files have changes that are purely rustfmt reformatting, unrelated to the Qwen feature:
koe-core/src/llm/openai_compatible.rs—format!macro line wrappingkoe-core/src/prompt.rs—log::warn!andifstatement reformattingkoe-core/src/session.rs—Session::newparams andlog::debug!reformattingkoe-core/src/dictionary.rs—log::info!reformattingkoe-asr/src/doubao.rs—.unwrap_or(false)line merge
Please revert these — they add noise to the diff and make git blame harder to use.
-
NSLogremovals — 4NSLogstatements removed inSPSetupWizardWindowController.m(config save, dictionary save, prompt save, "settings saved"). Unrelated to Qwen — should be a separate commit if intended. -
Dead code in hotkey loading —
triggerFoundandcancelFoundvariables are assigned but never read. Remove them. -
Hand-rolled base64 encoder —
qwen.rscontains a ~40-line manual base64 implementation, butCargo.tomlalready addsbase64 = "0.22"as a dependency. Just use the crate. -
.gitignoreaddingdocs/— Unrelated to this feature, should be a separate commit.
Suggestions
AsrConfigpollution — Thelanguagefield was added to the sharedAsrConfigstruct, but only Qwen uses it. Consider having each provider handle its own config internally, or at least document that this field is Qwen-specific.- Duplicate error handling in test connection —
testDoubaoConnectionandtestQwenConnectionshare a lot of identical error-matching logic. Consider extracting a common helper.
- Revert formatting-only changes in unrelated files: - koe-core/src/llm/openai_compatible.rs - koe-core/src/prompt.rs - koe-core/src/session.rs - koe-core/src/dictionary.rs - koe-asr/src/doubao.rs - Revert NSLog removals in SPSetupWizardWindowController.m - Remove dead code (triggerFound/cancelFound variables) - Replace hand-rolled base64 with base64 crate (0.22) - Revert .gitignore docs/ change
missuo
left a comment
There was a problem hiding this comment.
Re-review after feedback commit (3b407a9)
All previously flagged issues have been addressed:
- Formatting-only diffs reverted
.gitignoreandNSLogremovals reverted- Dead
triggerFound/cancelFoundvariables removed - Hand-rolled base64 replaced with the
base64crate
Remaining minor notes (non-blocking)
-
Hotkey loading refactor — The hotkey section in
loadValuesForPane:still has cosmetic changes (splitting intotriggerKeyRaw/cancelKeyRawintermediate variables, adding blank lines and a comment). These are harmless but not strictly necessary for this PR. -
AsrConfigas a shared struct — Qwen stuffs itsapi_keyinto theaccess_keyfield and leavesurl,app_key,resource_idas empty strings. This works but is a bit of a code smell — the shared config struct is effectively Doubao-shaped and Qwen has to work around it. Fine for now, worth revisiting if a third provider is added. -
ObjC test connection code duplication —
testDoubaoConnectionandtestQwenConnectionshare ~80% identical error-handling logic. A shared helper like- (void)handleAsrTestError:(NSError *)errorwould reduce this.
LGTM — the core Qwen ASR implementation and UI integration look good. Approving.
|
BTW, please do not include any emojis in commit messages next time, and please follow the contribution requirements. |
|
In addition, there should be no comments in Chinese. In principle, there should be no part of the test. |
Summary
添加阿里云 Qwen(通义千问)实时语音识别 Provider,支持使用 DashScope API 进行语音转文字。
Changes
Backend (Rust)
Frontend (Objective-C)
Configuration
asr.qwen.api_key配置项asr.qwen.language配置项(默认 zh)Testing
Usage
在设置向导中选择 Qwen Provider 并填入 DashScope API Key 即可使用。
Related