Skip to content

✨ feat: add Qwen ASR provider support#22

Merged
missuo merged 13 commits into
missuo:mainfrom
nmvr2600:feat/aliyun-asr
Mar 28, 2026
Merged

✨ feat: add Qwen ASR provider support#22
missuo merged 13 commits into
missuo:mainfrom
nmvr2600:feat/aliyun-asr

Conversation

@nmvr2600

Copy link
Copy Markdown
Contributor

Summary

添加阿里云 Qwen(通义千问)实时语音识别 Provider,支持使用 DashScope API 进行语音转文字。

Changes

Backend (Rust)

  • koe-asr/src/qwen.rs: 新增 QwenAsrProvider 实现 WebSocket Realtime API
  • koe-asr/src/lib.rs: 导出 QwenAsrProvider
  • koe-asr/Cargo.toml: 添加 base64 依赖
  • koe-core/src/lib.rs: 集成 Qwen provider 到 ASR 会话
  • koe-core/src/config.rs: 新增 QwenAsrConfig 配置结构

Frontend (Objective-C)

  • SPSetupWizardWindowController.m:
    • 新增 Qwen API Key 配置界面
    • 支持 Doubao/Qwen Provider 切换
    • 添加 ASR 连接测试功能

Configuration

  • 新增 asr.qwen.api_key 配置项
  • 新增 asr.qwen.language 配置项(默认 zh)

Testing

  • ✅ 所有现有测试通过
  • ✅ 新增 Qwen provider 基础测试
  • ✅ 修复 2 个失败的 TranscriptAggregator 测试

Usage

在设置向导中选择 Qwen Provider 并填入 DashScope API Key 即可使用。

Related

nmvr2600 and others added 12 commits March 27, 2026 16:48
Add AliyunAsrProvider for Qwen3-ASR-Flash-Realtime API:
- Implement WebSocket-based realtime ASR client
- Support session management and audio buffer streaming
- Add configuration options for sample rate and language
- Update provider trait and config for multi-provider support
- Use aggregator.best_text() for final transcript display
- Handle ASR error events during finalization
- Fix ASR settings UI: API Key alignment and spacing
- Update version to 1.1.0 (build 11)
Add pending_events queue to properly handle multiple events
returned from a single server message (e.g., Definite + Final).
Add "Test Connection" button to ASR configuration panel for both
Doubao and Aliyun providers. Users can now verify their API keys
before saving the configuration.

Features:
- WebSocket connection test for Doubao ASR
- WebSocket connection test for Aliyun ASR
- Chinese error messages for common failure cases
- 10-second timeout protection
- UI state management (disable button during test)

🤖 Generated with Claude via Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Add text merging logic to combine final and interim results
- Cache best_text to avoid repeated computation
- Fix ASR event draining to capture trailing segments
- Add api_test for transcript aggregation

🤖 Generated with Claude via Claude Code + Compound Engineering v2.53.0

Co-Authored-By: Claude (Claude Code context) <noreply@anthropic.com>
- Add Test button next to Provider dropdown in ASR settings
- Implement WebSocket connection test for Doubao ASR
- Implement WebSocket connection test for Aliyun ASR
- Show localized Chinese error messages for connection failures
- Adjust ASR panel height for better layout

🤖 Generated with Claude via Claude Code + Compound Engineering v2.53.0

Co-Authored-By: Claude (Claude Code context) <noreply@anthropic.com>
阿里云 ASR 的 interim 结果包含累积的 text + stash,直接使用会导致
UI 上先显示第一段,再显示第二段,然后又合并显示的问题。

改为使用 aggregator.best_text() 统一聚合所有文本结果,确保预览
文本正确去重和合并。
- Increase threshold from 0.0 to 0.5 to filter out ambient noise
- Reduce prefix_padding_ms from 300 to 100 to minimize noise inclusion
- Extract VAD parameters into named constants for clarity
1. Rename provider from aliyun to qwen for clarity
2. Remove NSLog debug statements from SPSetupWizardWindowController.m
3. Revert TranscriptAggregator changes to avoid affecting Doubao
4. Restore hotkey cancel-key logic to original condition
5. Restore wait_for_final condition to original
6. Restore config.rs formatting from main
7. Update QwenAsrConfig with Default derive
- Remove two failing tests that expected unsupported behavior
  (transcript merging after final result)
- Add basic test coverage for QwenAsrProvider
- Import QwenAsrProvider in integration tests

@missuo missuo left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Core functionality looks good

The Qwen ASR provider implementation, config integration, and UI changes for provider switching + test connection are solid.

Unnecessary changes

  1. Formatting-only diffs — The following files have changes that are purely rustfmt reformatting, unrelated to the Qwen feature:

    • koe-core/src/llm/openai_compatible.rsformat! macro line wrapping
    • koe-core/src/prompt.rslog::warn! and if statement reformatting
    • koe-core/src/session.rsSession::new params and log::debug! reformatting
    • koe-core/src/dictionary.rslog::info! reformatting
    • koe-asr/src/doubao.rs.unwrap_or(false) line merge

    Please revert these — they add noise to the diff and make git blame harder to use.

  2. NSLog removals — 4 NSLog statements removed in SPSetupWizardWindowController.m (config save, dictionary save, prompt save, "settings saved"). Unrelated to Qwen — should be a separate commit if intended.

  3. Dead code in hotkey loadingtriggerFound and cancelFound variables are assigned but never read. Remove them.

  4. Hand-rolled base64 encoderqwen.rs contains a ~40-line manual base64 implementation, but Cargo.toml already adds base64 = "0.22" as a dependency. Just use the crate.

  5. .gitignore adding docs/ — Unrelated to this feature, should be a separate commit.

Suggestions

  • AsrConfig pollution — The language field was added to the shared AsrConfig struct, but only Qwen uses it. Consider having each provider handle its own config internally, or at least document that this field is Qwen-specific.
  • Duplicate error handling in test connectiontestDoubaoConnection and testQwenConnection share a lot of identical error-matching logic. Consider extracting a common helper.

- Revert formatting-only changes in unrelated files:
  - koe-core/src/llm/openai_compatible.rs
  - koe-core/src/prompt.rs
  - koe-core/src/session.rs
  - koe-core/src/dictionary.rs
  - koe-asr/src/doubao.rs
- Revert NSLog removals in SPSetupWizardWindowController.m
- Remove dead code (triggerFound/cancelFound variables)
- Replace hand-rolled base64 with base64 crate (0.22)
- Revert .gitignore docs/ change

@missuo missuo left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after feedback commit (3b407a9)

All previously flagged issues have been addressed:

  • Formatting-only diffs reverted
  • .gitignore and NSLog removals reverted
  • Dead triggerFound/cancelFound variables removed
  • Hand-rolled base64 replaced with the base64 crate

Remaining minor notes (non-blocking)

  1. Hotkey loading refactor — The hotkey section in loadValuesForPane: still has cosmetic changes (splitting into triggerKeyRaw/cancelKeyRaw intermediate variables, adding blank lines and a comment). These are harmless but not strictly necessary for this PR.

  2. AsrConfig as a shared struct — Qwen stuffs its api_key into the access_key field and leaves url, app_key, resource_id as empty strings. This works but is a bit of a code smell — the shared config struct is effectively Doubao-shaped and Qwen has to work around it. Fine for now, worth revisiting if a third provider is added.

  3. ObjC test connection code duplicationtestDoubaoConnection and testQwenConnection share ~80% identical error-handling logic. A shared helper like - (void)handleAsrTestError:(NSError *)error would reduce this.

LGTM — the core Qwen ASR implementation and UI integration look good. Approving.

@missuo missuo merged commit 9cfdcd9 into missuo:main Mar 28, 2026
@missuo

missuo commented Mar 28, 2026

Copy link
Copy Markdown
Owner

BTW, please do not include any emojis in commit messages next time, and please follow the contribution requirements.

@missuo

missuo commented Mar 28, 2026

Copy link
Copy Markdown
Owner

In addition, there should be no comments in Chinese. In principle, there should be no part of the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants