Skip to content

feat: add local VL model directory support for PaddleOCR native backend#50

Merged
hubo1989 merged 11 commits into
mainfrom
feat/paddleocr-vlm-provider
Feb 28, 2026
Merged

feat: add local VL model directory support for PaddleOCR native backend#50
hubo1989 merged 11 commits into
mainfrom
feat/paddleocr-vlm-provider

Conversation

@hubo1989
Copy link
Copy Markdown
Owner

@hubo1989 hubo1989 commented Feb 28, 2026

Summary

  • Add localVLModelDir configuration to support native backend with local PaddleOCR-VL models
  • Fix error message display to show correct model info based on provider type
  • Fix localization key mismatch (error.ocr.recognition.failederror.ocr.failed)

Changes

  • Add localVLModelDir property to PaddleOCREngine.Configuration
  • Support --vl_rec_backend native --vl_rec_model_dir when MLX-VLM is disabled but local model path is set
  • Add UI input field for local model directory (shows when MLX-VLM is unchecked in precise mode)
  • Fix TranslationFlowController.showErrorAlert() to show provider-appropriate model info
  • Add localization strings for local model directory in English and Chinese

Test Plan

  1. Select PaddleOCR mode as "Precise"
  2. Uncheck "Use MLX-VLM"
  3. Enter local model path (e.g., ~/.paddlex/official_models/PaddleOCR-VL-1.5)
  4. Verify OCR works with native backend

🤖 Generated with Droid

Summary by CodeRabbit

发布说明

  • 新功能

    • 添加 PaddleOCR 作为新的 OCR/VLM 提供者,支持本地/云与快速/精确两种模式,并集成 MLX‑VLM 支持与可用性检测。
    • 在设置中新增完整 PaddleOCR 配置界面及凭据管理(密钥存储/删除/读取)。
  • 改进

    • 屏幕录制权限检查改为异步双重验证以提高准确性。
    • 翻译流程的错误消息键和展示方式已更新以改善错误呈现。
  • 本地化

    • 增加英文与简体中文的 PaddleOCR 相关翻译条目。

Hubert and others added 9 commits February 28, 2026 14:13
- Add PaddleOCRVLMProvider implementing VLMProviderProtocol
- Support two modes:
  - Fast mode: paddleocr ocr command (~1s)
  - Precise mode: paddleocr doc_parser VL-1.5 (~12s)
- Add PaddleOCRMode enum with settings UI
- Add cloud API configuration options (baseURL, API key)
- Fix JSON parsing for numpy arrays and float formats
- Update localization (en/zh-Hans)
- Fix screen recording permission check using SCShareableContent

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Remove redundant raw values from PaddleOCRMode enum
- Add PaddleOCR settings reset in resetToDefaults()
- Remove redundant MainActor.run wrapper in SettingsViewModel
- Add static defaultBaseURL constant to avoid force unwrap
- Add CJK-aware separator in merged(with:) for proper spacing
- Replace window counting heuristic with async SCShareableContent check in OnboardingViewModel

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- testPaddleOCRConnection: check cloud mode first with URL/API validation
- checkPermissions: use permissionCheckTask to avoid race conditions
- paddleOCRCloudAPIKey: store in Keychain instead of UserDefaults
- analyze(image:): check PaddleOCREngine availability only for local mode
- isAvailable: mode-aware check (cloud checks URL, local checks installation)
- weightedConfidence: handle divide-by-zero edge case

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Fix concurrency violation: capture paddleOCRCloudAPIKey before Task.detached
- Extract Keychain constants: add serviceIdentifier and paddleOCRAccount to KeychainService
- Fix CJK separator: check last char of first string and first char of second string
- Update AppSettings to use shared KeychainService constants

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add paddleOCRUseMLXVLM, paddleOCRMLXVLMServerURL, paddleOCRMLXVLMModelName settings
- Update PaddleOCREngine.Configuration with MLX-VLM fields
- Add --vl_rec_backend, --vl_rec_server_url, --vl_rec_api_model_name args
- Add UI settings (visible when precise mode and not using cloud)
- Add English and Chinese localizations

Usage:
1. Install: pip install "mlx-vlm>=0.3.11"
2. Start server: mlx_vlm.server --port 8111
3. Enable in Settings > PaddleOCR > Use MLX-VLM

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add isMLXVLMServerRunning and isCheckingMLXVLMServer state
- Add checkMLXVLMServerStatus() method to verify server connectivity
- Add UI with status indicator and refresh button
- Add English and Chinese localizations for status messages

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add MLX-VLM check in refreshPaddleOCRStatus()
- Add onAppear handler in PaddleOCRStatusSection

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Keys should not include format specifier %@

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add localVLModelDir configuration to PaddleOCREngine.Configuration
- Support native backend with --vl_rec_model_dir when MLX-VLM is disabled
- Add UI input field for local model directory (shows when MLX-VLM unchecked in precise mode)
- Fix error message to show correct model info based on provider type
- Fix localization key from error.ocr.recognition.failed to error.ocr.failed
- Add localization strings for local model directory in EN/CN

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 28, 2026

📝 Walkthrough

Walkthrough

本 PR 引入 PaddleOCR 作为新的 VLM 提供商(本地/云/MLX‑VLM),扩展设置、模型与 Keychain 支持,并将屏幕录制权限检查从纯同步改为以 SCShareableContent 为准的异步验证,同时在多处使用该异步检查。

Changes

Cohort / File(s) Summary
权限检查与捕获
ScreenTranslate/Features/Capture/CaptureManager.swift, ScreenTranslate/Features/Capture/ScreenDetector.swift, ScreenTranslate/Features/Onboarding/OnboardingViewModel.swift
将屏幕录制权限检查改为两阶段:先 CGPreflightScreenCaptureAccess() 快速判断,再通过 SCShareableContent.current 实际验证(async/await);增加 hasPermissionSync 同步备用接口,更新所有调用点为异步检查。
PaddleOCR 引擎与解析
ScreenTranslate/Services/PaddleOCREngine.swift, ScreenTranslate/Services/PaddleOCRVLMProvider.swift
新增 PaddleOCREngine 的 fast/precise 模式与 MLX‑VLM 支持;扩展命令行参数与 JSON 解析逻辑;新增 PaddleOCRVLMProvider 实现 VLMProvider,包含配置构建、OCR 结果到 ScreenAnalysisResult 的转换、行分组与 CJK 文本拼接。
VLM 提供商枚举与工厂
ScreenTranslate/Models/VLMProviderType.swift, ScreenTranslate/Services/ScreenCoderEngine.swift
在 VLMProviderType 中添加 paddleocr 枚举值,并在 ScreenCoderEngine 的工厂中返回 PaddleOCRVLMProvider() 分支。
应用设置与视图模型
ScreenTranslate/Models/AppSettings.swift, ScreenTranslate/Features/Settings/SettingsViewModel.swift, ScreenTranslate/Features/Settings/EngineSettingsTab.swift
在 AppSettings 添加入 PaddleOCR 配置项(mode、云/MLX/VL 本地路径、API key 存储触发),SettingsViewModel 增加对应绑定与 MLX‑VLM 健康检查、连通性测试;EngineSettingsTab 新增 PaddleOCRStatusSection 与相关 UI 分支。
本地化
ScreenTranslate/Resources/en.lproj/Localizable.strings, ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings
新增大量 PaddleOCR 相关字符串与 VLM 提供商说明,调整若干 translationFlow.error 键以移除旧的占位符格式。
凭证存储(Keychain)
ScreenTranslate/Services/Security/KeychainService.swift
新增通用 serviceIdentifier 与 PaddleOCR 专用 account,新增保存/读取/删除 PaddleOCR 云 API Key 的方法并复用内部通用保存逻辑。
翻译流程错误信息
ScreenTranslate/Features/TranslationFlow/TranslationFlowController.swift
在分析失败的错误信息中引入 provider 感知的 model 信息选择逻辑(PaddleOCR + MLX‑VLM 时使用 paddleOCRMLXVLMModelName,否则按 provider 决定)。
项目元数据
.omd/project-memory.json, .omd/sessions/...json
添加项目记忆与会话记录的 JSON 文件(工具/元数据文件)。

Sequence Diagram(s)

sequenceDiagram
  participant UI as rgba(60,130,160,0.5) UI/Onboarding
  participant CaptureMgr as rgba(80,160,100,0.5) CaptureManager
  participant ScreenDet as rgba(200,120,60,0.5) ScreenDetector
  participant SCShare as rgba(180,80,200,0.5) SCShareableContent

  UI->>CaptureMgr: 请求捕获/检查权限 (await)
  CaptureMgr->>ScreenDet: hasPermission() (async)
  ScreenDet->>SCShare: CGPreflight 快速检查
  alt CGPreflight = false
    ScreenDet-->>CaptureMgr: 返回 false
  else CGPreflight = true
    ScreenDet->>SCShare: 请求 SCShareableContent.current (async)
    SCShare-->>ScreenDet: 返回内容或抛错
    ScreenDet-->>CaptureMgr: 返回最终权限结果
  end
  CaptureMgr-->>UI: 权限结果 / 执行捕获
Loading
sequenceDiagram
  participant UI as rgba(60,130,160,0.5) UI/TranslationRequest
  participant EngineFactory as rgba(80,160,100,0.5) ScreenCoderEngine
  participant Provider as rgba(200,120,60,0.5) PaddleOCRVLMProvider
  participant PaddleEngine as rgba(180,80,200,0.5) PaddleOCREngine
  participant MLXVLM as rgba(120,100,220,0.5) MLX-VLM Server

  UI->>EngineFactory: 请求分析 (provider = paddleocr)
  EngineFactory->>Provider: 返回 PaddleOCRVLMProvider
  Provider->>PaddleEngine: buildConfiguration() (读取 AppSettings)
  alt useCloud
    Provider->>PaddleEngine: 调用云端接口
  else useMLXVLM
    Provider->>MLXVLM: 调用 MLX-VLM Server for VL
    MLXVLM-->>PaddleEngine: 返回 VL 输出
    PaddleEngine-->>Provider: 返回精确解析结果
  else local
    Provider->>PaddleEngine: 调用本地命令/二进制
  end
  Provider-->>UI: 返回 ScreenAnalysisResult
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

兔子诗

🐰 新增 PaddleOCR,识别更敏捷,
云端本地与 MLX,灵活又稳健,
权限异步更真实,捕获流程更清晰,
兔兔跳跃翻译田野,文字花开成片。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 79.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 标题准确反映了PR的主要变更内容:为PaddleOCR原生后端添加本地VL模型目录支持。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/paddleocr-vlm-provider

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (8)
ScreenTranslate/Services/PaddleOCREngine.swift (2)

615-617: 删除无效代码。

content.replacingOccurrences(of: "[]", with: "[]") 是一个空操作,没有任何效果。

♻️ 建议删除无效代码
             content = content.replacingOccurrences(of: "[,", with: "[")
             content = content.replacingOccurrences(of: ",]", with: "]")
-            // Handle edge case of empty nested arrays
-            content = content.replacingOccurrences(of: "[]", with: "[]")
             return content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/PaddleOCREngine.swift` around lines 615 - 617, The
line content = content.replacingOccurrences(of: "[]", with: "[]") in
PaddleOCREngine (near the block handling "edge case of empty nested arrays") is
a no-op and should be removed; update the surrounding code in the same function
so the early-return still occurs for truly empty content (keep the comment and
the return content statement if needed) and delete only the redundant
replacingOccurrences call to avoid misleading code.

264-270: 考虑对 localVLModelDir 进行路径展开。

用户可能会输入类似 ~/.paddlex/official_models/PaddleOCR-VL-1.5 的路径(如 PR 测试计划所述),但 ~ 不会被 shell 自动展开。由于命令通过 zsh 执行,这可能可以正常工作,但建议在 Swift 端显式展开以确保可靠性。

♻️ 建议的路径展开修复
             } else if !config.localVLModelDir.isEmpty {
                 // Use native backend with local model
+                let expandedPath = (config.localVLModelDir as NSString).expandingTildeInPath
                 args += [
                     "--vl_rec_backend", "native",
-                    "--vl_rec_model_dir", config.localVLModelDir
+                    "--vl_rec_model_dir", expandedPath
                 ]
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/PaddleOCREngine.swift` around lines 264 - 270,
Expand any leading tilde in config.localVLModelDir before adding it to args;
replace uses of config.localVLModelDir in the PaddleOCREngine argument-building
branch (the block that sets "--vl_rec_model_dir") with an expanded path (e.g.,
use NSString(...).expandingTildeInPath or URL-based expansion) so the final
argument passes a full filesystem path rather than a "~" form.
ScreenTranslate/Features/Capture/CaptureManager.swift (1)

91-93: 建议与 ScreenDetector 保持一致。

CaptureManager.hasPermission 现在使用 SCShareableContent.current 进行更可靠的权限验证,但 ScreenDetector.hasPermission (lines 128-134) 仍然只使用 CGPreflightScreenCaptureAccess()。考虑统一两处的权限检查逻辑以确保一致性。

Also applies to: 160-162

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Features/Capture/CaptureManager.swift` around lines 91 - 93,
Unify the permission-check logic by changing ScreenDetector.hasPermission (and
the other similar check around the second occurrence) to use the same
SCShareableContent.current-based approach as CaptureManager.hasPermission
instead of CGPreflightScreenCaptureAccess(); update the method to perform the
async query to SCShareableContent.current, mirror the same success/failure
handling and return/error semantics used by CaptureManager.hasPermission, and
ensure any callers awaiting permission are updated to the async pattern.
ScreenTranslate/Services/Security/KeychainService.swift (1)

322-370: 存在代码重复,考虑抽取通用方法。

savePaddleOCRCredentials 的实现与 saveCredentials(apiKey:forCompatibleId:) 几乎完全相同,仅账户名称不同。虽然为了保持一致性当前实现是可接受的,但可以考虑将通用逻辑抽取为私有辅助方法以减少重复。

♻️ 可选的重构方案
// 私有辅助方法
private func saveCredentialsInternal(apiKey: String, account: String, label: String) throws {
    // 共享的 Keychain 保存逻辑
}

// 公共方法使用辅助方法
func savePaddleOCRCredentials(apiKey: String) throws {
    try saveCredentialsInternal(apiKey: apiKey, account: Self.paddleOCRAccount, label: "PaddleOCR cloud")
}

func saveCredentials(apiKey: String, forCompatibleId compatibleId: String) throws {
    try saveCredentialsInternal(apiKey: apiKey, account: compatibleId, label: "compatible engine \(compatibleId)")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/Security/KeychainService.swift` around lines 322 -
370, The savePaddleOCRCredentials implementation duplicates the keychain save
logic in saveCredentials(apiKey:forCompatibleId:); extract the shared logic into
a private helper (e.g., saveCredentialsInternal(apiKey:account:label:)) that
accepts the apiKey, account string (use Self.paddleOCRAccount or compatibleId)
and a descriptive label, perform the JSON encoding,
SecItemCopyMatching/SecItemUpdate/SecItemAdd handling and KeychainError handling
there, and then have savePaddleOCRCredentials and
saveCredentials(apiKey:forCompatibleId:) simply call that helper; keep
references to StoredCredentials, KeychainError, and the SecItem APIs unchanged.
ScreenTranslate/Services/PaddleOCRVLMProvider.swift (2)

53-71: analyze 方法实现正确,但错误信息可本地化。

方法逻辑清晰:先构建配置,检查本地可用性(仅限本地模式),然后执行 OCR 并转换结果。建议将硬编码的安装提示字符串提取为本地化字符串,以支持多语言。

♻️ 建议使用本地化字符串

考虑将安装提示信息提取为本地化字符串:

             guard await PaddleOCREngine.shared.isAvailable else {
                 throw VLMProviderError.invalidConfiguration(
-                    "PaddleOCR is not installed. Install it using: pip3 install paddleocr paddlepaddle"
+                    NSLocalizedString("error.paddleocr.not.installed", comment: "PaddleOCR not installed error")
                 )
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift` around lines 53 - 71,
The hardcoded error message in analyze is not localized; replace the literal
install prompt passed to VLMProviderError.invalidConfiguration with a localized
string (e.g. use NSLocalizedString or a Localization helper) and add a
localization key like "PaddleOCRNotInstalled" that contains the install
instruction ("Install it using: pip3 install paddleocr paddlepaddle") so
PaddleOCREngine.shared.isAvailable, analyze, and the thrown
VLMProviderError.invalidConfiguration all use the localized message.

232-244: CJK 字符检测覆盖主要范围,但可能遗漏部分字符。

当前实现覆盖了 CJK 统一汉字、扩展 A、平假名、片假名和韩文音节。可考虑添加更多范围以提高准确性,但当前实现对于主要使用场景已足够。

♻️ 可选:扩展 CJK 字符范围

可考虑添加以下范围:

  • CJK 扩展 B-F: U+20000-U+2FA1F
  • 全角标点: U+3000-U+303F
  • 日文标点符号
         return (0x4E00...0x9FFF).contains(scalar) ||
                (0x3400...0x4DBF).contains(scalar) ||
                (0x3040...0x309F).contains(scalar) ||
                (0x30A0...0x30FF).contains(scalar) ||
-               (0xAC00...0xD7AF).contains(scalar)
+               (0xAC00...0xD7AF).contains(scalar) ||
+               (0x3000...0x303F).contains(scalar)  // CJK 符号和标点
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift` around lines 232 - 244,
The isCJKChar(_:) function currently checks common CJK blocks but omits extended
ranges and fullwidth punctuation; update isCJKChar to include additional Unicode
scalar ranges such as CJK Symbols and Punctuation (U+3000–U+303F), Fullwidth
Forms (U+FF00–U+FFEF), and the CJK Extension planes (U+20000–U+2FA1F) so
characters from extensions B–F and fullwidth/japanese punctuation are detected;
adjust logic in isCJKChar to test these ranges against char.unicodeScalars
(handling scalars > 0xFFFF correctly) and return true when any of these ranges
match.
ScreenTranslate/Models/AppSettings.swift (1)

506-519: 重置逻辑完整,但 Task.detached 中错误被静默忽略。

重置 PaddleOCR 设置的逻辑是完整的,包括清除 Keychain 凭证。但 try? 会静默忽略删除 Keychain 凭证时的错误,建议至少记录错误日志以便调试。

♻️ 建议添加错误日志
         // Delete PaddleOCR cloud API key from Keychain
         Task.detached {
-            try? await KeychainService.shared.deletePaddleOCRCredentials()
+            do {
+                try await KeychainService.shared.deletePaddleOCRCredentials()
+            } catch {
+                Logger.settings.warning("Failed to delete PaddleOCR credentials from Keychain: \(error)")
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Models/AppSettings.swift` around lines 506 - 519, The
Task.detached block that calls
KeychainService.shared.deletePaddleOCRCredentials() currently swallows errors
with try?, change it to await the call inside a do-catch (or use Task { } with
try and catch) and log any caught error instead of ignoring it; update the block
that references KeychainService.shared.deletePaddleOCRCredentials(), paddleOCR
settings reset, and Task.detached to perform do { try await
KeychainService.shared.deletePaddleOCRCredentials() } catch { <log the error via
the project's logger or os_log/print with context> } so deletion failures are
recorded.
ScreenTranslate/Features/Settings/SettingsViewModel.swift (1)

869-896: MLX-VLM 服务状态检查的 HTTP 状态码判断可能过于宽松。

当前逻辑将所有 statusCode < 500 都视为服务运行中,这包括了 4xx 客户端错误(如 400 Bad Request、404 Not Found)。对于健康检查,通常只有 2xx 状态码才表示服务正常运行。

♻️ 建议收紧状态码判断
                 if let httpResponse = response as? HTTPURLResponse {
-                    isRunning = httpResponse.statusCode < 500
+                    // 2xx 或 3xx 表示服务可用
+                    isRunning = (200..<400).contains(httpResponse.statusCode)
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 869 -
896, The health-check treats any HTTP statusCode < 500 as "running", which
incorrectly includes 4xx errors; update the Task.detached block that uses
paddleOCRMLXVLMServerURL to consider the server running only for 2xx responses
(e.g., check if let httpResponse = response as? HTTPURLResponse { isRunning =
(200...299).contains(httpResponse.statusCode) }), leaving the MainActor updates
to isMLXVLMServerRunning and isCheckingMLXVLMServer unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ScreenTranslate/Features/Capture/CaptureManager.swift`:
- Around line 91-93: Unify the permission-check logic by changing
ScreenDetector.hasPermission (and the other similar check around the second
occurrence) to use the same SCShareableContent.current-based approach as
CaptureManager.hasPermission instead of CGPreflightScreenCaptureAccess(); update
the method to perform the async query to SCShareableContent.current, mirror the
same success/failure handling and return/error semantics used by
CaptureManager.hasPermission, and ensure any callers awaiting permission are
updated to the async pattern.

In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 869-896: The health-check treats any HTTP statusCode < 500 as
"running", which incorrectly includes 4xx errors; update the Task.detached block
that uses paddleOCRMLXVLMServerURL to consider the server running only for 2xx
responses (e.g., check if let httpResponse = response as? HTTPURLResponse {
isRunning = (200...299).contains(httpResponse.statusCode) }), leaving the
MainActor updates to isMLXVLMServerRunning and isCheckingMLXVLMServer unchanged.

In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 506-519: The Task.detached block that calls
KeychainService.shared.deletePaddleOCRCredentials() currently swallows errors
with try?, change it to await the call inside a do-catch (or use Task { } with
try and catch) and log any caught error instead of ignoring it; update the block
that references KeychainService.shared.deletePaddleOCRCredentials(), paddleOCR
settings reset, and Task.detached to perform do { try await
KeychainService.shared.deletePaddleOCRCredentials() } catch { <log the error via
the project's logger or os_log/print with context> } so deletion failures are
recorded.

In `@ScreenTranslate/Services/PaddleOCREngine.swift`:
- Around line 615-617: The line content = content.replacingOccurrences(of: "[]",
with: "[]") in PaddleOCREngine (near the block handling "edge case of empty
nested arrays") is a no-op and should be removed; update the surrounding code in
the same function so the early-return still occurs for truly empty content (keep
the comment and the return content statement if needed) and delete only the
redundant replacingOccurrences call to avoid misleading code.
- Around line 264-270: Expand any leading tilde in config.localVLModelDir before
adding it to args; replace uses of config.localVLModelDir in the PaddleOCREngine
argument-building branch (the block that sets "--vl_rec_model_dir") with an
expanded path (e.g., use NSString(...).expandingTildeInPath or URL-based
expansion) so the final argument passes a full filesystem path rather than a "~"
form.

In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift`:
- Around line 53-71: The hardcoded error message in analyze is not localized;
replace the literal install prompt passed to
VLMProviderError.invalidConfiguration with a localized string (e.g. use
NSLocalizedString or a Localization helper) and add a localization key like
"PaddleOCRNotInstalled" that contains the install instruction ("Install it
using: pip3 install paddleocr paddlepaddle") so
PaddleOCREngine.shared.isAvailable, analyze, and the thrown
VLMProviderError.invalidConfiguration all use the localized message.
- Around line 232-244: The isCJKChar(_:) function currently checks common CJK
blocks but omits extended ranges and fullwidth punctuation; update isCJKChar to
include additional Unicode scalar ranges such as CJK Symbols and Punctuation
(U+3000–U+303F), Fullwidth Forms (U+FF00–U+FFEF), and the CJK Extension planes
(U+20000–U+2FA1F) so characters from extensions B–F and fullwidth/japanese
punctuation are detected; adjust logic in isCJKChar to test these ranges against
char.unicodeScalars (handling scalars > 0xFFFF correctly) and return true when
any of these ranges match.

In `@ScreenTranslate/Services/Security/KeychainService.swift`:
- Around line 322-370: The savePaddleOCRCredentials implementation duplicates
the keychain save logic in saveCredentials(apiKey:forCompatibleId:); extract the
shared logic into a private helper (e.g.,
saveCredentialsInternal(apiKey:account:label:)) that accepts the apiKey, account
string (use Self.paddleOCRAccount or compatibleId) and a descriptive label,
perform the JSON encoding, SecItemCopyMatching/SecItemUpdate/SecItemAdd handling
and KeychainError handling there, and then have savePaddleOCRCredentials and
saveCredentials(apiKey:forCompatibleId:) simply call that helper; keep
references to StoredCredentials, KeychainError, and the SecItem APIs unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5eb8dd and 2e1c0d1.

📒 Files selected for processing (13)
  • ScreenTranslate/Features/Capture/CaptureManager.swift
  • ScreenTranslate/Features/Onboarding/OnboardingViewModel.swift
  • ScreenTranslate/Features/Settings/EngineSettingsTab.swift
  • ScreenTranslate/Features/Settings/SettingsViewModel.swift
  • ScreenTranslate/Features/TranslationFlow/TranslationFlowController.swift
  • ScreenTranslate/Models/AppSettings.swift
  • ScreenTranslate/Models/VLMProviderType.swift
  • ScreenTranslate/Resources/en.lproj/Localizable.strings
  • ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings
  • ScreenTranslate/Services/PaddleOCREngine.swift
  • ScreenTranslate/Services/PaddleOCRVLMProvider.swift
  • ScreenTranslate/Services/ScreenCoderEngine.swift
  • ScreenTranslate/Services/Security/KeychainService.swift

- Fix ScreenDetector.hasPermission to use SCShareableContent.current
- Fix health-check to only accept 2xx responses (not <500)
- Add error logging for keychain deletion failures
- Remove no-op replacingOccurrences in PaddleOCREngine
- Add tilde expansion for localVLModelDir path
- Localize PaddleOCR not installed error message
- Extend isCJKChar with CJK punctuation, fullwidth forms, extensions B-F
- Extract saveCredentialsInternal helper to eliminate duplicate keychain logic

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
ScreenTranslate/Features/Capture/ScreenDetector.swift (1)

128-145: cachedPermissionStatus 被写入但从未被读取。

方法设置了 cachedPermissionStatus 的值,但该缓存从未在后续权限检查中被使用。如果缓存是有意设计的,应在执行检查前先读取缓存值;如果不需要缓存,建议移除该属性以避免产生死代码。

♻️ 建议:使用缓存或移除该属性

方案 1:移除未使用的缓存属性(如果不需要缓存)

-    /// Cached permission status - only check once to avoid repeated dialogs
-    private var cachedPermissionStatus: Bool?

并移除方法中所有对 cachedPermissionStatus 的赋值。

方案 2:实际使用缓存(如果需要避免重复检查)

 func hasPermission(silent: Bool = true) async -> Bool {
+    // Return cached result if available
+    if let cached = cachedPermissionStatus {
+        if !silent { print("[ScreenDetector] Permission check: using cached result \(cached)") }
+        return cached
+    }
+
     // Quick check first using CGPreflightScreenCaptureAccess
     guard CGPreflightScreenCaptureAccess() else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Features/Capture/ScreenDetector.swift` around lines 128 -
145, cachedPermissionStatus 在 hasPermission(_:silent:) 中被写入但未读取,导致死代码; either
remove the cachedPermissionStatus property and its assignments from
ScreenDetector and leave hasPermission to always perform the checks, or actually
use the cache by reading cachedPermissionStatus at the start of hasPermission
(e.g. if let cached = cachedPermissionStatus { return cached }) and only perform
CGPreflightScreenCaptureAccess() / try await SCShareableContent.current when
cache is nil, then assign cachedPermissionStatus = true/false before returning;
update references to the symbols cachedPermissionStatus, hasPermission,
CGPreflightScreenCaptureAccess, and SCShareableContent.current accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.omd/project-memory.json:
- Line 4: The projectMemory file currently contains a hardcoded local absolute
path in the projectRoot key which exposes a developer's local filesystem; update
the .omd/project-memory.json to remove or neutralize that sensitive value by
replacing the absolute path stored in the "projectRoot" key with a non-sensitive
relative path or a placeholder (e.g. "<PROJECT_ROOT>") and ensure this file is
not committed for local-specific values (add .omd/project-memory.json to
.gitignore or switch to templated/configured generation at runtime); also scrub
the repository history or create a new commit that removes the leaked path so
the sensitive data is not retained in the repo.
- Around line 153-175: The .omd project-memory hotPaths contains incorrect
"type" values (entries where "path" points to specific Swift source files like
AppSettings.swift, PaddleOCRVLMProvider.swift, SettingsViewModel.swift are
marked "directory" instead of "file"); update those JSON entries under the
"hotPaths" array to set "type":"file" (or regenerate the project-memory with the
.omd tool so types are correct), then remove the generated project-memory from
version control and add the .omd/project-memory.json pattern to .gitignore so
this unreliable auto-generated data is not committed again.
- Around line 1-244: The .omd/project-memory.json file is generated metadata and
must not be tracked; remove .omd/ from the PR and stop tracking these files by
adding the .omd/ directory to .gitignore and removing the tracked file from the
index, then commit the change; reference the tracked file
".omd/project-memory.json" and ensure .gitignore contains an entry for ".omd/"
(consistent with other ignored dirs like .claude/ .specify/ .ralph-tui/) so
future autogenerated .omd files are not committed.

In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 318-331: The didSet for paddleOCRCloudAPIKey currently always
spawns a save Task.detached which races with separate delete Tasks; modify the
didSet on paddleOCRCloudAPIKey to treat an empty capturedKey as a delete
operation (call the KeychainService.deletePaddleOCRCredentials or equivalent)
and non-empty as save (call KeychainService.shared.savePaddleOCRCredentials), so
all persistence/removal logic is centralized in the paddleOCRCloudAPIKey didSet
(using Task.detached for the single operation), and remove any separate delete
Task calls (e.g., in resetToDefaults) so assignments like paddleOCRCloudAPIKey =
"" go through this unified path.

In `@ScreenTranslate/Resources/en.lproj/Localizable.strings`:
- Line 606: Add the missing compatibility localization keys so the app doesn't
fall back to raw keys: add entries for "error.paddleocr.not.installed" and
"error.paddleocr.recovery" with the same value as "error.paddleocr.notInstalled"
(the keys to update/insert are error.paddleocr.notInstalled,
error.paddleocr.not.installed, and error.paddleocr.recovery) so any code reading
either key will get the localized message.

In `@ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings`:
- Line 606: The zh-Hans localization uses the key error.paddleocr.notInstalled
but the codebase still references error.paddleocr.not.installed (and the
.recovery variant), causing missing translations; update the Localizable.strings
entry to preserve the original keys for compatibility by adding/keeping entries
for error.paddleocr.not.installed and error.paddleocr.not.installed.recovery
(with the same translated value as error.paddleocr.notInstalled) so both key
forms resolve to the correct Chinese message.

In `@ScreenTranslate/Services/PaddleOCREngine.swift`:
- Around line 48-55: The new cloud configuration (useCloud, cloudBaseURL,
cloudAPIKey) is exposed but recognize still unconditionally checks local
installation and returns notInstalled; update recognize to branch on useCloud:
when useCloud == true skip the local-install check and call the cloud
recognition path using cloudBaseURL/cloudAPIKey, otherwise keep the existing
local flow (including the local availability check and local OCR calls). Ensure
error handling and return types remain the same so cloud errors propagate as
before.
- Around line 264-271: The code appends a user-controlled expandedPath (from
config.localVLModelDir) into args and later joins args into a shell string,
which breaks paths with spaces and exposes command-injection risk; modify the
caller to execute the subprocess with an argument array (no shell interpolation)
and ensure PaddleOCREngine's use of args (and the code constructing it,
including the expandedPath variable and the "--vl_rec_model_dir" entry) is
passed directly to the process execution API that accepts [String] rather than
being joined into a single shell command string so arguments are not re-parsed
by a shell.

---

Nitpick comments:
In `@ScreenTranslate/Features/Capture/ScreenDetector.swift`:
- Around line 128-145: cachedPermissionStatus 在 hasPermission(_:silent:)
中被写入但未读取,导致死代码; either remove the cachedPermissionStatus property and its
assignments from ScreenDetector and leave hasPermission to always perform the
checks, or actually use the cache by reading cachedPermissionStatus at the start
of hasPermission (e.g. if let cached = cachedPermissionStatus { return cached })
and only perform CGPreflightScreenCaptureAccess() / try await
SCShareableContent.current when cache is nil, then assign cachedPermissionStatus
= true/false before returning; update references to the symbols
cachedPermissionStatus, hasPermission, CGPreflightScreenCaptureAccess, and
SCShareableContent.current accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1c0d1 and 7a6f100.

📒 Files selected for processing (10)
  • .omd/project-memory.json
  • .omd/sessions/0c203f54-c10d-4417-8115-005c18e9036b.json
  • ScreenTranslate/Features/Capture/ScreenDetector.swift
  • ScreenTranslate/Features/Settings/SettingsViewModel.swift
  • ScreenTranslate/Models/AppSettings.swift
  • ScreenTranslate/Resources/en.lproj/Localizable.strings
  • ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings
  • ScreenTranslate/Services/PaddleOCREngine.swift
  • ScreenTranslate/Services/PaddleOCRVLMProvider.swift
  • ScreenTranslate/Services/Security/KeychainService.swift
✅ Files skipped from review due to trivial changes (1)
  • .omd/sessions/0c203f54-c10d-4417-8115-005c18e9036b.json

Comment thread .omd/project-memory.json
Comment on lines +1 to +244
{
"version": "1.0.0",
"lastScanned": 1772269310504,
"projectRoot": "/Users/hubo/.superset/worktrees/screentranslate/featadd-translate-engine",
"techStack": {
"languages": [],
"frameworks": [],
"packageManager": null,
"runtime": null
},
"build": {
"buildCommand": null,
"testCommand": null,
"lintCommand": null,
"devCommand": null,
"scripts": {}
},
"conventions": {
"namingStyle": null,
"importStyle": null,
"testPattern": null,
"fileOrganization": null
},
"structure": {
"isMonorepo": false,
"workspaces": [],
"mainDirectories": [
"docs"
],
"gitBranches": {
"defaultBranch": "main",
"branchingStrategy": null
}
},
"customNotes": [],
"directoryMap": {
"Build": {
"path": "Build",
"purpose": "Build output",
"fileCount": 1,
"lastAccessed": 1772269310491,
"keyFiles": []
},
"ScreenTranslate": {
"path": "ScreenTranslate",
"purpose": null,
"fileCount": 0,
"lastAccessed": 1772269310492,
"keyFiles": []
},
"ScreenTranslate.xcodeproj": {
"path": "ScreenTranslate.xcodeproj",
"purpose": null,
"fileCount": 1,
"lastAccessed": 1772269310492,
"keyFiles": [
"project.pbxproj"
]
},
"ScreenTranslateTests": {
"path": "ScreenTranslateTests",
"purpose": null,
"fileCount": 5,
"lastAccessed": 1772269310493,
"keyFiles": [
"KeyboardShortcutTests.swift",
"README.md",
"ScreenTranslateErrorTests.swift",
"ShortcutRecordingTypeTests.swift",
"TextTranslationErrorTests.swift"
]
},
"docs": {
"path": "docs",
"purpose": "Documentation",
"fileCount": 6,
"lastAccessed": 1772269310493,
"keyFiles": [
"README.md",
"api-reference.md",
"architecture.md",
"components.md",
"developer-guide.md"
]
},
"skills": {
"path": "skills",
"purpose": null,
"fileCount": 0,
"lastAccessed": 1772269310494,
"keyFiles": []
},
"tasks": {
"path": "tasks",
"purpose": null,
"fileCount": 6,
"lastAccessed": 1772269310494,
"keyFiles": [
"prd-.md",
"prd-macos-screentranslate.md",
"prd-screencoder-kiss-translator.md",
"prd-screencoder.md",
"prd-text-translation.json"
]
},
"ScreenTranslate/App": {
"path": "ScreenTranslate/App",
"purpose": "Application code",
"fileCount": 2,
"lastAccessed": 1772269310495,
"keyFiles": [
"AppDelegate.swift",
"ScreenTranslateApp.swift"
]
},
"ScreenTranslate/Models": {
"path": "ScreenTranslate/Models",
"purpose": "Data models",
"fileCount": 23,
"lastAccessed": 1772269310495,
"keyFiles": [
"Annotation.swift",
"AppLanguage.swift",
"AppSettings.swift"
]
},
"ScreenTranslate/Services": {
"path": "ScreenTranslate/Services",
"purpose": "Business logic services",
"fileCount": 26,
"lastAccessed": 1772269310495,
"keyFiles": [
"AccessibilityPermissionChecker.swift",
"AppleTranslationProvider.swift",
"ClaudeVLMProvider.swift"
]
}
},
"hotPaths": [
{
"path": "ScreenTranslate/Services/PaddleOCREngine.swift",
"accessCount": 17,
"lastAccessed": 1772277198204,
"type": "file"
},
{
"path": "ScreenTranslate/Services/Security/KeychainService.swift",
"accessCount": 14,
"lastAccessed": 1772277575721,
"type": "file"
},
{
"path": "ScreenTranslate/Models/AppSettings.swift",
"accessCount": 13,
"lastAccessed": 1772277135092,
"type": "directory"
},
{
"path": "ScreenTranslate/Resources/en.lproj/Localizable.strings",
"accessCount": 6,
"lastAccessed": 1772277251555,
"type": "file"
},
{
"path": "ScreenTranslate/Services/PaddleOCRVLMProvider.swift",
"accessCount": 6,
"lastAccessed": 1772277354512,
"type": "directory"
},
{
"path": "ScreenTranslate/Features/Settings/SettingsViewModel.swift",
"accessCount": 5,
"lastAccessed": 1772277092578,
"type": "directory"
},
{
"path": "ScreenTranslate/Resources",
"accessCount": 4,
"lastAccessed": 1772271181502,
"type": "directory"
},
{
"path": "ScreenTranslate/Features/Settings/EngineSettingsTab.swift",
"accessCount": 4,
"lastAccessed": 1772274724758,
"type": "directory"
},
{
"path": "ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings",
"accessCount": 4,
"lastAccessed": 1772277283269,
"type": "file"
},
{
"path": "ScreenTranslate/Features/TranslationFlow/TranslationFlowController.swift",
"accessCount": 3,
"lastAccessed": 1772270086752,
"type": "file"
},
{
"path": "ScreenTranslate",
"accessCount": 3,
"lastAccessed": 1772271133277,
"type": "directory"
},
{
"path": "ScreenTranslate/Features/Capture/ScreenDetector.swift",
"accessCount": 3,
"lastAccessed": 1772277067534,
"type": "directory"
},
{
"path": "ScreenTranslate/Features/Capture/CaptureManager.swift",
"accessCount": 2,
"lastAccessed": 1772277030873,
"type": "directory"
},
{
"path": "",
"accessCount": 1,
"lastAccessed": 1772269949125,
"type": "directory"
},
{
"path": "ScreenTranslate/Errors/ScreenTranslateError.swift",
"accessCount": 1,
"lastAccessed": 1772269960584,
"type": "file"
},
{
"path": "ScreenTranslate/Models/VLMProviderType.swift",
"accessCount": 1,
"lastAccessed": 1772269960585,
"type": "file"
},
{
"path": "ScreenTranslate/Services/ScreenCoderEngine.swift",
"accessCount": 1,
"lastAccessed": 1772270035460,
"type": "file"
}
],
"userDirectives": []
} No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if .gitignore exists and contains .omd rules
echo "=== Checking .gitignore ==="
if [ -f .gitignore ]; then
  echo ".gitignore exists. Checking for .omd rules:"
  grep -n "\.omd\|omd" .gitignore || echo "No .omd rules found"
else
  echo ".gitignore does not exist"
fi

echo -e "\n=== Checking if .omd directory exists ==="
if [ -d .omd ]; then
  echo ".omd directory exists. Contents:"
  find .omd -type f | head -20
else
  echo ".omd directory does not exist"
fi

echo -e "\n=== Checking file details ==="
if [ -f .omd/project-memory.json ]; then
  echo ".omd/project-memory.json exists"
  wc -l .omd/project-memory.json
  echo -e "\nFirst 10 lines:"
  head -10 .omd/project-memory.json
fi

Repository: hubo1989/ScreenTranslate

Length of output: 687


🏁 Script executed:

#!/bin/bash
# List all top-level files to see what's actually in the repo
echo "=== Top-level files/directories ==="
ls -la | head -20

echo -e "\n=== Check git status if in a repo ==="
if [ -d .git ]; then
  echo "Repository found. Checking git status:"
  git status --short | head -20
fi

Repository: hubo1989/ScreenTranslate

Length of output: 1355


🏁 Script executed:

#!/bin/bash
# Check if .omd files are staged or tracked in git
echo "=== Check if .omd files are tracked/staged in git ==="
git ls-files .omd 2>/dev/null || echo "No .omd files tracked in git"

echo -e "\n=== Check git status for .omd directory ==="
git status .omd 2>/dev/null || echo "Not in git tracking"

echo -e "\n=== Check .gitignore content ==="
cat .gitignore

Repository: hubo1989/ScreenTranslate

Length of output: 1409


不应将 .omd/ 目录提交到版本控制。

.omd/project-memory.json 和相关会话文件是开发工具自动生成的元数据,包含硬编码的开发者本地路径 /Users/hubo/.superset/worktrees/screentranslate/featadd-translate-engine、机器特定的访问时间戳和访问计数等开发环境信息。这类文件当前已被追踪在 git 中,会导致:

  1. 泄露开发者本地环境信息
  2. 每位开发者的访问模式不同,频繁产生合并冲突
  3. 对生产代码无任何价值,增加仓库噪音

建议将 .omd/ 添加到 .gitignore(与现有的 .claude/.specify/.ralph-tui/ 等配置一致)并从此 PR 中移除该文件:

# Development tools (Claude Code specific)
.claude/
.specify/
+ .omd/
.ralph-tui/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.omd/project-memory.json around lines 1 - 244, The .omd/project-memory.json
file is generated metadata and must not be tracked; remove .omd/ from the PR and
stop tracking these files by adding the .omd/ directory to .gitignore and
removing the tracked file from the index, then commit the change; reference the
tracked file ".omd/project-memory.json" and ensure .gitignore contains an entry
for ".omd/" (consistent with other ignored dirs like .claude/ .specify/
.ralph-tui/) so future autogenerated .omd files are not committed.

Comment thread .omd/project-memory.json
{
"version": "1.0.0",
"lastScanned": 1772269310504,
"projectRoot": "/Users/hubo/.superset/worktrees/screentranslate/featadd-translate-engine",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

硬编码的本地路径泄露了开发者环境信息。

projectRoot 包含完整的本地用户目录路径 /Users/hubo/.superset/worktrees/...,这属于敏感的开发环境信息,不应出现在公开仓库中。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.omd/project-memory.json at line 4, The projectMemory file currently
contains a hardcoded local absolute path in the projectRoot key which exposes a
developer's local filesystem; update the .omd/project-memory.json to remove or
neutralize that sensitive value by replacing the absolute path stored in the
"projectRoot" key with a non-sensitive relative path or a placeholder (e.g.
"<PROJECT_ROOT>") and ensure this file is not committed for local-specific
values (add .omd/project-memory.json to .gitignore or switch to
templated/configured generation at runtime); also scrub the repository history
or create a new commit that removes the leaked path so the sensitive data is not
retained in the repo.

Comment thread .omd/project-memory.json
Comment on lines +153 to +175
"path": "ScreenTranslate/Models/AppSettings.swift",
"accessCount": 13,
"lastAccessed": 1772277135092,
"type": "directory"
},
{
"path": "ScreenTranslate/Resources/en.lproj/Localizable.strings",
"accessCount": 6,
"lastAccessed": 1772277251555,
"type": "file"
},
{
"path": "ScreenTranslate/Services/PaddleOCRVLMProvider.swift",
"accessCount": 6,
"lastAccessed": 1772277354512,
"type": "directory"
},
{
"path": "ScreenTranslate/Features/Settings/SettingsViewModel.swift",
"accessCount": 5,
"lastAccessed": 1772277092578,
"type": "directory"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

hotPathstype 字段存在数据不一致。

部分文件路径被标记为 "type": "directory",例如:

  • 第 157 行:AppSettings.swift"type": "directory"
  • 第 169 行:PaddleOCRVLMProvider.swift"type": "directory"
  • 第 175 行:SettingsViewModel.swift"type": "directory"

这是 .omd 工具的数据错误,进一步说明此文件不应被版本控制——它包含不可靠的自动生成数据。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.omd/project-memory.json around lines 153 - 175, The .omd project-memory
hotPaths contains incorrect "type" values (entries where "path" points to
specific Swift source files like AppSettings.swift, PaddleOCRVLMProvider.swift,
SettingsViewModel.swift are marked "directory" instead of "file"); update those
JSON entries under the "hotPaths" array to set "type":"file" (or regenerate the
project-memory with the .omd tool so types are correct), then remove the
generated project-memory from version control and add the
.omd/project-memory.json pattern to .gitignore so this unreliable auto-generated
data is not committed again.

Comment on lines +318 to +331
var paddleOCRCloudAPIKey: String {
didSet {
// Capture the value on the actor before spawning detached task
let capturedKey = paddleOCRCloudAPIKey
// Save to Keychain asynchronously
Task.detached {
do {
try await KeychainService.shared.savePaddleOCRCredentials(apiKey: capturedKey)
} catch {
Logger.settings.error("Failed to save PaddleOCR cloud API key to Keychain: \(error)")
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n ScreenTranslate/Models/AppSettings.swift | sed -n '315,335p'

Repository: hubo1989/ScreenTranslate

Length of output: 1042


🏁 Script executed:

cat -n ScreenTranslate/Models/AppSettings.swift | sed -n '500,525p'

Repository: hubo1989/ScreenTranslate

Length of output: 1376


🏁 Script executed:

rg -n 'class KeychainService|func savePaddleOCRCredentials|func deletePaddleOCRCredentials' --type swift

Repository: hubo1989/ScreenTranslate

Length of output: 293


🏁 Script executed:

cat -n ScreenTranslate/Services/Security/KeychainService.swift | sed -n '296,310p'

Repository: hubo1989/ScreenTranslate

Length of output: 701


🏁 Script executed:

cat -n ScreenTranslate/Services/Security/KeychainService.swift | sed -n '339,360p'

Repository: hubo1989/ScreenTranslate

Length of output: 902


🏁 Script executed:

rg -n 'actor AppSettings|class AppSettings' ScreenTranslate/Models/AppSettings.swift | head -5

Repository: hubo1989/ScreenTranslate

Length of output: 96


🏁 Script executed:

rg -n 'nonisolated' ScreenTranslate/Models/AppSettings.swift | head -10

Repository: hubo1989/ScreenTranslate

Length of output: 50


🏁 Script executed:

cat -n ScreenTranslate/Models/AppSettings.swift | sed -n '1,30p'

Repository: hubo1989/ScreenTranslate

Length of output: 1174


🏁 Script executed:

rg -n '@MainActor|nonisolated.*var paddleOCRCloudAPIKey' ScreenTranslate/Models/AppSettings.swift | head -15

Repository: hubo1989/ScreenTranslate

Length of output: 81


🏁 Script executed:

grep -n 'class KeychainService\|actor KeychainService' ScreenTranslate/Services/Security/KeychainService.swift | head -3

Repository: hubo1989/ScreenTranslate

Length of output: 94


🏁 Script executed:

cat -n ScreenTranslate/Services/Security/KeychainService.swift | sed -n '1,50p'

Repository: hubo1989/ScreenTranslate

Length of output: 1897


🏁 Script executed:

rg -n 'paddleOCRCloudAPIKey\s*=' ScreenTranslate/Models/AppSettings.swift

Repository: hubo1989/ScreenTranslate

Length of output: 256


🏁 Script executed:

cat -n ScreenTranslate/Services/Security/KeychainService.swift | sed -n '296,360p'

Repository: hubo1989/ScreenTranslate

Length of output: 2686


🏁 Script executed:

cat -n ScreenTranslate/Models/AppSettings.swift | sed -n '32,40p'

Repository: hubo1989/ScreenTranslate

Length of output: 310


🏁 Script executed:

rg -n 'loadPaddleOCRAPIKeyFromKeychain' ScreenTranslate/Models/AppSettings.swift -A 5

Repository: hubo1989/ScreenTranslate

Length of output: 931


🏁 Script executed:

cat -n ScreenTranslate/Models/AppSettings.swift | sed -n '580,600p'

Repository: hubo1989/ScreenTranslate

Length of output: 1051


🏁 Script executed:

rg -n 'Task\.detached.*paddleOCRCloudAPIKey|paddleOCRCloudAPIKey.*=' ScreenTranslate/Models/AppSettings.swift | head -20

Repository: hubo1989/ScreenTranslate

Length of output: 256


🏁 Script executed:

rg -n 'saveCredentialsInternal' ScreenTranslate/Services/Security/KeychainService.swift | head -3

Repository: hubo1989/ScreenTranslate

Length of output: 265


🏁 Script executed:

cat -n ScreenTranslate/Services/Security/KeychainService.swift | sed -n '60,120p'

Repository: hubo1989/ScreenTranslate

Length of output: 3446


🏁 Script executed:

cat -n ScreenTranslate/Models/AppSettings.swift | sed -n '495,520p'

Repository: hubo1989/ScreenTranslate

Length of output: 1236


Keychain 保存与删除存在竞态条件,可能导致空值残留

Line 510 设置 paddleOCRCloudAPIKey = "" 触发 didSet(Line 323-329),使用 Task.detached 异步保存空字符串。同时 Line 512-518 的 Task.detached 调用删除操作。两个后台任务并发执行,KeychainService actor 无法保证执行顺序:若保存在删除后完成,Keychain 会残留空值而非真正删除。

建议在 didSet 中统一处理空值作为删除操作

修复建议(统一删除和保存路径)
     var paddleOCRCloudAPIKey: String {
         didSet {
-            // Capture the value on the actor before spawning detached task
             let capturedKey = paddleOCRCloudAPIKey
-            // Save to Keychain asynchronously
             Task.detached {
                 do {
-                    try await KeychainService.shared.savePaddleOCRCredentials(apiKey: capturedKey)
+                    if capturedKey.isEmpty {
+                        try await KeychainService.shared.deletePaddleOCRCredentials()
+                    } else {
+                        try await KeychainService.shared.savePaddleOCRCredentials(apiKey: capturedKey)
+                    }
                 } catch {
                     Logger.settings.error("Failed to save PaddleOCR cloud API key to Keychain: \(error)")
                 }
             }
         }
     }
@@
-        // Delete PaddleOCR cloud API key from Keychain
-        Task.detached {
-            do {
-                try await KeychainService.shared.deletePaddleOCRCredentials()
-            } catch {
-                Logger.settings.error("Failed to delete PaddleOCR credentials from keychain: \(error.localizedDescription)")
-            }
-        }

这样在 resetToDefaults 中只需 paddleOCRCloudAPIKey = "" 一次赋值,由单一代码路径处理删除逻辑,消除竞态。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Models/AppSettings.swift` around lines 318 - 331, The didSet
for paddleOCRCloudAPIKey currently always spawns a save Task.detached which
races with separate delete Tasks; modify the didSet on paddleOCRCloudAPIKey to
treat an empty capturedKey as a delete operation (call the
KeychainService.deletePaddleOCRCredentials or equivalent) and non-empty as save
(call KeychainService.shared.savePaddleOCRCredentials), so all
persistence/removal logic is centralized in the paddleOCRCloudAPIKey didSet
(using Task.detached for the single operation), and remove any separate delete
Task calls (e.g., in resetToDefaults) so assignments like paddleOCRCloudAPIKey =
"" go through this unified path.

"settings.paddleocr.mlxVLMNotRunning" = "Not Running";
"settings.paddleocr.localVLModelDir" = "Local Model Directory";
"settings.paddleocr.localVLModelDir.hint" = "Path to local PaddleOCR-VL model (e.g. ~/.paddlex/official_models/PaddleOCR-VL-1.5)";
"error.paddleocr.notInstalled" = "PaddleOCR is not installed. Install it using: pip3 install paddleocr paddlepaddle";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

补齐兼容 key,避免 PaddleOCR 未安装提示在部分路径丢失

Line 606 新增的是 error.paddleocr.notInstalled,但代码中仍有 error.paddleocr.not.installed / .recovery 的读取路径。建议补齐兼容 key,避免回退显示原始 key。

🔧 建议修复
 "error.paddleocr.notInstalled" = "PaddleOCR is not installed. Install it using: pip3 install paddleocr paddlepaddle";
+"error.paddleocr.not.installed" = "PaddleOCR is not installed. Install it using: pip3 install paddleocr paddlepaddle";
+"error.paddleocr.not.installed.recovery" = "Install PaddleOCR first, then try again.";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Resources/en.lproj/Localizable.strings` at line 606, Add the
missing compatibility localization keys so the app doesn't fall back to raw
keys: add entries for "error.paddleocr.not.installed" and
"error.paddleocr.recovery" with the same value as "error.paddleocr.notInstalled"
(the keys to update/insert are error.paddleocr.notInstalled,
error.paddleocr.not.installed, and error.paddleocr.recovery) so any code reading
either key will get the localized message.

"settings.paddleocr.mlxVLMNotRunning" = "未运行";
"settings.paddleocr.localVLModelDir" = "本地模型目录";
"settings.paddleocr.localVLModelDir.hint" = "本地 PaddleOCR-VL 模型路径(如 ~/.paddlex/official_models/PaddleOCR-VL-1.5)";
"error.paddleocr.notInstalled" = "PaddleOCR 未安装。请使用以下命令安装:pip3 install paddleocr paddlepaddle";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

统一 PaddleOCR 未安装错误 key,避免本地化失效

Line 606 使用了 error.paddleocr.notInstalled,但当前代码里还有 error.paddleocr.not.installed(含 .recovery)路径;这会导致部分场景显示 key 字面量而不是翻译文本。建议同时保留兼容 key。

🔧 建议修复
 "error.paddleocr.notInstalled" = "PaddleOCR 未安装。请使用以下命令安装:pip3 install paddleocr paddlepaddle";
+"error.paddleocr.not.installed" = "PaddleOCR 未安装。请使用以下命令安装:pip3 install paddleocr paddlepaddle";
+"error.paddleocr.not.installed.recovery" = "请先安装 PaddleOCR 后重试。";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"error.paddleocr.notInstalled" = "PaddleOCR 未安装。请使用以下命令安装:pip3 install paddleocr paddlepaddle";
"error.paddleocr.notInstalled" = "PaddleOCR 未安装。请使用以下命令安装:pip3 install paddleocr paddlepaddle";
"error.paddleocr.not.installed" = "PaddleOCR 未安装。请使用以下命令安装:pip3 install paddleocr paddlepaddle";
"error.paddleocr.not.installed.recovery" = "请先安装 PaddleOCR 后重试。";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings` at line 606, The
zh-Hans localization uses the key error.paddleocr.notInstalled but the codebase
still references error.paddleocr.not.installed (and the .recovery variant),
causing missing translations; update the Localizable.strings entry to preserve
the original keys for compatibility by adding/keeping entries for
error.paddleocr.not.installed and error.paddleocr.not.installed.recovery (with
the same translated value as error.paddleocr.notInstalled) so both key forms
resolve to the correct Chinese message.

Comment on lines +48 to +55
/// Whether to use cloud API
var useCloud: Bool

/// Cloud API base URL
var cloudBaseURL: String

/// Cloud API key
var cloudAPIKey: String
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

云端模式配置已暴露,但识别流程仍被本地安装强制拦截

你在 Line 48-55 增加了云端配置,但 recognize 里仍无条件检查本地可用性(Line 144)。这会让“仅云端”场景直接报 notInstalled,功能上不可用。

🔧 建议修复
-        guard isAvailable else {
-            throw PaddleOCREngineError.notInstalled
-        }
+        if !config.useCloud {
+            guard isAvailable else {
+                throw PaddleOCREngineError.notInstalled
+            }
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/PaddleOCREngine.swift` around lines 48 - 55, The new
cloud configuration (useCloud, cloudBaseURL, cloudAPIKey) is exposed but
recognize still unconditionally checks local installation and returns
notInstalled; update recognize to branch on useCloud: when useCloud == true skip
the local-install check and call the cloud recognition path using
cloudBaseURL/cloudAPIKey, otherwise keep the existing local flow (including the
local availability check and local OCR calls). Ensure error handling and return
types remain the same so cloud errors propagate as before.

Comment on lines +264 to +271
} else if !config.localVLModelDir.isEmpty {
// Use native backend with local model
// Expand tilde in path (e.g., ~/.paddlex -> /Users/xxx/.paddlex)
let expandedPath = NSString(string: config.localVLModelDir).expandingTildeInPath
args += [
"--vl_rec_backend", "native",
"--vl_rec_model_dir", expandedPath
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the file and check its size
find . -name "PaddleOCREngine.swift" -type f

Repository: hubo1989/ScreenTranslate

Length of output: 116


🏁 Script executed:

# Read the relevant sections around lines 264-271 and 280-286
if [ -f "ScreenTranslate/Services/PaddleOCREngine.swift" ]; then
  echo "=== Lines 260-275 ===" && sed -n '260,275p' "ScreenTranslate/Services/PaddleOCREngine.swift"
  echo ""
  echo "=== Lines 280-295 ===" && sed -n '280,295p' "ScreenTranslate/Services/PaddleOCREngine.swift"
fi

Repository: hubo1989/ScreenTranslate

Length of output: 1480


🏁 Script executed:

# Search for the command execution pattern with shell concatenation
rg -n -A5 'Process\(\)' "ScreenTranslate/Services/PaddleOCREngine.swift" | head -50

Repository: hubo1989/ScreenTranslate

Length of output: 383


🏁 Script executed:

# Look for how arguments are passed and if shell is involved
rg -n -B2 -A8 'joined\(separator: " "\)|executableURL|task\.arguments' "ScreenTranslate/Services/PaddleOCREngine.swift" | head -80

Repository: hubo1989/ScreenTranslate

Length of output: 866


Shell 命令拼接导致路径解析失败与命令注入风险

第 280 行将参数数组用空格拼接进 shell 命令,第 267-271 行的用户配置路径(包含 expandedPath)会直接受影响。路径中包含空格(如 /Users/xxx/.paddlex)时会被 shell 错误分割,特殊字符可导致命令注入。

应改为直接传递参数数组,避免 shell 解析:

修复建议
-        let fullCommand = "\(executablePath) \(arguments.joined(separator: " "))"
-        Logger.ocr.info("Executing: \(fullCommand)")
+        Logger.ocr.info("Executing PaddleOCR with \(arguments.count) arguments")

         let task = Process()
-        task.executableURL = URL(fileURLWithPath: "/bin/zsh")
-        task.arguments = ["-c", fullCommand]
+        task.executableURL = URL(fileURLWithPath: executablePath)
+        task.arguments = arguments
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/PaddleOCREngine.swift` around lines 264 - 271, The
code appends a user-controlled expandedPath (from config.localVLModelDir) into
args and later joins args into a shell string, which breaks paths with spaces
and exposes command-injection risk; modify the caller to execute the subprocess
with an argument array (no shell interpolation) and ensure PaddleOCREngine's use
of args (and the code constructing it, including the expandedPath variable and
the "--vl_rec_model_dir" entry) is passed directly to the process execution API
that accepts [String] rather than being joined into a single shell command
string so arguments are not re-parsed by a shell.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@hubo1989 hubo1989 merged commit dc5ff5c into main Feb 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant