feat: add PaddleOCR as VLM provider with fast/precise modes#49
Conversation
- Add PaddleOCRVLMProvider implementing VLMProviderProtocol - Support two modes: - Fast mode: paddleocr ocr command (~1s) - Precise mode: paddleocr doc_parser VL-1.5 (~12s) - Add PaddleOCRMode enum with settings UI - Add cloud API configuration options (baseURL, API key) - Fix JSON parsing for numpy arrays and float formats - Update localization (en/zh-Hans) - Fix screen recording permission check using SCShareableContent Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
📝 WalkthroughWalkthrough此 PR 引入 PaddleOCR 作为新的本地 VLM 提供者,扩展了设置/UI、模型与引擎实现,并将屏幕录制权限检查从同步改为基于 ScreenCaptureKit 的异步验证,同时保留快速同步预检方法。 Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Settings UI
participant VM as SettingsViewModel
participant Settings as AppSettings
participant Provider as PaddleOCRVLMProvider
participant Engine as PaddleOCREngine
UI->>VM: 用户更改 PaddleOCR 配置或触发连接测试
VM->>Settings: 更新并持久化 paddleOCRMode / useCloud / baseURL / apiKey
Settings-->>VM: 返回新配置
alt 连接测试
UI->>VM: 请求 testPaddleOCRConnection()
VM->>Engine: 查询 PaddleOCREngine.shared.isAvailable
Engine-->>VM: 返回可用或不可用
VM-->>UI: 显示测试结果/错误
end
alt OCR 分析流程
UI->>Provider: 调用 analyze(image:)
Provider->>Provider: await isAvailable
Provider->>Settings: buildConfiguration()(MainActor)
Settings-->>Provider: 返回 PaddleOCREngine.Configuration
Provider->>Engine: 调用 recognize(image, config:)
Engine->>Engine: 根据 mode 执行 fast/precise 流程并返回 OCRResult
Engine-->>Provider: 返回 OCRResult
Provider->>Provider: 转换为 ScreenAnalysisResult(含行合并)
Provider-->>UI: 返回分析结果
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ScreenTranslate/Models/AppSettings.swift (1)
436-472:⚠️ Potential issue | 🟡 Minor缺少 PaddleOCR 设置的重置逻辑
resetToDefaults()方法未包含 PaddleOCR 相关设置的重置。当用户选择"恢复默认设置"时,PaddleOCR 的模式、云端配置等会保留原值。🐛 建议在 resetToDefaults() 中添加
parallelEngines = [.apple, .mtranServer] compatibleProviderConfigs = [] + // Reset PaddleOCR configuration + paddleOCRMode = .fast + paddleOCRUseCloud = false + paddleOCRCloudBaseURL = "" + paddleOCRCloudAPIKey = "" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Models/AppSettings.swift` around lines 436 - 472, The resetToDefaults() method currently restores many fields but omits PaddleOCR settings; update resetToDefaults() to explicitly reset all PaddleOCR-related properties (e.g. paddleOCREnabled, paddleOCRMode, paddleOCRUseCloud, paddleOCRCloudConfig, paddleOCRModelPath, paddleOCRLanguages or similarly named properties in AppSettings) to their intended defaults (for example disable or set .local mode, clear cloud config to nil, set default model/path and default language list or thresholds) so that PaddleOCR state is fully reverted when calling resetToDefaults().
🧹 Nitpick comments (7)
ScreenTranslate/Features/Capture/CaptureManager.swift (1)
49-57: 可复用同一份SCShareableContent,避免每次截图重复异步调用Line 56 已获取一次
SCShareableContent.current,后续getSCDisplay(Line 249)又会获取一次。该重复调用会增加截图路径延迟,和文件内声明的低延迟目标有冲突风险。建议将权限校验与显示查找合并为一次内容获取并复用结果。参考改法(复用 shareable content)
- var hasPermission: Bool { - get async { - guard CGPreflightScreenCaptureAccess() else { - return false - } - do { - _ = try await SCShareableContent.current - return true - } catch { - return false - } - } - } + private func loadShareableContentIfPermitted() async throws -> SCShareableContent { + guard CGPreflightScreenCaptureAccess() else { + throw ScreenTranslateError.permissionDenied + } + return try await SCShareableContent.current + } - guard await hasPermission else { - throw ScreenTranslateError.permissionDenied - } + let scContent = try await loadShareableContentIfPermitted() - let scDisplay = try await getSCDisplay(for: display) + let scDisplay = try getSCDisplay(for: display, in: scContent)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Features/Capture/CaptureManager.swift` around lines 49 - 57, The code currently calls SCShareableContent.current twice (once in the async getter that checks permissions and again in getSCDisplay), causing redundant async work and added latency; change the logic so the permission preflight and the SCShareableContent fetch are done once and the resulting SCShareableContent is reused: in the permission-checking getter (the async get) obtain SCShareableContent.current and return a boolean based on that result, but also store or return the acquired SCShareableContent (e.g., cache it on the CaptureManager instance or refactor the getter to return the content) so that getSCDisplay uses that cached SCShareableContent instead of calling SCShareableContent.current again; update functions that currently call SCShareableContent.current (notably getSCDisplay) to accept/use the cached SCShareableContent and ensure thread-safety for the cache.ScreenTranslate/Models/AppSettings.swift (1)
6-8: 枚举原始值冗余SwiftLint 提示:当枚举 case 名称与其原始值相同时,可以省略显式赋值。
♻️ 建议修改
enum PaddleOCRMode: String, Codable, CaseIterable, Sendable { - case fast = "fast" - case precise = "precise" + case fast + case precise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Models/AppSettings.swift` around lines 6 - 8, The enum PaddleOCRMode declares String raw values identical to each case; remove the redundant explicit assignments in the PaddleOCRMode declaration (keep "enum PaddleOCRMode: String, Codable, CaseIterable, Sendable" and change the cases to simply "case fast" and "case precise") so the compiler/sanitizers infer the raw values automatically.ScreenTranslate/Services/PaddleOCRVLMProvider.swift (2)
26-30: 避免强制解包SwiftLint 警告:
URL(string: "http://localhost")!使用了强制解包。虽然此 URL 始终有效,但可以使用更安全的写法。♻️ 建议修改
self.configuration = VLMProviderConfiguration( apiKey: "", - baseURL: URL(string: "http://localhost")!, + baseURL: URL(string: "http://localhost") ?? URL(fileURLWithPath: "/"), modelName: "paddleocr" )或者定义一个静态常量:
private static let dummyBaseURL = URL(string: "http://localhost")!🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift` around lines 26 - 30, The code in PaddleOCRVLMProvider sets configuration.baseURL using a forced unwrap URL(string: "http://localhost")!, which triggers SwiftLint; replace this by introducing a safe static constant (e.g., private static let dummyBaseURL = URL(string: "http://localhost")!) and use that constant when constructing VLMProviderConfiguration in the PaddleOCRVLMProvider initializer (or alternatively use a non-forced initializer pattern such as guard let url = URL(string: "http://localhost") else { fatalError(...) } and pass url), updating the references to self.configuration and VLMProviderConfiguration to use the new url variable or dummyBaseURL to avoid inline force-unwrapping.
169-188: 中日韩文本合并时的空格处理
merged(with:)方法在合并文本时使用空格分隔 (text + " " + other.text)。对于中日韩文本,词与词之间通常不需要空格,这可能导致输出不自然。建议考虑根据文本内容智能判断是否添加空格,或者在 PaddleOCR 检测到中文语言时使用不同的分隔符。
♻️ 可能的改进方向
func merged(with other: OCRText) -> MergedLine { // 检测是否为 CJK 文本(简化逻辑) let needsSpace = !isCJKText(text) || !isCJKText(other.text) let separator = needsSpace ? " " : "" let combinedText = text + separator + other.text // ... } private func isCJKText(_ text: String) -> Bool { guard let firstChar = text.first else { return false } let scalar = firstChar.unicodeScalars.first! return (0x4E00...0x9FFF).contains(scalar.value) || (0x3040...0x30FF).contains(scalar.value) || (0xAC00...0xD7AF).contains(scalar.value) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift` around lines 169 - 188, The merged(with:) implementation in OCRText always inserts a space between text segments which produces incorrect spacing for CJK (Chinese/Japanese/Korean) scripts; update merged(with:) to conditionally insert a separator by detecting whether each segment is CJK (e.g., add a private helper isCJKText(_:) that checks the first character's Unicode scalar ranges) and use an empty separator when both sides are CJK, otherwise use a single space; keep the existing boundingBox.union(...) and confidence weighting logic and return a MergedLine with the adjusted combinedText.ScreenTranslate/Services/PaddleOCREngine.swift (1)
48-55: 添加 TODO 注释说明云端 API 功能的计划状态
useCloud、cloudBaseURL和cloudAPIKey字段已在配置中声明并从用户设置中赋值,但在executePaddleOCR或buildArguments方法中并未实际使用。这些字段在 EngineSettingsTab 中有对应的 UI 配置,表明云端 API 支持是计划中的功能。建议在配置字段处添加 TODO 注释,明确说明何时计划实现这一功能,以及具体如何集成云端 API。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/PaddleOCREngine.swift` around lines 48 - 55, 在 PaddleOCREngine.swift 中声明的字段 useCloud、cloudBaseURL 和 cloudAPIKey 尚未在 executePaddleOCR 或 buildArguments 中使用,且在 EngineSettingsTab 中已有 UI 配置,需在这些字段上添加明确的 TODO 注释说明云端 API 功能为计划中的特性、预期实现时间点和大致集成方式(例如:当 useCloud 为 true 时在 buildArguments 中切换到云端请求并在 executePaddleOCR 中发起带 cloudAPIKey 的网络调用,或说明需新增 networking helper);在注释中引用 useCloud、cloudBaseURL、cloudAPIKey、executePaddleOCR、buildArguments 和 EngineSettingsTab 以便未来实现者定位。ScreenTranslate/Features/Settings/SettingsViewModel.swift (2)
897-910: Cloud 模式下的连接测试可能不完整。当前实现仅检查本地 PaddleOCR 是否可用,但当用户启用
paddleOCRUseCloud时,Cloud API 的配置(paddleOCRCloudBaseURL、paddleOCRCloudAPIKey)未被验证。建议:当云模式开启时,增加对云端 API 的连通性测试。
♻️ 建议增加云模式测试
/// Tests PaddleOCR availability private func testPaddleOCRConnection() async throws -> (success: Bool, message: String) { + // If cloud mode is enabled, test cloud API connectivity + if paddleOCRUseCloud { + guard let url = URL(string: paddleOCRCloudBaseURL), !paddleOCRCloudBaseURL.isEmpty else { + throw VLMProviderError.invalidConfiguration("Invalid cloud base URL") + } + if paddleOCRCloudAPIKey.isEmpty { + throw VLMProviderError.invalidConfiguration("Cloud API key is required") + } + // TODO: Add actual cloud API connectivity test + return (true, "PaddleOCR Cloud API configured") + } + + // Local mode: check if PaddleOCR is installed let isAvailable = await PaddleOCREngine.shared.isAvailable if isAvailable { return (true, "PaddleOCR is ready") } else { throw VLMProviderError.invalidConfiguration("PaddleOCR is not installed") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 897 - 910, testPaddleOCRConnection() only checks local availability (PaddleOCREngine.shared.isAvailable) and ignores paddleOCRUseCloud, paddleOCRCloudBaseURL and paddleOCRCloudAPIKey; update testPaddleOCRConnection() to detect when paddleOCRUseCloud is true and then validate paddleOCRCloudBaseURL and paddleOCRCloudAPIKey are non-empty and perform a lightweight connectivity/health request to the cloud endpoint (use the configured paddleOCRCloudBaseURL and include paddleOCRCloudAPIKey in headers/query as expected) returning (true, "PaddleOCR cloud ready") on success or throwing VLMProviderError.invalidConfiguration with a descriptive message on failure; keep the existing local check path for when paddleOCRUseCloud is false and reuse PaddleOCREngine.shared where applicable.
404-411:MainActor.run在这里是冗余的。由于
SettingsViewModel类已标记为@MainActor,且Task会继承 actor 上下文,内部的await MainActor.run {}是多余的。♻️ 建议简化
// Check screen recording permission using ScreenCaptureKit Task { let granted = await checkScreenRecordingPermission() - await MainActor.run { - self.hasScreenRecordingPermission = granted - self.isCheckingPermissions = false - } + self.hasScreenRecordingPermission = granted + self.isCheckingPermissions = false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 404 - 411, Remove the redundant MainActor.run block inside the Task in SettingsViewModel: since the class is annotated `@MainActor` and the Task inherits that context, directly assign the result of await checkScreenRecordingPermission() to self.hasScreenRecordingPermission and set self.isCheckingPermissions = false inside the Task body (remove await MainActor.run { ... }); this targets the Task block that calls checkScreenRecordingPermission() and updates hasScreenRecordingPermission and isCheckingPermissions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Features/Capture/CaptureManager.swift`:
- Around line 55-60: The hasPermission method currently swallows all errors from
awaiting SCShareableContent.current and always returns false, which hides
non-permission failures; change hasPermission to rethrow the underlying error
instead of returning false for all exceptions (mirror the pattern used in
getSCDisplay), i.e., catch errors from SCShareableContent.current and throw a
ScreenTranslateError.captureFailure(underlying: error) (or only map genuine
permission-denied errors to false) so callers of hasPermission (the code paths
that interpret its result as permissionDenied) can distinguish network/system
failures from actual permission denial.
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 414-429: The OnboardingViewModel currently uses a heuristic
(counting windows) to infer screen-recording permission which can give false
positives; replace that heuristic with the same reliable check used in
SettingsViewModel by calling the async checkScreenRecordingPermission flow (use
CGPreflightScreenCaptureAccess() then await SCShareableContent.current) or
refactor that logic into a shared helper (e.g., a new
ScreenRecordingPermissionChecker.async func) and have both OnboardingViewModel
and SettingsViewModel call the shared method (or, if Onboarding must remain
synchronous, provide a small sync wrapper that performs the same async check and
returns the result).
---
Outside diff comments:
In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 436-472: The resetToDefaults() method currently restores many
fields but omits PaddleOCR settings; update resetToDefaults() to explicitly
reset all PaddleOCR-related properties (e.g. paddleOCREnabled, paddleOCRMode,
paddleOCRUseCloud, paddleOCRCloudConfig, paddleOCRModelPath, paddleOCRLanguages
or similarly named properties in AppSettings) to their intended defaults (for
example disable or set .local mode, clear cloud config to nil, set default
model/path and default language list or thresholds) so that PaddleOCR state is
fully reverted when calling resetToDefaults().
---
Nitpick comments:
In `@ScreenTranslate/Features/Capture/CaptureManager.swift`:
- Around line 49-57: The code currently calls SCShareableContent.current twice
(once in the async getter that checks permissions and again in getSCDisplay),
causing redundant async work and added latency; change the logic so the
permission preflight and the SCShareableContent fetch are done once and the
resulting SCShareableContent is reused: in the permission-checking getter (the
async get) obtain SCShareableContent.current and return a boolean based on that
result, but also store or return the acquired SCShareableContent (e.g., cache it
on the CaptureManager instance or refactor the getter to return the content) so
that getSCDisplay uses that cached SCShareableContent instead of calling
SCShareableContent.current again; update functions that currently call
SCShareableContent.current (notably getSCDisplay) to accept/use the cached
SCShareableContent and ensure thread-safety for the cache.
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 897-910: testPaddleOCRConnection() only checks local availability
(PaddleOCREngine.shared.isAvailable) and ignores paddleOCRUseCloud,
paddleOCRCloudBaseURL and paddleOCRCloudAPIKey; update testPaddleOCRConnection()
to detect when paddleOCRUseCloud is true and then validate paddleOCRCloudBaseURL
and paddleOCRCloudAPIKey are non-empty and perform a lightweight
connectivity/health request to the cloud endpoint (use the configured
paddleOCRCloudBaseURL and include paddleOCRCloudAPIKey in headers/query as
expected) returning (true, "PaddleOCR cloud ready") on success or throwing
VLMProviderError.invalidConfiguration with a descriptive message on failure;
keep the existing local check path for when paddleOCRUseCloud is false and reuse
PaddleOCREngine.shared where applicable.
- Around line 404-411: Remove the redundant MainActor.run block inside the Task
in SettingsViewModel: since the class is annotated `@MainActor` and the Task
inherits that context, directly assign the result of await
checkScreenRecordingPermission() to self.hasScreenRecordingPermission and set
self.isCheckingPermissions = false inside the Task body (remove await
MainActor.run { ... }); this targets the Task block that calls
checkScreenRecordingPermission() and updates hasScreenRecordingPermission and
isCheckingPermissions.
In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 6-8: The enum PaddleOCRMode declares String raw values identical
to each case; remove the redundant explicit assignments in the PaddleOCRMode
declaration (keep "enum PaddleOCRMode: String, Codable, CaseIterable, Sendable"
and change the cases to simply "case fast" and "case precise") so the
compiler/sanitizers infer the raw values automatically.
In `@ScreenTranslate/Services/PaddleOCREngine.swift`:
- Around line 48-55: 在 PaddleOCREngine.swift 中声明的字段 useCloud、cloudBaseURL 和
cloudAPIKey 尚未在 executePaddleOCR 或 buildArguments 中使用,且在 EngineSettingsTab 中已有
UI 配置,需在这些字段上添加明确的 TODO 注释说明云端 API 功能为计划中的特性、预期实现时间点和大致集成方式(例如:当 useCloud 为 true
时在 buildArguments 中切换到云端请求并在 executePaddleOCR 中发起带 cloudAPIKey 的网络调用,或说明需新增
networking helper);在注释中引用
useCloud、cloudBaseURL、cloudAPIKey、executePaddleOCR、buildArguments 和
EngineSettingsTab 以便未来实现者定位。
In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift`:
- Around line 26-30: The code in PaddleOCRVLMProvider sets configuration.baseURL
using a forced unwrap URL(string: "http://localhost")!, which triggers
SwiftLint; replace this by introducing a safe static constant (e.g., private
static let dummyBaseURL = URL(string: "http://localhost")!) and use that
constant when constructing VLMProviderConfiguration in the PaddleOCRVLMProvider
initializer (or alternatively use a non-forced initializer pattern such as guard
let url = URL(string: "http://localhost") else { fatalError(...) } and pass
url), updating the references to self.configuration and VLMProviderConfiguration
to use the new url variable or dummyBaseURL to avoid inline force-unwrapping.
- Around line 169-188: The merged(with:) implementation in OCRText always
inserts a space between text segments which produces incorrect spacing for CJK
(Chinese/Japanese/Korean) scripts; update merged(with:) to conditionally insert
a separator by detecting whether each segment is CJK (e.g., add a private helper
isCJKText(_:) that checks the first character's Unicode scalar ranges) and use
an empty separator when both sides are CJK, otherwise use a single space; keep
the existing boundingBox.union(...) and confidence weighting logic and return a
MergedLine with the adjusted combinedText.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
ScreenTranslate/Features/Capture/CaptureManager.swiftScreenTranslate/Features/Settings/EngineSettingsTab.swiftScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Models/AppSettings.swiftScreenTranslate/Models/VLMProviderType.swiftScreenTranslate/Resources/en.lproj/Localizable.stringsScreenTranslate/Resources/zh-Hans.lproj/Localizable.stringsScreenTranslate/Services/PaddleOCREngine.swiftScreenTranslate/Services/PaddleOCRVLMProvider.swiftScreenTranslate/Services/ScreenCoderEngine.swift
| do { | ||
| _ = try await SCShareableContent.current | ||
| return true | ||
| } catch { | ||
| return false | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n ScreenTranslate/Features/Capture/CaptureManager.swift | sed -n '40,70p'Repository: hubo1989/ScreenTranslate
Length of output: 1254
🏁 Script executed:
cat -n ScreenTranslate/Features/Capture/CaptureManager.swift | sed -n '85,100p'Repository: hubo1989/ScreenTranslate
Length of output: 745
🏁 Script executed:
cat -n ScreenTranslate/Features/Capture/CaptureManager.swift | sed -n '155,170p'Repository: hubo1989/ScreenTranslate
Length of output: 712
🏁 Script executed:
rg -n "hasPermission|permissionDenied|SCShareableContent\.current" ScreenTranslate/Features/Capture/CaptureManager.swiftRepository: hubo1989/ScreenTranslate
Length of output: 470
🏁 Script executed:
cat -n ScreenTranslate/Features/Capture/CaptureManager.swift | sed -n '195,260p'Repository: hubo1989/ScreenTranslate
Length of output: 3136
🏁 Script executed:
rg -n "getSCDisplay|throw|catch" ScreenTranslate/Features/Capture/CaptureManager.swift | head -20Repository: hubo1989/ScreenTranslate
Length of output: 1374
🏁 Script executed:
rg -n "ScreenTranslateError" ScreenTranslate/Features/Capture/CaptureManager.swiftRepository: hubo1989/ScreenTranslate
Length of output: 1103
hasPermission 将所有异常统一映射为 false,掩盖实际错误信息
第 55-60 行的 catch 块捕获 SCShareableContent.current 的所有异常并无区别地返回 false,导致第 92 行和 161 行的调用方都将其解释为权限拒绝(permissionDenied)。这会将网络错误、系统级故障等非权限问题误报为权限问题。
对比 getSCDisplay 方法(第 246-258 行)的做法,它在同一调用上使用了正确的错误处理模式:throw ScreenTranslateError.captureFailure(underlying: error),保留底层错误信息供调用方判断。建议 hasPermission 也按此模式优化,区分权限错误和其他故障,避免错误信息丢失。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ScreenTranslate/Features/Capture/CaptureManager.swift` around lines 55 - 60,
The hasPermission method currently swallows all errors from awaiting
SCShareableContent.current and always returns false, which hides non-permission
failures; change hasPermission to rethrow the underlying error instead of
returning false for all exceptions (mirror the pattern used in getSCDisplay),
i.e., catch errors from SCShareableContent.current and throw a
ScreenTranslateError.captureFailure(underlying: error) (or only map genuine
permission-denied errors to false) so callers of hasPermission (the code paths
that interpret its result as permissionDenied) can distinguish network/system
failures from actual permission denial.
- Remove redundant raw values from PaddleOCRMode enum - Add PaddleOCR settings reset in resetToDefaults() - Remove redundant MainActor.run wrapper in SettingsViewModel - Add static defaultBaseURL constant to avoid force unwrap - Add CJK-aware separator in merged(with:) for proper spacing - Replace window counting heuristic with async SCShareableContent check in OnboardingViewModel Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 901-907: testPaddleOCRConnection() currently always checks local
availability (PaddleOCREngine.shared.isAvailable) and throws "not installed",
which ignores the paddleOCRUseCloud path; update the function to first check the
paddleOCRUseCloud flag and, if true, validate the cloud configuration (call the
existing cloud validation/test helper or perform the cloud API/auth check) and
return success if cloud is reachable, otherwise return/throw a clear
VLMProviderError.invalidConfiguration with cloud-specific message; only when
paddleOCRUseCloud is false (or cloud check fails and fallback intended) then
check PaddleOCREngine.shared.isAvailable and return the local-ready result or
the existing invalidConfiguration("PaddleOCR is not installed") error.
- Around line 404-409: checkPermissions() currently spawns untracked Tasks which
can overlap and overwrite state; change it to mirror startPermissionCheck() by
using the existing permissionCheckTask property: if permissionCheckTask exists
cancel it, then assign a new Task to permissionCheckTask that awaits
checkScreenRecordingPermission(), updates hasScreenRecordingPermission and
isCheckingPermissions, and clears permissionCheckTask when done; ensure you
cancel the previous task before creating a new one so repeated calls don’t
produce racey state updates.
In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 311-314: The paddleOCRCloudAPIKey is currently persisted in
UserDefaults (via save and Keys.paddleOCRCloudAPIKey) which stores the
credential in plaintext; change storage to the system Keychain API (or a
Keychain wrapper used in the project) instead of save/UserDefaults, update the
paddleOCRCloudAPIKey getter/setter to read/write from the Keychain (use the same
key identifier Keys.paddleOCRCloudAPIKey), and ensure any settings reset/clear
method in AppSettings also deletes the Keychain entry so the credential is
removed when settings are reset.
In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift`:
- Around line 44-50: The analyze(image:) function currently checks
PaddleOCREngine.shared.isAvailable before reading configuration, which blocks
useCloud=true paths; change the flow to first build/read the provider
configuration (determine useCloud), and only perform the
PaddleOCREngine.shared.isAvailable guard when the config indicates local mode
(useCloud == false); apply the same change to the other availability check in
this file so local-only availability is enforced only for local mode.
- Around line 181-185: The current weightedConfidence calculation can divide by
zero when text.count + other.text.count == 0; add a guard so if totalLength == 0
you assign a safe default (e.g., 0.0) to weightedConfidence instead of
performing the division, otherwise compute the weighted average as before using
totalLength, confidence and other.confidence (variables: totalLength, text,
other.text, weightedConfidence, confidence, other.confidence).
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
ScreenTranslate/Features/Onboarding/OnboardingViewModel.swiftScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Models/AppSettings.swiftScreenTranslate/Services/PaddleOCRVLMProvider.swift
✅ Files skipped from review due to trivial changes (1)
- ScreenTranslate/Features/Onboarding/OnboardingViewModel.swift
- testPaddleOCRConnection: check cloud mode first with URL/API validation - checkPermissions: use permissionCheckTask to avoid race conditions - paddleOCRCloudAPIKey: store in Keychain instead of UserDefaults - analyze(image:): check PaddleOCREngine availability only for local mode - isAvailable: mode-aware check (cloud checks URL, local checks installation) - weightedConfidence: handle divide-by-zero edge case Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
ScreenTranslate/Services/Security/KeychainService.swift (1)
316-401: 建议提取重复的账户标识符为常量。
"paddleocr_cloud"字符串在三个方法中重复出现(第 317、369、401 行)。如果将来需要修改此标识符,可能会遗漏某处。♻️ 建议的重构
+ /// Account identifier for PaddleOCR cloud credentials + private let paddleOCRAccount = "paddleocr_cloud" + func savePaddleOCRCredentials(apiKey: String) throws { - let account = "paddleocr_cloud" + let account = paddleOCRAccount ... } func getPaddleOCRCredentials() -> String? { - let account = "paddleocr_cloud" + let account = paddleOCRAccount ... } func deletePaddleOCRCredentials() throws { - let account = "paddleocr_cloud" + let account = paddleOCRAccount ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/Security/KeychainService.swift` around lines 316 - 401, The literal "paddleocr_cloud" is duplicated across savePaddleOCRCredentials, getPaddleOCRCredentials, and deletePaddleOCRCredentials; extract it into a single constant (e.g. private let paddleOCRAccount = "paddleocr_cloud" or a static property on KeychainService) and replace each use of the string with that constant to ensure a single source of truth and easier future changes.ScreenTranslate/Services/PaddleOCRVLMProvider.swift (1)
211-217: CJK 检测逻辑可考虑优化。当前
separator方法检测每个文本片段的首字符来判断是否为 CJK。对于混合内容(如 "Hello世界"),检测首字符可能不够准确。更精确的做法是检查第一段的末字符和第二段的首字符。不过对于 OCR 文本合并的场景,当前实现已足够实用。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift` around lines 211 - 217, The separator(for:and:) currently determines CJK-ness by inspecting each string (which in the code used first-character checks); change it to inspect the last character of the first string and the first character of the second string instead (use isCJKText on those characters) so mixed-content cases like "Hello世界" are handled more accurately; update separator(for:and:) to compute firstLast = first.last (as String/Character) and secondFirst = second.first and call isCJKText(firstLast) and isCJKText(secondFirst) to decide between "" and " ".ScreenTranslate/Models/AppSettings.swift (1)
538-560: 将硬编码的 Keychain 标识符提取为共享常量。服务标识符
"com.screentranslate.credentials"和账户标识符"paddleocr_cloud"在此处硬编码(第 539-540 行),与KeychainService中的定义重复。如果KeychainService的值更新,此处不会同步,可能导致凭据读取失败。建议将这些常量定义在共享位置(如
KeychainConstants或类似的配置文件),供AppSettings和KeychainService同时使用。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Models/AppSettings.swift` around lines 538 - 560, The hard-coded Keychain identifiers in loadPaddleOCRAPIKeyFromKeychain() (service "com.screentranslate.credentials" and account "paddleocr_cloud") must be replaced with shared constants to avoid duplication with KeychainService; create or use an existing KeychainConstants (or similar) exposing SERVICE and ACCOUNT (or specific names like PADDLE_OCR_SERVICE / PADDLE_OCR_ACCOUNT), import/extend that constant container, and update loadPaddleOCRAPIKeyFromKeychain() to reference those constants instead of string literals so both AppSettings and KeychainService use the same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 314-323: AppSettings is `@MainActor-isolated` so reading
self.paddleOCRCloudAPIKey inside Task.detached is a concurrency violation;
capture the value on the actor before spawning the detached task and pass that
captured value into the closure, then call
KeychainService.shared.savePaddleOCRCredentials(apiKey: capturedValue) inside
the Task.detached and handle errors as before (references: paddleOCRCloudAPIKey,
AppSettings, Task.detached, KeychainService.shared.savePaddleOCRCredentials).
---
Nitpick comments:
In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 538-560: The hard-coded Keychain identifiers in
loadPaddleOCRAPIKeyFromKeychain() (service "com.screentranslate.credentials" and
account "paddleocr_cloud") must be replaced with shared constants to avoid
duplication with KeychainService; create or use an existing KeychainConstants
(or similar) exposing SERVICE and ACCOUNT (or specific names like
PADDLE_OCR_SERVICE / PADDLE_OCR_ACCOUNT), import/extend that constant container,
and update loadPaddleOCRAPIKeyFromKeychain() to reference those constants
instead of string literals so both AppSettings and KeychainService use the same
source of truth.
In `@ScreenTranslate/Services/PaddleOCRVLMProvider.swift`:
- Around line 211-217: The separator(for:and:) currently determines CJK-ness by
inspecting each string (which in the code used first-character checks); change
it to inspect the last character of the first string and the first character of
the second string instead (use isCJKText on those characters) so mixed-content
cases like "Hello世界" are handled more accurately; update separator(for:and:) to
compute firstLast = first.last (as String/Character) and secondFirst =
second.first and call isCJKText(firstLast) and isCJKText(secondFirst) to decide
between "" and " ".
In `@ScreenTranslate/Services/Security/KeychainService.swift`:
- Around line 316-401: The literal "paddleocr_cloud" is duplicated across
savePaddleOCRCredentials, getPaddleOCRCredentials, and
deletePaddleOCRCredentials; extract it into a single constant (e.g. private let
paddleOCRAccount = "paddleocr_cloud" or a static property on KeychainService)
and replace each use of the string with that constant to ensure a single source
of truth and easier future changes.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
ScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Models/AppSettings.swiftScreenTranslate/Services/PaddleOCRVLMProvider.swiftScreenTranslate/Services/Security/KeychainService.swift
- Fix concurrency violation: capture paddleOCRCloudAPIKey before Task.detached - Extract Keychain constants: add serviceIdentifier and paddleOCRAccount to KeychainService - Fix CJK separator: check last char of first string and first char of second string - Update AppSettings to use shared KeychainService constants Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…nd (#50) * feat: add PaddleOCR as VLM provider with fast/precise modes - Add PaddleOCRVLMProvider implementing VLMProviderProtocol - Support two modes: - Fast mode: paddleocr ocr command (~1s) - Precise mode: paddleocr doc_parser VL-1.5 (~12s) - Add PaddleOCRMode enum with settings UI - Add cloud API configuration options (baseURL, API key) - Fix JSON parsing for numpy arrays and float formats - Update localization (en/zh-Hans) - Fix screen recording permission check using SCShareableContent Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * fix: address code review feedback on PR #49 - Remove redundant raw values from PaddleOCRMode enum - Add PaddleOCR settings reset in resetToDefaults() - Remove redundant MainActor.run wrapper in SettingsViewModel - Add static defaultBaseURL constant to avoid force unwrap - Add CJK-aware separator in merged(with:) for proper spacing - Replace window counting heuristic with async SCShareableContent check in OnboardingViewModel Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * fix: address second round code review feedback - testPaddleOCRConnection: check cloud mode first with URL/API validation - checkPermissions: use permissionCheckTask to avoid race conditions - paddleOCRCloudAPIKey: store in Keychain instead of UserDefaults - analyze(image:): check PaddleOCREngine availability only for local mode - isAvailable: mode-aware check (cloud checks URL, local checks installation) - weightedConfidence: handle divide-by-zero edge case Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * fix: address third round code review feedback - Fix concurrency violation: capture paddleOCRCloudAPIKey before Task.detached - Extract Keychain constants: add serviceIdentifier and paddleOCRAccount to KeychainService - Fix CJK separator: check last char of first string and first char of second string - Update AppSettings to use shared KeychainService constants Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * feat: add MLX-VLM inference framework support for Apple Silicon - Add paddleOCRUseMLXVLM, paddleOCRMLXVLMServerURL, paddleOCRMLXVLMModelName settings - Update PaddleOCREngine.Configuration with MLX-VLM fields - Add --vl_rec_backend, --vl_rec_server_url, --vl_rec_api_model_name args - Add UI settings (visible when precise mode and not using cloud) - Add English and Chinese localizations Usage: 1. Install: pip install "mlx-vlm>=0.3.11" 2. Start server: mlx_vlm.server --port 8111 3. Enable in Settings > PaddleOCR > Use MLX-VLM Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * feat: add MLX-VLM server status detection - Add isMLXVLMServerRunning and isCheckingMLXVLMServer state - Add checkMLXVLMServerStatus() method to verify server connectivity - Add UI with status indicator and refresh button - Add English and Chinese localizations for status messages Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * fix: auto-check MLX-VLM server status on settings load - Add MLX-VLM check in refreshPaddleOCRStatus() - Add onAppear handler in PaddleOCRStatusSection Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * fix: correct localization key names for error messages Keys should not include format specifier %@ Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * feat: add local VL model directory support for PaddleOCR native backend - Add localVLModelDir configuration to PaddleOCREngine.Configuration - Support native backend with --vl_rec_model_dir when MLX-VLM is disabled - Add UI input field for local model directory (shows when MLX-VLM unchecked in precise mode) - Fix error message to show correct model info based on provider type - Fix localization key from error.ocr.recognition.failed to error.ocr.failed - Add localization strings for local model directory in EN/CN Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> * refactor: address code review nitpicks for PaddleOCR integration - Fix ScreenDetector.hasPermission to use SCShareableContent.current - Fix health-check to only accept 2xx responses (not <500) - Add error logging for keychain deletion failures - Remove no-op replacingOccurrences in PaddleOCREngine - Add tilde expansion for localVLModelDir path - Localize PaddleOCR not installed error message - Extend isCJKChar with CJK punctuation, fullwidth forms, extensions B-F - Extract saveCredentialsInternal helper to eliminate duplicate keychain logic Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --------- Co-authored-by: Hubert <hubo@HubertdeMacBook-Pro.local> Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Summary
paddleocr ocrcommand (~1s)paddleocr doc_parser VL-1.5(~12s, higher accuracy)Changes
PaddleOCRVLMProvider.swift: New VLM provider implementationPaddleOCREngine.swift: Support for dual modes and cloud APIAppSettings.swift: PaddleOCRMode enum and settingsEngineSettingsTab.swift: Settings UI with mode pickerLocalizable.strings: English and Chinese translationsTest Plan
Summary by CodeRabbit
发布说明
新功能
改进
本地化