Skip to content

Add cloud/local mode switching for GLM OCR#67

Merged
hubo1989 merged 11 commits intomainfrom
fixlanguage_error
Mar 20, 2026
Merged

Add cloud/local mode switching for GLM OCR#67
hubo1989 merged 11 commits intomainfrom
fixlanguage_error

Conversation

@hubo1989
Copy link
Copy Markdown
Owner

@hubo1989 hubo1989 commented Mar 20, 2026

Summary

  • keep GLM OCR as a single provider with cloud and local MLX-VLM modes
  • route cloud mode to layout_parsing and local mode to OpenAI-compatible chat/completions
  • add settings persistence, UI mode switching, connectivity checks, and tests for both response formats

Testing

  • xcodebuild build -project ScreenTranslate.xcodeproj -scheme ScreenTranslate -destination 'platform=macOS'
  • xcodebuild test -project ScreenTranslate.xcodeproj -scheme ScreenTranslate -destination 'platform=macOS' -only-testing:ScreenTranslateTests/GLMOCRVLMProviderTests

Summary by CodeRabbit

  • 新功能

    • 添加 GLM OCR 作为新的 VLM 提供商,支持 云端/本地 模式、模式感知的设置与连接检测
    • 在分析阶段加入 OCR 降级恢复以提高文本提取稳健性
    • 设置中根据模式展示不同的 API Key 提示与提供商描述;语言自动识别改为在“自动”时传空源语言
  • 本地化

    • 为 GLM OCR 添加多语言界面字符串,修正若干翻译键名
  • 测试

    • 增加针对 GLM OCR 解析与降级逻辑的单元测试
  • 修复

    • 改进中文区域标识匹配(将 "zh" 视为简体中文)
    • 调整翻译过滤逻辑,移除在无文本时回退原始结果的行为

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

引入对 GLM‑OCR VLM 提供者的支持(cloud/local),包含提供者实现、设置持久化与 UI、连接测试、OCR 恢复流程、模型/本地化与单元测试覆盖,及与现有 VLM 路由/默认值的联动改造。(50 字内)

Changes

Cohort / File(s) Summary
GLM‑OCR 提供者实现
ScreenTranslate/Services/GLMOCRVLMProvider.swift
新增 GLMOCRVLMProvider:支持 cloud/local 两种模式,构建 POST 请求、执行网络请求、解析 layout/chat 响应、JSON/文本回退、边界框归一化与错误映射。
VLM 类型与路由
ScreenTranslate/Models/VLMProviderType.swift, ScreenTranslate/Services/ScreenCoderEngine.swift
新增 VLMProviderType.case glmOCR,将 providerDescription/defaultBaseURL/defaultModelName/requiresAPIKey 重载为接受 GLMOCRMode,并在 provider 创建、缓存哈希与默认解析中纳入 glmOCRMode
应用设置与持久化 / 设置 UI
ScreenTranslate/Models/AppSettings.swift, ScreenTranslate/Features/Settings/SettingsViewModel.swift, ScreenTranslate/Features/Settings/EngineSettingsTab.swift
新增 GLMOCRMode(cloud/local)与相应本地化键;AppSettings 增加模式特定的持久化键与读写逻辑;SettingsViewModel 暴露 glmOCRModecurrentVLMRequiresAPIKeycurrentVLMProviderDescription;Settings UI 显示模式选择并按模式调整 API-key 文本与描述。
分析/翻译流程与 OCR 回退
ScreenTranslate/Features/TranslationFlow/TranslationFlowController.swift, ScreenTranslate/Models/ScreenAnalysisResult.swift
新增 TextSegment.isLikelyOCRPromptLeakagecontainsOnlyPromptLeakage 与基于此的过滤逻辑;filteredForTranslation() 同时过滤噪声与提示泄漏(不再在空结果时回退);新增 recoverAnalysisResultIfNeeded 并在仅泄漏时调用 OCR 回退以替代分析结果。
语言/本地化
ScreenTranslate/Models/AppLanguage.swift, ScreenTranslate/Resources/*/Localizable.strings
将通用 zh 识别为简体中文,添加多语言 GLM‑OCR 本地化条目并统一修正若干 textTranslation.error.translationFailed 键名。
连接测试与翻译服务
ScreenTranslate/Services/TranslationService.swift, ScreenTranslate/Features/Settings/SettingsViewModel.swift
改进 testConnection 的错误处理/日志;SettingsViewModel 的 VLM 测试路径根据 glmOCRMode 调度到 GLM‑OCR 专用连接测试函数。
单元测试
ScreenTranslateTests/GLMOCRVLMProviderTests.swift, ScreenTranslateTests/TranslationPipelineRegressionTests.swift
新增 GLM‑OCR 响应解析与本地解析测试;新增提示泄漏启发式与 OCR 恢复行为的回归测试。

Sequence Diagram(s)

sequenceDiagram
    participant UI as 用户/设置 UI
    participant SettingsVM as SettingsViewModel
    participant AppSettings as AppSettings
    participant ScreenCoderEngine as ScreenCoderEngine
    participant GLMOCRProvider as GLMOCRVLMProvider
    participant Network as 网络

    UI->>SettingsVM: 切换 glmOCRMode (cloud ↔ local)
    SettingsVM->>AppSettings: 保存 glmOCRMode
    AppSettings-->>SettingsVM: 返回已解析的 baseURL/modelName
    UI->>SettingsVM: 发起 “测试 VLM 连接”
    SettingsVM->>ScreenCoderEngine: testVLMAPI()
    ScreenCoderEngine->>GLMOCRProvider: 创建 GLMOCRVLMProvider (含 mode)
    GLMOCRProvider->>Network: 发送请求 (/layout_parsing 或 /chat/completions)
    alt cloud
        Network-->>GLMOCRProvider: 返回 layout_parsing 响应
        GLMOCRProvider->>GLMOCRProvider: 解析 layout -> ScreenAnalysisResult
    else local
        Network-->>GLMOCRProvider: 返回 chat 响应
        GLMOCRProvider->>GLMOCRProvider: 抽取/解析 JSON 或 纯文本回退 -> ScreenAnalysisResult
    end
    GLMOCRProvider-->>ScreenCoderEngine: 返回 成功/失败
    ScreenCoderEngine-->>SettingsVM: 转发 测试结果
Loading
sequenceDiagram
    participant TranslationFlow as TranslationFlowController
    participant ScreenCoderEngine as ScreenCoderEngine
    participant Analysis as ScreenAnalysisResult
    participant OCRService as OCRService

    TranslationFlow->>ScreenCoderEngine: analyze(image)
    ScreenCoderEngine-->>TranslationFlow: analysisResult
    TranslationFlow->>Analysis: containsOnlyPromptLeakage?
    alt only prompt leakage
        TranslationFlow->>OCRService: recognize(image) (OCR 回退)
        OCRService-->>TranslationFlow: ocrResult
        TranslationFlow->>Analysis: init(ocrResult) -> recoveredResult
        TranslationFlow->>TranslationFlow: 使用 recoveredResult 继续
    else has valid text
        TranslationFlow->>TranslationFlow: 使用 原始 analysisResult
    end
    TranslationFlow->>Analysis: filteredForTranslation()
    Analysis-->>TranslationFlow: filteredResult (可能为空 -> 抛错)
Loading

Estimated code review effort

🎯 4 (复杂) | ⏱️ ~75 minutes

Possibly related PRs

诗句

🐰 云端或在枯枝下的本地,
字符跳跃如胡萝卜的光,
我嗅到布局里的秘密与噪音,
一跳一跳把泄漏化为文本,
GLM‑OCR,兔子为你鼓掌。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.99% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR 标题清晰准确地概括了变更的核心内容:为 GLM OCR 添加云/本地模式切换功能。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fixlanguage_error
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 537c337f13

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1075 to +1081
case .cloud:
request = try GLMOCRVLMProvider.makeLayoutParsingRequest(
baseURL: baseURL,
apiKey: apiKey,
modelName: modelName,
fileDataURI: GLMOCRVLMProvider.connectionTestImageDataURI,
timeout: 10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid charging users when they press the GLM cloud test button

In cloud mode, testGLMOCRConnection builds a real layout_parsing request with connectionTestImageDataURI instead of using a lightweight capability check. That means every click on “Test API” performs an OCR inference, so users who are just validating credentials can still consume quota, incur cost, or hit rate limits; the other providers’ tests only probe metadata endpoints.

Useful? React with 👍 / 👎.

Comment on lines +1104 to +1108
if let modelsResponse = try? decoder.decode(GLMOCRLocalModelsResponse.self, from: data),
!modelsResponse.data.contains(where: { $0.id == modelName }) {
let availableModels = modelsResponse.data.map(\.id).joined(separator: ", ")
let message = availableModels.isEmpty ? "No models reported by local server." : "Available models: \(availableModels)"
return (true, message)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail the local GLM test when the configured model is missing

If the local server responds to /models but does not advertise modelName, this path still returns success: true, so the settings UI shows a green success state even though GLMOCRVLMProvider.analyze will later send that unavailable model to /chat/completions and fail on first use. This affects any local setup where the loaded model ID differs from the saved setting.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
ScreenTranslate/Models/AppLanguage.swift (1)

74-91: 考虑处理不带脚本/地区的 zh 语言代码。

当前实现中,纯 zh 标识符不会匹配任何语言用例,因为:

  • 不满足特殊的简体中文检查(需要 zh-hans 前缀或精确匹配 zh-cn/zh-sg
  • 在通用匹配中也不会命中(检查的是 zh-hans

某些系统可能仅返回 zh 作为首选语言,这会导致意外回退到英语。

♻️ 可选的修复方案
 static func from(localeIdentifier: String) -> AppLanguage? {
     let normalizedIdentifier = localeIdentifier.replacingOccurrences(of: "_", with: "-").lowercased()

     if normalizedIdentifier.hasPrefix("zh-hans")
         || normalizedIdentifier == "zh-cn"
-        || normalizedIdentifier == "zh-sg" {
+        || normalizedIdentifier == "zh-sg"
+        || normalizedIdentifier == "zh" {
         return .simplifiedChinese
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Models/AppLanguage.swift` around lines 74 - 91, The
from(localeIdentifier:) matcher currently skips plain "zh" so add a branch that
treats "zh" (after normalizing via normalizedIdentifier) as simplified Chinese;
specifically, update the conditional that returns .simplifiedChinese to also
check normalizedIdentifier == "zh" (in addition to the existing zh-hans / zh-cn
/ zh-sg checks), so the function (and symbols normalizedIdentifier,
from(localeIdentifier:), .simplifiedChinese, allCases, .system) will correctly
map a bare "zh" locale to simplified Chinese.
ScreenTranslate/Services/ScreenCoderEngine.swift (1)

173-188: 考虑合并多个 MainActor.run 调用以提高性能。

当前 configurationHash() 方法中有 5 个独立的 await MainActor.run 调用,每次调用都会产生一次 actor hop 开销。可以考虑将这些调用合并为一个,返回包含所有所需值的元组。

♻️ 建议的优化
 private func configurationHash() async -> Int {
-    let settings = await MainActor.run { AppSettings.shared }
-    
-    let providerType = await MainActor.run { settings.vlmProvider }
-    let apiKey = await MainActor.run { settings.vlmAPIKey }
-    let baseURL = await MainActor.run { settings.vlmBaseURL }
-    let modelName = await MainActor.run { settings.vlmModelName }
-    let glmOCRMode = await MainActor.run { settings.glmOCRMode }
+    let (providerType, apiKey, baseURL, modelName, glmOCRMode) = await MainActor.run {
+        let settings = AppSettings.shared
+        return (settings.vlmProvider, settings.vlmAPIKey, settings.vlmBaseURL, settings.vlmModelName, settings.glmOCRMode)
+    }
     
     var hasher = Hasher()
     hasher.combine(providerType)
     hasher.combine(glmOCRMode)
     hasher.combine(apiKey)
     hasher.combine(baseURL)
     hasher.combine(modelName)
     return hasher.finalize()
 }

注意:createProvider(for:) 方法(第 132-137 行)也有类似的优化机会。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/ScreenCoderEngine.swift` around lines 173 - 188, The
multiple await MainActor.run hops in configurationHash() (reading
AppSettings.shared.vlmProvider, vlmAPIKey, vlmBaseURL, vlmModelName, glmOCRMode)
should be collapsed into a single await MainActor.run that reads and returns all
needed values (e.g., a tuple or small struct) and then use those locals to feed
the Hasher; do the same optimization in createProvider(for:) by reading all
required AppSettings.shared properties with one MainActor.run call to avoid
repeated actor hops.
ScreenTranslate/Services/GLMOCRVLMProvider.swift (2)

131-136: 移除集合字面量中的尾随逗号。

SwiftLint 标记了第 133 行和第 135 行的尾随逗号问题。

♻️ 修复尾随逗号
                     .text(VLMPromptTemplate.localModelUserPrompt + "\nReturn only valid JSON."),
-                        .imageURL(GLMOCRLocalImageURL(url: fileDataURI)),
-                    ])
+                        .imageURL(GLMOCRLocalImageURL(url: fileDataURI))
+                    ]
                 ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 131 - 136,
Remove the trailing commas inside the array passed to .vision: specifically drop
the comma after .text(VLMPromptTemplate.localModelUserPrompt + "\nReturn only
valid JSON.") and the comma after .imageURL(GLMOCRLocalImageURL(url:
fileDataURI)) so the collection literal has no trailing commas; update the array
items in the .vision([ ... ]) call accordingly.

17-20: 静态 URL 常量使用强制解包。

虽然这些 URL 字符串是有效的字面量,强制解包在此处是安全的,但 SwiftLint 标记了这个问题。如果想消除警告,可以使用以下方式:

♻️ 可选的改进方案
-    static let defaultBaseURL = URL(string: "https://open.bigmodel.cn/api/paas/v4")!
-    static let defaultModel = "glm-ocr"
-    static let defaultLocalBaseURL = URL(string: "http://127.0.0.1:18081")!
+    // swiftlint:disable:next force_unwrapping
+    static let defaultBaseURL = URL(string: "https://open.bigmodel.cn/api/paas/v4")!
+    static let defaultModel = "glm-ocr"
+    // swiftlint:disable:next force_unwrapping
+    static let defaultLocalBaseURL = URL(string: "http://127.0.0.1:18081")!
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 17 - 20,
Replace the force-unwrapped URL initializers with safe, non-optional
initializers using a closure that validates URL(string:) and calls fatalError on
failure; specifically update static let defaultBaseURL and static let
defaultLocalBaseURL (and any similar constants like defaultModel stays
unchanged) to use a closure form such as: static let defaultBaseURL: URL = {
guard let url = URL(string: "https://open.bigmodel.cn/api/paas/v4") else {
fatalError("Invalid URL literal for defaultBaseURL") } return url }() and do the
same pattern for defaultLocalBaseURL to satisfy SwiftLint without changing
runtime behavior.
ScreenTranslateTests/TranslationPipelineRegressionTests.swift (1)

34-35: 移除集合字面量中的尾随逗号。

SwiftLint 标记了第 34 行和第 49 行的尾随逗号问题。虽然这不影响功能,但为了保持代码风格一致性,建议移除。

♻️ 修复尾随逗号
                     boundingBox: .zero,
                     confidence: 0.99
-                ),
+                )
             ],

同样适用于第 49 行:

                     confidence: 0.94
-                ),
+                )
             ],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslateTests/TranslationPipelineRegressionTests.swift` around lines
34 - 35, 在测试文件中有两个集合字面量包含尾随逗号(SwiftLint
报告的两个位置),请在相应的数组/字典字面量中删除多余的末尾逗号以符合代码风格;定位到声明集合的那两处(测试用例中的字面量列表)并移除末尾的“,”,随后运行
SwiftLint/测试确保没有格式警告或编译问题。
ScreenTranslate/Services/TranslationService.swift (1)

404-409: testConnection 中使用 try? 会吞掉所有错误信息。

resolvedProvider(for:) 失败时,虽然记录了日志,但调用方无法区分是配置错误、网络问题还是其他原因导致的连接测试失败。这可能会降低用户体验,因为用户只会看到测试失败,而不知道具体原因。

建议考虑保留更详细的错误信息以便于调试或向用户显示:

♻️ 可选的改进方案
-        guard let provider = try? await resolvedProvider(for: engine) else {
-            logger.error("Failed to resolve provider for \(engine.rawValue)")
+        let provider: any TranslationProvider
+        do {
+            provider = try await resolvedProvider(for: engine)
+        } catch {
+            logger.error("Failed to resolve provider for \(engine.rawValue): \(error.localizedDescription)")
             return false
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/TranslationService.swift` around lines 404 - 409,
The current testConnection implementation swallows errors by using try? when
calling resolvedProvider(for:), losing error details; change it to use a do {
let provider = try await resolvedProvider(for: engine); return await
provider.checkConnection() } catch { logger.error("Failed to resolve provider
for \(engine.rawValue): \(error)"); either rethrow the error (make
testConnection throw) or return a Result/throwable failure so callers get the
real error information — update the testConnection signature and callers
accordingly, and ensure resolvedProvider(for:) and provider.checkConnection()
errors are preserved and logged.
ScreenTranslate/Features/Settings/SettingsViewModel.swift (1)

1102-1109: 本地模式下模型未找到时的返回行为可能造成混淆。

当配置的模型名称在本地服务器中不存在时,方法返回 success = true 并显示可用模型列表。虽然这提供了有用的信息,但返回"成功"可能会误导用户认为配置是正确的。

建议考虑在模型未找到时返回警告状态或更明确的消息:

♻️ 建议的改进方案
             if mode == .local {
                 let decoder = JSONDecoder()
                 if let modelsResponse = try? decoder.decode(GLMOCRLocalModelsResponse.self, from: data),
                    !modelsResponse.data.contains(where: { $0.id == modelName }) {
                     let availableModels = modelsResponse.data.map(\.id).joined(separator: ", ")
                     let message = availableModels.isEmpty ? "No models reported by local server." : "Available models: \(availableModels)"
-                    return (true, message)
+                    return (false, "Model '\(modelName)' not found. \(message)")
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 1102
- 1109, The current logic in SettingsViewModel (in the local mode branch
decoding GLMOCRLocalModelsResponse) returns (true, message) when the configured
modelName is not found, which is misleading; change it to indicate failure or a
warning instead of success — e.g., return (false, "Model '\(modelName)' not
found. " + (availableModels.isEmpty ? "No models reported by local server." :
"Available models: \(availableModels)")) or update the method to return a
distinct status/warning enum and a clear message so callers can distinguish
"model missing" from a successful validation; update any callers of this method
to handle the new false/warning status accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ScreenTranslate/Models/AppSettings.swift`:
- Line 131: The persistence keys for GLM OCR endpoints/models are not separated
by mode, so switching glmOCRMode causes cloud/local settings to overwrite each
other; update AppSettings to define and use distinct keys (e.g., add keys like
glmCloudBaseURL, glmCloudModelName and glmLocalBaseURL, glmLocalModelName) and
change the getters/setters that currently use vlmBaseURL/vlmModelName to
read/write the mode-specific keys based on glmOCRMode; also update the restore
logic referenced in SettingsViewModel (around the restore/default backfill) to
fetch/backfill from the correct mode-specific keys so cloud and local
endpoints/models persist independently.

In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 202-206: The filteredForTranslation() implementation in
ScreenAnalysisResult currently falls back to the original segments when
filtering yields no results; remove that fallback so it returns an empty
ScreenAnalysisResult(segments: [], imageSize: imageSize) when all segments are
classified as noise, allowing TranslationFlowController (which expects an empty
result to throw noTextFound) to work correctly—if false positives are a concern,
refine the noise detection predicate (isLikelyTranslationNoise) instead of
restoring the original segments.

In `@ScreenTranslate/Resources/de.lproj/Localizable.strings`:
- Line 640: Die deutschsprachige Beschreibung für den String-Schlüssel
"vlm.provider.glmocr.description" enthält noch den englischen Ausdruck "layout
parsing"; ersetze diesen im Wert durch eine passende deutsche Formulierung (z.
B. "Layout-Analyse" oder "Layout-Parsing" je nach Projektkonvention) so die
komplette Beschreibung konsistent auf Deutsch ist, und speichere die
aktualisierte Zeichenfolge unter dem Schlüssel
"vlm.provider.glmocr.description".

In `@ScreenTranslate/Resources/it.lproj/Localizable.strings`:
- Line 640: The Italian translation for the key
"vlm.provider.glmocr.description" contains the untranslated English phrase
"layout parsing"; replace it with a fully localized Italian phrase (e.g.,
"analisi del layout" or another consistent term used by adjacent provider
descriptions) so the value reads entirely in Italian and matches the style of
neighboring provider strings; update the value for
"vlm.provider.glmocr.description" accordingly in Localizable.strings.

In `@ScreenTranslateTests/GLMOCRVLMProviderTests.swift`:
- Around line 56-59: Add a safety check before indexing result.segments: assert
the expected count (e.g., XCTAssertEqual(result.segments.count, 2)) and then
safely unwrap the elements (use guard with return or XCTUnwrap) before accessing
segments[0] / segments[1]; update the assertions that call assertRect on
segments[0].boundingBox and segments[1].boundingBox to use the safely unwrapped
variables so an earlier assertion failure prevents an index-out-of-bounds crash
in GLMOCRVLMProviderTests.swift.

---

Nitpick comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 1102-1109: The current logic in SettingsViewModel (in the local
mode branch decoding GLMOCRLocalModelsResponse) returns (true, message) when the
configured modelName is not found, which is misleading; change it to indicate
failure or a warning instead of success — e.g., return (false, "Model
'\(modelName)' not found. " + (availableModels.isEmpty ? "No models reported by
local server." : "Available models: \(availableModels)")) or update the method
to return a distinct status/warning enum and a clear message so callers can
distinguish "model missing" from a successful validation; update any callers of
this method to handle the new false/warning status accordingly.

In `@ScreenTranslate/Models/AppLanguage.swift`:
- Around line 74-91: The from(localeIdentifier:) matcher currently skips plain
"zh" so add a branch that treats "zh" (after normalizing via
normalizedIdentifier) as simplified Chinese; specifically, update the
conditional that returns .simplifiedChinese to also check normalizedIdentifier
== "zh" (in addition to the existing zh-hans / zh-cn / zh-sg checks), so the
function (and symbols normalizedIdentifier, from(localeIdentifier:),
.simplifiedChinese, allCases, .system) will correctly map a bare "zh" locale to
simplified Chinese.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 131-136: Remove the trailing commas inside the array passed to
.vision: specifically drop the comma after
.text(VLMPromptTemplate.localModelUserPrompt + "\nReturn only valid JSON.") and
the comma after .imageURL(GLMOCRLocalImageURL(url: fileDataURI)) so the
collection literal has no trailing commas; update the array items in the
.vision([ ... ]) call accordingly.
- Around line 17-20: Replace the force-unwrapped URL initializers with safe,
non-optional initializers using a closure that validates URL(string:) and calls
fatalError on failure; specifically update static let defaultBaseURL and static
let defaultLocalBaseURL (and any similar constants like defaultModel stays
unchanged) to use a closure form such as: static let defaultBaseURL: URL = {
guard let url = URL(string: "https://open.bigmodel.cn/api/paas/v4") else {
fatalError("Invalid URL literal for defaultBaseURL") } return url }() and do the
same pattern for defaultLocalBaseURL to satisfy SwiftLint without changing
runtime behavior.

In `@ScreenTranslate/Services/ScreenCoderEngine.swift`:
- Around line 173-188: The multiple await MainActor.run hops in
configurationHash() (reading AppSettings.shared.vlmProvider, vlmAPIKey,
vlmBaseURL, vlmModelName, glmOCRMode) should be collapsed into a single await
MainActor.run that reads and returns all needed values (e.g., a tuple or small
struct) and then use those locals to feed the Hasher; do the same optimization
in createProvider(for:) by reading all required AppSettings.shared properties
with one MainActor.run call to avoid repeated actor hops.

In `@ScreenTranslate/Services/TranslationService.swift`:
- Around line 404-409: The current testConnection implementation swallows errors
by using try? when calling resolvedProvider(for:), losing error details; change
it to use a do { let provider = try await resolvedProvider(for: engine); return
await provider.checkConnection() } catch { logger.error("Failed to resolve
provider for \(engine.rawValue): \(error)"); either rethrow the error (make
testConnection throw) or return a Result/throwable failure so callers get the
real error information — update the testConnection signature and callers
accordingly, and ensure resolvedProvider(for:) and provider.checkConnection()
errors are preserved and logged.

In `@ScreenTranslateTests/TranslationPipelineRegressionTests.swift`:
- Around line 34-35: 在测试文件中有两个集合字面量包含尾随逗号(SwiftLint
报告的两个位置),请在相应的数组/字典字面量中删除多余的末尾逗号以符合代码风格;定位到声明集合的那两处(测试用例中的字面量列表)并移除末尾的“,”,随后运行
SwiftLint/测试确保没有格式警告或编译问题。

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e72d1ab-a57a-4921-8861-3fd06333a02d

📥 Commits

Reviewing files that changed from the base of the PR and between bd835d2 and 537c337.

📒 Files selected for processing (32)
  • ScreenTranslate.xcodeproj/project.pbxproj
  • ScreenTranslate/Features/Onboarding/OnboardingViewModel.swift
  • ScreenTranslate/Features/Settings/EngineSettingsTab.swift
  • ScreenTranslate/Features/Settings/SettingsViewModel.swift
  • ScreenTranslate/Features/TextTranslation/TextTranslationPopupWindow.swift
  • ScreenTranslate/Features/TranslationFlow/TranslationFlowController.swift
  • ScreenTranslate/Models/AppLanguage.swift
  • ScreenTranslate/Models/AppSettings.swift
  • ScreenTranslate/Models/ScreenAnalysisResult.swift
  • ScreenTranslate/Models/VLMProviderType.swift
  • ScreenTranslate/Resources/de.lproj/Localizable.strings
  • ScreenTranslate/Resources/en.lproj/Localizable.strings
  • ScreenTranslate/Resources/es.lproj/Localizable.strings
  • ScreenTranslate/Resources/fr.lproj/Localizable.strings
  • ScreenTranslate/Resources/it.lproj/Localizable.strings
  • ScreenTranslate/Resources/ja.lproj/Localizable.strings
  • ScreenTranslate/Resources/ko.lproj/Localizable.strings
  • ScreenTranslate/Resources/pt.lproj/Localizable.strings
  • ScreenTranslate/Resources/ru.lproj/Localizable.strings
  • ScreenTranslate/Resources/zh-Hans.lproj/Localizable.strings
  • ScreenTranslate/Services/AppleTranslationProvider.swift
  • ScreenTranslate/Services/GLMOCRVLMProvider.swift
  • ScreenTranslate/Services/ScreenCoderEngine.swift
  • ScreenTranslate/Services/Translation/Providers/CompatibleTranslationProvider.swift
  • ScreenTranslate/Services/Translation/Providers/LLMTranslationProvider.swift
  • ScreenTranslate/Services/TranslationEngine.swift
  • ScreenTranslate/Services/TranslationService.swift
  • ScreenTranslateTests/GLMOCRVLMProviderTests.swift
  • ScreenTranslateTests/KeyboardShortcutTests.swift
  • ScreenTranslateTests/ScreenTranslateErrorTests.swift
  • ScreenTranslateTests/ShortcutRecordingTypeTests.swift
  • ScreenTranslateTests/TranslationPipelineRegressionTests.swift

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d3676a439

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +170 to +171
analysisResult = try await Self.recoverAnalysisResultIfNeeded(initialAnalysis) {
try await OCRService.shared.recognize(image)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a fixed OCR fallback for prompt-leak recovery

This only runs when recoverAnalysisResultIfNeeded detects prompt/schema leakage, but the fallback now goes through OCRService.shared.recognize, which honors the user’s selected OCR engine. If that engine is paddleOCR and PaddleOCR is unavailable, the recovery path throws engineNotAvailable and the whole translation fails instead of falling back gracefully. A GLM-specific parsing issue should not become a hard failure because of an unrelated OCR-engine setting.

Useful? React with 👍 / 👎.

Comment on lines +368 to +372
.filter { line in
guard !line.isEmpty else { return false }
if line == "```" || line.hasPrefix("```") { return false }
if ["{", "}", "[", "]"].contains(line) { return false }
return true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Filter schema tokens from GLM plain-text fallback

This fallback is entered precisely when the local GLM server did not return valid JSON. Unlike OpenAIVLMProvider.parsePlainTextResponse, it only strips code fences and lone braces, so pseudo-JSON lines like "segments": [ or "boundingBox": { are converted into real TextSegments. In mixed malformed responses, users will still get translated prompt/schema garbage overlaid on the screenshot instead of just the recovered text.

Useful? React with 👍 / 👎.

Comment on lines +1113 to +1116
case 404, 405:
if mode == .cloud {
return (true, "GLM OCR cloud endpoint is reachable.")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail the GLM cloud test on 404/405 responses

In cloud mode the settings test probes baseURL/models, but real GLM OCR requests later go to baseURL/layout_parsing. Returning success on any 404/405 means a mistyped base URL or proxy that merely lacks /models still shows a green test result even though every actual analysis call will fail. This makes the new cloud-mode validation misleading for setup and troubleshooting.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
ScreenTranslate/Features/Settings/SettingsViewModel.swift (1)

334-357: ⚠️ Potential issue | 🟠 Major

这里还会继承上一个 provider 的自定义 URL / model。

shouldUpdateBaseURL / shouldUpdateModel 仍然只在“当前值为空或等于旧 provider 默认值”时才回填。只要用户之前给 OpenAI/Claude 配过自定义 endpoint/model,切到 .glmOCR 后就会继续拿那套配置;ScreenTranslate/Services/ScreenCoderEngine.swift:137-153 随后会用这些旧值创建 GLM provider,直接命中错误 endpoint/model。切换 provider 时应该切到目标 provider 的持久化/default 配置,而不是复用上一个 provider 的自定义值。

💡 最小修复示例
-            let previousProvider = settings.vlmProvider
-            let previousMode = settings.glmOCRMode
-            let shouldUpdateBaseURL = vlmBaseURL.isEmpty || vlmBaseURL == previousProvider.defaultBaseURL(glmOCRMode: previousMode)
-            let shouldUpdateModel = vlmModelName.isEmpty || vlmModelName == previousProvider.defaultModelName(glmOCRMode: previousMode)
-
             settings.vlmProvider = newValue
-
-            if shouldUpdateBaseURL {
-                if newValue == .glmOCR {
-                    vlmBaseURL = settings.storedGLMOCRBaseURL(for: settings.glmOCRMode)
-                        ?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode)
-                } else {
-                    vlmBaseURL = newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode)
-                }
-            }
-
-            if shouldUpdateModel {
-                if newValue == .glmOCR {
-                    vlmModelName = settings.storedGLMOCRModelName(for: settings.glmOCRMode)
-                        ?? newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)
-                } else {
-                    vlmModelName = newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)
-                }
-            }
+            vlmBaseURL = newValue == .glmOCR
+                ? (settings.storedGLMOCRBaseURL(for: settings.glmOCRMode)
+                    ?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode))
+                : newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode)
+
+            vlmModelName = newValue == .glmOCR
+                ? (settings.storedGLMOCRModelName(for: settings.glmOCRMode)
+                    ?? newValue.defaultModelName(glmOCRMode: settings.glmOCRMode))
+                : newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)

如果后续要保留各 provider 的自定义值,建议再补按 provider 分桶的持久化。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 334 -
357, 当前逻辑仍然可能在切换到新 provider 时沿用上一个 provider 的自定义
vlmBaseURL/vlmModelName(shouldUpdateBaseURL/shouldUpdateModel 以旧 provider
的默认值为准),应改为在 settings.vlmProvider 变更时直接加载目标 provider 的持久化或默认配置。具体修改:在
SettingsViewModel 中处理 settings.vlmProvider = newValue 后,基于 newValue(而非
previousProvider/previousMode)计算是否回填,并在 shouldUpdateBaseURL/shouldUpdateModel 为
true 时将 vlmBaseURL = settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) ??
newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) 和 vlmModelName =
settings.storedGLMOCRModelName(for: settings.glmOCRMode) ??
newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)(或直接无条件覆盖为目标 provider
的持久化/默认值),确保不复用旧 provider 的自定义 endpoint/model(参考 symbols: settings.vlmProvider,
previousProvider, shouldUpdateBaseURL, shouldUpdateModel, vlmBaseURL,
vlmModelName, storedGLMOCRBaseURL, storedGLMOCRModelName, defaultBaseURL,
defaultModelName)。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 1068-1125: The connectivity check in testGLMOCRConnection is too
permissive: cloud treats 404/405 as success and local treats any 200 as success
even if /models JSON can't be decoded. Update testGLMOCRConnection to validate
the response shape for both modes (decode GLMOCRLocalModelsResponse for local
and verify expected cloud response schema) or, preferably, perform the same
lightweight probe used by GLMOCRVLMProvider (e.g., call the actual parsing
endpoint like /layout_parsing or a minimal /chat/completions request that the
provider uses) and assert the returned status/JSON matches the provider's
expected format; keep function name testGLMOCRConnection and reuse
GLMOCRVLMProvider request construction/expected response checks so the UI only
shows success when the real service will work.

In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 597-601: resetToDefaults() currently only resets
vlmBaseURL/vlmModelName and leaves glmOCRCloudBaseURL, glmOCRCloudModelName,
glmOCRLocalBaseURL, glmOCRLocalModelName untouched; update resetToDefaults() to
also clear or set those four properties back to their default values (matching
VLMProviderType.glmOCR defaults or empty as intended) so that when vlmProvider
is set to .glmOCR later, SettingsViewModel won't restore stale endpoint/model
values; locate resetToDefaults() and the properties glmOCRCloudBaseURL,
glmOCRCloudModelName, glmOCRLocalBaseURL, glmOCRLocalModelName to apply the
changes.

In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 202-204: The filteredForTranslation() in ScreenAnalysisResult
currently filters only segments where isLikelyTranslationNoise is true; update
it to also exclude segments flagged by isLikelyOCRPromptLeakage so prompt/schema
fragments are not passed into the translation flow (see
TranslationFlowController logic at lines ~305-316); locate the
filteredForTranslation() method and change the filter predicate to require both
!segment.isLikelyTranslationNoise and !segment.isLikelyOCRPromptLeakage so only
real text segments reach translation.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 164-170: The error currently embeds the raw OCR/model response
into the VLMProviderError.parsingFailed message (in the
GLMOCRLayoutParsingResponse decode catch), which can leak sensitive screenshot
text; remove the raw payload from the thrown error and only include the decoding
error details (e.g., error.localizedDescription). If you need raw data for
debugging, log it only behind a debug flag or to a restricted debug logger with
explicit redaction. Apply the same change to the other catch block referenced
(the similar decode around lines 193–199) so neither throws include raw response
content.

---

Duplicate comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 334-357: 当前逻辑仍然可能在切换到新 provider 时沿用上一个 provider 的自定义
vlmBaseURL/vlmModelName(shouldUpdateBaseURL/shouldUpdateModel 以旧 provider
的默认值为准),应改为在 settings.vlmProvider 变更时直接加载目标 provider 的持久化或默认配置。具体修改:在
SettingsViewModel 中处理 settings.vlmProvider = newValue 后,基于 newValue(而非
previousProvider/previousMode)计算是否回填,并在 shouldUpdateBaseURL/shouldUpdateModel 为
true 时将 vlmBaseURL = settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) ??
newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) 和 vlmModelName =
settings.storedGLMOCRModelName(for: settings.glmOCRMode) ??
newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)(或直接无条件覆盖为目标 provider
的持久化/默认值),确保不复用旧 provider 的自定义 endpoint/model(参考 symbols: settings.vlmProvider,
previousProvider, shouldUpdateBaseURL, shouldUpdateModel, vlmBaseURL,
vlmModelName, storedGLMOCRBaseURL, storedGLMOCRModelName, defaultBaseURL,
defaultModelName)。

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cccc4dd1-c6ad-4931-a80e-7399d74596f0

📥 Commits

Reviewing files that changed from the base of the PR and between 537c337 and 2d3676a.

📒 Files selected for processing (11)
  • ScreenTranslate/Features/Settings/SettingsViewModel.swift
  • ScreenTranslate/Models/AppLanguage.swift
  • ScreenTranslate/Models/AppSettings.swift
  • ScreenTranslate/Models/ScreenAnalysisResult.swift
  • ScreenTranslate/Resources/de.lproj/Localizable.strings
  • ScreenTranslate/Resources/it.lproj/Localizable.strings
  • ScreenTranslate/Services/GLMOCRVLMProvider.swift
  • ScreenTranslate/Services/ScreenCoderEngine.swift
  • ScreenTranslate/Services/TranslationService.swift
  • ScreenTranslateTests/GLMOCRVLMProviderTests.swift
  • ScreenTranslateTests/TranslationPipelineRegressionTests.swift
✅ Files skipped from review due to trivial changes (3)
  • ScreenTranslate/Resources/de.lproj/Localizable.strings
  • ScreenTranslateTests/GLMOCRVLMProviderTests.swift
  • ScreenTranslate/Resources/it.lproj/Localizable.strings

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a770e49e6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +335 to +338
vlmBaseURL = newValue == .glmOCR
? (settings.storedGLMOCRBaseURL(for: settings.glmOCRMode)
?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode))
: newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve custom provider settings when switching VLMs

This setter now overwrites vlmBaseURL unconditionally, and the same pattern immediately below does it for vlmModelName. Before this change, custom endpoints/models survived provider toggles unless the field still held the old default; now a user who configured a custom OpenAI/Claude/Ollama endpoint loses it as soon as they inspect another provider. Because AppSettings only has GLM-specific restore keys (storedGLMOCRBaseURL/ModelName), switching back can only recover the built-in defaults, so the previous configuration is permanently clobbered.

Useful? React with 👍 / 👎.

Comment on lines +309 to +310
guard analysisResult.containsOnlyPromptLeakage else {
return analysisResult
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Trigger OCR recovery when leakage leaves no translatable text

This fallback only runs when every segment matches isLikelyOCRPromptLeakage. In practice malformed GLM replies can mix leaked schema lines with one numeric/date fragment like 2024 or 25%; containsOnlyPromptLeakage is false here, but ScreenAnalysisResult.filteredForTranslation() later strips both the leakage and the noise-only fragment, and performTranslation turns that into .noTextFound. So the new prompt-leak recovery still misses common mixed outputs instead of falling back to OCR.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 334-344: When changing the VLM provider or mode (the setter that
assigns settings.vlmProvider and updates vlmBaseURL and vlmModelName, and the
similar block later), reset any previous connectivity state by clearing
vlmTestResult and setting vlmTestSuccess = nil/false (or the viewmodel's neutral
state) so old success indicators don't persist when switching provider/mode;
locate the setters that update vlmBaseURL and vlmModelName and add explicit
resets for vlmTestResult and vlmTestSuccess immediately after those assignments.

In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 209-212: containsOnlyPromptLeakage currently checks raw segments
with segments.allSatisfy(\.isLikelyOCRPromptLeakage), which is short-circuited
by ordinary translation-noise segments and prevents downstream OCR fallback;
change the predicate to operate on the translation-filtered set (use
filteredForTranslation() or segments.filter { !$0.isLikelyTranslationNoise })
and then test allSatisfy(\.isLikelyOCRPromptLeakage) so that noise is ignored
when deciding if the result contains only prompt/schema leakage (update the
containsOnlyPromptLeakage computed property).

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 127-130: 当前实现会在任何非空 apiKey 下向请求添加 Authorization,从而在 local 模式下仍把共享的
settings.vlmAPIKey 转发到本地/兼容端点;请修改 GLMOCRVLMProvider 的逻辑,在设置 Authorization
之前检查运行模式(例如通过 configuration.mode 或新增布尔标识如 configuration.isLocal/isLocalMode),仅在非
local 模式且 trimmedAPIKey 非空时才调用 request.setValue("Bearer \(trimmedAPIKey)",
forHTTPHeaderField: "Authorization"); 同时确认 ScreenCoderEngine 在创建 provider/传入
settings.vlmAPIKey 时不需要改动,或在调用处传递能够让 provider 判断为 local 的标识以避免在本地泄露云端密钥。

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f4957d58-cdce-483d-8160-665ca376eae8

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3676a and a770e49.

📒 Files selected for processing (4)
  • ScreenTranslate/Features/Settings/SettingsViewModel.swift
  • ScreenTranslate/Models/AppSettings.swift
  • ScreenTranslate/Models/ScreenAnalysisResult.swift
  • ScreenTranslate/Services/GLMOCRVLMProvider.swift

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ScreenTranslate/Models/ScreenAnalysisResult.swift (1)

107-152: isLikelyOCRPromptLeakage 实现清晰,检测逻辑合理。

强信号/弱信号分层检测方案设计良好,能有效识别 VLM 意外返回的 prompt/schema 片段。

静态分析工具提示第 129 行和第 143 行存在尾随逗号,建议移除以符合 SwiftLint 规范:

🔧 建议移除尾随逗号
             "返回json",
-        ]
+        ]

(第 129 行 "返回json", 改为 "返回json",第 143 行 "归一化", 改为 "归一化"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Models/ScreenAnalysisResult.swift` around lines 107 - 152,
Remove the trailing commas from the string arrays in the
isLikelyOCRPromptLeakage property: edit the strongSignals array (remove the
trailing comma after the final entry "返回json") and the weakSignals array (remove
the trailing comma after the final entry "归一化") so both arrays end with the last
string literal without a comma to satisfy SwiftLint/style rules.
ScreenTranslate/Services/GLMOCRVLMProvider.swift (1)

114-147: 本地模式请求正确地不再转发云端 API Key。

makeLocalChatRequest 没有设置 Authorization header,这符合本地模式"不需要 API key"的设计,避免了将云端密钥意外发送到本地端点的风险。

静态分析提示第 140 行存在尾随逗号,建议移除:

🔧 建议移除尾随逗号
                 ),
-            ],
+            ],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 114 - 147,
Remove the trailing comma after the last GLMOCRLocalChatMessage in the messages
array inside makeLocalChatRequest; locate the messages: [...] array in the
makeLocalChatRequest function and delete the extra comma following the final
GLMOCRLocalChatMessage entry so the array closing bracket is immediately
followed by the next property (maxTokens).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 234-250: The parseAPIErrorMessage function currently returns the
raw response body as a UTF-8 string on fallback, which can leak sensitive OCR
content; update parseAPIErrorMessage(from:mode:) so that after attempting to
decode GLMOCRAPIErrorResponse and GLMOCRLocalErrorResponse it does NOT return
the raw Data, but instead returns a safe generic fallback (e.g. "Unknown API
error" or nil) or a sanitized/trimmed brief message; keep the
structured-decoding logic for .cloud and .local (using GLMOCRAPIErrorResponse,
GLMOCRLocalErrorResponse) but replace the final String(data:encoding:) return
with the safe fallback to avoid exposing raw response contents.

---

Nitpick comments:
In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 107-152: Remove the trailing commas from the string arrays in the
isLikelyOCRPromptLeakage property: edit the strongSignals array (remove the
trailing comma after the final entry "返回json") and the weakSignals array (remove
the trailing comma after the final entry "归一化") so both arrays end with the last
string literal without a comma to satisfy SwiftLint/style rules.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 114-147: Remove the trailing comma after the last
GLMOCRLocalChatMessage in the messages array inside makeLocalChatRequest; locate
the messages: [...] array in the makeLocalChatRequest function and delete the
extra comma following the final GLMOCRLocalChatMessage entry so the array
closing bracket is immediately followed by the next property (maxTokens).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a89e54d3-ddf2-457b-8862-75d9b830be42

📥 Commits

Reviewing files that changed from the base of the PR and between a770e49 and eb2b6b8.

📒 Files selected for processing (3)
  • ScreenTranslate/Features/Settings/SettingsViewModel.swift
  • ScreenTranslate/Models/ScreenAnalysisResult.swift
  • ScreenTranslate/Services/GLMOCRVLMProvider.swift

Comment on lines +234 to +250
private func parseAPIErrorMessage(from data: Data, mode: GLMOCRMode) -> String? {
let decoder = JSONDecoder()
decoder.keyDecodingStrategy = .convertFromSnakeCase
switch mode {
case .cloud:
if let response = try? decoder.decode(GLMOCRAPIErrorResponse.self, from: data),
let message = response.error.message,
!message.isEmpty {
return message
}
case .local:
if let response = try? decoder.decode(GLMOCRLocalErrorResponse.self, from: data) {
return response.error.message
}
}
return String(data: data, encoding: .utf8)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parseAPIErrorMessage 的 fallback 分支可能泄露敏感信息。

第 249 行在无法解析结构化错误时,直接将原始响应数据作为字符串返回。如果响应体包含 OCR 识别的屏幕内容或其他敏感信息,这些内容可能会通过错误消息暴露给日志或 UI。

💡 建议修改
         }
-        return String(data: data, encoding: .utf8)
+        // 不返回原始响应内容,避免泄露敏感数据
+        return nil
     }

或者在调用处提供一个通用的 fallback 消息,而不是使用原始响应。

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private func parseAPIErrorMessage(from data: Data, mode: GLMOCRMode) -> String? {
let decoder = JSONDecoder()
decoder.keyDecodingStrategy = .convertFromSnakeCase
switch mode {
case .cloud:
if let response = try? decoder.decode(GLMOCRAPIErrorResponse.self, from: data),
let message = response.error.message,
!message.isEmpty {
return message
}
case .local:
if let response = try? decoder.decode(GLMOCRLocalErrorResponse.self, from: data) {
return response.error.message
}
}
return String(data: data, encoding: .utf8)
}
private func parseAPIErrorMessage(from data: Data, mode: GLMOCRMode) -> String? {
let decoder = JSONDecoder()
decoder.keyDecodingStrategy = .convertFromSnakeCase
switch mode {
case .cloud:
if let response = try? decoder.decode(GLMOCRAPIErrorResponse.self, from: data),
let message = response.error.message,
!message.isEmpty {
return message
}
case .local:
if let response = try? decoder.decode(GLMOCRLocalErrorResponse.self, from: data) {
return response.error.message
}
}
// 不返回原始响应内容,避免泄露敏感数据
return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 234 - 250, The
parseAPIErrorMessage function currently returns the raw response body as a UTF-8
string on fallback, which can leak sensitive OCR content; update
parseAPIErrorMessage(from:mode:) so that after attempting to decode
GLMOCRAPIErrorResponse and GLMOCRLocalErrorResponse it does NOT return the raw
Data, but instead returns a safe generic fallback (e.g. "Unknown API error" or
nil) or a sanitized/trimmed brief message; keep the structured-decoding logic
for .cloud and .local (using GLMOCRAPIErrorResponse, GLMOCRLocalErrorResponse)
but replace the final String(data:encoding:) return with the safe fallback to
avoid exposing raw response contents.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb2b6b8a37

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +121 to +125
let endpoint = baseURL.appendingPathComponent("chat/completions")
var request = URLRequest(url: endpoint)
request.httpMethod = "POST"
request.timeoutInterval = timeout
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward optional API keys to local GLM requests

This breaks self-hosted GLM/OpenAI-compatible servers that still require auth (for example, a reverse proxy or vLLM instance with a bearer token): the settings UI exposes an API-key field for local mode, but makeLocalChatRequest never uses its apiKey argument to set Authorization. In that setup both testGLMOCRConnection and real analyze calls will fail with 401s even though the user entered valid credentials.

Useful? React with 👍 / 👎.

Comment on lines +199 to +202
do {
return try parseLocalContent(content).toScreenAnalysisResult(imageSize: fallbackImageSize)
} catch {
throw VLMProviderError.parsingFailed("Failed to parse local GLM OCR content: \(error.localizedDescription)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle truncated local GLM completions before parsing

This shows up on dense screenshots/documents. makeLocalChatRequest limits local completions to 4096 tokens, but parseLocalResponse immediately treats the first message.content as final and never checks finish_reason or requests continuation. When the server stops with a length limit, local GLM will either throw on malformed JSON or return only a partial plain-text recovery, unlike OpenAIVLMProvider.analyzeWithContinuation and ClaudeVLMProvider which already continue truncated responses.

Useful? React with 👍 / 👎.

@hubo1989 hubo1989 merged commit 2da1bf9 into main Mar 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant