Add cloud/local mode switching for GLM OCR#67
Conversation
📝 WalkthroughWalkthrough引入对 GLM‑OCR VLM 提供者的支持(cloud/local),包含提供者实现、设置持久化与 UI、连接测试、OCR 恢复流程、模型/本地化与单元测试覆盖,及与现有 VLM 路由/默认值的联动改造。(50 字内) Changes
Sequence Diagram(s)sequenceDiagram
participant UI as 用户/设置 UI
participant SettingsVM as SettingsViewModel
participant AppSettings as AppSettings
participant ScreenCoderEngine as ScreenCoderEngine
participant GLMOCRProvider as GLMOCRVLMProvider
participant Network as 网络
UI->>SettingsVM: 切换 glmOCRMode (cloud ↔ local)
SettingsVM->>AppSettings: 保存 glmOCRMode
AppSettings-->>SettingsVM: 返回已解析的 baseURL/modelName
UI->>SettingsVM: 发起 “测试 VLM 连接”
SettingsVM->>ScreenCoderEngine: testVLMAPI()
ScreenCoderEngine->>GLMOCRProvider: 创建 GLMOCRVLMProvider (含 mode)
GLMOCRProvider->>Network: 发送请求 (/layout_parsing 或 /chat/completions)
alt cloud
Network-->>GLMOCRProvider: 返回 layout_parsing 响应
GLMOCRProvider->>GLMOCRProvider: 解析 layout -> ScreenAnalysisResult
else local
Network-->>GLMOCRProvider: 返回 chat 响应
GLMOCRProvider->>GLMOCRProvider: 抽取/解析 JSON 或 纯文本回退 -> ScreenAnalysisResult
end
GLMOCRProvider-->>ScreenCoderEngine: 返回 成功/失败
ScreenCoderEngine-->>SettingsVM: 转发 测试结果
sequenceDiagram
participant TranslationFlow as TranslationFlowController
participant ScreenCoderEngine as ScreenCoderEngine
participant Analysis as ScreenAnalysisResult
participant OCRService as OCRService
TranslationFlow->>ScreenCoderEngine: analyze(image)
ScreenCoderEngine-->>TranslationFlow: analysisResult
TranslationFlow->>Analysis: containsOnlyPromptLeakage?
alt only prompt leakage
TranslationFlow->>OCRService: recognize(image) (OCR 回退)
OCRService-->>TranslationFlow: ocrResult
TranslationFlow->>Analysis: init(ocrResult) -> recoveredResult
TranslationFlow->>TranslationFlow: 使用 recoveredResult 继续
else has valid text
TranslationFlow->>TranslationFlow: 使用 原始 analysisResult
end
TranslationFlow->>Analysis: filteredForTranslation()
Analysis-->>TranslationFlow: filteredResult (可能为空 -> 抛错)
Estimated code review effort🎯 4 (复杂) | ⏱️ ~75 minutes Possibly related PRs
诗句
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 537c337f13
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case .cloud: | ||
| request = try GLMOCRVLMProvider.makeLayoutParsingRequest( | ||
| baseURL: baseURL, | ||
| apiKey: apiKey, | ||
| modelName: modelName, | ||
| fileDataURI: GLMOCRVLMProvider.connectionTestImageDataURI, | ||
| timeout: 10 |
There was a problem hiding this comment.
Avoid charging users when they press the GLM cloud test button
In cloud mode, testGLMOCRConnection builds a real layout_parsing request with connectionTestImageDataURI instead of using a lightweight capability check. That means every click on “Test API” performs an OCR inference, so users who are just validating credentials can still consume quota, incur cost, or hit rate limits; the other providers’ tests only probe metadata endpoints.
Useful? React with 👍 / 👎.
| if let modelsResponse = try? decoder.decode(GLMOCRLocalModelsResponse.self, from: data), | ||
| !modelsResponse.data.contains(where: { $0.id == modelName }) { | ||
| let availableModels = modelsResponse.data.map(\.id).joined(separator: ", ") | ||
| let message = availableModels.isEmpty ? "No models reported by local server." : "Available models: \(availableModels)" | ||
| return (true, message) |
There was a problem hiding this comment.
Fail the local GLM test when the configured model is missing
If the local server responds to /models but does not advertise modelName, this path still returns success: true, so the settings UI shows a green success state even though GLMOCRVLMProvider.analyze will later send that unavailable model to /chat/completions and fail on first use. This affects any local setup where the loaded model ID differs from the saved setting.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (7)
ScreenTranslate/Models/AppLanguage.swift (1)
74-91: 考虑处理不带脚本/地区的zh语言代码。当前实现中,纯
zh标识符不会匹配任何语言用例,因为:
- 不满足特殊的简体中文检查(需要
zh-hans前缀或精确匹配zh-cn/zh-sg)- 在通用匹配中也不会命中(检查的是
zh-hans)某些系统可能仅返回
zh作为首选语言,这会导致意外回退到英语。♻️ 可选的修复方案
static func from(localeIdentifier: String) -> AppLanguage? { let normalizedIdentifier = localeIdentifier.replacingOccurrences(of: "_", with: "-").lowercased() if normalizedIdentifier.hasPrefix("zh-hans") || normalizedIdentifier == "zh-cn" - || normalizedIdentifier == "zh-sg" { + || normalizedIdentifier == "zh-sg" + || normalizedIdentifier == "zh" { return .simplifiedChinese }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Models/AppLanguage.swift` around lines 74 - 91, The from(localeIdentifier:) matcher currently skips plain "zh" so add a branch that treats "zh" (after normalizing via normalizedIdentifier) as simplified Chinese; specifically, update the conditional that returns .simplifiedChinese to also check normalizedIdentifier == "zh" (in addition to the existing zh-hans / zh-cn / zh-sg checks), so the function (and symbols normalizedIdentifier, from(localeIdentifier:), .simplifiedChinese, allCases, .system) will correctly map a bare "zh" locale to simplified Chinese.ScreenTranslate/Services/ScreenCoderEngine.swift (1)
173-188: 考虑合并多个MainActor.run调用以提高性能。当前
configurationHash()方法中有 5 个独立的await MainActor.run调用,每次调用都会产生一次 actor hop 开销。可以考虑将这些调用合并为一个,返回包含所有所需值的元组。♻️ 建议的优化
private func configurationHash() async -> Int { - let settings = await MainActor.run { AppSettings.shared } - - let providerType = await MainActor.run { settings.vlmProvider } - let apiKey = await MainActor.run { settings.vlmAPIKey } - let baseURL = await MainActor.run { settings.vlmBaseURL } - let modelName = await MainActor.run { settings.vlmModelName } - let glmOCRMode = await MainActor.run { settings.glmOCRMode } + let (providerType, apiKey, baseURL, modelName, glmOCRMode) = await MainActor.run { + let settings = AppSettings.shared + return (settings.vlmProvider, settings.vlmAPIKey, settings.vlmBaseURL, settings.vlmModelName, settings.glmOCRMode) + } var hasher = Hasher() hasher.combine(providerType) hasher.combine(glmOCRMode) hasher.combine(apiKey) hasher.combine(baseURL) hasher.combine(modelName) return hasher.finalize() }注意:
createProvider(for:)方法(第 132-137 行)也有类似的优化机会。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/ScreenCoderEngine.swift` around lines 173 - 188, The multiple await MainActor.run hops in configurationHash() (reading AppSettings.shared.vlmProvider, vlmAPIKey, vlmBaseURL, vlmModelName, glmOCRMode) should be collapsed into a single await MainActor.run that reads and returns all needed values (e.g., a tuple or small struct) and then use those locals to feed the Hasher; do the same optimization in createProvider(for:) by reading all required AppSettings.shared properties with one MainActor.run call to avoid repeated actor hops.ScreenTranslate/Services/GLMOCRVLMProvider.swift (2)
131-136: 移除集合字面量中的尾随逗号。SwiftLint 标记了第 133 行和第 135 行的尾随逗号问题。
♻️ 修复尾随逗号
.text(VLMPromptTemplate.localModelUserPrompt + "\nReturn only valid JSON."), - .imageURL(GLMOCRLocalImageURL(url: fileDataURI)), - ]) + .imageURL(GLMOCRLocalImageURL(url: fileDataURI)) + ] ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 131 - 136, Remove the trailing commas inside the array passed to .vision: specifically drop the comma after .text(VLMPromptTemplate.localModelUserPrompt + "\nReturn only valid JSON.") and the comma after .imageURL(GLMOCRLocalImageURL(url: fileDataURI)) so the collection literal has no trailing commas; update the array items in the .vision([ ... ]) call accordingly.
17-20: 静态 URL 常量使用强制解包。虽然这些 URL 字符串是有效的字面量,强制解包在此处是安全的,但 SwiftLint 标记了这个问题。如果想消除警告,可以使用以下方式:
♻️ 可选的改进方案
- static let defaultBaseURL = URL(string: "https://open.bigmodel.cn/api/paas/v4")! - static let defaultModel = "glm-ocr" - static let defaultLocalBaseURL = URL(string: "http://127.0.0.1:18081")! + // swiftlint:disable:next force_unwrapping + static let defaultBaseURL = URL(string: "https://open.bigmodel.cn/api/paas/v4")! + static let defaultModel = "glm-ocr" + // swiftlint:disable:next force_unwrapping + static let defaultLocalBaseURL = URL(string: "http://127.0.0.1:18081")!🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 17 - 20, Replace the force-unwrapped URL initializers with safe, non-optional initializers using a closure that validates URL(string:) and calls fatalError on failure; specifically update static let defaultBaseURL and static let defaultLocalBaseURL (and any similar constants like defaultModel stays unchanged) to use a closure form such as: static let defaultBaseURL: URL = { guard let url = URL(string: "https://open.bigmodel.cn/api/paas/v4") else { fatalError("Invalid URL literal for defaultBaseURL") } return url }() and do the same pattern for defaultLocalBaseURL to satisfy SwiftLint without changing runtime behavior.ScreenTranslateTests/TranslationPipelineRegressionTests.swift (1)
34-35: 移除集合字面量中的尾随逗号。SwiftLint 标记了第 34 行和第 49 行的尾随逗号问题。虽然这不影响功能,但为了保持代码风格一致性,建议移除。
♻️ 修复尾随逗号
boundingBox: .zero, confidence: 0.99 - ), + ) ],同样适用于第 49 行:
confidence: 0.94 - ), + ) ],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslateTests/TranslationPipelineRegressionTests.swift` around lines 34 - 35, 在测试文件中有两个集合字面量包含尾随逗号(SwiftLint 报告的两个位置),请在相应的数组/字典字面量中删除多余的末尾逗号以符合代码风格;定位到声明集合的那两处(测试用例中的字面量列表)并移除末尾的“,”,随后运行 SwiftLint/测试确保没有格式警告或编译问题。ScreenTranslate/Services/TranslationService.swift (1)
404-409:testConnection中使用try?会吞掉所有错误信息。当
resolvedProvider(for:)失败时,虽然记录了日志,但调用方无法区分是配置错误、网络问题还是其他原因导致的连接测试失败。这可能会降低用户体验,因为用户只会看到测试失败,而不知道具体原因。建议考虑保留更详细的错误信息以便于调试或向用户显示:
♻️ 可选的改进方案
- guard let provider = try? await resolvedProvider(for: engine) else { - logger.error("Failed to resolve provider for \(engine.rawValue)") + let provider: any TranslationProvider + do { + provider = try await resolvedProvider(for: engine) + } catch { + logger.error("Failed to resolve provider for \(engine.rawValue): \(error.localizedDescription)") return false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/TranslationService.swift` around lines 404 - 409, The current testConnection implementation swallows errors by using try? when calling resolvedProvider(for:), losing error details; change it to use a do { let provider = try await resolvedProvider(for: engine); return await provider.checkConnection() } catch { logger.error("Failed to resolve provider for \(engine.rawValue): \(error)"); either rethrow the error (make testConnection throw) or return a Result/throwable failure so callers get the real error information — update the testConnection signature and callers accordingly, and ensure resolvedProvider(for:) and provider.checkConnection() errors are preserved and logged.ScreenTranslate/Features/Settings/SettingsViewModel.swift (1)
1102-1109: 本地模式下模型未找到时的返回行为可能造成混淆。当配置的模型名称在本地服务器中不存在时,方法返回
success = true并显示可用模型列表。虽然这提供了有用的信息,但返回"成功"可能会误导用户认为配置是正确的。建议考虑在模型未找到时返回警告状态或更明确的消息:
♻️ 建议的改进方案
if mode == .local { let decoder = JSONDecoder() if let modelsResponse = try? decoder.decode(GLMOCRLocalModelsResponse.self, from: data), !modelsResponse.data.contains(where: { $0.id == modelName }) { let availableModels = modelsResponse.data.map(\.id).joined(separator: ", ") let message = availableModels.isEmpty ? "No models reported by local server." : "Available models: \(availableModels)" - return (true, message) + return (false, "Model '\(modelName)' not found. \(message)") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 1102 - 1109, The current logic in SettingsViewModel (in the local mode branch decoding GLMOCRLocalModelsResponse) returns (true, message) when the configured modelName is not found, which is misleading; change it to indicate failure or a warning instead of success — e.g., return (false, "Model '\(modelName)' not found. " + (availableModels.isEmpty ? "No models reported by local server." : "Available models: \(availableModels)")) or update the method to return a distinct status/warning enum and a clear message so callers can distinguish "model missing" from a successful validation; update any callers of this method to handle the new false/warning status accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Models/AppSettings.swift`:
- Line 131: The persistence keys for GLM OCR endpoints/models are not separated
by mode, so switching glmOCRMode causes cloud/local settings to overwrite each
other; update AppSettings to define and use distinct keys (e.g., add keys like
glmCloudBaseURL, glmCloudModelName and glmLocalBaseURL, glmLocalModelName) and
change the getters/setters that currently use vlmBaseURL/vlmModelName to
read/write the mode-specific keys based on glmOCRMode; also update the restore
logic referenced in SettingsViewModel (around the restore/default backfill) to
fetch/backfill from the correct mode-specific keys so cloud and local
endpoints/models persist independently.
In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 202-206: The filteredForTranslation() implementation in
ScreenAnalysisResult currently falls back to the original segments when
filtering yields no results; remove that fallback so it returns an empty
ScreenAnalysisResult(segments: [], imageSize: imageSize) when all segments are
classified as noise, allowing TranslationFlowController (which expects an empty
result to throw noTextFound) to work correctly—if false positives are a concern,
refine the noise detection predicate (isLikelyTranslationNoise) instead of
restoring the original segments.
In `@ScreenTranslate/Resources/de.lproj/Localizable.strings`:
- Line 640: Die deutschsprachige Beschreibung für den String-Schlüssel
"vlm.provider.glmocr.description" enthält noch den englischen Ausdruck "layout
parsing"; ersetze diesen im Wert durch eine passende deutsche Formulierung (z.
B. "Layout-Analyse" oder "Layout-Parsing" je nach Projektkonvention) so die
komplette Beschreibung konsistent auf Deutsch ist, und speichere die
aktualisierte Zeichenfolge unter dem Schlüssel
"vlm.provider.glmocr.description".
In `@ScreenTranslate/Resources/it.lproj/Localizable.strings`:
- Line 640: The Italian translation for the key
"vlm.provider.glmocr.description" contains the untranslated English phrase
"layout parsing"; replace it with a fully localized Italian phrase (e.g.,
"analisi del layout" or another consistent term used by adjacent provider
descriptions) so the value reads entirely in Italian and matches the style of
neighboring provider strings; update the value for
"vlm.provider.glmocr.description" accordingly in Localizable.strings.
In `@ScreenTranslateTests/GLMOCRVLMProviderTests.swift`:
- Around line 56-59: Add a safety check before indexing result.segments: assert
the expected count (e.g., XCTAssertEqual(result.segments.count, 2)) and then
safely unwrap the elements (use guard with return or XCTUnwrap) before accessing
segments[0] / segments[1]; update the assertions that call assertRect on
segments[0].boundingBox and segments[1].boundingBox to use the safely unwrapped
variables so an earlier assertion failure prevents an index-out-of-bounds crash
in GLMOCRVLMProviderTests.swift.
---
Nitpick comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 1102-1109: The current logic in SettingsViewModel (in the local
mode branch decoding GLMOCRLocalModelsResponse) returns (true, message) when the
configured modelName is not found, which is misleading; change it to indicate
failure or a warning instead of success — e.g., return (false, "Model
'\(modelName)' not found. " + (availableModels.isEmpty ? "No models reported by
local server." : "Available models: \(availableModels)")) or update the method
to return a distinct status/warning enum and a clear message so callers can
distinguish "model missing" from a successful validation; update any callers of
this method to handle the new false/warning status accordingly.
In `@ScreenTranslate/Models/AppLanguage.swift`:
- Around line 74-91: The from(localeIdentifier:) matcher currently skips plain
"zh" so add a branch that treats "zh" (after normalizing via
normalizedIdentifier) as simplified Chinese; specifically, update the
conditional that returns .simplifiedChinese to also check normalizedIdentifier
== "zh" (in addition to the existing zh-hans / zh-cn / zh-sg checks), so the
function (and symbols normalizedIdentifier, from(localeIdentifier:),
.simplifiedChinese, allCases, .system) will correctly map a bare "zh" locale to
simplified Chinese.
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 131-136: Remove the trailing commas inside the array passed to
.vision: specifically drop the comma after
.text(VLMPromptTemplate.localModelUserPrompt + "\nReturn only valid JSON.") and
the comma after .imageURL(GLMOCRLocalImageURL(url: fileDataURI)) so the
collection literal has no trailing commas; update the array items in the
.vision([ ... ]) call accordingly.
- Around line 17-20: Replace the force-unwrapped URL initializers with safe,
non-optional initializers using a closure that validates URL(string:) and calls
fatalError on failure; specifically update static let defaultBaseURL and static
let defaultLocalBaseURL (and any similar constants like defaultModel stays
unchanged) to use a closure form such as: static let defaultBaseURL: URL = {
guard let url = URL(string: "https://open.bigmodel.cn/api/paas/v4") else {
fatalError("Invalid URL literal for defaultBaseURL") } return url }() and do the
same pattern for defaultLocalBaseURL to satisfy SwiftLint without changing
runtime behavior.
In `@ScreenTranslate/Services/ScreenCoderEngine.swift`:
- Around line 173-188: The multiple await MainActor.run hops in
configurationHash() (reading AppSettings.shared.vlmProvider, vlmAPIKey,
vlmBaseURL, vlmModelName, glmOCRMode) should be collapsed into a single await
MainActor.run that reads and returns all needed values (e.g., a tuple or small
struct) and then use those locals to feed the Hasher; do the same optimization
in createProvider(for:) by reading all required AppSettings.shared properties
with one MainActor.run call to avoid repeated actor hops.
In `@ScreenTranslate/Services/TranslationService.swift`:
- Around line 404-409: The current testConnection implementation swallows errors
by using try? when calling resolvedProvider(for:), losing error details; change
it to use a do { let provider = try await resolvedProvider(for: engine); return
await provider.checkConnection() } catch { logger.error("Failed to resolve
provider for \(engine.rawValue): \(error)"); either rethrow the error (make
testConnection throw) or return a Result/throwable failure so callers get the
real error information — update the testConnection signature and callers
accordingly, and ensure resolvedProvider(for:) and provider.checkConnection()
errors are preserved and logged.
In `@ScreenTranslateTests/TranslationPipelineRegressionTests.swift`:
- Around line 34-35: 在测试文件中有两个集合字面量包含尾随逗号(SwiftLint
报告的两个位置),请在相应的数组/字典字面量中删除多余的末尾逗号以符合代码风格;定位到声明集合的那两处(测试用例中的字面量列表)并移除末尾的“,”,随后运行
SwiftLint/测试确保没有格式警告或编译问题。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2e72d1ab-a57a-4921-8861-3fd06333a02d
📒 Files selected for processing (32)
ScreenTranslate.xcodeproj/project.pbxprojScreenTranslate/Features/Onboarding/OnboardingViewModel.swiftScreenTranslate/Features/Settings/EngineSettingsTab.swiftScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Features/TextTranslation/TextTranslationPopupWindow.swiftScreenTranslate/Features/TranslationFlow/TranslationFlowController.swiftScreenTranslate/Models/AppLanguage.swiftScreenTranslate/Models/AppSettings.swiftScreenTranslate/Models/ScreenAnalysisResult.swiftScreenTranslate/Models/VLMProviderType.swiftScreenTranslate/Resources/de.lproj/Localizable.stringsScreenTranslate/Resources/en.lproj/Localizable.stringsScreenTranslate/Resources/es.lproj/Localizable.stringsScreenTranslate/Resources/fr.lproj/Localizable.stringsScreenTranslate/Resources/it.lproj/Localizable.stringsScreenTranslate/Resources/ja.lproj/Localizable.stringsScreenTranslate/Resources/ko.lproj/Localizable.stringsScreenTranslate/Resources/pt.lproj/Localizable.stringsScreenTranslate/Resources/ru.lproj/Localizable.stringsScreenTranslate/Resources/zh-Hans.lproj/Localizable.stringsScreenTranslate/Services/AppleTranslationProvider.swiftScreenTranslate/Services/GLMOCRVLMProvider.swiftScreenTranslate/Services/ScreenCoderEngine.swiftScreenTranslate/Services/Translation/Providers/CompatibleTranslationProvider.swiftScreenTranslate/Services/Translation/Providers/LLMTranslationProvider.swiftScreenTranslate/Services/TranslationEngine.swiftScreenTranslate/Services/TranslationService.swiftScreenTranslateTests/GLMOCRVLMProviderTests.swiftScreenTranslateTests/KeyboardShortcutTests.swiftScreenTranslateTests/ScreenTranslateErrorTests.swiftScreenTranslateTests/ShortcutRecordingTypeTests.swiftScreenTranslateTests/TranslationPipelineRegressionTests.swift
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d3676a439
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| analysisResult = try await Self.recoverAnalysisResultIfNeeded(initialAnalysis) { | ||
| try await OCRService.shared.recognize(image) |
There was a problem hiding this comment.
Use a fixed OCR fallback for prompt-leak recovery
This only runs when recoverAnalysisResultIfNeeded detects prompt/schema leakage, but the fallback now goes through OCRService.shared.recognize, which honors the user’s selected OCR engine. If that engine is paddleOCR and PaddleOCR is unavailable, the recovery path throws engineNotAvailable and the whole translation fails instead of falling back gracefully. A GLM-specific parsing issue should not become a hard failure because of an unrelated OCR-engine setting.
Useful? React with 👍 / 👎.
| .filter { line in | ||
| guard !line.isEmpty else { return false } | ||
| if line == "```" || line.hasPrefix("```") { return false } | ||
| if ["{", "}", "[", "]"].contains(line) { return false } | ||
| return true |
There was a problem hiding this comment.
Filter schema tokens from GLM plain-text fallback
This fallback is entered precisely when the local GLM server did not return valid JSON. Unlike OpenAIVLMProvider.parsePlainTextResponse, it only strips code fences and lone braces, so pseudo-JSON lines like "segments": [ or "boundingBox": { are converted into real TextSegments. In mixed malformed responses, users will still get translated prompt/schema garbage overlaid on the screenshot instead of just the recovered text.
Useful? React with 👍 / 👎.
| case 404, 405: | ||
| if mode == .cloud { | ||
| return (true, "GLM OCR cloud endpoint is reachable.") | ||
| } |
There was a problem hiding this comment.
Fail the GLM cloud test on 404/405 responses
In cloud mode the settings test probes baseURL/models, but real GLM OCR requests later go to baseURL/layout_parsing. Returning success on any 404/405 means a mistyped base URL or proxy that merely lacks /models still shows a green test result even though every actual analysis call will fail. This makes the new cloud-mode validation misleading for setup and troubleshooting.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
ScreenTranslate/Features/Settings/SettingsViewModel.swift (1)
334-357:⚠️ Potential issue | 🟠 Major这里还会继承上一个 provider 的自定义 URL / model。
shouldUpdateBaseURL/shouldUpdateModel仍然只在“当前值为空或等于旧 provider 默认值”时才回填。只要用户之前给 OpenAI/Claude 配过自定义 endpoint/model,切到.glmOCR后就会继续拿那套配置;ScreenTranslate/Services/ScreenCoderEngine.swift:137-153随后会用这些旧值创建 GLM provider,直接命中错误 endpoint/model。切换 provider 时应该切到目标 provider 的持久化/default 配置,而不是复用上一个 provider 的自定义值。💡 最小修复示例
- let previousProvider = settings.vlmProvider - let previousMode = settings.glmOCRMode - let shouldUpdateBaseURL = vlmBaseURL.isEmpty || vlmBaseURL == previousProvider.defaultBaseURL(glmOCRMode: previousMode) - let shouldUpdateModel = vlmModelName.isEmpty || vlmModelName == previousProvider.defaultModelName(glmOCRMode: previousMode) - settings.vlmProvider = newValue - - if shouldUpdateBaseURL { - if newValue == .glmOCR { - vlmBaseURL = settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) - ?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) - } else { - vlmBaseURL = newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) - } - } - - if shouldUpdateModel { - if newValue == .glmOCR { - vlmModelName = settings.storedGLMOCRModelName(for: settings.glmOCRMode) - ?? newValue.defaultModelName(glmOCRMode: settings.glmOCRMode) - } else { - vlmModelName = newValue.defaultModelName(glmOCRMode: settings.glmOCRMode) - } - } + vlmBaseURL = newValue == .glmOCR + ? (settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) + ?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode)) + : newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) + + vlmModelName = newValue == .glmOCR + ? (settings.storedGLMOCRModelName(for: settings.glmOCRMode) + ?? newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)) + : newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)如果后续要保留各 provider 的自定义值,建议再补按 provider 分桶的持久化。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift` around lines 334 - 357, 当前逻辑仍然可能在切换到新 provider 时沿用上一个 provider 的自定义 vlmBaseURL/vlmModelName(shouldUpdateBaseURL/shouldUpdateModel 以旧 provider 的默认值为准),应改为在 settings.vlmProvider 变更时直接加载目标 provider 的持久化或默认配置。具体修改:在 SettingsViewModel 中处理 settings.vlmProvider = newValue 后,基于 newValue(而非 previousProvider/previousMode)计算是否回填,并在 shouldUpdateBaseURL/shouldUpdateModel 为 true 时将 vlmBaseURL = settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) ?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) 和 vlmModelName = settings.storedGLMOCRModelName(for: settings.glmOCRMode) ?? newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)(或直接无条件覆盖为目标 provider 的持久化/默认值),确保不复用旧 provider 的自定义 endpoint/model(参考 symbols: settings.vlmProvider, previousProvider, shouldUpdateBaseURL, shouldUpdateModel, vlmBaseURL, vlmModelName, storedGLMOCRBaseURL, storedGLMOCRModelName, defaultBaseURL, defaultModelName)。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 1068-1125: The connectivity check in testGLMOCRConnection is too
permissive: cloud treats 404/405 as success and local treats any 200 as success
even if /models JSON can't be decoded. Update testGLMOCRConnection to validate
the response shape for both modes (decode GLMOCRLocalModelsResponse for local
and verify expected cloud response schema) or, preferably, perform the same
lightweight probe used by GLMOCRVLMProvider (e.g., call the actual parsing
endpoint like /layout_parsing or a minimal /chat/completions request that the
provider uses) and assert the returned status/JSON matches the provider's
expected format; keep function name testGLMOCRConnection and reuse
GLMOCRVLMProvider request construction/expected response checks so the UI only
shows success when the real service will work.
In `@ScreenTranslate/Models/AppSettings.swift`:
- Around line 597-601: resetToDefaults() currently only resets
vlmBaseURL/vlmModelName and leaves glmOCRCloudBaseURL, glmOCRCloudModelName,
glmOCRLocalBaseURL, glmOCRLocalModelName untouched; update resetToDefaults() to
also clear or set those four properties back to their default values (matching
VLMProviderType.glmOCR defaults or empty as intended) so that when vlmProvider
is set to .glmOCR later, SettingsViewModel won't restore stale endpoint/model
values; locate resetToDefaults() and the properties glmOCRCloudBaseURL,
glmOCRCloudModelName, glmOCRLocalBaseURL, glmOCRLocalModelName to apply the
changes.
In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 202-204: The filteredForTranslation() in ScreenAnalysisResult
currently filters only segments where isLikelyTranslationNoise is true; update
it to also exclude segments flagged by isLikelyOCRPromptLeakage so prompt/schema
fragments are not passed into the translation flow (see
TranslationFlowController logic at lines ~305-316); locate the
filteredForTranslation() method and change the filter predicate to require both
!segment.isLikelyTranslationNoise and !segment.isLikelyOCRPromptLeakage so only
real text segments reach translation.
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 164-170: The error currently embeds the raw OCR/model response
into the VLMProviderError.parsingFailed message (in the
GLMOCRLayoutParsingResponse decode catch), which can leak sensitive screenshot
text; remove the raw payload from the thrown error and only include the decoding
error details (e.g., error.localizedDescription). If you need raw data for
debugging, log it only behind a debug flag or to a restricted debug logger with
explicit redaction. Apply the same change to the other catch block referenced
(the similar decode around lines 193–199) so neither throws include raw response
content.
---
Duplicate comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 334-357: 当前逻辑仍然可能在切换到新 provider 时沿用上一个 provider 的自定义
vlmBaseURL/vlmModelName(shouldUpdateBaseURL/shouldUpdateModel 以旧 provider
的默认值为准),应改为在 settings.vlmProvider 变更时直接加载目标 provider 的持久化或默认配置。具体修改:在
SettingsViewModel 中处理 settings.vlmProvider = newValue 后,基于 newValue(而非
previousProvider/previousMode)计算是否回填,并在 shouldUpdateBaseURL/shouldUpdateModel 为
true 时将 vlmBaseURL = settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) ??
newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) 和 vlmModelName =
settings.storedGLMOCRModelName(for: settings.glmOCRMode) ??
newValue.defaultModelName(glmOCRMode: settings.glmOCRMode)(或直接无条件覆盖为目标 provider
的持久化/默认值),确保不复用旧 provider 的自定义 endpoint/model(参考 symbols: settings.vlmProvider,
previousProvider, shouldUpdateBaseURL, shouldUpdateModel, vlmBaseURL,
vlmModelName, storedGLMOCRBaseURL, storedGLMOCRModelName, defaultBaseURL,
defaultModelName)。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cccc4dd1-c6ad-4931-a80e-7399d74596f0
📒 Files selected for processing (11)
ScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Models/AppLanguage.swiftScreenTranslate/Models/AppSettings.swiftScreenTranslate/Models/ScreenAnalysisResult.swiftScreenTranslate/Resources/de.lproj/Localizable.stringsScreenTranslate/Resources/it.lproj/Localizable.stringsScreenTranslate/Services/GLMOCRVLMProvider.swiftScreenTranslate/Services/ScreenCoderEngine.swiftScreenTranslate/Services/TranslationService.swiftScreenTranslateTests/GLMOCRVLMProviderTests.swiftScreenTranslateTests/TranslationPipelineRegressionTests.swift
✅ Files skipped from review due to trivial changes (3)
- ScreenTranslate/Resources/de.lproj/Localizable.strings
- ScreenTranslateTests/GLMOCRVLMProviderTests.swift
- ScreenTranslate/Resources/it.lproj/Localizable.strings
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a770e49e6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| vlmBaseURL = newValue == .glmOCR | ||
| ? (settings.storedGLMOCRBaseURL(for: settings.glmOCRMode) | ||
| ?? newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode)) | ||
| : newValue.defaultBaseURL(glmOCRMode: settings.glmOCRMode) |
There was a problem hiding this comment.
Preserve custom provider settings when switching VLMs
This setter now overwrites vlmBaseURL unconditionally, and the same pattern immediately below does it for vlmModelName. Before this change, custom endpoints/models survived provider toggles unless the field still held the old default; now a user who configured a custom OpenAI/Claude/Ollama endpoint loses it as soon as they inspect another provider. Because AppSettings only has GLM-specific restore keys (storedGLMOCRBaseURL/ModelName), switching back can only recover the built-in defaults, so the previous configuration is permanently clobbered.
Useful? React with 👍 / 👎.
| guard analysisResult.containsOnlyPromptLeakage else { | ||
| return analysisResult |
There was a problem hiding this comment.
Trigger OCR recovery when leakage leaves no translatable text
This fallback only runs when every segment matches isLikelyOCRPromptLeakage. In practice malformed GLM replies can mix leaked schema lines with one numeric/date fragment like 2024 or 25%; containsOnlyPromptLeakage is false here, but ScreenAnalysisResult.filteredForTranslation() later strips both the leakage and the noise-only fragment, and performTranslation turns that into .noTextFound. So the new prompt-leak recovery still misses common mixed outputs instead of falling back to OCR.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Features/Settings/SettingsViewModel.swift`:
- Around line 334-344: When changing the VLM provider or mode (the setter that
assigns settings.vlmProvider and updates vlmBaseURL and vlmModelName, and the
similar block later), reset any previous connectivity state by clearing
vlmTestResult and setting vlmTestSuccess = nil/false (or the viewmodel's neutral
state) so old success indicators don't persist when switching provider/mode;
locate the setters that update vlmBaseURL and vlmModelName and add explicit
resets for vlmTestResult and vlmTestSuccess immediately after those assignments.
In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 209-212: containsOnlyPromptLeakage currently checks raw segments
with segments.allSatisfy(\.isLikelyOCRPromptLeakage), which is short-circuited
by ordinary translation-noise segments and prevents downstream OCR fallback;
change the predicate to operate on the translation-filtered set (use
filteredForTranslation() or segments.filter { !$0.isLikelyTranslationNoise })
and then test allSatisfy(\.isLikelyOCRPromptLeakage) so that noise is ignored
when deciding if the result contains only prompt/schema leakage (update the
containsOnlyPromptLeakage computed property).
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 127-130: 当前实现会在任何非空 apiKey 下向请求添加 Authorization,从而在 local 模式下仍把共享的
settings.vlmAPIKey 转发到本地/兼容端点;请修改 GLMOCRVLMProvider 的逻辑,在设置 Authorization
之前检查运行模式(例如通过 configuration.mode 或新增布尔标识如 configuration.isLocal/isLocalMode),仅在非
local 模式且 trimmedAPIKey 非空时才调用 request.setValue("Bearer \(trimmedAPIKey)",
forHTTPHeaderField: "Authorization"); 同时确认 ScreenCoderEngine 在创建 provider/传入
settings.vlmAPIKey 时不需要改动,或在调用处传递能够让 provider 判断为 local 的标识以避免在本地泄露云端密钥。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f4957d58-cdce-483d-8160-665ca376eae8
📒 Files selected for processing (4)
ScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Models/AppSettings.swiftScreenTranslate/Models/ScreenAnalysisResult.swiftScreenTranslate/Services/GLMOCRVLMProvider.swift
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ScreenTranslate/Models/ScreenAnalysisResult.swift (1)
107-152:isLikelyOCRPromptLeakage实现清晰,检测逻辑合理。强信号/弱信号分层检测方案设计良好,能有效识别 VLM 意外返回的 prompt/schema 片段。
静态分析工具提示第 129 行和第 143 行存在尾随逗号,建议移除以符合 SwiftLint 规范:
🔧 建议移除尾随逗号
"返回json", - ] + ](第 129 行
"返回json",改为"返回json",第 143 行"归一化",改为"归一化")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Models/ScreenAnalysisResult.swift` around lines 107 - 152, Remove the trailing commas from the string arrays in the isLikelyOCRPromptLeakage property: edit the strongSignals array (remove the trailing comma after the final entry "返回json") and the weakSignals array (remove the trailing comma after the final entry "归一化") so both arrays end with the last string literal without a comma to satisfy SwiftLint/style rules.ScreenTranslate/Services/GLMOCRVLMProvider.swift (1)
114-147: 本地模式请求正确地不再转发云端 API Key。
makeLocalChatRequest没有设置Authorizationheader,这符合本地模式"不需要 API key"的设计,避免了将云端密钥意外发送到本地端点的风险。静态分析提示第 140 行存在尾随逗号,建议移除:
🔧 建议移除尾随逗号
), - ], + ],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 114 - 147, Remove the trailing comma after the last GLMOCRLocalChatMessage in the messages array inside makeLocalChatRequest; locate the messages: [...] array in the makeLocalChatRequest function and delete the extra comma following the final GLMOCRLocalChatMessage entry so the array closing bracket is immediately followed by the next property (maxTokens).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 234-250: The parseAPIErrorMessage function currently returns the
raw response body as a UTF-8 string on fallback, which can leak sensitive OCR
content; update parseAPIErrorMessage(from:mode:) so that after attempting to
decode GLMOCRAPIErrorResponse and GLMOCRLocalErrorResponse it does NOT return
the raw Data, but instead returns a safe generic fallback (e.g. "Unknown API
error" or nil) or a sanitized/trimmed brief message; keep the
structured-decoding logic for .cloud and .local (using GLMOCRAPIErrorResponse,
GLMOCRLocalErrorResponse) but replace the final String(data:encoding:) return
with the safe fallback to avoid exposing raw response contents.
---
Nitpick comments:
In `@ScreenTranslate/Models/ScreenAnalysisResult.swift`:
- Around line 107-152: Remove the trailing commas from the string arrays in the
isLikelyOCRPromptLeakage property: edit the strongSignals array (remove the
trailing comma after the final entry "返回json") and the weakSignals array (remove
the trailing comma after the final entry "归一化") so both arrays end with the last
string literal without a comma to satisfy SwiftLint/style rules.
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift`:
- Around line 114-147: Remove the trailing comma after the last
GLMOCRLocalChatMessage in the messages array inside makeLocalChatRequest; locate
the messages: [...] array in the makeLocalChatRequest function and delete the
extra comma following the final GLMOCRLocalChatMessage entry so the array
closing bracket is immediately followed by the next property (maxTokens).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a89e54d3-ddf2-457b-8862-75d9b830be42
📒 Files selected for processing (3)
ScreenTranslate/Features/Settings/SettingsViewModel.swiftScreenTranslate/Models/ScreenAnalysisResult.swiftScreenTranslate/Services/GLMOCRVLMProvider.swift
| private func parseAPIErrorMessage(from data: Data, mode: GLMOCRMode) -> String? { | ||
| let decoder = JSONDecoder() | ||
| decoder.keyDecodingStrategy = .convertFromSnakeCase | ||
| switch mode { | ||
| case .cloud: | ||
| if let response = try? decoder.decode(GLMOCRAPIErrorResponse.self, from: data), | ||
| let message = response.error.message, | ||
| !message.isEmpty { | ||
| return message | ||
| } | ||
| case .local: | ||
| if let response = try? decoder.decode(GLMOCRLocalErrorResponse.self, from: data) { | ||
| return response.error.message | ||
| } | ||
| } | ||
| return String(data: data, encoding: .utf8) | ||
| } |
There was a problem hiding this comment.
parseAPIErrorMessage 的 fallback 分支可能泄露敏感信息。
第 249 行在无法解析结构化错误时,直接将原始响应数据作为字符串返回。如果响应体包含 OCR 识别的屏幕内容或其他敏感信息,这些内容可能会通过错误消息暴露给日志或 UI。
💡 建议修改
}
- return String(data: data, encoding: .utf8)
+ // 不返回原始响应内容,避免泄露敏感数据
+ return nil
}或者在调用处提供一个通用的 fallback 消息,而不是使用原始响应。
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private func parseAPIErrorMessage(from data: Data, mode: GLMOCRMode) -> String? { | |
| let decoder = JSONDecoder() | |
| decoder.keyDecodingStrategy = .convertFromSnakeCase | |
| switch mode { | |
| case .cloud: | |
| if let response = try? decoder.decode(GLMOCRAPIErrorResponse.self, from: data), | |
| let message = response.error.message, | |
| !message.isEmpty { | |
| return message | |
| } | |
| case .local: | |
| if let response = try? decoder.decode(GLMOCRLocalErrorResponse.self, from: data) { | |
| return response.error.message | |
| } | |
| } | |
| return String(data: data, encoding: .utf8) | |
| } | |
| private func parseAPIErrorMessage(from data: Data, mode: GLMOCRMode) -> String? { | |
| let decoder = JSONDecoder() | |
| decoder.keyDecodingStrategy = .convertFromSnakeCase | |
| switch mode { | |
| case .cloud: | |
| if let response = try? decoder.decode(GLMOCRAPIErrorResponse.self, from: data), | |
| let message = response.error.message, | |
| !message.isEmpty { | |
| return message | |
| } | |
| case .local: | |
| if let response = try? decoder.decode(GLMOCRLocalErrorResponse.self, from: data) { | |
| return response.error.message | |
| } | |
| } | |
| // 不返回原始响应内容,避免泄露敏感数据 | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ScreenTranslate/Services/GLMOCRVLMProvider.swift` around lines 234 - 250, The
parseAPIErrorMessage function currently returns the raw response body as a UTF-8
string on fallback, which can leak sensitive OCR content; update
parseAPIErrorMessage(from:mode:) so that after attempting to decode
GLMOCRAPIErrorResponse and GLMOCRLocalErrorResponse it does NOT return the raw
Data, but instead returns a safe generic fallback (e.g. "Unknown API error" or
nil) or a sanitized/trimmed brief message; keep the structured-decoding logic
for .cloud and .local (using GLMOCRAPIErrorResponse, GLMOCRLocalErrorResponse)
but replace the final String(data:encoding:) return with the safe fallback to
avoid exposing raw response contents.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb2b6b8a37
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let endpoint = baseURL.appendingPathComponent("chat/completions") | ||
| var request = URLRequest(url: endpoint) | ||
| request.httpMethod = "POST" | ||
| request.timeoutInterval = timeout | ||
| request.setValue("application/json", forHTTPHeaderField: "Content-Type") |
There was a problem hiding this comment.
Forward optional API keys to local GLM requests
This breaks self-hosted GLM/OpenAI-compatible servers that still require auth (for example, a reverse proxy or vLLM instance with a bearer token): the settings UI exposes an API-key field for local mode, but makeLocalChatRequest never uses its apiKey argument to set Authorization. In that setup both testGLMOCRConnection and real analyze calls will fail with 401s even though the user entered valid credentials.
Useful? React with 👍 / 👎.
| do { | ||
| return try parseLocalContent(content).toScreenAnalysisResult(imageSize: fallbackImageSize) | ||
| } catch { | ||
| throw VLMProviderError.parsingFailed("Failed to parse local GLM OCR content: \(error.localizedDescription)") |
There was a problem hiding this comment.
Handle truncated local GLM completions before parsing
This shows up on dense screenshots/documents. makeLocalChatRequest limits local completions to 4096 tokens, but parseLocalResponse immediately treats the first message.content as final and never checks finish_reason or requests continuation. When the server stops with a length limit, local GLM will either throw on malformed JSON or return only a partial plain-text recovery, unlike OpenAIVLMProvider.analyzeWithContinuation and ClaudeVLMProvider which already continue truncated responses.
Useful? React with 👍 / 👎.
Summary
Testing
Summary by CodeRabbit
新功能
本地化
测试
修复