feat(wiki): 知识库处理失败的可见性改造(错误码链路 + 静默子步骤告警 + 跨KB失败中心)#437
Merged
Conversation
Wiki ingestion failures only reached the UI as free-text error_message — often a raw English exception, sometimes null — so users had to read the server logs to understand what went wrong. The pipeline already classifies failures into a stable vocabulary (WikiProcessingService#classifyErrorCode) but the code never left the job layer. - add error_code column to mate_wiki_raw_material (h2/mysql/kingbase V162) - persist the classified code on every failure transition; clear code + message on success via FieldStrategy.ALWAYS so a re-run starts clean - include errorCode in the RAW_FAILED SSE payload and the listRaw response - frontend: stop dropping the SSE error fields (a null message no longer leaves a blank "failed" badge); render a localized friendly hint keyed by errorCode, keeping the raw message as the hover tooltip for triage - i18n zh-CN / en-US error-code map - tests: classifyErrorCode vocabulary + end-to-end code propagation Foundation for mateaix#436. Silent sub-step failures (async embedding / entity extraction / vision) and the centralized cross-KB failure view are follow-ups.
…rnings Embedding and entity-graph extraction run async *after* a material is already marked completed/partial. When they failed the row stayed "completed" but was silently degraded (e.g. not semantically searchable), and the only trace was a server log line — exactly the "you can only see it in the logs" gap. - add warning_code / warning_message columns to mate_wiki_raw_material (h2/mysql/kingbase V163), mirroring the error_code/message pair - WikiRawMaterialService#recordWarning flags a degraded row without changing its status; clearFailureState() resets error + warning on every re-run path (required now that these columns are FieldStrategy.ALWAYS) - emit EMBEDDING_FAILED / ENTITY_EXTRACTION_FAILED on the eager and lazy ingest paths, persisted and pushed live via a new raw.warning SSE event - listRaw returns the warning fields - frontend: apply raw.warning live, render a localized ⚠ warning chip on otherwise-successful rows, raw text kept as the tooltip - i18n zh-CN / en-US warning-code map - tests: recordWarning / clearFailureState state machine + async embedding-failure surfaces a warning (not a failed status) Completes the foundation half of mateaix#436.
Wiki ingest is mostly background work, so a failure in one KB was invisible unless you happened to be on that KB's page. This adds an operator-facing, cross-KB view of everything needing attention (failed / partial / degraded), reusing the existing notification-center pattern (mirrors failedCrons). Backend - WikiRawMaterialMapper.countFailures / listFailures: one shared NEEDS_ATTENTION predicate (failed | partial | warning_code present), joined to the KB for display name + workspace - GET /wiki/admin/failures (platform-admin only — it spans every workspace) - NotificationSummary gains failedWikiJobs (admin-only, like stuckAgents) Frontend - useNotificationCenter + NotificationSummary carry failedWikiJobs - sidebar NavBadge on the Wiki nav item (admin) drives attention to it - WikiFailureCenter: a collapsible cross-KB list in the library view with localized friendly hints (reuses the errorCode/warningCode i18n maps) and one-click open into the owning KB - i18n zh-CN / en-US Tests: H2 E2E pins the NEEDS_ATTENTION predicate (failed/partial/warning in, clean/pending out) and the KB-name join. Implements the centralized-view half of mateaix#436.
…re center - wiki.md (zh/en): new "failure visibility" section — error_code vocabulary, non-blocking warnings, the full progress SSE event table (incl. raw.warning), and the cross-KB admin failure center + failedWikiJobs notification count - wiki.md raw_material row: note the new error/warning columns - api.md (zh/en): add GET /api/v1/wiki/admin/failures to the endpoint index
Owner
|
感谢贡献 🙏 错误码链路、 小建议(不阻塞,后续可顺手): |
This was referenced Jun 28, 2026
Contributor
Author
|
感谢 review 和合并!两个小建议已跟进,单独提了 #448:
纯风格清理,零行为变化 🙏 |
ncw1992120
added a commit
to ncw1992120/mateclaw
that referenced
this pull request
Jun 28, 2026
BLOCKERS: - prefix column VARCHAR(6) → VARCHAR(12) across all 3 migration dialects; KbApiKeyService.create() produces 8 chars (mck_ + 4 random), VARCHAR(6) would silently truncate on H2 and throw on MySQL strict mode - Rename migration V162 → V164 to avoid collision with merged mateaix#437 (V162=wiki_raw_material_error_code, V163=wiki_raw_material_warning) and fix stale V161 references in h2/kingbase comments NITS: - SecurityConfig/WebMvcConfig: replace inline FQN with import + simple name - parseScopes: add .map(String::trim) so ' kb:read' matches correctly - Remove ?token= SSE query fallback in KbOpenApiAuthFilter (P0-A has no SSE endpoint; key would leak into access/proxy logs — R5) - Move kb-open-api-design.md from repo root to rfcs/ (contains RFC-090 internal reference that would be exposed by sync-opensource) - KbApiKeyEntity Javadoc: 'first 4 chars' → 'first 8 chars (mck_ + 4)' to match actual behavior
mateaix
pushed a commit
that referenced
this pull request
Jun 28, 2026
…vadoc) Pure style cleanup, zero behavior change: replace inline FQN return type in WikiRawMaterialService.listFailures with an import + simple name, and translate the new WikiRawMaterialEntity field Javadocs to English.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #436.
目的
知识库(Wiki)消化大多是后台异步任务,出错时错误信息很难抵达用户:要么只在后台日志可见,要么是一串看不懂的英文异常;只要用户不在出错的那个 KB 页面,就完全无感。
本 PR 把"让知识库处理失败对用户可见"做成一个完整闭环:结构化错误链路打通 → 友好本地化提示 → 静默子步骤可见化 → 跨知识库集中失败中心。围绕同一 issue,3 个 commit 保持分层便于评审。
改动
1. 错误链路打通(commit 1)
mate_wiki_raw_material新增error_code(h2/mysql/kingbase V162)。复用已有的WikiProcessingService#classifyErrorCode(AUTH_ERROR/BILLING/MODEL_NOT_FOUND/RATE_LIMIT/TIMEOUT/SERVER_ERROR/CONTENT_FILTER/NO_CONTENT/EMPTY_RESULT/UNKNOWN)在每个失败点落码。error_code/error_message改为FieldStrategy.ALWAYS:成功转换时一并清空,修掉了"重处理后残留旧错误"的隐患。RAW_FAILEDSSE 事件与listRaw响应都带上errorCode。raw.failed不再丢弃错误字段(null message 也不会再是空白"失败"徽标);按errorCode渲染本地化友好提示,原始异常串折叠为 hover 详情。2. 静默子步骤可见化(commit 2)
warning_code/warning_message(V163)。embedding、实体图谱抽取等异步子步骤在材料已completed后失败时,原先只log.warn、材料仍显示"完成"但实际降级(如无法语义检索)。现在落非阻断告警,经新的raw.warningSSE 实时下发。clearFailureState()统一清空 error + warning(因这些列是ALWAYS)。3. 跨知识库集中失败中心(commit 3)
WikiRawMaterialMapper.countFailures / listFailures:共用NEEDS_ATTENTION谓词(failed | partial | warning),JOIN KB 取名字 + workspace。GET /wiki/admin/failures:平台管理员(ROLE_ADMIN,跨 workspace)。NotificationSummary新增failedWikiJobs(admin-only,仿stuckAgents)→ 复用通知中心范式,Wiki 侧边栏 NavBadge。WikiFailureCenter.vue:library 视图中的可折叠跨 KB 失败列表,友好 i18n,一键进入对应 KB。测试
WikiProcessingServiceErrorCodeTest:errorCode 词表分类 + 端到端透传 + 异步 embedding 失败只打告警不翻 failed。WikiRawMaterialFailureStateTest:4-arg 状态更新落码 / recordWarning 不改 status / claim 清空 error+warning。WikiRawMaterialFailuresMapperE2ETest(H2):钉死NEEDS_ATTENTION谓词(failed/partial/warning 进,clean/pending 出)+ KB name JOIN。vue-tsc --noEmit类型干净。升级影响
纯增量、向后兼容:新列均可空,已有数据行为不变,无需回填。
取舍
NO_CONTENT硬失败(已被 commit 1 覆盖),不属于"已完成但降级"场景。备注