refactor: reposition wiki module as llm knowledge base and consolidate powermem integration#151
Merged
webup merged 7 commits intoMay 13, 2026
Conversation
The module is an LLM-curated knowledge base built around automated
ingest, derivation, and synthesis — not a hand-authored wiki. User-facing
copy is updated to reflect that, and this positioning informs follow-up
decisions such as not adding page-edit/CRUD endpoints.
Changes:
- packages/web/src/locales/main/{en,zh,ja}.ts: rename nav.wiki, wiki.title,
wiki.subtitle, and related labels; reword copy that implied hand-editing
(e.g. "compile into the wiki" → "ingest as a knowledge source"); drop
stale references to PowerMem from user-facing strings where the backing
engine is already surfaced by the stats panel.
- packages/web/src/modules/wiki/__tests__/WikiPage.test.tsx: update
assertions to match the new strings.
- README.md / README_CN.md / README_JP.md: v0.4.0 roadmap entry renamed
from "LLM Wiki" to "LLM Knowledge module" in all three languages.
Code identifiers, routes (/wiki, /api/wiki/*), file paths, and TypeScript
types remain unchanged — renaming them is high-cost, zero-user-value churn
that would bloat the diff without any observable improvement.
… plugin Two layers used to own the same on-disk wiki vault: wikiService derived the path from the OpenClaw profile selection, the PowerMem plugin derived it from the managed data root. They converged for the default profile but could drift under non-trivial configurations (named profile without a data-root override, differing env handling), and each layer wrote the vault scaffold (SCHEMA.md, .meta/*.json, index.md, log.md) independently. Changes: - packages/backend/src/openclawVaultPaths.ts (new): single source of truth for vault-root resolution and directory/meta scaffolding. Precedence: explicit override > CLAWMASTER_WIKI_ROOT env > OPENCLAW_STATE_DIR env > data-root-derived state dir > profile selection. Exports resolveWikiVaultRoot, resolveWikiVaultLayout, ensureWikiVaultStructure, and WIKI_SCHEMA_MARKDOWN. - packages/backend/src/openclawVaultPaths.test.ts (new): matrix tests across override / env / data-root / profile inputs, plus a parity check asserting the plugin's join(resolveOpenclawWorkspaceDir(ctx), '..', 'wiki') lands on the same vault root as the shared helper. - packages/backend/src/services/wikiService.ts: delete the local resolveOpenclawStateDir + WIKI_SCHEMA_TEMPLATE + inline directory ensure logic; delegate resolveWikiPaths and ensureWikiVault to the shared module. Drop the stealth SCHEMA.md force-overwrite block (obsolete migration). - plugins/memory-clawmaster-powermem/index.ts: honor CLAWMASTER_WIKI_ROOT in resolveWikiVaultRoot for parity with the backend (no other behavior change).
generatedFromSourceIds, relatedPages, and relatedPageIds now persist as YAML inline arrays on disk (e.g. `generatedFromSourceIds: ["a", "b"]`) instead of pipe-delimited strings. External markdown editors render these as native lists and can query them with dataview-style tools without treating the value as a single opaque string. The in-memory shape remains Record<string, string> — parseFrontmatter pipe-joins array values on read so existing call sites keep working, and renderFrontmatter splits them back out on write for any key in LIST_FRONTMATTER_KEYS. A small set of helpers (readListFrontmatter, serializeFrontmatterList, normalizeFrontmatterValueForWrite) replaces the ad-hoc parsePipeList / serializePipeList pair. Source pages no longer write a body-level "## Extracted Wiki Links" generated block. The derived page titles now flow into a `relatedPages` frontmatter array; re-ingest of any legacy page strips the old block via removeGeneratedBlock before writing. summarizeParsedPages unions frontmatter relatedPages into the page link graph so backlinks and orphan detection still see the derived-entity edges. The shared SCHEMA.md template documents the new array convention. The wikiService test asserts the raw on-disk file carries YAML-array syntax and that the legacy body section is absent.
…solution
wikiLlm.ts previously decided between gateway-fetch and infer-model by
comparing globalThis.fetch against a module-load-time reference. Any
APM agent, polyfill, or test harness that wraps fetch would silently
divert production traffic to the gateway-fetch path, bypassing the
intended CLI transport.
Changes:
- Drop the nativeFetch capture and shouldUseMockedFetchTransport logic.
Transport resolution is now: explicit test override > WIKI_LLM_USE_GATEWAY
env ('1' → gateway-fetch) > default (infer-model). WIKI_LLM_USE_GATEWAY
is the only opt-in outside of tests.
- Introduce setWikiLlmTransportForTests(transport|null) as the primary
test seam. setWikiLlmUseGatewayFetchForTests is kept as a deprecated
shim so existing test files compile unchanged during the transition.
- Add tests asserting: default routes to infer-model without calling
fetch; WIKI_LLM_USE_GATEWAY=1 forces gateway-fetch; explicit override
beats the env.
- Replace bare catch {} blocks in wikiService.ts LLM call sites with
logWikiLlmFailure() calls that emit console.warn with the error
message and a structured context object. Warning code payloads are
preserved.
- Simplify sanitizeWikiBody: the two CLAWMASTER-GENERATED-specific
comment strips are subsumed by the single generic comment-strip regex.
ingestWikiSource called listWikiPages(context) inside every iteration of the derived-page upsert loop — an O(N×M) filesystem scan per ingest. Fix: compute knownPages once before the loop and pass the snapshot to upsertDerivedPage. A mutable copy is kept and extended after each successful upsert so later suggestions in the same batch see freshly created pages without a re-scan. The duplicate listWikiPages on the skipped-fingerprint fast-path now reuses the same snapshot. Also truncate and sanitize contradiction check inputs to MAX_CONTRADICTION_PAGE_CHARS (3000 chars) per page — same sanitizer used for query synthesis — to cap per-pair token spend in a dense vault. A test asserts the user message delivered to the LLM is < 7500 chars even when each source page is 10 000 chars of body text.
…nstall flag - plugins/memory-clawmaster-powermem/index.ts: rename buildAutoRecallLogForTest → formatAutoRecallLog. The function is called in production code; the ForTest suffix was misleading. Update the call site and the test import. - packages/backend/src/services/managedMemoryBridge.ts: expand the inline comment on dangerouslyForceUnsafeInstall to explain which specific check it bypasses (resetManagedMemory access + direct openclaw/plugin-sdk imports) and note what a narrower alternative would look like when OpenClaw adds one. - tests/ui/wiki-powermem-proof-helper.ts: add a fail-fast check before syncManagedMemoryBridge runs — if the resolved --home path is the real user home or any subdirectory of it, throw rather than risk writing into the live OpenClaw profile.
1. Plumb managedMemoryContext.dataRootOverride into resolveWikiPaths so
the shared vault-root resolver is exercised at runtime, not only in
unit tests. Before this change, the Stage 2 dataRootOverride support
in resolveWikiVaultLayout was dead code from the wikiService call
path. Add a unit test in openclawVaultPaths.test.ts and an integration
test in wikiService.test.ts asserting the vault root resolves
correctly when only dataRootOverride (no vaultRootOverride) is set.
2. Add a multi-element readListFrontmatter round-trip test. The previous
YAML-array tests only exercised single-element arrays. The new test
ingests a source page where the LLM returns two suggestions and asserts
that relatedPages persists as a two-element YAML inline array on disk
while the in-memory representation stays pipe-joined.
3. Replace setWikiLlmUseGatewayFetchForTests with
setWikiLlmTransportForTests('gateway-fetch') in the contradiction-
truncation test and in afterEach, ahead of the deprecated shim's
removal.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Six-stage refactor addressing review findings from PR #138.
Stage 1 — User-facing rename
/wiki), API routes, TypeScript types, and file paths are unchangedStage 2 — Shared vault path + structure
packages/backend/src/openclawVaultPaths.ts: single resolver for vault root, layout, and directory scaffoldingwikiServiceand the PowerMem pluginSCHEMA.mdforce-overwrite migration block fromensureWikiVaultdataRootOverride, and on Windows/WSL pathsCLAWMASTER_WIKI_ROOTenv for parityStage 3 — YAML-array frontmatter migration
generatedFromSourceIdsandrelatedPagesnow persist as YAML inline arrays on disk (e.g.["a","b"]) instead of pipe-delimited strings — readable by dataview-style tools in Obsidian/FoamRecord<string, string>(pipe-joined);parseFrontmatterreads either format;renderMarkdownWithFrontmatteralways writes arrays for list keys## Extracted Wiki Linksgenerated block; related titles flow intorelatedPagesfrontmatter;re-ingeststrips any legacy block on first writesummarizeParsedPagesunions frontmatterrelatedPagesinto the body-link graph so backlinks and orphan detection still see derived-entity edgesSCHEMA.mdtemplate documents the new array conventionStage 4 — wikiLlm transport hardening
globalThis.fetch !== nativeFetchidentity check with explicit resolution: test override →WIKI_LLM_USE_GATEWAY=1env → defaultinfer-modelsetWikiLlmTransportForTests(transport|null);setWikiLlmUseGatewayFetchForTestskept as a deprecated shiminfer-modelwithout touchingfetch; env opt-in works; explicit override beats envcatch {}silent swallows in wikiService LLM call sites withlogWikiLlmFailure()that emitsconsole.warnwith error message + structured contextsanitizeWikiBodyby removing the two CLAWMASTER-GENERATED-specific passes (subsumed by the generic<!--…-->strip)Stage 5 — Ingest/search perf and contradiction safety
listWikiPagesout of the derived-page upsert loop (O(N×M) → O(N+M)); mutable snapshot extended after each successful upsert so later suggestions in the same batch see new pages without a re-scanknownPagessnapshot on the skipped-fingerprint fast-pathMAX_CONTRADICTION_PAGE_CHARS(3 000 chars) per page viasanitizeContentForSynthesis— same path used for query synthesisStage 6 — Cleanup and safety
buildAutoRecallLogForTest→formatAutoRecallLog(production call site; theForTestsuffix was misleading)dangerouslyForceUnsafeInstallinmanagedMemoryBridge.tsto name which specific check it bypasses and what a narrower alternative looks liketests/ui/wiki-powermem-proof-helper.ts): fail-fast guard that throws if the resolved--homeis the real user home or a subdirectory of it, preventing accidental writes to a live profileWhy
Testing
Vault format migration note
generatedFromSourceIdsandrelatedPagesfields now emit YAML arrays on disk. The read path tolerates both formats indefinitely. A rollback to a pre-migration binary will read new-format pages correctly becauseparseFrontmatterhandles[...]by pipe-joining.