Skip to content

refactor: reposition wiki module as llm knowledge base and consolidate powermem integration#151

Merged
webup merged 7 commits into
developfrom
refactor/knowledge-reposition-and-powermem-consolidation
May 13, 2026
Merged

refactor: reposition wiki module as llm knowledge base and consolidate powermem integration#151
webup merged 7 commits into
developfrom
refactor/knowledge-reposition-and-powermem-consolidation

Conversation

@webup
Copy link
Copy Markdown
Contributor

@webup webup commented May 12, 2026

What

Six-stage refactor addressing review findings from PR #138.

Stage 1 — User-facing rename

  • Rename "Wiki" → "Knowledge" / "知识库" / "ナレッジ" across all three locales and READMEs
  • Reword copy to frame the module as an LLM-curated knowledge base, not a hand-authored wiki
  • URL (/wiki), API routes, TypeScript types, and file paths are unchanged

Stage 2 — Shared vault path + structure

  • New packages/backend/src/openclawVaultPaths.ts: single resolver for vault root, layout, and directory scaffolding
  • Eliminates named-profile vault-root divergence between wikiService and the PowerMem plugin
  • Deletes the stealth SCHEMA.md force-overwrite migration block from ensureWikiVault
  • 15-case parity test asserting backend and plugin resolve to the same vault root across default/dev/named profiles, with/without dataRootOverride, and on Windows/WSL paths
  • PowerMem plugin now honors CLAWMASTER_WIKI_ROOT env for parity

Stage 3 — YAML-array frontmatter migration

  • generatedFromSourceIds and relatedPages now persist as YAML inline arrays on disk (e.g. ["a","b"]) instead of pipe-delimited strings — readable by dataview-style tools in Obsidian/Foam
  • Internal representation stays Record<string, string> (pipe-joined); parseFrontmatter reads either format; renderMarkdownWithFrontmatter always writes arrays for list keys
  • Source pages no longer write a body-level ## Extracted Wiki Links generated block; related titles flow into relatedPages frontmatter; re-ingest strips any legacy block on first write
  • summarizeParsedPages unions frontmatter relatedPages into the body-link graph so backlinks and orphan detection still see derived-entity edges
  • SCHEMA.md template documents the new array convention

Stage 4 — wikiLlm transport hardening

  • Replaces globalThis.fetch !== nativeFetch identity check with explicit resolution: test override → WIKI_LLM_USE_GATEWAY=1 env → default infer-model
  • Introduces setWikiLlmTransportForTests(transport|null); setWikiLlmUseGatewayFetchForTests kept as a deprecated shim
  • 3 new transport tests: default routes to infer-model without touching fetch; env opt-in works; explicit override beats env
  • Replaces all catch {} silent swallows in wikiService LLM call sites with logWikiLlmFailure() that emits console.warn with error message + structured context
  • Simplifies sanitizeWikiBody by removing the two CLAWMASTER-GENERATED-specific passes (subsumed by the generic <!--…--> strip)

Stage 5 — Ingest/search perf and contradiction safety

  • Hoists listWikiPages out of the derived-page upsert loop (O(N×M) → O(N+M)); mutable snapshot extended after each successful upsert so later suggestions in the same batch see new pages without a re-scan
  • Reuses the outer knownPages snapshot on the skipped-fingerprint fast-path
  • Truncates contradiction check inputs to MAX_CONTRADICTION_PAGE_CHARS (3 000 chars) per page via sanitizeContentForSynthesis — same path used for query synthesis
  • Test asserts the user message delivered to the LLM is < 7 500 chars even when source pages are 10 000 chars each

Stage 6 — Cleanup and safety

  • Rename buildAutoRecallLogForTestformatAutoRecallLog (production call site; the ForTest suffix was misleading)
  • Expand inline comment on dangerouslyForceUnsafeInstall in managedMemoryBridge.ts to name which specific check it bypasses and what a narrower alternative looks like
  • Proof helper (tests/ui/wiki-powermem-proof-helper.ts): fail-fast guard that throws if the resolved --home is the real user home or a subdirectory of it, preventing accidental writes to a live profile

Why

  • PR feat: add gateway-backed wiki llm workflows #138 review uncovered a named-profile vault-root divergence, brittle transport selection, O(N²) filesystem scans, silent error swallowing, and a user-facing "Wiki" label that overpromises hand-editing features that don't exist (see discussion for the no-CRUD positioning decision).
  • The frontmatter array migration makes the vault more usable from Obsidian/Foam/dataview without any file-format break: old pages read fine, new writes are clean.

Testing

# Backend unit tests (includes 15-case vault path parity, 7 wikiLlm transport, 20 wikiService)
cd packages/backend && node ../../node_modules/tsx/dist/cli.mjs --test src/**/*.test.ts

# Plugin tests
node node_modules/tsx/dist/cli.mjs --test plugins/memory-clawmaster-powermem/*.test.ts

# Web tests
npm test --workspace=@openclaw-manager/web -- --run

# Build both packages
npm run build --workspace=@openclaw-manager/web
npm run build --workspace=@openclaw-manager/backend

Vault format migration note

generatedFromSourceIds and relatedPages fields now emit YAML arrays on disk. The read path tolerates both formats indefinitely. A rollback to a pre-migration binary will read new-format pages correctly because parseFrontmatter handles [...] by pipe-joining.

webup added 7 commits May 9, 2026 22:42
The module is an LLM-curated knowledge base built around automated
ingest, derivation, and synthesis — not a hand-authored wiki. User-facing
copy is updated to reflect that, and this positioning informs follow-up
decisions such as not adding page-edit/CRUD endpoints.

Changes:
- packages/web/src/locales/main/{en,zh,ja}.ts: rename nav.wiki, wiki.title,
  wiki.subtitle, and related labels; reword copy that implied hand-editing
  (e.g. "compile into the wiki" → "ingest as a knowledge source"); drop
  stale references to PowerMem from user-facing strings where the backing
  engine is already surfaced by the stats panel.
- packages/web/src/modules/wiki/__tests__/WikiPage.test.tsx: update
  assertions to match the new strings.
- README.md / README_CN.md / README_JP.md: v0.4.0 roadmap entry renamed
  from "LLM Wiki" to "LLM Knowledge module" in all three languages.

Code identifiers, routes (/wiki, /api/wiki/*), file paths, and TypeScript
types remain unchanged — renaming them is high-cost, zero-user-value churn
that would bloat the diff without any observable improvement.
… plugin

Two layers used to own the same on-disk wiki vault: wikiService derived
the path from the OpenClaw profile selection, the PowerMem plugin derived
it from the managed data root. They converged for the default profile but
could drift under non-trivial configurations (named profile without a
data-root override, differing env handling), and each layer wrote the
vault scaffold (SCHEMA.md, .meta/*.json, index.md, log.md) independently.

Changes:
- packages/backend/src/openclawVaultPaths.ts (new): single source of
  truth for vault-root resolution and directory/meta scaffolding.
  Precedence: explicit override > CLAWMASTER_WIKI_ROOT env >
  OPENCLAW_STATE_DIR env > data-root-derived state dir > profile
  selection. Exports resolveWikiVaultRoot, resolveWikiVaultLayout,
  ensureWikiVaultStructure, and WIKI_SCHEMA_MARKDOWN.
- packages/backend/src/openclawVaultPaths.test.ts (new): matrix tests
  across override / env / data-root / profile inputs, plus a parity
  check asserting the plugin's join(resolveOpenclawWorkspaceDir(ctx),
  '..', 'wiki') lands on the same vault root as the shared helper.
- packages/backend/src/services/wikiService.ts: delete the local
  resolveOpenclawStateDir + WIKI_SCHEMA_TEMPLATE + inline directory
  ensure logic; delegate resolveWikiPaths and ensureWikiVault to the
  shared module. Drop the stealth SCHEMA.md force-overwrite block
  (obsolete migration).
- plugins/memory-clawmaster-powermem/index.ts: honor
  CLAWMASTER_WIKI_ROOT in resolveWikiVaultRoot for parity with the
  backend (no other behavior change).
generatedFromSourceIds, relatedPages, and relatedPageIds now persist as
YAML inline arrays on disk (e.g. `generatedFromSourceIds: ["a", "b"]`)
instead of pipe-delimited strings. External markdown editors render
these as native lists and can query them with dataview-style tools
without treating the value as a single opaque string.

The in-memory shape remains Record<string, string> — parseFrontmatter
pipe-joins array values on read so existing call sites keep working, and
renderFrontmatter splits them back out on write for any key in
LIST_FRONTMATTER_KEYS. A small set of helpers (readListFrontmatter,
serializeFrontmatterList, normalizeFrontmatterValueForWrite) replaces
the ad-hoc parsePipeList / serializePipeList pair.

Source pages no longer write a body-level "## Extracted Wiki Links"
generated block. The derived page titles now flow into a `relatedPages`
frontmatter array; re-ingest of any legacy page strips the old block via
removeGeneratedBlock before writing. summarizeParsedPages unions
frontmatter relatedPages into the page link graph so backlinks and
orphan detection still see the derived-entity edges.

The shared SCHEMA.md template documents the new array convention. The
wikiService test asserts the raw on-disk file carries YAML-array syntax
and that the legacy body section is absent.
…solution

wikiLlm.ts previously decided between gateway-fetch and infer-model by
comparing globalThis.fetch against a module-load-time reference. Any
APM agent, polyfill, or test harness that wraps fetch would silently
divert production traffic to the gateway-fetch path, bypassing the
intended CLI transport.

Changes:
- Drop the nativeFetch capture and shouldUseMockedFetchTransport logic.
  Transport resolution is now: explicit test override > WIKI_LLM_USE_GATEWAY
  env ('1' → gateway-fetch) > default (infer-model). WIKI_LLM_USE_GATEWAY
  is the only opt-in outside of tests.
- Introduce setWikiLlmTransportForTests(transport|null) as the primary
  test seam. setWikiLlmUseGatewayFetchForTests is kept as a deprecated
  shim so existing test files compile unchanged during the transition.
- Add tests asserting: default routes to infer-model without calling
  fetch; WIKI_LLM_USE_GATEWAY=1 forces gateway-fetch; explicit override
  beats the env.
- Replace bare catch {} blocks in wikiService.ts LLM call sites with
  logWikiLlmFailure() calls that emit console.warn with the error
  message and a structured context object. Warning code payloads are
  preserved.
- Simplify sanitizeWikiBody: the two CLAWMASTER-GENERATED-specific
  comment strips are subsumed by the single generic comment-strip regex.
ingestWikiSource called listWikiPages(context) inside every iteration of
the derived-page upsert loop — an O(N×M) filesystem scan per ingest.
Fix: compute knownPages once before the loop and pass the snapshot to
upsertDerivedPage. A mutable copy is kept and extended after each
successful upsert so later suggestions in the same batch see freshly
created pages without a re-scan. The duplicate listWikiPages on the
skipped-fingerprint fast-path now reuses the same snapshot.

Also truncate and sanitize contradiction check inputs to
MAX_CONTRADICTION_PAGE_CHARS (3000 chars) per page — same sanitizer used
for query synthesis — to cap per-pair token spend in a dense vault.
A test asserts the user message delivered to the LLM is < 7500 chars
even when each source page is 10 000 chars of body text.
…nstall flag

- plugins/memory-clawmaster-powermem/index.ts: rename
  buildAutoRecallLogForTest → formatAutoRecallLog. The function is called
  in production code; the ForTest suffix was misleading. Update the call
  site and the test import.
- packages/backend/src/services/managedMemoryBridge.ts: expand the
  inline comment on dangerouslyForceUnsafeInstall to explain which
  specific check it bypasses (resetManagedMemory access + direct
  openclaw/plugin-sdk imports) and note what a narrower alternative
  would look like when OpenClaw adds one.
- tests/ui/wiki-powermem-proof-helper.ts: add a fail-fast check before
  syncManagedMemoryBridge runs — if the resolved --home path is the real
  user home or any subdirectory of it, throw rather than risk writing
  into the live OpenClaw profile.
1. Plumb managedMemoryContext.dataRootOverride into resolveWikiPaths so
   the shared vault-root resolver is exercised at runtime, not only in
   unit tests. Before this change, the Stage 2 dataRootOverride support
   in resolveWikiVaultLayout was dead code from the wikiService call
   path. Add a unit test in openclawVaultPaths.test.ts and an integration
   test in wikiService.test.ts asserting the vault root resolves
   correctly when only dataRootOverride (no vaultRootOverride) is set.

2. Add a multi-element readListFrontmatter round-trip test. The previous
   YAML-array tests only exercised single-element arrays. The new test
   ingests a source page where the LLM returns two suggestions and asserts
   that relatedPages persists as a two-element YAML inline array on disk
   while the in-memory representation stays pipe-joined.

3. Replace setWikiLlmUseGatewayFetchForTests with
   setWikiLlmTransportForTests('gateway-fetch') in the contradiction-
   truncation test and in afterEach, ahead of the deprecated shim's
   removal.
@webup webup merged commit 78ac07d into develop May 13, 2026
10 checks passed
@webup webup deleted the refactor/knowledge-reposition-and-powermem-consolidation branch May 13, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant