繁體中文 · English
讓知識重新浮現 | Make Knowledge Reappear
XKB is a personal knowledge lifecycle system. It turns local notes, bookmarks, videos, repositories, papers, and conversation memory into structured cards, searchable indexes, distilled wiki topics, and an interactive knowledge graph. The core idea is semantic active recall — knowledge should resurface when it becomes useful, not sit in an archive waiting to be manually found.

Every day we consume notes, articles, videos, repositories, papers, conversations, and threads. We save them because they feel important. Six months later — we cannot find them, cannot recall them, and have no idea what we actually learned.
Most tools stop at capture: save a bookmark, store a note, tag a source. XKB starts after capture. It turns raw material into reusable knowledge, connects it to what you already know, and lets agents retrieve it at the moment of need. Knowledge should know when you need it.
XKB is built on a different premise: knowledge has a lifecycle. The goal is not to archive more — it is to make what you have already consumed reappear at the right moment and gradually sediment into durable understanding.
XKB can run in layers. You do not need to install the full XBrain/GBrain stack on day one.
| Mode | Best for | Requires | Retrieval |
|---|---|---|---|
| Lite | First run, local notes, small libraries | Python + an LLM provider | search_index.json keyword search |
| Enhanced | Better semantic recall without services | Lite + GEMINI_API_KEY |
flat vector_index.json fallback |
| Full / XBrain | Daily use, larger libraries, agent workflows | OpenClaw + GBrain/XBrain | Postgres/pgvector hybrid RRF search |
Recommended path:
- Start with Lite mode and ingest a few local notes.
- Add Gemini embeddings only if you need semantic fallback search.
- Install XBrain/GBrain once the library is large enough to need hybrid search and durable jobs.
Your personal cards, indexes, graph data, and runtime state should stay outside git. This repo contains the tooling and templates, not your private knowledge base.
# 1. Clone
git clone https://github.com/Hidicence/x-knowledge-base
cd x-knowledge-base
# 2. Choose a workspace for your private data
export OPENCLAW_WORKSPACE="$HOME/.openclaw/workspace"
mkdir -p "$OPENCLAW_WORKSPACE/memory/cards" "$OPENCLAW_WORKSPACE/memory/bookmarks"
# 3. Configure an LLM provider
cp .env.example .env
# Edit .env or export LLM_MODEL / LLM_API_URL / LLM_API_KEY in your shell.
# 4. Ingest a local markdown folder
python3 scripts/local_ingest.py /path/to/notes --category notes
# 5. Build the fallback search index
bash scripts/build_search_index.sh
# 6. Ask or search
bash scripts/search_bookmarks.sh "agent memory"
python3 scripts/xkb_ask.py "What patterns are emerging in my notes?"This mode does not require X/Twitter cookies, GBrain, Postgres, Bun, or OpenClaw cron.
XKB is designed so the public repo can stay clean while your knowledge base remains local.
Do not commit:
.envor real files under.secrets/- X/Twitter cookies such as
BIRD_AUTH_TOKEN/BIRD_CT0 - API keys such as
LLM_API_KEY,GEMINI_API_KEY,OPENAI_API_KEY - generated personal data:
memory/cards/,memory/bookmarks/search_index.json,memory/bookmarks/vector_index.json,memory/x-knowledge-base/wiki/ - generated demo output:
$OPENCLAW_WORKSPACE/memory/x-knowledge-base/demo/graph-data.json - runtime queues, logs, caches, PM2 dumps, or machine-specific paths
Safe to commit:
- source scripts
- config templates and examples
- docs
- sample graph/schema files
.env.example/.secrets/*.exampleplaceholders
Before publishing changes, run:
git status --short
git diff --check
python3 scripts/health_check_pipeline.pyFor index hygiene specifically:
python3 scripts/prune_duplicate_index_rows.py --dry-run
python3 scripts/sync_enriched_index.py --dry-run # if available in your versionInput sources
├── Local notes / markdown → local_ingest.py
├── YouTube playlists → fetch_youtube_playlist.py
├── GitHub forks/stars → fetch_github_repos.py
├── PubMed / academic papers → fetch_pubmed.py + local_ingest.py
├── Conversation memory → distill_memory_to_wiki.py
└── X/Twitter bookmarks → xkb_minion_submit.py / xkb_minion_worker.py
│
▼
scripts/_card_prompt.py ← shared by ALL ingest scripts
(unified 9-section card format, same prompt regardless of source)
│
▼ (all LLM calls go through scripts/_llm.py)
│
┌─────────────────────────────────────────────────────────────┐
│ Knowledge Artifacts (permanent, gitignored) │
│ │
│ memory/cards/*.md structured 9-section cards │
│ memory/x-knowledge-base/wiki/topics/ distilled long-term knowledge │
└─────────────────────────────────────────────────────────────┘
│
▼ on every card write (auto)
┌─────────────────────────────────────────────────────────────┐
│ Primary Retrieval — XBrain │
│ (XKB's semantic search layer, powered by GBrain) │
│ │
│ • pgvector + Postgres-backed GBrain │
│ • Gemini embeddings │
│ • RRF hybrid search (vector + keyword) │
│ • Minions job runtime for durable internal pipelines │
│ • xbrain_recall.py ← used automatically by all scripts │
└─────────────────────────────────────────────────────────────┘
│ falls back to when XBrain unavailable
▼
┌─────────────────────────────────────────────────────────────┐
│ Fallback Retrieval │
│ │
│ search_index.json keyword + summary search │
│ vector_index.json flat Gemini vector index │
│ build_vector_index.py rebuilds flat index on demand │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Wiki Layer (memory/x-knowledge-base/wiki/topics/*.md) │
│ │
│ sync_cards_to_wiki.py external bookmark knowledge │
│ distill_memory_to_wiki.py conversation memory insights │
│ (daily cron, auto-staged) │
└─────────────────────────────────────────────────────────┘
│
▼
xkb_ask.py / Active Recall Layer
Two-layer recall: wiki topics (synthesized) → cards (XBrain hybrid search)
│
▼
demo/xkb-demo-ui/ ← Interactive graph explorer (Next.js)
Knowledge Graph | Chat | Evidence Panel
| # | Section | Purpose |
|---|---|---|
| 1 | Core Question & Conclusion | What question does this answer? What is the conclusion? |
| 2 | Claim Level | Attested / Scholarship / Inference — how reliable? |
| 3 | Key Arguments | 3–5 key arguments extracted from the source |
| 4 | False Friends | Terms with specific technical meaning in this context |
| 5 | Surprises | What might surprise a knowledgeable reader? |
| 6 | Relation to Existing Knowledge | How does this relate to existing cards? |
| 7 | Bilingual Summary | ZH + EN (used for search index) |
| 8 | Value to User | Actionable directions, relevant projects |
| 9 | Source | Source URL and related links |
One format, every source. A YouTube video, a GitHub repo, and a PubMed paper all produce the same card structure.
XKB uses a single unified LLM config. All scripts share the same model — no scattered environment variables.
{
"model": "openai-codex/gpt-5.4"
}Available model formats (anything supported by openclaw capability model run):
| Value | Provider |
|---|---|
openai-codex/gpt-5.4 |
ChatGPT via OpenClaw OAuth |
openai-codex/gpt-5.4-mini |
ChatGPT Mini via OpenClaw OAuth |
MiniMax-M2.7 |
MiniMax via API key |
MiniMax-M2.5 |
MiniMax M2.5 via API key |
How it works: All scripts call
scripts/_llm.py, which invokesopenclaw capability model run. OpenClaw handles all auth (OAuth token refresh, API keys) automatically. Scripts no longer need to manage API keys.
Embedding is separate. Semantic vector search uses Gemini (
GEMINI_API_KEY) and is not affected byconfig/llm.json.
If you are not using OpenClaw, override the model via environment variables:
export LLM_MODEL="MiniMax-M2.5"
export LLM_API_URL="https://api.minimax.io/anthropic"
export LLM_API_KEY="your-minimax-key"
LLM_MODELenv var takes priority overconfig/llm.json.
X/Twitter bookmark enrichment now runs on a Minions-native queue pipeline by default on top of Postgres-backed GBrain. This replaces the old cron-spawn scan-worker pattern for the main bookmark enrichment path.
scripts/xkb_minion_submit.py- scans unenriched bookmarks
- submits one idempotent Minion job per bookmark
- intended to run from cron (for example, hourly)
scripts/xkb_minion_worker.py- long-lived worker daemon
- claims jobs from
minion_jobs - runs LLM enrichment
- writes final cards and updates job state
Old pattern:
- spawned a fresh Python process every 10 minutes
- produced zombie
openclaw-infersubprocesses under load - was harder to observe, retry, and recover safely
Current Minions-native pattern:
- one long-lived worker daemon
- sequential processing by default (one job at a time)
- built-in timeout
- retry + exponential backoff
- idempotent submissions keyed by bookmark/card id
- observable via
gbrain jobs list
Validated in this environment:
- worker successfully claims jobs
- LLM inference starts correctly
- jobs move through expected states
- active → dead flow was observed under intentionally short timeout in test mode
- production timeout can be set to 300s per job
# see job states
gbrain jobs list
# verify Minions health
gbrain jobs smokeThe wiki is the distilled output layer — a readable, long-term knowledge base built from two sources:
| Source | Script | What it adds |
|---|---|---|
| External bookmarks | sync_cards_to_wiki.py |
Synthesized insights from cards via absorb gate |
| Conversation memory | distill_memory_to_wiki.py |
Decisions, workflows, principles from daily memory logs |
The wiki lives at wiki/ inside the skill directory. The workspace symlinks to it:
~/.openclaw/workspace/wiki/ → ~/.openclaw/workspace/memory/x-knowledge-base/wiki/ (symlink)
This prevents dual-wiki drift: every tool reads from one place.
distill_memory_to_wiki.py reads recent memory/YYYY-MM-DD.md logs, uses LLM to extract insights worth long-term preservation, and either stages them for review or applies them to wiki topic pages.
# Preview what would be extracted from the last 3 days
python3 scripts/distill_memory_to_wiki.py --dry-run --days 3
# Stage candidates for review
python3 scripts/distill_memory_to_wiki.py --stage --days 2
# Apply all staged candidates (auto-approve)
python3 scripts/distill_memory_to_wiki.py --apply \
--staging-file $OPENCLAW_WORKSPACE/memory/x-knowledge-base/wiki/_staging/YYYY-MM-DD-candidates.md \
--approve-allCron jobs run this automatically at 15:30 and 21:30 UTC+8 daily.
python3 scripts/health_check_pipeline.pyChecks three things:
workspace/wikiis a symlink to the canonical wiki (not a duplicate)- Recall reads from the correct wiki path
search_index.jsonsummary coverage ≥ 70%, age < 26h; vector index freshness
When a user sends a message, XKB uses two-layer recall:
- Layer 1 — Wiki topics (
memory/x-knowledge-base/wiki/topics/*.md): synthesized, durable knowledge. Answers conceptual questions. - Layer 2 — Cards (XBrain hybrid search, falls back to
search_index.json): raw evidence. Provides specific citations and sources.
# Ask a question over your knowledge base
python3 scripts/xkb_ask.py "What are alternatives to RAG?"
python3 scripts/xkb_ask.py "What is the absorb gate?" --format chat
python3 scripts/xkb_ask.py "agent memory design" --jsonAdd to .claude/settings.json:
{
"mcpServers": {
"xkb-recall": {
"command": "python3",
"args": ["/path/to/x-knowledge-base/scripts/xkb_recall_server.py"],
"env": { "OPENCLAW_WORKSPACE": "/path/to/workspace" }
}
}
}A draft roadmap for the next phase lives here:
docs/xkb-vnext-roadmap-draft.md
Short version:
- keep wiki as the human-readable product layer
- place graph/relations in the structured knowledge layer below wiki
- treat Minions as the default execution substrate for large-scale internal workflows
- now that bookmark enrichment is already Minions-native, focus next on knowledge governance: confidence, staleness, supersession, typed relationships
Once Lite mode works, you can enable the full stack.
# 1. Clone into your OpenClaw skills directory
git clone https://github.com/Hidicence/x-knowledge-base \
~/.openclaw/workspace/skills/x-knowledge-base
cd ~/.openclaw/workspace/skills/x-knowledge-base
# 2. Configure model/auth in OpenClaw
# Add env keys to ~/.openclaw/openclaw.json, for example:
# { "env": { "GEMINI_API_KEY": "...", "LLM_API_KEY": "..." } }
# 3. Optional: install XBrain/GBrain hybrid search
bash scripts/setup_xbrain.sh
# 4. Run health check
python3 scripts/health_check_pipeline.py
# 5. Run the demo
bash scripts/xkb_demo.shexport LLM_API_KEY="your-minimax-or-openai-key"
export LLM_API_URL="https://api.minimax.io/anthropic/v1"
export LLM_MODEL="MiniMax-M2.7"
export OPENCLAW_WORKSPACE="$HOME/.openclaw/workspace"
export GEMINI_API_KEY="your-gemini-key"
bash scripts/setup_xbrain.sh
bash scripts/xkb_demo.shIf you prefer not to use the setup script:
# 1. Install Bun https://bun.sh
curl -fsSL https://bun.sh/install | bash
# 2. Clone GBrain runtime
git clone https://github.com/garrytan/gbrain ~/gbrain
cd ~/gbrain && bun install && bun run src/cli.ts init
# 3. Tell XKB where to find it
# Add to ~/.openclaw/openclaw.json → "env":
# "gbrain_dir": "/absolute/path/to/gbrain"
# "GEMINI_API_KEY": "your-key" ← required for embeddings
# 4. Verify
python3 scripts/xbrain_recall.py "test query"All scripts share _card_prompt.py and _llm.py — one prompt, one LLM call, one card format.
| Script | Source | What it does |
|---|---|---|
run_scan_worker.py |
X/Twitter | Scans bookmarks for unenriched files → cards |
run_bookmark_worker.py |
X/Twitter queue | Processes tiege-queue.json one item at a time |
fetch_youtube_playlist.py |
YouTube | Playlist subtitles → knowledge cards |
fetch_github_repos.py |
GitHub | Forks/stars → repo-level knowledge cards |
local_ingest.py |
Local / PubMed | Markdown/txt/papers → cards |
fetch_pubmed.py |
PubMed Central | Fetch open-access papers as markdown |
_card_prompt.py |
(shared) | Unified prompt, card format, summary extraction |
_llm.py |
(shared) | Unified LLM call via openclaw capability model run |
| Script | What it does |
|---|---|
sync_enriched_index.py |
Backfill summaries/tags from enriched cards into search_index.json |
build_vector_index.py |
Build/update flat JSON vector index (fallback when XBrain unavailable) |
xbrain_recall.py |
XBrain search bridge — hybrid RRF (pgvector + keyword); auto-used by all recall scripts |
| Script | What it does |
|---|---|
sync_cards_to_wiki.py |
Cards → wiki topic pages via LLM absorb gate |
distill_memory_to_wiki.py |
Daily memory logs → wiki topic insights (stage/apply workflow) |
sync_cards_to_wiki.py --review |
Review pending absorb decisions |
lint_wiki.py |
Validate wiki structure, detect gap topics |
topic_guide_generator.py |
Generate new wiki topic stubs |
suggest_topic_map.py |
Suggest topic map updates from uncovered cards |
| Script | What it does |
|---|---|
xkb_ask.py |
Natural-language Q&A: wiki (Layer 1) → cards via XBrain hybrid search (Layer 2) |
recall_for_conversation.py |
Conversation-triggered recall (wiki + XBrain card search) |
continuity_recall.py |
MEMORY.md + wiki lookup for session continuity |
contrarian_recall.py |
Surfaces warnings, failures, counter-examples |
action_recall.py |
Action-oriented recall (what to do next) |
xkb_recall_server.py |
MCP server exposing recall as a tool |
| Script | What it does |
|---|---|
health_check_pipeline.py |
Wiki symlink integrity, recall source path, index freshness |
status_knowledge_pipeline.py |
Full pipeline status in one view |
health_check.py |
Semantic conflict detection, gap analysis |
demo/
├── xkb-demo-ui/ Next.js app — three-column explorer
│ ├── app/page.tsx Main layout: graph | chat | evidence
│ ├── components/
│ │ ├── KnowledgeGraph.tsx Force-directed graph (react-force-graph-2d)
│ │ ├── ChatPanel.tsx Natural-language Q&A via xkb_ask.py
│ │ └── EvidencePanel.tsx Source cards + wiki references
│ └── public/
│ └── graph-data.sample.json schema reference
└── generate_graph.py Builds graph-data.json from search_index.json into $OPENCLAW_WORKSPACE/memory/x-knowledge-base/demo/
Run the demo:
python3 demo/generate_graph.py
cd demo/xkb-demo-ui && npm install && npm run dev
# → http://localhost:3000
graph-data.jsonis generated under$OPENCLAW_WORKSPACE/memory/x-knowledge-base/demo/. It is personal runtime data and should not be committed.
# Local notes
python3 scripts/local_ingest.py notes/ --category learning
# X/Twitter bookmarks
python3 scripts/run_scan_worker.py --limit 20
# YouTube playlists
python3 scripts/fetch_youtube_playlist.py --playlist "URL"
# GitHub repos
python3 scripts/fetch_github_repos.py --forks --stars
# PubMed papers
python3 scripts/fetch_pubmed.py "antimicrobial resistance" --limit 20 --out /tmp/papers
python3 scripts/local_ingest.py /tmp/papers/ --category research --tag pubmed# Backfill summaries from enriched cards into search_index.json (always run this)
python3 scripts/sync_enriched_index.py
# Only needed if XBrain is not configured (fallback mode)
python3 scripts/build_vector_index.py --incrementalXBrain (primary): every ingest script auto-pushes cards to XBrain on write.
xbrain_recall.pyis used automatically by all recall scripts — no extra steps. Setgbrain_dirin~/.openclaw/openclaw.jsonto point at your GBrain runtime directory.Fallback: if XBrain is unavailable, recall falls back to
search_index.jsonkeyword search automatically.
# Sync external knowledge (bookmark cards → wiki topics)
python3 scripts/sync_cards_to_wiki.py --apply --limit 20
# Distill conversation memory into wiki topics
python3 scripts/distill_memory_to_wiki.py --stage --days 3
python3 scripts/distill_memory_to_wiki.py --apply \
--staging-file $OPENCLAW_WORKSPACE/memory/x-knowledge-base/wiki/_staging/YYYY-MM-DD-candidates.md --approve-allpython3 scripts/xkb_ask.py "What are the alternatives to RAG?"python3 scripts/health_check_pipeline.pyExpected output:
✅ wiki_canonical workspace/wiki → memory/x-knowledge-base/wiki (symlink correct)
✅ recall_wiki_source Recall reads from canonical wiki
✅ index_freshness summary coverage: 212/270 (79%) | enriched: 218 | vectors: 471
When running with OpenClaw, the full pipeline runs automatically:
| Schedule | Job | What it does |
|---|---|---|
| 13:30 UTC+8 | daily:xkb-ingestion-batch |
Ingest new X/Twitter bookmarks → cards → auto-push to XBrain → sync_enriched_index |
| 15:30 UTC+8 | daily:wiki-distill-afternoon |
Distill today's memory into wiki candidates |
| 21:30 UTC+8 | daily:wiki-distill-evening |
Second distillation pass, apply high-confidence candidates |
The pipeline ensures that after each ingestion run:
- Each card is auto-pushed to XBrain on write — hybrid RRF search immediately available
sync_enriched_index.pybackfills summaries into the fallback search index- New insights from conversations are automatically staged for wiki inclusion
- Python 3.10+
- Node.js 18+ (demo UI only)
- OpenClaw (recommended) — handles all LLM auth and cron automation
GEMINI_API_KEY— required for XBrain semantic embeddings; set in~/.openclaw/openclaw.json- Bun + GBrain runtime (optional) — powers XBrain hybrid search (pgvector/PGLite + RRF); set
gbrain_dirinopenclaw.jsonto activate. Falls back to keyword search if not configured.
| Version | Status | What it delivered |
|---|---|---|
| v0.1 | ✅ | Bookmark ingestion, knowledge cards, keyword search |
| v0.2 | ✅ | Multi-layer extraction, enrichment worker, vector index |
| v0.3 | ✅ | Wiki pipeline: absorb gate, topic pages, memory distillation |
| v0.4 | ✅ | Local notes ingest, ask layer, demo mode, auto topic-map |
| v0.5 | ✅ | Absorb gate explainability, review-decisions log |
| v0.6 | ✅ | Active Recall Layer: proactive recall, MCP server, telemetry |
| v0.7 | ✅ | Claim levels, False Friends, bilingual summaries, academic PDF pipeline |
| v0.8 | ✅ | Unified ingest pipeline (_card_prompt.py); demo UI (graph + chat) |
| v0.9 | ✅ | Two-layer recall (wiki first); unified LLM config; memory→wiki distillation pipeline; single canonical wiki; pipeline health check |
| v1.0 | ✅ | XBrain hybrid search (pgvector + RRF) fully integrated across all ingest scripts; unified path resolution; graceful fallback to keyword search |
| v1.1 | 🔜 | Active Recall quality upgrade — soft-trigger re-ranking; Claim level surfaced in recall output; trigger strategy expansion beyond rule-based regex |
| v1.2 | 🔜 | Agent-to-Agent knowledge exchange — standardized card format (9-section + Claim level) as exchange unit over A2A protocol; receive_card MCP tool; XBrain as local digestion layer for received cards |
- One card format, many sources. Every source produces the same 9-section card.
- Layers, not one database. Working memory, consolidation, capture, and output are separate problems.
- Quality gates over quantity. The absorb gate keeps the wiki as a distilled output layer.
- Understanding over summarization. Cards answer what question this solves, not what it says.
- Single source of truth. One canonical wiki path, one LLM config file — no scattered settings.
- OpenClaw handles auth. Scripts call
openclaw capability model run; token management is not their problem. - Graceful degradation. XBrain hybrid search is the primary retrieval path; keyword fallback activates automatically when XBrain is unavailable. Nothing breaks.
- Personal data stays local. Graph data, cards, and wiki are gitignored.
Start with SKILL.md and docs/xkb-wiki-architecture.md.
PRs and issues welcome. Your knowledge deserves to be remembered.