Skip to content

jayala-wt/mcp-knowledge-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCP Knowledge Engine

A knowledge management system built on SQLite FTS5 with multi-model coordination, epistemic decision capture, and a memory recall/confirm protocol. Designed for self-hosted environments where multiple AI models (Claude, GPT, Codex, Qwen) collaborate through a shared knowledge graph.

What It Does

  • Full-text search over a document corpus using SQLite FTS5 with Porter stemming and Unicode tokenization
  • Document tiering (hot/warm/cold) with automatic promotion, decay, and archival based on access patterns
  • Multi-model devloop coordination -- multiple AI models log structured runs, artifacts, and decisions to a shared database, exposed over SSE for cross-tool interop
  • Epistemic decision capture -- logs reasoning decisions with TAR (Tool-Assisted Rate) and TMR (Tool-Missing Rate) metrics to measure when AI tools help vs. when they are absent
  • Memory recall/confirm protocol -- retrieval with reinforcement: query documents, then confirm which results were useful, closing a feedback loop that drives tier promotion

Architecture

tools/                    # MCP tool modules (17 tools)
  knowledge_tools.py      # FTS5 search, status, reindex, OCR queue, context marking
  memory_tools.py         # recall + confirm (retrieval with reinforcement)
  devloop_tools.py        # multi-model run logging, artifact storage, search
  decision_capture_tools.py  # epistemic decision logging + metrics

server/
  devloop_sse_server.py   # FastMCP SSE bridge (3 tools: memory, devloop_write, devloop_propose)

scripts/
  index_knowledge.py      # Document indexer: walks directories, extracts text, chunks, FTS5 upsert
  normalize_tiers.py      # Tier rebalancing: promotes hot, decays cold, archives noise
  knowledge_cron.sh       # Cron wrapper for scheduled reindexing
  run_ocr.py              # OCR pipeline for scanned documents (Tesseract)

ui/
  decision_capture/       # Flask blueprint for decision capture UI
  ayala_sigil/            # Flask blueprint for knowledge sigil visualization

FTS5 Index

Documents are chunked (2000 chars, 200 overlap) and stored in an FTS5 virtual table with Porter stemming. Each document has metadata: category, entity, year, file path, and a computed priority score based on:

  • Category weight (legal/tax/hr = high priority)
  • Recency (current year boost)
  • Entity relevance (configurable per deployment)
  • Access frequency (reinforcement from recall/confirm)
  • File path keyword matches (operating agreements, tax forms, etc.)

Hot / Warm / Cold Tiering

Documents move between tiers based on usage:

Tier Behavior
Hot Pinned in memory, instant recall, quality_score >= 80
Warm Standard FTS5 retrieval, moderate access patterns
Cold Archived after sustained low access, retrievable on demand

Continuous decay runs on each search: documents that are not accessed lose quality score over time. High-quality documents that receive memory.confirm calls get promoted upward.

Multi-Model Devloop

The devloop system coordinates multiple AI models through a shared SQLite database:

  • Runs -- each model session creates a run with origin tracking (claude/chatgpt/codex/qwen)
  • Artifacts -- structured outputs attached to runs (code, analysis, decisions)
  • Search -- full-text search across all model outputs
  • SSE Bridge -- FastMCP server exposes devloop tools over Server-Sent Events for cross-tool access

Decision Capture (Epistemic Instrumentation)

Captures reasoning decisions with structured metadata:

  • Decision type: tool-assisted, reconstruction-only, hybrid
  • Confidence scores: pre/post decision confidence
  • TAR/TMR metrics: measures tool adoption rate vs. reconstruction rate
  • Sigil corrections: links decisions to epistemic correction documents

MCP Tool Reference

Knowledge (6 tools)

Tool Description
knowledge.status Index stats: doc count, tier distribution, staleness
knowledge.search FTS5 search with priority scoring and stochastic recall
knowledge.bootstrap_context Load hot-tier context for session start (deprecated, use memory.recall)
knowledge.reindex Re-index documents from source directories
knowledge.ocr_queue Queue scanned documents for OCR processing
knowledge.context_mark Record which documents were consulted (deprecated, use memory.confirm)

Memory (2 tools)

Tool Description
memory.recall Query knowledge base with reinforcement tracking
memory.confirm Confirm which recalled documents were useful (closes feedback loop)

Devloop (6 tools)

Tool Description
devloop.run_start Start a new multi-model coordination run
devloop.log Log structured entry to current run
devloop.add_artifact Attach artifact (code/analysis/decision) to a run
devloop.latest Get latest runs and artifacts
devloop.search Full-text search across all devloop entries
devloop.get_artifact Retrieve a specific artifact by ID

Decision Capture (3 tools)

Tool Description
decision_capture.log Log a reasoning decision with epistemic metadata
decision_capture.list List captured decisions with filtering
decision_capture.metrics Compute TAR/TMR rates and epistemic health metrics

SSE Bridge (3 tools)

Tool Description
memory Recall/confirm via SSE (action: recall, latest, artifact, confirm)
devloop_write Log or dispatch tasks to specific models
devloop_propose Propose entries (always-available variant of devloop_write)

Key Concepts

Memory Recall / Confirm Protocol

The core retrieval pattern is two-step:

  1. Recall: memory.recall(query="...") -- searches the knowledge base and returns ranked results
  2. Confirm: memory.confirm(recall_id="...") -- marks which results were actually useful

This closes a reinforcement loop: confirmed documents get quality score boosts and tier promotion, while unconfirmed results decay over time. The system learns which documents are genuinely useful.

Epistemic Instrumentation

Decision capture tracks how AI models reason, not just what they produce:

  • Tool-Assisted Rate (TAR): percentage of decisions where MCP tools provided the answer
  • Tool-Missing Rate (TMR): percentage of decisions where tools were unavailable and the model had to reconstruct from memory
  • These metrics identify gaps in the knowledge base and tool coverage

Sigil Corrections

Epistemic corrections are stored as structured documents ("sigils") that capture:

  • What was wrong (the incorrect assumption or reconstruction)
  • What is correct (the verified ground truth)
  • Why it matters (impact on downstream reasoning)

Sigils are automatically promoted to hot tier for instant recall.

Tech Stack

  • Python 3.11+
  • SQLite FTS5 -- full-text search with Porter stemming, Unicode tokenization
  • FastMCP -- SSE transport for cross-tool MCP access
  • Flask -- web UI for decision capture and sigil visualization
  • Tesseract OCR -- document scanning pipeline

Stats

  • 24 MB knowledge index
  • 17 MCP tools across 4 modules + 3 SSE bridge tools
  • Multi-model coordination: Claude, GPT, Codex, Qwen
  • Hot/warm/cold document tiering with continuous decay
  • FTS5 with stochastic recall (serendipitous document surfacing)

Setup

# Install dependencies
pip install fastmcp flask

# Index documents
python scripts/index_knowledge.py --dry-run    # preview
python scripts/index_knowledge.py --apply       # commit

# Run SSE bridge (for multi-model access)
python server/devloop_sse_server.py

# Tier normalization (run periodically)
python scripts/normalize_tiers.py

Configuration

All database paths and host configuration are resolved through a config object. Set environment variables or modify the config module for your deployment:

  • KNOWLEDGE_DB -- path to the FTS5 knowledge database
  • DEVLOOP_MCP_BEARER_TOKEN -- optional bearer token for SSE bridge authentication
  • Document source directories are configured in scripts/index_knowledge.py

License

MIT

About

MCP knowledge engine: FTS5 search, multi-model memory coordination, epistemic decision capture, and document tiering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors