A persistent memory system for AI coding agents built on Neo4j with native vector indexes. Implements semantic retrieval with hidden layer concept abstraction and Hebbian learning.
Key insight: The critical metric isn't average retrieval score—it's state survival under context compaction. Baseline agents forget architectural decisions after compactions. MDEMG maintains decision persistence indefinitely.
Everything needed to independently verify our results is included.
- Docker (Neo4j)
- Go 1.24+
- Python 3.10+ (grading + utilities)
Embeddings (choose one):
- OpenAI API key, or
- Ollama (local embeddings)
Agent under test (choose one):
- Claude Code (recommended baseline runner), or
- Any tool-using LLM agent that can call the MDEMG API
# 1. Clone and setup
git clone https://github.com/reh3376/mdemg.git && cd mdemg
cp .env.example .env # Add your embedding provider credentials
# 2. Start services
docker compose up -d
go build -o bin/mdemg ./cmd/server && ./bin/mdemg &
# Server writes .mdemg.port with actual port (dynamic allocation if preferred port is busy)
# 3. Ingest test codebase (or use your own)
go build -o bin/ingest-codebase ./cmd/ingest-codebase
./bin/ingest-codebase --space-id=benchmark --path=/path/to/target-repo
# 4. Run consolidation (reads port from .mdemg.port automatically)
PORT=$(cat .mdemg.port 2>/dev/null || echo 9999)
curl -X POST http://localhost:$PORT/v1/memory/consolidate \
-H "Content-Type: application/json" -d '{"space_id": "benchmark"}'
# 5. Run benchmark (see docs/benchmarks/whk-wms/)
# Questions: test_questions_120_agent.json
# Grader: docs/benchmarks/grader_v4.pyReport output: grades_*.json contains per-question scores with evidence breakdown.
# Question bank hash (SHA-256)
shasum -a 256 docs/benchmarks/whk-wms/test_questions_120_agent.json
# Expected: 24aa17a215e4e58b8b44c7faef9f14228edb0e6d3f8f657d867b1bfa850f7e9eFull reproducibility details for skeptics:
| Item | Value |
|---|---|
| MDEMG Commit | 779d753 |
| Question Set | test_questions_120_agent.json (120 questions) |
| Question Hash | sha256:24aa17a2... |
| Answer Key | test_questions_120.json |
| Grader | grade_answers.py |
| Scoring Weights | Evidence: 0.70 / Concept: 0.15 / Semantic: 0.15 |
| Target Codebase | whk-wms (507K LOC TypeScript) |
| Include Patterns | **/*.ts, **/*.tsx, **/*.json |
| Exclude Patterns | node_modules/, dist/, .git/, docs-website/ |
| Agent Model | Claude Haiku (via Claude Code) |
| Runs | 2 per condition (run 3 excluded for consistency) |
| Embedding Provider | OpenAI text-embedding-3-small |
Baseline definition: Same agent runner and tool permissions, no MDEMG retrieval, relying on long-context + auto-compaction only (memory off).
| Metric | Baseline | MDEMG + Edge Attention | Delta |
|---|---|---|---|
| Mean Score | 0.854 | 0.898 | +5.2% |
| Standard Deviation | 0.088 | 0.059 | -51% variance |
| High Score Rate (≥0.7) | 97.9% | 100% | +2.1pp |
| Strong Evidence Rate | 97.9% | 100% | +2.1pp |
Category Performance (Edge Attention):
| Category | Mean Score |
|---|---|
| Disambiguation | 0.958 |
| Service Relationships | 0.916 |
| Architecture Structure | 0.889 |
| Data Flow Integration | 0.882 |
The Q&A battery measures single-turn retrieval accuracy. The critical differentiator is state survival under compaction:
| Metric | Baseline | MDEMG | Source |
|---|---|---|---|
| Decision Persistence @5 compactions | 0% | 95% | Compaction torture test |
When context windows fill and auto-compaction kicks in, baseline agents lose architectural decisions. MDEMG persists them in the graph.
The baseline forgets under pressure. MDEMG remembers.
MDEMG provides long-term memory for AI agents, enabling them to:
- Store observations: Persist code patterns, decisions, and architectural knowledge
- Semantic recall: Retrieve relevant memories via vector similarity search
- Concept abstraction: Automatically form higher-level concepts from related memories (hidden layers)
- Associative learning: Build connections between memories through Hebbian reinforcement
- LLM re-ranking: Apply GPT-powered relevance scoring for improved retrieval quality
- Multi-layer graph architecture: Base observations (L0) → Hidden concepts (L1) → Abstract concepts (L2+)
- Hybrid search: Combines vector similarity with graph traversal
- Conversation Memory System (CMS): Persistent memory across agent sessions with surprise-weighted learning
- Symbol extraction (UPTS): Unified Parser Test Schema supporting 27 languages with file:line evidence and SHA256 fixture verification
- Plugin system: Extensible via ingestion, reasoning, and APE (Autonomous Pattern Extraction) modules
- Evidence-based retrieval: Returns symbol-level citations (file:line references) with results
- Capability gap detection: Identifies missing knowledge areas for targeted improvement
- Codebase ingestion API: Background job processing for large codebase ingestion with consolidation
- Git commit hooks: Automatic incremental ingestion on every commit via post-commit hook
- Freshness tracking: TapRoot-level staleness detection with configurable thresholds
- Scheduled sync: Periodic background sync to keep memory graphs up-to-date
- Webhook integration: Linear webhook endpoint with HMAC signature verification and debouncing
- File watcher: Standalone
mdemg-watchbinary for automatic re-ingestion on file changes - Orphan detection: Timestamp-based detection of nodes missing from re-ingestion with archive/delete/list actions
- Edge consistency: Automatic staleness tracking and edge weight refresh during consolidation
- Backup & restore: Automated full database dumps and partial space exports with retention policies and scheduler
- Neo4j state monitoring: Single endpoint for consolidated database health, per-space statistics (nodes, edges, layers, health score, staleness), and backup overview
- Meta-cognition enforcement: Server-side anomaly detection (empty-resume, empty-recall), hook circuit breakers with CRITICAL warnings, multi-dimensional watchdog monitoring, Hebbian signal learning for adaptive enforcement
- Space Transfer & DevSpace: Export/import space graphs as
.mdemgfiles or via gRPC; optional DevSpace hub for agent registration, publish/pull exports, and inter-agent messaging (seecmd/space-transfer/README.mdanddocs/specs/development-space-collaboration.md)
┌─────────────────────────────────────────────────────────────┐
│ AI Coding Agent │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ IDE/CLI │◄──►│ MCP Server │◄──►│ MDEMG Client │ │
│ │ (Cursor) │ │ (tools) │ │ (HTTP) │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
└─────────────────────────────────────────────────┼──────────┘
│
┌─────────────────────────────▼─────────┐
│ MDEMG Service (dynamic port) │
│ ┌─────────┐ ┌───────────────────┐ │
│ │Embedding│ │ Neo4j Graph │ │
│ │ Provider│ │ (Vector + Graph) │ │
│ └─────────┘ └───────────────────┘ │
└───────────────────────────────────────┘
- Go 1.24+
- Docker (for Neo4j)
- Embedding provider: Ollama (local) or OpenAI API key
# Clone the repo
git clone https://github.com/reh3376/mdemg.git
cd mdemg
# Copy environment config
cp .env.example .env
# Edit .env with your settings (embedding provider, Neo4j credentials)
# Start Neo4j
docker compose up -d
# Build the server
go build -o bin/mdemg ./cmd/server
# Run the server
./bin/mdemg# Build the ingestion tool
go build -o bin/ingest-codebase ./cmd/ingest-codebase
# Ingest a codebase
./bin/ingest-codebase --space-id=my-project --path=/path/to/repo
# Incremental ingest (only changed files since last commit)
./bin/ingest-codebase --space-id=my-project --path=/path/to/repo --incremental
# Quiet mode (suppress non-error output, useful for hooks/CI)
./bin/ingest-codebase --space-id=my-project --path=/path/to/repo --quiet
# Log to file instead of stderr
./bin/ingest-codebase --space-id=my-project --path=/path/to/repo --log-file /tmp/ingest.log
# Run consolidation to create concept layers
curl -X POST http://localhost:9999/v1/memory/consolidate \
-H "Content-Type: application/json" \
-d '{"space_id": "my-project"}'Install the post-commit hook to automatically ingest changes on every commit:
# Install the hook
./scripts/install-hook.sh /path/to/your/repo
# The hook runs quietly by default. Configure via environment:
# MDEMG_SPACE_ID - space to ingest into (default: repo directory name)
# MDEMG_ENDPOINT - server URL (default: http://localhost:9999)
# MDEMG_VERBOSE - set to "true" for verbose output
# MDEMG_LOG_FILE - redirect logs to a file| Endpoint | Method | Description |
|---|---|---|
/v1/memory/retrieve |
POST | Semantic search with optional LLM re-ranking |
/v1/memory/consult |
POST | SME-style Q&A with evidence citations |
/v1/memory/ingest |
POST | Store a single observation |
/v1/memory/ingest/batch |
POST | Store multiple observations |
/v1/memory/consolidate |
POST | Trigger hidden layer creation |
/v1/memory/stats |
GET | Per-space memory statistics |
/v1/memory/ingest-codebase |
POST | Background codebase ingestion job |
/v1/memory/symbols |
GET | Query extracted code symbols |
/v1/memory/ingest/files |
POST | Ingest files with background job processing |
/v1/memory/spaces/{id}/freshness |
GET | Space freshness and staleness status |
/v1/webhooks/linear |
POST | Linear webhook receiver with HMAC verification |
| Endpoint | Method | Description |
|---|---|---|
/v1/scraper/jobs |
POST | Create a new scrape job |
/v1/scraper/jobs |
GET | List all scrape jobs |
/v1/scraper/jobs/{id} |
GET | Get job status and scraped content |
/v1/scraper/jobs/{id} |
DELETE | Cancel a running job |
/v1/scraper/jobs/{id}/review |
POST | Approve/reject/edit scraped content |
/v1/scraper/spaces |
GET | List available target spaces |
| Endpoint | Method | Description |
|---|---|---|
/v1/backup/trigger |
POST | Trigger backup (full database dump or partial space export) |
/v1/backup/status/{id} |
GET | Backup job status and progress |
/v1/backup/list |
GET | List all backups (optional ?type= filter) |
/v1/backup/manifest/{id} |
GET | Get full backup manifest (checksum, sizes, spaces) |
/v1/backup/{id} |
DELETE | Delete a backup artifact |
/v1/backup/restore |
POST | Trigger restore from full backup |
/v1/backup/restore/status/{id} |
GET | Restore job status |
| Endpoint | Method | Description |
|---|---|---|
/v1/conversation/resume |
POST | Restore session context with themes and concepts |
/v1/conversation/observe |
POST | Record observations (decisions, corrections, learnings) |
/v1/conversation/correct |
POST | Record user corrections for learning |
/v1/conversation/recall |
POST | Query conversation history |
/v1/conversation/consolidate |
POST | Create themes from observations |
/v1/conversation/session/anomalies |
GET | Aggregated session anomalies and health |
/v1/conversation/templates |
GET/POST | List or create observation templates |
/v1/conversation/templates/{id} |
GET/PUT/DELETE | CRUD for specific template |
/v1/conversation/snapshot |
POST | Create session snapshot |
/v1/conversation/snapshot/{id} |
GET/DELETE | Retrieve or delete snapshot |
/v1/conversation/snapshot/{id}/restore |
POST | Restore session from snapshot |
/v1/conversation/relevance |
POST | Compute observation relevance scores |
/v1/conversation/truncate |
POST | Truncate old observations |
/v1/conversation/org-review |
GET | Get observations pending org review |
/v1/conversation/org-review/{id}/approve |
POST | Approve observation for org sharing |
/v1/conversation/org-review/{id}/reject |
POST | Reject observation from org sharing |
| Endpoint | Method | Description |
|---|---|---|
/v1/learning/stats |
GET | Hebbian learning edge statistics |
/v1/learning/freeze |
POST | Freeze learning for stable scoring |
/v1/learning/unfreeze |
POST | Resume learning edge creation |
/v1/learning/prune |
POST | Remove decayed edges |
MDEMG extracts code symbols during ingestion using the Unified Parser Test Schema (UPTS):
Supported Languages (27 UPTS-validated, 100% pass rate):
- Systems: Go, Rust, C, C++, CUDA
- JVM: Java, Kotlin
- .NET: C#
- Scripting: Python, TypeScript/JavaScript, Lua, Shell
- API Schemas: Protocol Buffers, GraphQL, OpenAPI
- Configuration: YAML, TOML, JSON, INI
- Infrastructure: Terraform/HCL, Dockerfile, Makefile
- Database: SQL, Cypher (Neo4j)
- Documentation: Markdown, XML, Scraper Markdown (web-scraped content with section chunking)
Extracted Symbol Types:
- Functions, methods, classes, interfaces, types
- Constants, variables, enums
- Imports, exports, module declarations
Symbols include file:line references for evidence-based retrieval.
CMS provides persistent memory for AI agents across sessions:
# Resume session context (call at session start)
curl -X POST http://localhost:9999/v1/conversation/resume \
-H "Content-Type: application/json" \
-d '{"space_id": "my-agent", "session_id": "session-1", "max_observations": 10}'
# Record an observation (decision, learning, correction)
curl -X POST http://localhost:9999/v1/conversation/observe \
-H "Content-Type: application/json" \
-d '{
"space_id": "my-agent",
"session_id": "session-1",
"content": "User prefers TypeScript for new files",
"obs_type": "preference"
}'Observation Types: decision, correction, learning, preference, error, progress
Observations are surprise-weighted (novel information persists longer) and form themes via consolidation.
MDEMG provides an MCP server for IDE integration. Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"mdemg": {
"command": "/path/to/mdemg/bin/mdemg-mcp",
"args": [],
"env": {
"MDEMG_ENDPOINT": "http://localhost:9999"
}
}
}
}mdemg/
├── cmd/ # CLI tools (server, ingest, MCP, etc.)
├── internal/ # Core logic
│ ├── api/ # HTTP handlers
│ ├── retrieval/ # Search algorithms
│ ├── hidden/ # Concept abstraction
│ ├── learning/ # Hebbian edges
│ ├── conversation/ # Conversation Memory System (CMS)
│ ├── backup/ # Backup & restore (full, partial, scheduler, retention)
│ ├── symbols/ # Code symbol extraction
│ └── plugins/ # Plugin system
├── docs/ # Documentation
│ └── benchmarks/ # Benchmark questions, graders, results
├── migrations/ # Neo4j schema (Cypher)
├── tests/ # Integration tests
└── docker-compose.yml # Neo4j container
MDEMG includes a complete observability stack for monitoring and debugging.
cd deploy/docker
docker compose -f docker-compose.observability.yml up -d
# Access dashboards
open http://localhost:3000 # Grafana (admin/admin)
open http://localhost:9090 # Prometheus| Component | Port | Description |
|---|---|---|
| Prometheus | 9090 | Metrics collection and alerting |
| Grafana | 3000 | Dashboard visualization |
| Blackbox Exporter | 9115 | HTTP/TCP health probes |
Pre-configured dashboard with 10 panels:
- Request Rate, P95 Latency, Error Rate, Circuit Breakers
- Request Latency Distribution (p50/p95/p99)
- Requests by Status, Cache Hit Ratios
- Retrieval Latency, Rate Limit Rejections, Embedding Latency
curl http://localhost:9999/v1/prometheusExposes all MDEMG metrics in Prometheus format.
| Phase | Name | Status |
|---|---|---|
| 80 | CMS ANN Meta-Cognition & Self-Improvement Enforcement | ✅ Complete |
| 70 | Neo4j Backup & Restore (Full & Partial) with Scheduler | ✅ Complete |
| 60 | CMS Advanced Functionality II (Templates, Snapshots, Relevance, Truncation, Org-Review) | ✅ Complete |
| 49 | LLM Plugin SDK (Scaffolding, Validation, Gap Detection) | ✅ Complete |
| 48.3-48.4 | Data Transmission & Connection Pooling | ✅ Complete |
| 47 | Optimistic Lock Retry + Edge Consistency | ✅ Complete |
| 37 | Agent Health / Heartbeat / Presence | ✅ Complete |
| 38 | UNTS Hash Verification (Nash Verification) | ✅ Complete |
| 35 | CRDT for Learned Edges + Space Lineage | ✅ Complete |
| 34 | Incremental Sync (Delta Export) | ✅ Complete |
| 46 | Symbol Indexing | ✅ Complete |
| 48.5 | Observability Stack (Prometheus/Grafana) | ✅ Complete |
| Priority | Phase | Task | Description |
|---|---|---|---|
| 1 | 45.5 | APE Context Cooler | Volatile → long-term memory graduation |
| 2 | 45.5 | APE Constraint Module | Priority/deadline enforcement |
| 3 | 51 | Web Scraper Ingestion | Web content ingestion with section chunking |
| 4 | 50 | Public Readiness | Open source hardening |
| Phase | Name | Description |
|---|---|---|
| 36 | Observation Forwarding | Team-visible observations via DevSpace |
| 50 | Public Readiness | Governance, security, CI/CD, onboarding |
See AGENT_HANDOFF.md for detailed phase specifications.
- Architecture - System design and components
- Graph Schema - Neo4j labels and relationships
- Retrieval & Scoring - Scoring algorithm details
- Benchmarking Guide - Running and validating benchmarks
- CI/CD Integration - Git hooks, GitHub Actions, and scheduled sync
- API Reference - Full API endpoint documentation
- Backup & Restore Guide - Backup configuration, manual triggers, retention policies
- Agent Handoff - Complete development context and phase registry
See CONTRIBUTING.md for development setup and guidelines.
See SECURITY.md for the vulnerability reporting policy.
MIT License - see LICENSE for details.