Add shared long-term memory server (experimental) by reyortiz3 · Pull Request #5015 · stacklok/toolhive

reyortiz3 · 2026-04-22T16:49:48Z

Summary

ToolHive manages MCPs (tools) and Skills (procedural knowledge as OCI artifacts). The missing primitive is shared long-term memory — a knowledge store that agents can query and contribute to across sessions. Without it every agent session starts cold, and facts learned in one session are invisible to others.

This PR introduces the memory server core (Plan 1 of 3):

pkg/memory/ — domain types (Entry, Revision, ListFilter), three pluggable interfaces (Store, VectorStore, Embedder), a Service orchestration layer with conflict detection and score-weighted search ranking, trust/staleness scoring formulas, and gomock mocks
pkg/memory/sqlite/ — SQLite-backed Store and VectorStore (Go-native cosine similarity, no CGo dependency); goose migrations including a TypeEpisodic type for time-indexed event records
pkg/memory/embedder/ollama/ — Ollama HTTP embedder that probes vector dimensions on startup
cmd/thv-memory/ — standalone MCP server binary serving 9 tools over streamable HTTP (/mcp), with a /health liveness probe, YAML config with sensible defaults, and a background lifecycle job (TTL expiry, score recomputation every 24h)
docs/proposals/2026-04-22-shared-memory-server.md — full design doc covering architecture, tool surface, scoring, conflict detection, Skills relationship, comparison with LinkedIn's Cognitive Memory Agent, and the recommended three-tier memory activation strategy

Key design decisions:

Conflict detection is write-time (cosine similarity > 0.85 blocks the write and returns candidates for the agent to resolve — no LLM inference needed)
Search results are ranked by similarity × trust_score × (1 − 0.3 × staleness_score) so flagged/stale entries don't rank above fresh, trusted ones
The agent IS the retrieval orchestrator — tools are explicit MCP calls, not auto-triggered pipelines
Three memory types: semantic (aggregated facts), procedural (how-to), episodic (time-indexed events with CreatedAfter/CreatedBefore list filters)

Plans 2 (CLI thv memory subcommand + system workload integration) and 3 (Kubernetes MCPMemoryServer CRD) are follow-up work.

Type of change

New feature

Test plan

Unit tests (task test)
Linting (task lint-fix)
Manual testing (integration test in cmd/thv-memory/integration_test.go wires real SQLite store + vector store + fake embedder end-to-end: remember → search → access count increment → delete → ErrNotFound; conflict detection test verifies force-write path)

Changes

File	Change
`pkg/memory/types.go`	Domain types: `Entry`, `Revision`, `ListFilter` (with time-range fields), `VectorFilter`, `Type` (`semantic`/`procedural`/`episodic`), scoring types
`pkg/memory/interfaces.go`	`Store`, `VectorStore`, `Embedder` interfaces + mockgen directives
`pkg/memory/service.go`	`Service`: conflict detection, `Remember`, `Search` with composite ranking
`pkg/memory/scoring.go`	`ComputeTrustScore`, `ComputeStalenessScore`
`pkg/memory/sqlite/`	SQLite Store, VectorStore, goose migrations (001 initial + 002 adds episodic type)
`pkg/memory/embedder/ollama/`	Ollama HTTP embedder
`pkg/memory/mocks/`	Generated gomock mocks for all three interfaces
`cmd/thv-memory/main.go`	Entry point: HTTP server lifecycle, graceful shutdown
`cmd/thv-memory/server.go`	MCP server construction, tool registration, streamable HTTP handler + `/health`
`cmd/thv-memory/config.go`	YAML config with defaults (SQLite, Ollama, `0.0.0.0:8080`)
`cmd/thv-memory/lifecycle/job.go`	Background job: TTL expiry + score recomputation
`cmd/thv-memory/tools/`	9 MCP tool handlers
`cmd/thv-memory/integration_test.go`	End-to-end integration test
`docs/proposals/2026-04-22-shared-memory-server.md`	Design doc

Does this introduce a user-facing change?

No — this adds a new standalone binary (cmd/thv-memory) and supporting packages. Nothing in the existing CLI or operator is modified. The binary is not yet wired into thv commands (that is Plan 2).

Implementation plan

Approved implementation plan

This PR was planned and implemented with Claude Code. The design spec is at docs/proposals/2026-04-22-shared-memory-server.md. The implementation follows the spec with the following notable adaptations:

Type names use Go stutter-avoidance convention (Entry not MemoryEntry, Store not MemoryStore) to satisfy the revive linter
goose.NewProvider (scoped) used instead of global goose.SetBaseFS/SetDialect to avoid concurrent-open races
server.NewStreamableHTTPServer + server.WithStdioContextFunc used to match actual mcp-go v0.48.0 API
SQLite VectorStore uses Go-native cosine similarity (no CGo/sqlite-vec dependency) with a load-and-score approach; external VectorStore providers are pluggable via the VectorStore interface for datasets > 100K entries

Special notes for reviewers

This is experimental — do not merge until Plans 2 and 3 are ready. Specific areas to scrutinise:

pkg/memory/sqlite/vector.go: the load-all-and-score approach works for small datasets but will not scale past ~100K entries. The VectorStore interface is designed to be swapped for Qdrant/pgvector when needed.
pkg/memory/service.go: the conflict threshold (0.85) and staleness penalty weight (0.3) are initial values — they will need tuning against real usage data.
cmd/thv-memory/server.go: no auth middleware on the MCP endpoint yet. Auth will be enforced at the ToolHive proxy layer when the system workload integration lands in Plan 2.

Generated with Claude Code

Introduces the core domain types for ToolHive's shared long-term memory system: MemoryEntry, MemoryRevision, typed constants for MemoryType, AuthorType, SourceType, EntryStatus, and ArchiveReason, plus filter and result types used by the store interface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduces pkg/memory with three pluggable interfaces (Store, VectorStore, Embedder), a Service orchestration layer with conflict detection and score-weighted search ranking, SQLite-backed implementations, an Ollama embedder, and gomock mocks for all interfaces. Key behaviours: - Conflict detection on write: cosine similarity > 0.85 blocks the write and returns conflicting entries for the agent to resolve - Trust scoring: author weight × age decay × correction penalty × flag multiplier - Staleness scoring: access age + flag bonus + correction bonus - Search ranking: composite score (similarity × trust × staleness penalty) so flagged/stale entries do not rank above fresh, trusted ones - TypeEpisodic memory type for time-indexed event records - ListFilter time-range fields (CreatedAfter/CreatedBefore) for timeline queries - SQLite migration 002 widens the type CHECK constraint to include episodic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Standalone MCP server exposing 9 memory tools over streamable HTTP (/mcp endpoint, /health liveness probe). Wires SQLite store and vector store, Ollama embedder, and a background lifecycle job that runs every 24h to expire TTL'd entries and recompute trust/staleness scores. Tools: memory_remember, memory_search, memory_recall, memory_forget, memory_update, memory_flag, memory_list, memory_consolidate, memory_crystallize. Config via memory-server.yaml with defaults (SQLite + sqlite-vec + Ollama on localhost:11434, listening on 0.0.0.0:8080). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers architecture, MCP tool surface, trust/staleness scoring, conflict detection, Skills relationship, a comparison with LinkedIn's Cognitive Memory Agent, and the recommended three-tier memory activation strategy (session-boundary injection, signal-based mid-session reads, write-on-observation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.

This review will be automatically dismissed once you add the justification section.

codecov · 2026-04-22T16:58:02Z

Codecov Report

❌ Patch coverage is 35.13770% with 683 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.52%. Comparing base (cffe934) to head (40c641d).

Files with missing lines	Patch %	Lines
pkg/memory/sqlite/store.go	56.29%	101 Missing and 17 partials ⚠️
cmd/thv-memory/main.go	0.00%	78 Missing ⚠️
cmd/thv-memory/config.go	0.00%	50 Missing ⚠️
cmd/thv-memory/tools/crystallize.go	0.00%	49 Missing ⚠️
pkg/memory/service.go	58.11%	34 Missing and 15 partials ⚠️
cmd/thv-memory/tools/consolidate.go	0.00%	44 Missing ⚠️
cmd/thv-memory/tools/remember.go	0.00%	39 Missing ⚠️
pkg/memory/sqlite/db.go	46.55%	21 Missing and 10 partials ⚠️
cmd/thv-memory/lifecycle/job.go	32.55%	23 Missing and 6 partials ⚠️
cmd/thv-memory/tools/list.go	0.00%	29 Missing ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5015      +/-   ##
==========================================
- Coverage   69.02%   68.52%   -0.51%     
==========================================
  Files         554      573      +19     
  Lines       73075    74128    +1053     
==========================================
+ Hits        50443    50797     +354     
- Misses      19620    20254     +634     
- Partials     3012     3077      +65

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

reyortiz3 and others added 5 commits April 21, 2026 18:25

Add .worktrees to gitignore

0b37a87

github-actions Bot requested changes Apr 22, 2026

View reviewed changes

github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Apr 22, 2026

Merge branch 'main' into feature/memory-server-core

40c641d

github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add shared long-term memory server (experimental)#5015

Add shared long-term memory server (experimental)#5015
reyortiz3 wants to merge 6 commits intomainfrom
feature/memory-server-core

reyortiz3 commented Apr 22, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reyortiz3 commented Apr 22, 2026

Summary

Type of change

Test plan

Changes

Does this introduce a user-facing change?

Implementation plan

Special notes for reviewers

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Large PR Detected

How to unblock this PR:

Alternative:

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Apr 22, 2026 •

edited

Loading