diff --git a/designs/0011-knowledge-bases.md b/designs/0011-knowledge-bases.md new file mode 100644 index 000000000..83b368fb2 --- /dev/null +++ b/designs/0011-knowledge-bases.md @@ -0,0 +1,406 @@ +# Long-Term Memory + +**Status**: Implemented + +**Date**: 2026-05-14 + +**Issue**: TBD + +**Scope**: TypeScript SDK + +## Context + +Strands agents today are stateless across sessions. Every conversation starts from zero: the agent can't recall user preferences, past decisions, or accumulated knowledge. When information leaves the context window, it's gone unless the developer builds custom persistence. The SDK provides session management (persisting conversation state) and context management (handling context window size within a session), but neither addresses cross-session knowledge. An agent that assists a user daily should be able to remember what it learned yesterday without replaying the full history. Prototyping a memory-enabled agent today requires wiring up a vector store, writing extraction logic, managing tool registration, and handling multi-tenancy. This should be supported natively. + +This design proposes a `MemoryManager` primitive that owns long-term knowledge: persisting facts to configurable backends, recalling them via tools or system prompt injection, and optionally extracting them from conversations. The primitive solves three distinct problems: + +1. **Knowledge Retrieval**: how the agent searches and surfaces stored knowledge at the right time +2. **Knowledge Ingestion**: how knowledge enters the system (triggers, writes, deduplication) +3. **Fact Extraction**: how conversation messages become structured knowledge entries (for backends that don't handle extraction server-side) + +## Decision + +### Architecture + +`MemoryManager` is the component that gives agents persistent knowledge across sessions. It handles storing facts, recalling them when relevant, and optionally extracting them from conversations. + +It is exposed as a top-level `memoryManager` parameter on `AgentConfig`, following the pattern of `contextManager` and `sessionManager`. It accepts either a `MemoryManager` instance or a plain `MemoryManagerConfig` object (auto-wrapped): + +```typescript +new Agent({ + model, + memoryManager: new MemoryManager({ ... }), +}) +``` + +Under the hood, MemoryManager integrates with the agent lifecycle via hooks: registering tools at initialization, injecting knowledge before model calls, and ingesting new facts after each turn. + +**Stores.** A store is a backend that holds and retrieves knowledge (a vector database, a managed service like Amazon Bedrock Knowledge Bases or AgentCore Memory, or any implementation of the store interface). MemoryManager orchestrates one or more stores: + +```typescript +memoryManager: new MemoryManager({ + stores: [ + { store: userStore, ingestion: { trigger: 'tool' } }, + { store: teamStore }, // search-only + { store: orgStore }, // search-only + ], +}) +``` + +Multi-store support avoids pushing multi-tenancy complexity onto the developer. A single agent can query personal, team, and organization knowledge simultaneously. Scoping (namespace, tenant isolation) is handled by each store's own constructor config — e.g., `BedrockKnowledgeBaseStore({ scope: 'user-123' })` or `AgentCoreMemoryStore({ namespace: 'facts/user-123' })`. + +One `KnowledgeStore` interface with `search()` required and `add()` / `delete()` optional. Runtime helpers narrow the type when writes are needed. This makes multi-tenant patterns natural: team or org stores that are pre-populated externally simply don't implement `add()`, while a user's personal store does. Writability at the MemoryManager level is determined by whether the store has an ingestion configuration (see Knowledge Ingestion below). + +```typescript +interface KnowledgeStore { + search(query: string, options?: Record): Promise + add?(content: string, metadata?: Record): Promise + delete?(id: string): Promise +} +``` + +**Shipped backends.** + +| Backend | Package | Use case | +|---------|---------|----------| +| `InMemoryKnowledgeStore` | `@strands-agents/sdk` | Testing, prototyping | +| `FileKnowledgeStore` | `@strands-agents/sdk` | Local development | +| `BedrockKnowledgeBaseStore` | `@strands-agents/sdk` | Production (managed, zero-infra) | +| `AgentCoreKnowledgeStore` | `@strands-agents/memory-agentcore` | AgentCore managed memory | + +The three in-SDK backends cover the prototyping → local dev → production progression without adding dependencies. Bedrock KB is the managed zero-infra option. Third-party managed memory services like AgentCore carry client dependencies and are opt-in via separate packages so the SDK stays lean. + +--- + +### Knowledge Retrieval + +The agent needs stored knowledge at the right moment, but retrieving it has a cost (latency, tokens, relevance noise). MemoryManager provides two retrieval mechanisms that offer different trade-offs between precision and reliability. Both can be used together. + +#### Active Recall + +Active recall lets the agent decide when memory is relevant. Instead of retrieving knowledge every turn, the agent searches on demand, only when it judges that stored knowledge would help. + +This works by registering a `search_memory` tool that the agent can call like any other tool. The trade-off: active recall depends on the model recognizing when to search. If the model doesn't think to look, relevant memories stay hidden. + +```typescript +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store, ingestion: { trigger: 'tool' } }], + includeTools: true, // Agent gets search_memory tool. + }), +}) +``` + +When multiple stores are configured, results are interleaved by rank using store config order as the priority signal. Scores are not compared across stores because different backends produce incomparable scales. If a store fails, partial results from other stores are still returned. + +#### Context Injection + +Context injection guarantees that relevant knowledge is always present, at the cost of paying for retrieval every turn. This is useful when baseline context is more important than token efficiency, or when the model can't reliably judge when to search. + +Enabled via `injection: true`. Each turn, MemoryManager searches stores using the last substantive user message as the query, formats results into a `` block, and appends it to the system prompt. The block is stripped and re-injected fresh every turn so memories never accumulate in the prompt and each turn gets a fresh set based on the current query. The token budget and formatting are configurable via `maxTokens`, `format`, and `query` functions. + +```typescript +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store, ingestion: { trigger: 'tool' } }], + injection: true, // searches every turn, injects into system prompt + }), +}) +``` + +--- + +### Knowledge Ingestion + +Knowledge can enter the system in two ways: the agent explicitly writes it (via a `store_memory` tool registered by MemoryManager), or MemoryManager automatically extracts it from conversation messages using an extractor (a component that distills messages into discrete facts via a model call). + +MemoryManager uses triggers to control when these writes happen. A trigger is a named event that causes MemoryManager to process recent messages and write to the store. Four built-in triggers cover most use cases: + +| Trigger | When | Cost | +|---------|---------------|---------------| +| `tool` | Agent calls `store_memory` | Nothing extra (agent provides content directly) | +| `perTurn` | After every agent invocation | High (model call per turn if an extractor is configured) | +| `onEviction` | Messages are evicted from the context window | Medium (only fires on eviction events) | +| `scheduled` | Every N turns (configurable via `interval`) | Controllable | + +All writes are async and non-blocking. This means a fact stored in one turn may not be searchable immediately in the next (eventual consistency). + +**Deduplication.** MemoryManager tracks a per-store high-water mark: a pointer to the last message that was already processed. Each trigger only processes messages beyond that mark, preventing duplicate writes. Tool-related content blocks (`toolUse`, `toolResult`) are filtered out by default before processing, since they rarely contain user-relevant knowledge. + +**Custom triggers.** For cases the built-in triggers don't cover, the store interface is public and `add()` can be called directly from any lifecycle hook. + +**Deletion and corrections.** `KnowledgeStore.delete()` is an optional method available for programmatic use (compliance, cleanup), but no `delete_memory` tool is registered for the agent. Stores that don't support deletion simply don't implement it. Exposing deletion to the agent risks accidental data loss with no undo path. Instead, corrections are handled by storing updated facts. Newer entries take precedence via recency weighting in search results. + +--- + +### Fact Extraction + +When messages are ingested, they need to become searchable knowledge entries. Some managed backends (e.g. AgentCore Memory) handle this transformation server-side, accepting raw messages and producing structured entries internally. For self-managed backends that store only what they're given, MemoryManager can extract discrete facts from conversation messages before writing them. + +Extraction is optional. It is only needed when two conditions are true: the backend doesn't handle extraction server-side, and you want MemoryManager to distill conversations into facts automatically rather than relying on the agent to provide facts via the `tool` trigger. + +Each store can have its own extractor, because different stores benefit from different extraction styles. A preferences store might want pure facts ("user prefers dark mode") while a decisions store wants richer context with reasoning. `ModelExtractor` is the built-in implementation: it calls a language model to extract facts, defaulting to the agent's own model but configurable with an explicit cheaper model to reduce cost. + +When no extractor is configured, messages are serialized as plain text and passed directly to the store's `add()` method. This is the correct setup for managed backends that handle extraction internally. + +## Developer Experience + +### Minimal: prototyping + +```typescript +import { Agent, MemoryManager, InMemoryKnowledgeStore } from '@strands-agents/sdk' + +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store: new InMemoryKnowledgeStore(), ingestion: { trigger: 'tool' } }], + }), +}) +// Agent now has search_memory and store_memory tools. Zero infrastructure. +``` + +### Production: Bedrock Knowledge Bases + +```typescript +import { Agent, MemoryManager, BedrockKnowledgeBaseStore, ModelExtractor } from '@strands-agents/sdk' + +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ + store: new BedrockKnowledgeBaseStore({ knowledgeBaseId: 'KB123', dataSourceId: 'DS456', scope: 'user-123' }), + ingestion: { trigger: ['tool', 'perTurn'], extractor: new ModelExtractor({ model }) }, + }], + }), +}) +``` + +### Multi-tenant: personal + team + org + +```typescript +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [ + // Personal: learns from conversation (scope configured on store) + { store: userKB, ingestion: { trigger: ['tool', 'perTurn'], extractor } }, + // Team: search-only, pre-populated (no ingestion = no writes) + { store: teamKB }, + // Org: search-only, shared + { store: orgKB }, + ], + }), +}) +// search_memory queries all three, merges by rank position +// store_memory writes only to stores with 'tool' trigger +``` + +### With context injection + +```typescript +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store, ingestion: { trigger: 'tool' } }], + injection: true, // default XML format, 2000 token budget + }), +}) +``` + +## Alternatives Considered + +### 1. Memory as a top-level Agent parameter (no MemoryManager class) + +```typescript +new Agent({ memory: { stores: [...], injection: {...} } }) +``` + +**Why rejected:** A config object doesn't provide methods (`search`, `store`, `flush`) that power users need for programmatic access. The class also owns the ingestion queue lifecycle. + +### 2. Single store (no multi-store orchestration) + +**Why rejected:** Forces multi-tenant patterns onto the developer. Multi-store is a customer ask for production agents. + +### 3. Two-interface split (`KnowledgeStore` + `MutableKnowledgeStore`) + +The split provides compile-time guarantees that are leaky in practice for custom extensions. AgentCore Memory is event-sourced. `delete()` operates on a different entity than what `add()` creates. HindSight doesn't support deletion at all, but it would be forced to implement `delete()` that throws. The two-interface approach creates false confidence while real integrations still hits runtime failures. A single interface with optional methods and runtime helpers is simpler and more honest. + + +## Consequences + +### What Becomes Easier + +- Cross-session knowledge becomes a single parameter. No custom persistence, no manual tool registration, no vector store wiring. +- Multi-tenancy is built in via multi-store (scoping handled per-store). +- Progressive complexity: `InMemoryKnowledgeStore` (prototyping) to `BedrockKnowledgeBaseStore` (production) is changing one import. + +### What Becomes Harder or Requires Attention + +- **Eventual consistency**: writes are async; a fact may not be searchable in the next turn. +- **Extraction cost**: `perTurn` triggers a model call every turn. We need sensible defaults and good documentation for users to navigate this. +- **Active recall and context injection depends on model judgment**: the model must know when to search. Context injection guarantees baseline context at the cost of always paying for retrieval. We need to evaluate and baseline tool descriptions. + +### Migration + +No breaking changes. `memoryManager` is a new optional parameter on `AgentConfig`. + +## Willingness to Implement + +Yes. + +--- + +
+Appendix A: Core Interfaces + +### Knowledge Entry + +```typescript +interface KnowledgeEntry { + id: string + content: string + metadata?: Record // score, provenance, etc. live here +} +``` + +### Store Interface + +```typescript +interface KnowledgeStore { + search(query: string, options?: Record): Promise + add?(content: string, metadata?: Record): Promise + delete?(id: string): Promise +} + +// Runtime type guards for narrowing +function hasAdd(store: KnowledgeStore): store is KnowledgeStore & { add(...): Promise } +function hasDelete(store: KnowledgeStore): store is KnowledgeStore & { delete(...): Promise } +``` + +**`search()` options bag:** Each backend pulls what it understands from the options record: +- InMemory/File/BedrockKB read `options.limit` +- AgentCore reads `options.memoryStrategyId` +- HindSight reads `options.budget`, `options.tags` + +MemoryManager passes `{ limit: config.limit ?? 10 }` — stores use what's relevant, ignore the rest. + +**`score` is metadata:** All backends return results in relevance order. Score is informational metadata some backends provide (`metadata.score`), not a first-class field. Stores that don't produce scores just don't set it. MemoryManager trusts position for round-robin interleaving. + +### Ingestion Pipeline + +```typescript +type IngestionTrigger = 'tool' | 'perTurn' | 'onEviction' | 'scheduled' + +type ContentBlockType = 'text' | 'toolUse' | 'toolResult' | 'reasoning' | 'cachePoint' | 'guardContent' | 'image' | 'video' | 'document' | 'citations' + +interface MessageFilter { + exclude: ContentBlockType[] +} + +interface IngestionConfig { + trigger: IngestionTrigger | IngestionTrigger[] + extractor?: Extractor // optional: if omitted, messages serialized as text + interval?: number // for 'scheduled': every N turns + filter?: MessageFilter // default: { exclude: ['toolUse', 'toolResult'] } +} + +interface Extractor { + extract(messages: MessageData[]): Promise<{ content: string; metadata?: Record }[]> +} +``` + +### MemoryManager Config + +```typescript +interface MemoryManagerConfig { + stores: StoreConfig[] + includeTools?: boolean | ToolsConfig // default: true (registers search_memory, store_memory) + injection?: boolean | InjectionConfig // default: false (opt-in passive recall) +} + +interface ToolConfig { + name?: string + description?: string +} + +interface ToolsConfig { + search?: boolean | ToolConfig + store?: boolean | ToolConfig +} + +interface StoreConfig { + store: KnowledgeStore + limit?: number // max results from this store (default: 10) + ingestion?: IngestionConfig // if present, store is a write target +} + +interface InjectionConfig { + format?: (entries: KnowledgeEntry[]) => string // default: XML block format + maxTokens?: number // budget for injected content (default: 2000) + query?: (messages: MessageData[]) => string // default: last substantive user message +} +``` + +**`includeTools` resolution:** +- `true` (default): registers both `search_memory` and `store_memory` +- `false`: no tools registered (use with injection-only) +- `{ search: true, store: false }`: search only, no agent-driven writes +- `{ search: { name: 'recall' } }`: rename tool to avoid conflicts +- `{ store: { description: '...' } }`: customize tool description + +**`injection` resolution:** +- `false` (default): no injection +- `true`: injection enabled with default XML format and 2000 token budget +- `{ maxTokens: 4000 }`: injection with custom budget +- `{ format: customFn }`: injection with custom format (escape hatch) + +
+ +--- + +
+Appendix B: Implementation Details + +### Context injection lifecycle + +When injection is enabled, MemoryManager hooks into `BeforeInvocationEvent` (once per turn). The lifecycle: + +1. **Strip**: Remove previous `` block from system prompt +2. **Retrieve**: Use last substantive user message (>10 chars) as search query +3. **Format**: Render results as XML, respecting `maxTokens` budget +4. **Inject**: Append block to end of system prompt + +If no substantive message exists or retrieval returns zero results, injection is skipped for that turn. The block is never persisted in conversation history. + +### `store_memory` tool behavior + +Accepts `{ entries: string[] }` (batch). Writes fan out to all stores with `'tool'` in their trigger array. Writes are async and non-blocking. Returns `{ stored: true }` — no IDs are surfaced to the agent. + +### Message filter details + +`filter: { exclude: ContentBlockType[] }` strips content block types before they reach the extractor or serializer. Filter applies first. Messages that become empty after filtering are dropped entirely. Default: `exclude: ['toolUse', 'toolResult']`. + +Extractors receive only unprocessed messages (tracked via per-store high-water mark). + +### `strands_source` metadata + +Every write is tagged to indicate content type: +- `'tool'`: agent explicitly called `store_memory` +- `'extraction'`: extractor processed messages into facts +- `'raw'`: messages serialized directly (no extractor) + +### Custom triggers + +```typescript +const myStore = new BedrockKnowledgeBaseStore({ ... }) + +agent.addHook(AfterToolCallEvent, async (event) => { + if (event.tool.name === 'important_api') { + await myStore.add(`API result: ${summarize(event.result)}`) + } +}) +``` + +