From 86bf57dabd897967915512585d829453782b76ab Mon Sep 17 00:00:00 2001 From: opieter-aws Date: Fri, 15 May 2026 12:26:50 -0400 Subject: [PATCH 1/3] design: knowledge bases --- designs/0011-knowledge-bases.md | 616 ++++++++++++++++++++++++++++++++ 1 file changed, 616 insertions(+) create mode 100644 designs/0011-knowledge-bases.md diff --git a/designs/0011-knowledge-bases.md b/designs/0011-knowledge-bases.md new file mode 100644 index 000000000..02b5277c9 --- /dev/null +++ b/designs/0011-knowledge-bases.md @@ -0,0 +1,616 @@ +# Design: Long-Term Memory (L2 Knowledge Primitive) + +**Status**: Proposed + +**Date**: 2026-05-14 + +**Issue**: TBD + +- [1. Problem Statement](#1-problem-statement) +- [2. The Three-Tier Model](#2-the-three-tier-model) +- [3. Key Decisions](#3-key-decisions) +- [4. Developer Experience](#4-developer-experience) +- [5. Relationship to Context Management](#5-relationship-to-context-management) +- [6. Anticipated Questions](#6-anticipated-questions) +- Appendix A: Core Interfaces +- Appendix B: Storage Backends +- Appendix C: Alternatives Considered + +--- + +## 1. Problem Statement + +Agents today are stateless across sessions. Every conversation starts from zero — the agent can't recall user preferences, past decisions, or accumulated knowledge. When messages are evicted from the context window, that information is gone unless the developer builds custom persistence. + +The SDK provides session management (persisting the conversation state) and context management (handling pressure within a session), but neither addresses the cross-session knowledge problem. An agent that assists a user daily should remember what it learned yesterday without replaying the full history. + +This is the [simple at any scale](https://github.com/strands-agents/docs/blob/main/team/TENETS.md) gap: prototyping a memory-enabled agent today requires wiring up a vector store, writing extraction logic, managing tool registration, and handling multi-tenancy. It should be one parameter. + +--- + +## 2. The Three-Tier Model + +Context management maps to a three-tier cache hierarchy. This design covers **L2 only**. + +| Tier | What it is | Lifecycle | Owner | +|------|-----------|-----------|-------| +| **L0** | Context window — what the model sees | Per-request | `agent.messages` | +| **L1** | Session history — evicted messages | Per-session | `contextManager` ([Context Management Presets](https://github.com/strands-agents/docs/pull/831)) | +| **L2** | Long-term knowledge — facts, preferences, learned behavior | Cross-session | **`memoryManager` (this design)** | + +L0 ↔ L1 is owned by `contextManager`. L2 is a separate primitive that reads from conversations (current or evicted) and writes to persistent knowledge stores that outlive any single session. + +--- + +## 3. Key Decisions + +### 3.1 MemoryManager is a Plugin, not a standalone class + +**Decision:** `MemoryManager` implements `Plugin` and integrates with the agent lifecycle via hooks. + +**Why:** Memory needs to react to agent lifecycle events — ingesting after each turn, injecting before model calls, registering tools at initialization. The plugin system already provides these hooks. A standalone class would need its own parallel lifecycle management or require the developer to manually wire callbacks. + +**Trade-off:** Coupling to the plugin interface means MemoryManager can't be used outside of an Agent. This is acceptable — L2 memory is inherently agent-scoped (it needs conversations to learn from and a model to serve). + +### 3.2 Multi-store architecture with namespacing + +**Decision:** MemoryManager orchestrates multiple stores, each with its own namespace. A single agent can query personal, team, and organization knowledge simultaneously. + +```typescript +memoryManager: new MemoryManager({ + stores: [ + { store: userStore, namespace: 'user-123' }, + { store: teamStore, namespace: 'team-marketing' }, + { store: orgStore, namespace: 'org-acme' }, + ], +}) +``` + +**Why:** Real-world agents serve users who exist within organizational hierarchies. A support agent should know the user's preferences (personal store), the team's procedures (team store), and the company's policies (org store). Forcing a single store pushes multi-tenancy complexity onto the developer. + +**Trade-off:** Merged search across stores adds latency (parallel queries). For single-store use cases, this is unnecessary overhead — but the overhead is minimal (one store = one query, no merge). + +**Merge strategy:** Results are interleaved by rank, with store config order as priority. The first result from each store (in config order), then the second from each, and so on. Scores are not compared across stores — different backends produce incomparable score scales. Store position in the config array is the explicit priority signal. + +### 3.3 Ingestion config determines writability — no explicit flag + +**Decision:** A store is writable if and only if it has an `ingestion` config. Stores without `ingestion` are search-only. This is enforced at the type level — `ingestion` is only available on `WritableStoreConfig` which requires `MutableKnowledgeStore`. + +**Why:** An explicit `readOnly: true` flag would be redundant — if there's no ingestion config, there's nothing to write and no trigger to write it. The absence of config *is* the signal. This keeps configuration minimal and makes multi-tenant patterns natural: team/org stores that are pre-populated externally simply omit `ingestion`. + +**Trade-off:** Less explicit than a flag. A developer reading the config must understand the convention. We mitigate this with documentation and the tool behavior: `store_memory` only targets stores with `'tool'` in their trigger — if none exist, the tool isn't registered. + +**`store_memory` behavior:** The tool accepts a batch of strings (`{ entries: string[] }`), allowing the agent to store multiple facts in a single call. Writes fan out to **all** stores with `'tool'` in their trigger array (not just the first). Writes are async and non-blocking. + +### 3.4 Read-only vs mutable store interfaces (interface split) + +**Decision:** Two interfaces — `KnowledgeStore` (search-only) and `MutableKnowledgeStore` (search + write + delete). Stores implement whichever matches their capability. + +**Why:** Avoiding optional methods at the interface level. A managed knowledge base that's externally populated (e.g., a pre-built Bedrock KB with company documentation) genuinely cannot accept writes — it shouldn't implement a `store()` method that throws. The split makes the type system reflect reality. + +**Trade-off:** Two interfaces instead of one. We considered a single interface with optional `store`/`delete`, but that pushes runtime errors ("this store doesn't support writes") into what should be compile-time guarantees. + +### 3.5 Active recall by default, passive injection opt-in + +**Decision:** By default, MemoryManager registers `search_memory` and `store_memory` tools (active recall). Injection into the system prompt (passive recall) is disabled by default and opt-in via `injection: true`. + +**Why:** Active recall lets the agent decide when memory is relevant — it searches when it needs context, not every turn. This is cheaper (no retrieval cost on irrelevant turns) and gives the agent control. Passive injection is useful for always-personalized agents (support bots, assistants) but adds retrieval latency and token cost to every model call. + +**Trade-off:** Tool-based recall depends on the model knowing when to search. If the model doesn't call `search_memory` when it should, relevant context is missed. Injection guarantees baseline context at the cost of always paying for retrieval. + +Only tools that have backing stores are registered. If no store has `'tool'` in its trigger, `store_memory` is not registered — `tools: true` means "register whatever tools are applicable," not "register all tools unconditionally." + +### 3.5.1 Injection lifecycle + +When injection is enabled, MemoryManager hooks into `BeforeInvocationEvent` (once per turn, not per model call) to manage injected content. The lifecycle is **strip → retrieve → format → inject**, executed fresh every turn. + +**How it works:** + +1. **Strip** — Remove the previous injection block from the system prompt (if present). Injected content is wrapped in a `` sentinel so it can be identified and removed cleanly without colliding with user content. +2. **Retrieve** — Determine the search query (default: last substantive user message, walking backward, skipping messages under 10 characters). If no substantive message is found, skip injection for this turn entirely. +3. **Format** — Render results into an XML block, respecting the `maxTokens` budget. If retrieval returns zero results, skip injection (no empty block). +4. **Inject** — Append the formatted block to the end of the system prompt. + +``` +Turn N: + BeforeInvocationEvent fires (once per invocation) + → strip ... from system prompt (from turn N-1) + → find last substantive user message as query (skip if none found) + → search stores, interleaving results by store priority + rank + → if results empty, done (no block injected) + → format results as XML, respecting maxTokens budget + → append ... to end of system prompt + → model sees fresh, relevant memories +``` + +**Why strip and re-inject every turn:** +- Memories are query-dependent. The relevant memories for turn 5 may differ from turn 4. +- Prevents accumulation — injected content never grows unbounded across turns. +- Keeps the injected block ephemeral — it's never persisted in conversation history, only present in the model's view for that single call. + +**Default format (XML):** + +```xml + +- user prefers dark mode +- last project was a React app for inventory management +- user is senior engineer, 10 years experience + +``` + +The `` sentinel is namespaced to avoid collisions with user-authored system prompt content. It provides clear boundary markers for the model and enables reliable stripping. + +**Query selection** is configurable via `injection.query`: + +```typescript +injection: { query: (messages) => customQueryLogic(messages) } +``` + +The default walks backward through messages to find the last `role: 'user'` message with > 10 characters. If nothing qualifies, injection is skipped for that turn (no retrieval, no block). + +### 3.6 All writes are async and non-blocking + +**Decision:** Store writes (ingestion) never block the agent loop. Writes are queued internally and flushed in the background. + +**Why:** Memory ingestion is not on the critical path of a conversation. The user is waiting for the agent's response, not for a fact to be persisted. Bedrock Knowledge Bases reinforce this — `IngestKnowledgeBaseDocuments` returns HTTP 202 (accepted, not completed). Blocking on writes would add latency to every turn for no user-visible benefit. + +**Trade-off:** A fact stored in one turn may not be searchable in the immediately following turn (eventual consistency). For most use cases this is acceptable — the information is still in L0 during the current session. Cross-session, it will have been indexed by the next conversation. + +### 3.7 Extraction is per-store and optional + +**Decision:** Each store has its own extractor (or none). The extractor transforms conversation messages into knowledge entries before ingestion. Extractor is optional on **all** triggers, not just `'tool'`. + +**Why:** Different stores may need different extraction strategies. A personal preferences store wants terse facts ("user prefers dark mode"). A decisions store wants richer context ("chose React over Vue because of team expertise, 2026-05-14"). A store with `'tool'` trigger needs no extractor — the agent provides content directly. + +**When no extractor is configured:** Messages are serialized to text (role-prefixed lines) and passed to `store()` as the content string. This supports managed service backends (e.g., AgentCore) that handle extraction server-side. + +**`strands_source` metadata tag:** Every write is tagged to indicate content type: +- `'tool'` — agent explicitly called `store_memory` +- `'extraction'` — extractor processed messages into facts +- `'raw'` — messages serialized directly (no extractor) + +**Trade-off:** Multiple extractors mean multiple model calls if several stores have `perTurn` triggers with extractors. We mitigate by making `'tool'` the cheapest trigger (no extraction cost) and `'scheduled'` a cost-controlled alternative to `'perTurn'`. + +**Extractor input:** Extractors always receive only **unprocessed messages** — messages since the store's last extraction, not the full history. This is tracked via a per-store high-water mark (message index or turn ID). + +### 3.8 Message filter on ingestion + +**Decision:** `IngestionConfig` has a `filter: { exclude: ContentBlockType[] }` that strips specific content block types from messages before they reach the extractor or serializer. + +**Why:** Tool machinery (toolUse, toolResult blocks) is typically noise for memory extraction — it's the conversation substance that matters, not the function calls. Filtering at the block level keeps the ingestion pipeline focused on semantically meaningful content. + +**Default:** `exclude: ['toolUse', 'toolResult']` — strip tool blocks, keep text and other content. Override with `filter: { exclude: [] }` to forward everything. + +**Lifecycle:** Filter always applies first — before extractor (if present) OR before serialization (if no extractor). Messages that become empty after filtering (all blocks excluded) are dropped entirely. + +### 3.9 Deduplication via high-water mark + +**Decision:** MemoryManager tracks a per-store high-water mark of processed messages. When any trigger fires, only messages beyond the mark are passed to the extractor. This prevents duplicate extraction when triggers overlap (e.g., `['perTurn', 'onEviction']` on the same store). + +**Why:** Without deduplication, a store with both `perTurn` and `onEviction` triggers would extract the same messages twice — once per turn, and again when those messages are evicted. The high-water mark ensures each message is extracted at most once per store. + +**Fail-safe direction:** If tracking state is lost (process restart, corruption), the worst case is a duplicate extraction — not data loss. Duplicates are acceptable; missed facts are not. + +### 3.10 ModelExtractor defaults to cheapest model in same provider family + +**Decision:** `ModelExtractor` detects the agent's model provider at init time and defaults to the cheapest model in that family. User can override with an explicit model. + +**Why:** Extraction is a structured task — "given these messages, output a list of facts." It doesn't need the same capability as the agent's primary model. Defaulting to the agent's model would burn expensive tokens (e.g., Opus) on a task Haiku handles fine. Matching the provider avoids credential mismatches. + +**Provider defaults:** +- `BedrockModel` → Claude Haiku via Bedrock +- `AnthropicModel` → Claude Haiku via Anthropic API +- `OpenAIModel` → gpt-4o-mini via OpenAI +- `GoogleModel` → Gemini Flash via Google + +**Override:** +```typescript +extractor: new ModelExtractor({ model: new BedrockModel({ modelId: 'us.anthropic.claude-sonnet-4-6-20250514-v1:0' }) }) +``` + +### 3.11 Delete is programmatic only — no agent tool + +**Decision:** `MutableKnowledgeStore.delete()` is exposed for developers (compliance, cleanup, admin scripts) but no `delete_memory` tool is registered for the agent. + +**Why:** Deletion is dangerous for an agent to perform autonomously. An agent that aggressively deletes old facts could lose valuable context. Corrections are better handled by storing updated facts — the newer entry naturally takes precedence via recency in search results. + +### 3.12 Partial results on store failure + +**Decision:** If a store's search fails, return partial results from stores that succeeded. Log the failure for observability. The model doesn't see the error. + +**Why:** The model can't fix infrastructure failures — surfacing them adds noise. Partial results are better than no results. If all stores fail, the tool returns an empty array (same as "no results found"). + +### 3.13 `memoryManager` is a top-level Agent parameter only + +**Decision:** `memoryManager` is a named parameter on `AgentConfig`. Passing MemoryManager via `plugins: [...]` throws with a helpful error. + +**Why:** One clear path. `memoryManager` follows the pattern of `conversationManager` and `contextManager` — named constructor parameters for first-class agent capabilities. It's implemented as a plugin internally, but that's an implementation detail. Two paths to register the same thing creates ambiguity. + +### 3.14 Custom triggers via direct store access + +**Decision:** No custom trigger mechanism on MemoryManager. Users who need custom triggers hook directly into the store's `store()` method from their own hooks. + +**Why:** The named triggers (`tool`, `perTurn`, `onEviction`, `scheduled`) cover 80% of use cases. For anything custom, the store interface is already public — the user holds a reference and can call `store.store()` from any hook. No new API surface needed. This is the low-level escape hatch alongside the high-level named triggers. + +```typescript +const myStore = new BedrockKnowledgeBaseStore({ ... }) + +agent.addHook(AfterToolCallEvent, async (event) => { + if (event.tool.name === 'important_api') { + await myStore.store('user-123', `API result: ${summarize(event.result)}`) + } +}) +``` + +### 3.15 Namespace is a plain string + +**Decision:** Namespace is an opaque string, not a structured type (no `{ userId, teamId, orgId }` hierarchy). + +**Why:** Multi-tenancy patterns vary wildly. Some apps have users, some have workspaces, some have hierarchies. A structured namespace type would either be too narrow (missing someone's pattern) or too broad (everything optional, no guidance). A plain string with documented conventions (`"user-{id}"`, `"team-{id}"`) is maximally flexible. + +**Trade-off:** No type safety on namespace values. Typos won't be caught at compile time. This is the same trade-off as route strings in web frameworks — acceptable given the flexibility benefit. + +### 3.16 `search_memory` tool accepts optional `limit` parameter + +**Decision:** The `search_memory` tool input schema includes an optional `limit` parameter so the model can control how many results it receives. + +```typescript +// search_memory tool input +{ query: string, limit?: number } // default: 10 +``` + +**Why:** Per-store limits cap each store's contribution. The tool-level limit lets the model request fewer results when it only needs a quick check. No global limit on MemoryManager — limits belong on the query, not the orchestrator. + +--- + +## 4. Developer Experience + +### Minimal — prototyping + +```typescript +import { Agent, MemoryManager, InMemoryKnowledgeStore } from '@strands-agents/sdk' + +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store: new InMemoryKnowledgeStore(), namespace: 'user-123', ingestion: { trigger: 'tool' } }], + }), +}) +// Agent now has search_memory and store_memory tools. Zero infrastructure. +``` + +### Production — Bedrock Knowledge Bases + +```typescript +import { Agent, MemoryManager, BedrockKnowledgeBaseStore, ModelExtractor } from '@strands-agents/sdk' + +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ + store: new BedrockKnowledgeBaseStore({ knowledgeBaseId: 'KB123', dataSourceId: 'DS456' }), + namespace: 'user-123', + ingestion: { trigger: ['tool', 'perTurn'], extractor: new ModelExtractor({ model }) }, + }], + }), +}) +``` + +### Multi-tenant — personal + team + org + +```typescript +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [ + // Personal — learns from conversation + { store: userKB, namespace: 'user-123', ingestion: { trigger: ['tool', 'perTurn'], extractor } }, + // Team — read-only, pre-populated + { store: teamKB, namespace: 'team-marketing' }, + // Org — read-only, shared + { store: orgKB, namespace: 'org-acme' }, + ], + }), +}) +// search_memory queries all three, merges by relevance score +// store_memory writes only to stores with 'tool' trigger +``` + +### With passive injection + +```typescript +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store, namespace: 'user-123', ingestion: { trigger: 'tool' } }], + injection: true, // enables injection with default XML format and 2000 token budget + }), +}) +// Memories auto-injected into system prompt each turn + tools for on-demand search + +// With custom budget: +injection: { maxTokens: 4000 } + +// Power user — custom format function (escape hatch): +injection: { format: (entries) => myCustomFormat(entries), maxTokens: 2000 } +``` + +--- + +## 5. Relationship to Context Management + +```typescript +new Agent({ + contextManager: "auto", // Owns L0 ↔ L1 (within-session) + memoryManager: new MemoryManager({ ... }), // Owns L2 (cross-session) +}) +``` + +The two primitives are independent — each is configured separately, each can exist without the other. The integration point is **eviction**: when `contextManager` evicts messages from L0 to L1, that's an opportunity for `memoryManager` to extract knowledge before it becomes harder to access. + +### Integration via `OnEvictionEvent` + +`contextManager` introduces an `OnEvictionEvent` hook that fires when messages are evicted from L0 to L1. MemoryManager subscribes to this event and routes the evicted messages to stores with `'onEviction'` in their trigger array. + +```typescript +// contextManager fires: +hookRegistry.emit(new OnEvictionEvent({ messages: evictedMessages })) + +// MemoryManager subscribes in initAgent(): +hookRegistry.on(OnEvictionEvent, (event) => { + this.onEviction(event.messages) +}) +``` + +This keeps the two primitives decoupled — `contextManager` doesn't know about memory, it just announces eviction. MemoryManager listens if configured. If no `memoryManager` is present, the event fires with no subscribers and no cost. + +--- + +## 6. Anticipated Questions + +**Q: Should `search_memory` also search L1 (transcript)?** + +No. L1 is within-session history owned by `contextManager`. If the agent needs to browse evicted messages from the current session, that's `contextManager`'s `searchHistory` tool (agentic mode). `search_memory` is cross-session knowledge — facts extracted and persisted beyond any single conversation. Mixing the two would blur the L1/L2 boundary. + +**Q: Is a model call every turn (`perTurn` extraction) acceptable?** + +It's the most complete strategy but the most expensive. The default for shipped examples should be `'tool'` (zero extraction cost — agent decides what to remember). `'perTurn'` is for high-value use cases where missing a fact is worse than the cost. `'scheduled'` (every N turns) is the middle ground. + +**Q: What about Bedrock KB ingestion latency?** + +Bedrock KB ingestion is async — content may not be immediately searchable after `store_memory`. This is acceptable because: (1) the stored fact is still in L0 for the current session, and (2) cross-session recall (where the latency matters) has a natural gap between conversations. We should document this clearly. + +**Q: Why ship BedrockKnowledgeBaseStore in the SDK rather than as a separate package?** + +It's the production-grade managed option that requires zero infrastructure setup from the user (no self-hosted vector DB). Aligns with "simple at any scale" — the path from `InMemoryKnowledgeStore` (prototyping) to `BedrockKnowledgeBaseStore` (production) should be changing one import, not adding a package. + +**Q: How does AgentCore Memory fit?** + +AgentCore ships as a separate package (`@strands-agents/memory-agentcore`) that implements `MutableKnowledgeStore`. It plugs into MemoryManager like any other store. No special integration needed — MemoryManager's injection, tools, and ingestion pipeline work with any store that implements the interface. + +--- + +
+Appendix A: Core Interfaces + +### Knowledge Entry + +```typescript +interface KnowledgeEntry { + id: string + content: string + score?: number + metadata?: Record +} +``` + +### Store Interfaces + +```typescript +// Read-only — for managed or pre-populated backends +interface KnowledgeStore { + search(namespace: string, query: string, limit?: number): Promise +} + +// Read+write — for self-managed backends +interface MutableKnowledgeStore extends KnowledgeStore { + store(namespace: string, content: string, metadata?: Record): Promise + delete(namespace: string, id: string): Promise +} +``` + +### Ingestion Pipeline + +```typescript +type IngestionTrigger = 'tool' | 'perTurn' | 'onEviction' | 'scheduled' + +type ContentBlockType = 'text' | 'toolUse' | 'toolResult' | 'reasoning' | 'cachePoint' | 'guardContent' | 'image' | 'video' | 'document' | 'citations' + +interface MessageFilter { + exclude: ContentBlockType[] +} + +interface IngestionConfig { + trigger: IngestionTrigger | IngestionTrigger[] + extractor?: Extractor // optional — if omitted, messages serialized as text + interval?: number // for 'scheduled': every N turns + filter?: MessageFilter // default: { exclude: ['toolUse', 'toolResult'] } +} + +interface Extractor { + extract(messages: MessageData[]): Promise<{ content: string; metadata?: Record }[]> +} +``` + +### MemoryManager Config + +```typescript +interface MemoryManagerConfig { + stores: StoreConfig[] + tools?: boolean | ToolsConfig // default: true — registers search_memory, store_memory + injection?: boolean | InjectionConfig // default: false — opt-in passive recall +} + +interface ToolConfig { + name?: string + description?: string +} + +interface ToolsConfig { + search?: boolean | ToolConfig + store?: boolean | ToolConfig +} + +type StoreConfig = ReadOnlyStoreConfig | WritableStoreConfig + +interface ReadOnlyStoreConfig { + store: KnowledgeStore + namespace: string + limit?: number // max results from this store (default: 10) +} + +interface WritableStoreConfig { + store: MutableKnowledgeStore + namespace: string + limit?: number // max results from this store (default: 10) + ingestion: IngestionConfig // required — determines when writes happen +} + +interface InjectionConfig { + format?: (entries: KnowledgeEntry[]) => string // default: XML block format + maxTokens?: number // budget for injected content (default: 2000) + query?: (messages: MessageData[]) => string // default: last substantive user message +} +``` + +**`tools` resolution:** +- `true` (default) — registers both `search_memory` and `store_memory` +- `false` — no tools registered (use with injection-only) +- `{ search: true, store: false }` — search only, no agent-driven writes +- `{ search: { name: 'recall' } }` — rename tool to avoid conflicts +- `{ store: { description: '...' } }` — customize tool description + +**Default tool descriptions:** +- `search_memory`: "Search long-term memory for facts, preferences, or context from previous conversations. Use when you need background about the user or topic that may have been discussed before." +- `store_memory`: "Store facts, preferences, or decisions that should be remembered across conversations. Use when the user shares something worth recalling later." + +**`injection` resolution:** +- `false` (default) — no injection +- `true` — injection enabled with default XML format and 2000 token budget +- `{ maxTokens: 4000 }` — injection with custom budget +- `{ format: customFn }` — injection with custom format (escape hatch) + +### Ingestion Triggers + +| Trigger | When | Input | Cost | +|---------|------|-------|------| +| `tool` | Agent calls `store_memory` | Agent-provided content | None (no extraction) | +| `perTurn` | After every invocation | New messages from this turn | High (model call per turn) | +| `onEviction` | Messages evicted from L0 | Evicted messages | Medium (only on eviction) | +| `scheduled` | Every N turns | Unprocessed messages | Controllable | + +
+ +--- + +
+Appendix B: Storage Backends + +### Shipped in SDK + +| Backend | Use case | Dependencies | +|---------|----------|-------------| +| `InMemoryKnowledgeStore` | Testing, prototyping | None | +| `FileKnowledgeStore` | Local development | None (node:fs) | +| `BedrockKnowledgeBaseStore` | Production | `@aws-sdk/client-bedrock-agent-runtime`, `@aws-sdk/client-bedrock-agent` | + +### Why Bedrock Knowledge Bases for the built-in production path + +- **Managed embeddings** — zero config, no model selection needed +- **Async writes** — `IngestKnowledgeBaseDocuments` returns HTTP 202, aligns with non-blocking design +- **Search** — Retrieve API with metadata filtering, reranking, guardrails +- **Delete** — `DeleteKnowledgeBaseDocuments` API for compliance/cleanup + +### External packages (community / future) + +| Package | Backend | +|---------|---------| +| `@strands-agents/memory-agentcore` | AgentCore Memory | +| `@strands-agents/memory-pinecone` | Pinecone | +| `@strands-agents/memory-postgres` | pgvector | +| `@strands-agents/memory-opensearch` | OpenSearch (BM25 + vector) | + +
+ +--- + +
+Appendix C: Alternatives Considered + +### Memory as a top-level Agent parameter (no MemoryManager class) + +```typescript +new Agent({ + memory: { stores: [...], injection: {...} }, +}) +``` + +**Why rejected:** A config object doesn't provide methods (`search`, `store`, `flush`) that power users need for programmatic access. The class also owns the ingestion queue lifecycle. + +### Single store (no multi-store orchestration) + +```typescript +new Agent({ + memoryManager: new MemoryManager({ store: myStore, namespace: 'user-123' }), +}) +``` + +**Why rejected:** Forces multi-tenant patterns onto the developer. A support agent that needs personal + team + org knowledge would need three separate MemoryManagers or a custom wrapper. Multi-store is the 80% case for production agents. + +### Explicit `readOnly` flag instead of ingestion-determines-writability + +```typescript +{ store: teamKB, namespace: 'team-marketing', readOnly: true } +``` + +**Why rejected:** Redundant signal. If there's no ingestion config, there's nothing to trigger writes. Two signals for one decision creates confusion about which wins when they conflict. + +### Single `KnowledgeStore` interface with optional write methods + +```typescript +interface KnowledgeStore { + search(...): Promise + store?(...): Promise + delete?(...): Promise +} +``` + +**Why rejected:** Optional methods push type safety to runtime. A developer implementing a read-only store for a managed KB would need to either leave methods unimplemented (confusing) or implement them as no-ops/throws (surprising). The interface split makes the contract explicit. + +### Injection enabled by default + +**Why rejected:** Adds retrieval latency and token cost to every model call. Most conversations don't need memory context on every turn. Active recall (tools) is cheaper and gives the agent control. Users who want always-on personalization opt in. + +### Injection `format` as a required function (no default format) + +```typescript +injection: { + format: (entries) => `\n${entries.map(e => `- ${e.content}`).join('\n')}\n`, + maxTokens: 2000, +} +``` + +**Why rejected:** Forces every user to write a formatting function even though 90% want the same XML-tagged block. The simple path (`injection: true`) should work without ceremony. Power users who need custom rendering can provide a `format` function as an escape hatch. + +### Global extractor instead of per-store + +```typescript +new MemoryManager({ + extractor: new ModelExtractor(), // applies to all stores + stores: [...], +}) +``` + +**Why rejected:** Different stores often want different extraction granularity. A preferences store wants terse facts; a decisions store wants richer context. Per-store extractors allow this without a routing layer. + +
From c0bfdb560d5958b0cd0cb878a09e60c4728c55b6 Mon Sep 17 00:00:00 2001 From: opieter-aws Date: Fri, 15 May 2026 16:28:01 -0400 Subject: [PATCH 2/3] design: add long term memory design --- designs/0011-knowledge-bases.md | 505 +++++++++----------------------- 1 file changed, 145 insertions(+), 360 deletions(-) diff --git a/designs/0011-knowledge-bases.md b/designs/0011-knowledge-bases.md index 02b5277c9..ccde7b1cf 100644 --- a/designs/0011-knowledge-bases.md +++ b/designs/0011-knowledge-bases.md @@ -1,4 +1,4 @@ -# Design: Long-Term Memory (L2 Knowledge Primitive) +# Long-Term Memory **Status**: Proposed @@ -6,55 +6,36 @@ **Issue**: TBD -- [1. Problem Statement](#1-problem-statement) -- [2. The Three-Tier Model](#2-the-three-tier-model) -- [3. Key Decisions](#3-key-decisions) -- [4. Developer Experience](#4-developer-experience) -- [5. Relationship to Context Management](#5-relationship-to-context-management) -- [6. Anticipated Questions](#6-anticipated-questions) -- Appendix A: Core Interfaces -- Appendix B: Storage Backends -- Appendix C: Alternatives Considered +**Scope**: TypeScript SDK ---- - -## 1. Problem Statement - -Agents today are stateless across sessions. Every conversation starts from zero — the agent can't recall user preferences, past decisions, or accumulated knowledge. When messages are evicted from the context window, that information is gone unless the developer builds custom persistence. - -The SDK provides session management (persisting the conversation state) and context management (handling pressure within a session), but neither addresses the cross-session knowledge problem. An agent that assists a user daily should remember what it learned yesterday without replaying the full history. - -This is the [simple at any scale](https://github.com/strands-agents/docs/blob/main/team/TENETS.md) gap: prototyping a memory-enabled agent today requires wiring up a vector store, writing extraction logic, managing tool registration, and handling multi-tenancy. It should be one parameter. - ---- - -## 2. The Three-Tier Model - -Context management maps to a three-tier cache hierarchy. This design covers **L2 only**. +## Context -| Tier | What it is | Lifecycle | Owner | -|------|-----------|-----------|-------| -| **L0** | Context window — what the model sees | Per-request | `agent.messages` | -| **L1** | Session history — evicted messages | Per-session | `contextManager` ([Context Management Presets](https://github.com/strands-agents/docs/pull/831)) | -| **L2** | Long-term knowledge — facts, preferences, learned behavior | Cross-session | **`memoryManager` (this design)** | +Strands agents today are stateless across sessions. Every conversation starts from zero: the agent can't recall user preferences, past decisions, or accumulated knowledge. When information leaves the context window, it's gone unless the developer builds custom persistence. The SDK provides session management (persisting conversation state) and context management (handling context window size within a session), but neither addresses cross-session knowledge. An agent that assists a user daily should be able to remember what it learned yesterday without replaying the full history. Prototyping a memory-enabled agent today requires wiring up a vector store, writing extraction logic, managing tool registration, and handling multi-tenancy. This should be supported natively. -L0 ↔ L1 is owned by `contextManager`. L2 is a separate primitive that reads from conversations (current or evicted) and writes to persistent knowledge stores that outlive any single session. +This design proposes a `MemoryManager` primitive that owns long-term knowledge: persisting facts to configurable backends, recalling them via tools or system prompt injection, and optionally extracting them from conversations. The primitive solves three distinct problems: ---- +1. **Knowledge Retrieval**: how the agent searches and surfaces stored knowledge at the right time +2. **Knowledge Ingestion**: how knowledge enters the system (triggers, writes, deduplication) +3. **Fact Extraction**: how conversation messages become structured knowledge entries (for backends that don't handle extraction server-side) -## 3. Key Decisions +## Decision -### 3.1 MemoryManager is a Plugin, not a standalone class +### Architecture -**Decision:** `MemoryManager` implements `Plugin` and integrates with the agent lifecycle via hooks. +`MemoryManager` is the component that gives agents persistent knowledge across sessions. It handles storing facts, recalling them when relevant, and optionally extracting them from conversations. -**Why:** Memory needs to react to agent lifecycle events — ingesting after each turn, injecting before model calls, registering tools at initialization. The plugin system already provides these hooks. A standalone class would need its own parallel lifecycle management or require the developer to manually wire callbacks. +It is exposed as a top-level `memoryManager` parameter on `AgentConfig`, following the pattern of `contextManager` and `sessionManager`: -**Trade-off:** Coupling to the plugin interface means MemoryManager can't be used outside of an Agent. This is acceptable — L2 memory is inherently agent-scoped (it needs conversations to learn from and a model to serve). +```typescript +new Agent({ + model, + memoryManager: new MemoryManager({ ... }), +}) +``` -### 3.2 Multi-store architecture with namespacing +Under the hood, MemoryManager integrates with the agent lifecycle via hooks: registering tools at initialization, injecting knowledge before model calls, and ingesting new facts after each turn. -**Decision:** MemoryManager orchestrates multiple stores, each with its own namespace. A single agent can query personal, team, and organization knowledge simultaneously. +**Stores.** A store is a backend that holds and retrieves knowledge (a vector database, a managed service like Amazon Bedrock Knowledge Bases or AgentCore Memory, or any implementation of the store interface). MemoryManager orchestrates one or more stores, each scoped by a namespace: ```typescript memoryManager: new MemoryManager({ @@ -66,206 +47,99 @@ memoryManager: new MemoryManager({ }) ``` -**Why:** Real-world agents serve users who exist within organizational hierarchies. A support agent should know the user's preferences (personal store), the team's procedures (team store), and the company's policies (org store). Forcing a single store pushes multi-tenancy complexity onto the developer. - -**Trade-off:** Merged search across stores adds latency (parallel queries). For single-store use cases, this is unnecessary overhead — but the overhead is minimal (one store = one query, no merge). - -**Merge strategy:** Results are interleaved by rank, with store config order as priority. The first result from each store (in config order), then the second from each, and so on. Scores are not compared across stores — different backends produce incomparable score scales. Store position in the config array is the explicit priority signal. - -### 3.3 Ingestion config determines writability — no explicit flag +Multi-store support avoids pushing multi-tenancy complexity onto the developer. A single agent can query personal, team, and organization knowledge simultaneously, with namespace isolation keeping them separate. -**Decision:** A store is writable if and only if it has an `ingestion` config. Stores without `ingestion` are search-only. This is enforced at the type level — `ingestion` is only available on `WritableStoreConfig` which requires `MutableKnowledgeStore`. +**Read-only vs. mutable stores.** Two interfaces: `KnowledgeStore` (search-only) and `MutableKnowledgeStore` (search + write + delete). This distinction makes multi-tenant patterns natural: team or org stores that are pre-populated externally are read-only, while a user's personal store is mutable and accepts new facts during conversation. Mutability is determined by whether the store has an ingestion configuration (see Knowledge Ingestion below). -**Why:** An explicit `readOnly: true` flag would be redundant — if there's no ingestion config, there's nothing to write and no trigger to write it. The absence of config *is* the signal. This keeps configuration minimal and makes multi-tenant patterns natural: team/org stores that are pre-populated externally simply omit `ingestion`. +**Shipped backends.** -**Trade-off:** Less explicit than a flag. A developer reading the config must understand the convention. We mitigate this with documentation and the tool behavior: `store_memory` only targets stores with `'tool'` in their trigger — if none exist, the tool isn't registered. +| Backend | Package | Use case | +|---------|---------|----------| +| `InMemoryKnowledgeStore` | `@strands-agents/sdk` | Testing, prototyping | +| `FileKnowledgeStore` | `@strands-agents/sdk` | Local development | +| `BedrockKnowledgeBaseStore` | `@strands-agents/sdk` | Production (managed, zero-infra) | +| `AgentCoreKnowledgeStore` | `@strands-agents/memory-agentcore` | AgentCore managed memory | -**`store_memory` behavior:** The tool accepts a batch of strings (`{ entries: string[] }`), allowing the agent to store multiple facts in a single call. Writes fan out to **all** stores with `'tool'` in their trigger array (not just the first). Writes are async and non-blocking. +The three in-SDK backends cover the prototyping → local dev → production progression without adding dependencies. Bedrock KB is the managed zero-infra option. Third-party managed memory services like AgentCore carry client dependencies and are opt-in via separate packages so the SDK stays lean. -### 3.4 Read-only vs mutable store interfaces (interface split) - -**Decision:** Two interfaces — `KnowledgeStore` (search-only) and `MutableKnowledgeStore` (search + write + delete). Stores implement whichever matches their capability. - -**Why:** Avoiding optional methods at the interface level. A managed knowledge base that's externally populated (e.g., a pre-built Bedrock KB with company documentation) genuinely cannot accept writes — it shouldn't implement a `store()` method that throws. The split makes the type system reflect reality. - -**Trade-off:** Two interfaces instead of one. We considered a single interface with optional `store`/`delete`, but that pushes runtime errors ("this store doesn't support writes") into what should be compile-time guarantees. - -### 3.5 Active recall by default, passive injection opt-in - -**Decision:** By default, MemoryManager registers `search_memory` and `store_memory` tools (active recall). Injection into the system prompt (passive recall) is disabled by default and opt-in via `injection: true`. - -**Why:** Active recall lets the agent decide when memory is relevant — it searches when it needs context, not every turn. This is cheaper (no retrieval cost on irrelevant turns) and gives the agent control. Passive injection is useful for always-personalized agents (support bots, assistants) but adds retrieval latency and token cost to every model call. - -**Trade-off:** Tool-based recall depends on the model knowing when to search. If the model doesn't call `search_memory` when it should, relevant context is missed. Injection guarantees baseline context at the cost of always paying for retrieval. - -Only tools that have backing stores are registered. If no store has `'tool'` in its trigger, `store_memory` is not registered — `tools: true` means "register whatever tools are applicable," not "register all tools unconditionally." - -### 3.5.1 Injection lifecycle - -When injection is enabled, MemoryManager hooks into `BeforeInvocationEvent` (once per turn, not per model call) to manage injected content. The lifecycle is **strip → retrieve → format → inject**, executed fresh every turn. - -**How it works:** - -1. **Strip** — Remove the previous injection block from the system prompt (if present). Injected content is wrapped in a `` sentinel so it can be identified and removed cleanly without colliding with user content. -2. **Retrieve** — Determine the search query (default: last substantive user message, walking backward, skipping messages under 10 characters). If no substantive message is found, skip injection for this turn entirely. -3. **Format** — Render results into an XML block, respecting the `maxTokens` budget. If retrieval returns zero results, skip injection (no empty block). -4. **Inject** — Append the formatted block to the end of the system prompt. - -``` -Turn N: - BeforeInvocationEvent fires (once per invocation) - → strip ... from system prompt (from turn N-1) - → find last substantive user message as query (skip if none found) - → search stores, interleaving results by store priority + rank - → if results empty, done (no block injected) - → format results as XML, respecting maxTokens budget - → append ... to end of system prompt - → model sees fresh, relevant memories -``` +--- -**Why strip and re-inject every turn:** -- Memories are query-dependent. The relevant memories for turn 5 may differ from turn 4. -- Prevents accumulation — injected content never grows unbounded across turns. -- Keeps the injected block ephemeral — it's never persisted in conversation history, only present in the model's view for that single call. +### Knowledge Retrieval -**Default format (XML):** +The agent needs stored knowledge at the right moment, but retrieving it has a cost (latency, tokens, relevance noise). MemoryManager provides two retrieval mechanisms that offer different trade-offs between precision and reliability. Both can be used together. -```xml - -- user prefers dark mode -- last project was a React app for inventory management -- user is senior engineer, 10 years experience - -``` +#### Active Recall -The `` sentinel is namespaced to avoid collisions with user-authored system prompt content. It provides clear boundary markers for the model and enables reliable stripping. +Active recall lets the agent decide when memory is relevant. Instead of retrieving knowledge every turn, the agent searches on demand, only when it judges that stored knowledge would help. -**Query selection** is configurable via `injection.query`: +This works by registering a `search_memory` tool that the agent can call like any other tool. The trade-off: active recall depends on the model recognizing when to search. If the model doesn't think to look, relevant memories stay hidden. ```typescript -injection: { query: (messages) => customQueryLogic(messages) } +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store, namespace: 'user-123' }], + tools: true, // Agent gets search_memory tool. + }), +}) ``` -The default walks backward through messages to find the last `role: 'user'` message with > 10 characters. If nothing qualifies, injection is skipped for that turn (no retrieval, no block). - -### 3.6 All writes are async and non-blocking - -**Decision:** Store writes (ingestion) never block the agent loop. Writes are queued internally and flushed in the background. - -**Why:** Memory ingestion is not on the critical path of a conversation. The user is waiting for the agent's response, not for a fact to be persisted. Bedrock Knowledge Bases reinforce this — `IngestKnowledgeBaseDocuments` returns HTTP 202 (accepted, not completed). Blocking on writes would add latency to every turn for no user-visible benefit. - -**Trade-off:** A fact stored in one turn may not be searchable in the immediately following turn (eventual consistency). For most use cases this is acceptable — the information is still in L0 during the current session. Cross-session, it will have been indexed by the next conversation. - -### 3.7 Extraction is per-store and optional - -**Decision:** Each store has its own extractor (or none). The extractor transforms conversation messages into knowledge entries before ingestion. Extractor is optional on **all** triggers, not just `'tool'`. - -**Why:** Different stores may need different extraction strategies. A personal preferences store wants terse facts ("user prefers dark mode"). A decisions store wants richer context ("chose React over Vue because of team expertise, 2026-05-14"). A store with `'tool'` trigger needs no extractor — the agent provides content directly. - -**When no extractor is configured:** Messages are serialized to text (role-prefixed lines) and passed to `store()` as the content string. This supports managed service backends (e.g., AgentCore) that handle extraction server-side. - -**`strands_source` metadata tag:** Every write is tagged to indicate content type: -- `'tool'` — agent explicitly called `store_memory` -- `'extraction'` — extractor processed messages into facts -- `'raw'` — messages serialized directly (no extractor) - -**Trade-off:** Multiple extractors mean multiple model calls if several stores have `perTurn` triggers with extractors. We mitigate by making `'tool'` the cheapest trigger (no extraction cost) and `'scheduled'` a cost-controlled alternative to `'perTurn'`. - -**Extractor input:** Extractors always receive only **unprocessed messages** — messages since the store's last extraction, not the full history. This is tracked via a per-store high-water mark (message index or turn ID). - -### 3.8 Message filter on ingestion - -**Decision:** `IngestionConfig` has a `filter: { exclude: ContentBlockType[] }` that strips specific content block types from messages before they reach the extractor or serializer. +When multiple stores are configured, results are interleaved by rank using store config order as the priority signal. Scores are not compared across stores because different backends produce incomparable scales. If a store fails, partial results from other stores are still returned. -**Why:** Tool machinery (toolUse, toolResult blocks) is typically noise for memory extraction — it's the conversation substance that matters, not the function calls. Filtering at the block level keeps the ingestion pipeline focused on semantically meaningful content. +#### Context Injection -**Default:** `exclude: ['toolUse', 'toolResult']` — strip tool blocks, keep text and other content. Override with `filter: { exclude: [] }` to forward everything. +Context injection guarantees that relevant knowledge is always present, at the cost of paying for retrieval every turn. This is useful when baseline context is more important than token efficiency, or when the model can't reliably judge when to search. -**Lifecycle:** Filter always applies first — before extractor (if present) OR before serialization (if no extractor). Messages that become empty after filtering (all blocks excluded) are dropped entirely. +Enabled via `injection: true`. Each turn, MemoryManager searches stores using the last substantive user message as the query, formats results into a `` block, and appends it to the system prompt. The block is stripped and re-injected fresh every turn so memories never accumulate in the prompt and each turn gets a fresh set based on the current query. The token budget and formatting are configurable via `maxTokens`, `format`, and `query` functions. -### 3.9 Deduplication via high-water mark - -**Decision:** MemoryManager tracks a per-store high-water mark of processed messages. When any trigger fires, only messages beyond the mark are passed to the extractor. This prevents duplicate extraction when triggers overlap (e.g., `['perTurn', 'onEviction']` on the same store). - -**Why:** Without deduplication, a store with both `perTurn` and `onEviction` triggers would extract the same messages twice — once per turn, and again when those messages are evicted. The high-water mark ensures each message is extracted at most once per store. - -**Fail-safe direction:** If tracking state is lost (process restart, corruption), the worst case is a duplicate extraction — not data loss. Duplicates are acceptable; missed facts are not. - -### 3.10 ModelExtractor defaults to cheapest model in same provider family - -**Decision:** `ModelExtractor` detects the agent's model provider at init time and defaults to the cheapest model in that family. User can override with an explicit model. - -**Why:** Extraction is a structured task — "given these messages, output a list of facts." It doesn't need the same capability as the agent's primary model. Defaulting to the agent's model would burn expensive tokens (e.g., Opus) on a task Haiku handles fine. Matching the provider avoids credential mismatches. - -**Provider defaults:** -- `BedrockModel` → Claude Haiku via Bedrock -- `AnthropicModel` → Claude Haiku via Anthropic API -- `OpenAIModel` → gpt-4o-mini via OpenAI -- `GoogleModel` → Gemini Flash via Google - -**Override:** ```typescript -extractor: new ModelExtractor({ model: new BedrockModel({ modelId: 'us.anthropic.claude-sonnet-4-6-20250514-v1:0' }) }) +const agent = new Agent({ + model, + memoryManager: new MemoryManager({ + stores: [{ store, namespace: 'user-123' }], + injection: true, // searches every turn, injects into system prompt + }), +}) ``` -### 3.11 Delete is programmatic only — no agent tool - -**Decision:** `MutableKnowledgeStore.delete()` is exposed for developers (compliance, cleanup, admin scripts) but no `delete_memory` tool is registered for the agent. - -**Why:** Deletion is dangerous for an agent to perform autonomously. An agent that aggressively deletes old facts could lose valuable context. Corrections are better handled by storing updated facts — the newer entry naturally takes precedence via recency in search results. - -### 3.12 Partial results on store failure - -**Decision:** If a store's search fails, return partial results from stores that succeeded. Log the failure for observability. The model doesn't see the error. - -**Why:** The model can't fix infrastructure failures — surfacing them adds noise. Partial results are better than no results. If all stores fail, the tool returns an empty array (same as "no results found"). - -### 3.13 `memoryManager` is a top-level Agent parameter only - -**Decision:** `memoryManager` is a named parameter on `AgentConfig`. Passing MemoryManager via `plugins: [...]` throws with a helpful error. +--- -**Why:** One clear path. `memoryManager` follows the pattern of `conversationManager` and `contextManager` — named constructor parameters for first-class agent capabilities. It's implemented as a plugin internally, but that's an implementation detail. Two paths to register the same thing creates ambiguity. +### Knowledge Ingestion -### 3.14 Custom triggers via direct store access +Knowledge can enter the system in two ways: the agent explicitly writes it (via a `store_memory` tool registered by MemoryManager), or MemoryManager automatically extracts it from conversation messages using an extractor (a component that distills messages into discrete facts via a model call). -**Decision:** No custom trigger mechanism on MemoryManager. Users who need custom triggers hook directly into the store's `store()` method from their own hooks. +MemoryManager uses triggers to control when these writes happen. A trigger is a named event that causes MemoryManager to process recent messages and write to the store. Four built-in triggers cover most use cases: -**Why:** The named triggers (`tool`, `perTurn`, `onEviction`, `scheduled`) cover 80% of use cases. For anything custom, the store interface is already public — the user holds a reference and can call `store.store()` from any hook. No new API surface needed. This is the low-level escape hatch alongside the high-level named triggers. +| Trigger | When | Cost | +|---------|---------------|---------------| +| `tool` | Agent calls `store_memory` | Nothing extra (agent provides content directly) | +| `perTurn` | After every agent invocation | High (model call per turn if an extractor is configured) | +| `onEviction` | Messages are evicted from the context window | Medium (only fires on eviction events) | +| `scheduled` | Every N turns (configurable via `interval`) | Controllable | -```typescript -const myStore = new BedrockKnowledgeBaseStore({ ... }) - -agent.addHook(AfterToolCallEvent, async (event) => { - if (event.tool.name === 'important_api') { - await myStore.store('user-123', `API result: ${summarize(event.result)}`) - } -}) -``` +All writes are async and non-blocking. This means a fact stored in one turn may not be searchable immediately in the next (eventual consistency). -### 3.15 Namespace is a plain string +**Deduplication.** MemoryManager tracks a per-store high-water mark: a pointer to the last message that was already processed. Each trigger only processes messages beyond that mark, preventing duplicate writes. Tool-related content blocks (`toolUse`, `toolResult`) are filtered out by default before processing, since they rarely contain user-relevant knowledge. -**Decision:** Namespace is an opaque string, not a structured type (no `{ userId, teamId, orgId }` hierarchy). +**Custom triggers.** For cases the built-in triggers don't cover, the store interface is public and can be called directly from any lifecycle hook. -**Why:** Multi-tenancy patterns vary wildly. Some apps have users, some have workspaces, some have hierarchies. A structured namespace type would either be too narrow (missing someone's pattern) or too broad (everything optional, no guidance). A plain string with documented conventions (`"user-{id}"`, `"team-{id}"`) is maximally flexible. +**Deletion and corrections.** `MutableKnowledgeStore.delete()` is exposed for programmatic use (compliance, cleanup), but no `delete_memory` tool is registered for the agent. Exposing deletion to the agent risks accidental data loss with no undo path. Instead, corrections are handled by storing updated facts. Newer entries take precedence via recency weighting in search results. -**Trade-off:** No type safety on namespace values. Typos won't be caught at compile time. This is the same trade-off as route strings in web frameworks — acceptable given the flexibility benefit. +--- -### 3.16 `search_memory` tool accepts optional `limit` parameter +### Fact Extraction -**Decision:** The `search_memory` tool input schema includes an optional `limit` parameter so the model can control how many results it receives. +When messages are ingested, they need to become searchable knowledge entries. Some managed backends (e.g. AgentCore Memory) handle this transformation server-side, accepting raw messages and producing structured entries internally. For self-managed backends that store only what they're given, MemoryManager can extract discrete facts from conversation messages before writing them. -```typescript -// search_memory tool input -{ query: string, limit?: number } // default: 10 -``` +Extraction is optional. It is only needed when two conditions are true: the backend doesn't handle extraction server-side, and you want MemoryManager to distill conversations into facts automatically rather than relying on the agent to provide facts via the `tool` trigger. -**Why:** Per-store limits cap each store's contribution. The tool-level limit lets the model request fewer results when it only needs a quick check. No global limit on MemoryManager — limits belong on the query, not the orchestrator. +Each store can have its own extractor, because different stores benefit from different extraction styles. A preferences store might want pure facts ("user prefers dark mode") while a decisions store wants richer context with reasoning. `ModelExtractor` is the built-in implementation: it calls a language model to extract facts, defaulting to the agent's own model but configurable with an explicit cheaper model to reduce cost. ---- +When no extractor is configured, messages are serialized as plain text and passed directly to the store's `add()` method. This is the correct setup for managed backends that handle extraction internally. -## 4. Developer Experience +## Developer Experience -### Minimal — prototyping +### Minimal: prototyping ```typescript import { Agent, MemoryManager, InMemoryKnowledgeStore } from '@strands-agents/sdk' @@ -279,7 +153,7 @@ const agent = new Agent({ // Agent now has search_memory and store_memory tools. Zero infrastructure. ``` -### Production — Bedrock Knowledge Bases +### Production: Bedrock Knowledge Bases ```typescript import { Agent, MemoryManager, BedrockKnowledgeBaseStore, ModelExtractor } from '@strands-agents/sdk' @@ -296,97 +170,78 @@ const agent = new Agent({ }) ``` -### Multi-tenant — personal + team + org +### Multi-tenant: personal + team + org ```typescript const agent = new Agent({ model, memoryManager: new MemoryManager({ stores: [ - // Personal — learns from conversation + // Personal: learns from conversation { store: userKB, namespace: 'user-123', ingestion: { trigger: ['tool', 'perTurn'], extractor } }, - // Team — read-only, pre-populated + // Team: read-only, pre-populated { store: teamKB, namespace: 'team-marketing' }, - // Org — read-only, shared + // Org: read-only, shared { store: orgKB, namespace: 'org-acme' }, ], }), }) -// search_memory queries all three, merges by relevance score +// search_memory queries all three, merges by store priority // store_memory writes only to stores with 'tool' trigger ``` -### With passive injection +### With context injection ```typescript const agent = new Agent({ model, memoryManager: new MemoryManager({ stores: [{ store, namespace: 'user-123', ingestion: { trigger: 'tool' } }], - injection: true, // enables injection with default XML format and 2000 token budget + injection: true, // default XML format, 2000 token budget }), }) -// Memories auto-injected into system prompt each turn + tools for on-demand search - -// With custom budget: -injection: { maxTokens: 4000 } - -// Power user — custom format function (escape hatch): -injection: { format: (entries) => myCustomFormat(entries), maxTokens: 2000 } ``` ---- +## Alternatives Considered -## 5. Relationship to Context Management +### 1. Memory as a top-level Agent parameter (no MemoryManager class) ```typescript -new Agent({ - contextManager: "auto", // Owns L0 ↔ L1 (within-session) - memoryManager: new MemoryManager({ ... }), // Owns L2 (cross-session) -}) +new Agent({ memory: { stores: [...], injection: {...} } }) ``` -The two primitives are independent — each is configured separately, each can exist without the other. The integration point is **eviction**: when `contextManager` evicts messages from L0 to L1, that's an opportunity for `memoryManager` to extract knowledge before it becomes harder to access. +**Why rejected:** A config object doesn't provide methods (`search`, `add`, `flush`) that power users need for programmatic access. The class also owns the ingestion queue lifecycle. -### Integration via `OnEvictionEvent` +### 2. Single store (no multi-store orchestration) -`contextManager` introduces an `OnEvictionEvent` hook that fires when messages are evicted from L0 to L1. MemoryManager subscribes to this event and routes the evicted messages to stores with `'onEviction'` in their trigger array. +**Why rejected:** Forces multi-tenant patterns onto the developer. Multi-store is a customer ask for production agents. -```typescript -// contextManager fires: -hookRegistry.emit(new OnEvictionEvent({ messages: evictedMessages })) +### 3. Single `KnowledgeStore` interface with optional write methods -// MemoryManager subscribes in initAgent(): -hookRegistry.on(OnEvictionEvent, (event) => { - this.onEviction(event.messages) -}) -``` - -This keeps the two primitives decoupled — `contextManager` doesn't know about memory, it just announces eviction. MemoryManager listens if configured. If no `memoryManager` is present, the event fires with no subscribers and no cost. - ---- +**Why rejected:** Optional methods push type safety to runtime. The interface split makes read-only vs mutable an explicit compile-time guarantee. -## 6. Anticipated Questions -**Q: Should `search_memory` also search L1 (transcript)?** +## Consequences -No. L1 is within-session history owned by `contextManager`. If the agent needs to browse evicted messages from the current session, that's `contextManager`'s `searchHistory` tool (agentic mode). `search_memory` is cross-session knowledge — facts extracted and persisted beyond any single conversation. Mixing the two would blur the L1/L2 boundary. +### What Becomes Easier -**Q: Is a model call every turn (`perTurn` extraction) acceptable?** +- Cross-session knowledge becomes a single parameter. No custom persistence, no manual tool registration, no vector store wiring. +- Multi-tenancy is built in via multi-store + namespacing. +- Progressive complexity: `InMemoryKnowledgeStore` (prototyping) to `BedrockKnowledgeBaseStore` (production) is changing one import. -It's the most complete strategy but the most expensive. The default for shipped examples should be `'tool'` (zero extraction cost — agent decides what to remember). `'perTurn'` is for high-value use cases where missing a fact is worse than the cost. `'scheduled'` (every N turns) is the middle ground. +### What Becomes Harder or Requires Attention -**Q: What about Bedrock KB ingestion latency?** +- **Eventual consistency**: writes are async; a fact may not be searchable in the next turn. +- **Extraction cost**: `perTurn` triggers a model call every turn. We need sensible defaults and good documentation for users to navigate this. +- **Active recall and context injection depends on model judgment**: the model must know when to search. Context injection guarantees baseline context at the cost of always paying for retrieval. We need to evaluate and baseline tool descriptions. -Bedrock KB ingestion is async — content may not be immediately searchable after `store_memory`. This is acceptable because: (1) the stored fact is still in L0 for the current session, and (2) cross-session recall (where the latency matters) has a natural gap between conversations. We should document this clearly. +### Migration -**Q: Why ship BedrockKnowledgeBaseStore in the SDK rather than as a separate package?** +No breaking changes. `memoryManager` is a new optional parameter on `AgentConfig`. -It's the production-grade managed option that requires zero infrastructure setup from the user (no self-hosted vector DB). Aligns with "simple at any scale" — the path from `InMemoryKnowledgeStore` (prototyping) to `BedrockKnowledgeBaseStore` (production) should be changing one import, not adding a package. +## Willingness to Implement -**Q: How does AgentCore Memory fit?** - -AgentCore ships as a separate package (`@strands-agents/memory-agentcore`) that implements `MutableKnowledgeStore`. It plugs into MemoryManager like any other store. No special integration needed — MemoryManager's injection, tools, and ingestion pipeline work with any store that implements the interface. +Yes. --- @@ -407,14 +262,14 @@ interface KnowledgeEntry { ### Store Interfaces ```typescript -// Read-only — for managed or pre-populated backends +// Read-only: for managed or pre-populated backends interface KnowledgeStore { search(namespace: string, query: string, limit?: number): Promise } -// Read+write — for self-managed backends +// Read+write: for self-managed backends interface MutableKnowledgeStore extends KnowledgeStore { - store(namespace: string, content: string, metadata?: Record): Promise + add(namespace: string, content: string, metadata?: Record): Promise delete(namespace: string, id: string): Promise } ``` @@ -432,7 +287,7 @@ interface MessageFilter { interface IngestionConfig { trigger: IngestionTrigger | IngestionTrigger[] - extractor?: Extractor // optional — if omitted, messages serialized as text + extractor?: Extractor // optional: if omitted, messages serialized as text interval?: number // for 'scheduled': every N turns filter?: MessageFilter // default: { exclude: ['toolUse', 'toolResult'] } } @@ -447,8 +302,8 @@ interface Extractor { ```typescript interface MemoryManagerConfig { stores: StoreConfig[] - tools?: boolean | ToolsConfig // default: true — registers search_memory, store_memory - injection?: boolean | InjectionConfig // default: false — opt-in passive recall + tools?: boolean | ToolsConfig // default: true (registers search_memory, store_memory) + injection?: boolean | InjectionConfig // default: false (opt-in passive recall) } interface ToolConfig { @@ -473,7 +328,7 @@ interface WritableStoreConfig { store: MutableKnowledgeStore namespace: string limit?: number // max results from this store (default: 10) - ingestion: IngestionConfig // required — determines when writes happen + ingestion: IngestionConfig // required: determines when writes happen } interface InjectionConfig { @@ -484,133 +339,63 @@ interface InjectionConfig { ``` **`tools` resolution:** -- `true` (default) — registers both `search_memory` and `store_memory` -- `false` — no tools registered (use with injection-only) -- `{ search: true, store: false }` — search only, no agent-driven writes -- `{ search: { name: 'recall' } }` — rename tool to avoid conflicts -- `{ store: { description: '...' } }` — customize tool description - -**Default tool descriptions:** -- `search_memory`: "Search long-term memory for facts, preferences, or context from previous conversations. Use when you need background about the user or topic that may have been discussed before." -- `store_memory`: "Store facts, preferences, or decisions that should be remembered across conversations. Use when the user shares something worth recalling later." +- `true` (default): registers both `search_memory` and `store_memory` +- `false`: no tools registered (use with injection-only) +- `{ search: true, store: false }`: search only, no agent-driven writes +- `{ search: { name: 'recall' } }`: rename tool to avoid conflicts +- `{ store: { description: '...' } }`: customize tool description **`injection` resolution:** -- `false` (default) — no injection -- `true` — injection enabled with default XML format and 2000 token budget -- `{ maxTokens: 4000 }` — injection with custom budget -- `{ format: customFn }` — injection with custom format (escape hatch) - -### Ingestion Triggers - -| Trigger | When | Input | Cost | -|---------|------|-------|------| -| `tool` | Agent calls `store_memory` | Agent-provided content | None (no extraction) | -| `perTurn` | After every invocation | New messages from this turn | High (model call per turn) | -| `onEviction` | Messages evicted from L0 | Evicted messages | Medium (only on eviction) | -| `scheduled` | Every N turns | Unprocessed messages | Controllable | - - - ---- - -
-Appendix B: Storage Backends - -### Shipped in SDK - -| Backend | Use case | Dependencies | -|---------|----------|-------------| -| `InMemoryKnowledgeStore` | Testing, prototyping | None | -| `FileKnowledgeStore` | Local development | None (node:fs) | -| `BedrockKnowledgeBaseStore` | Production | `@aws-sdk/client-bedrock-agent-runtime`, `@aws-sdk/client-bedrock-agent` | - -### Why Bedrock Knowledge Bases for the built-in production path - -- **Managed embeddings** — zero config, no model selection needed -- **Async writes** — `IngestKnowledgeBaseDocuments` returns HTTP 202, aligns with non-blocking design -- **Search** — Retrieve API with metadata filtering, reranking, guardrails -- **Delete** — `DeleteKnowledgeBaseDocuments` API for compliance/cleanup - -### External packages (community / future) - -| Package | Backend | -|---------|---------| -| `@strands-agents/memory-agentcore` | AgentCore Memory | -| `@strands-agents/memory-pinecone` | Pinecone | -| `@strands-agents/memory-postgres` | pgvector | -| `@strands-agents/memory-opensearch` | OpenSearch (BM25 + vector) | +- `false` (default): no injection +- `true`: injection enabled with default XML format and 2000 token budget +- `{ maxTokens: 4000 }`: injection with custom budget +- `{ format: customFn }`: injection with custom format (escape hatch)
---
-Appendix C: Alternatives Considered +Appendix B: Implementation Details -### Memory as a top-level Agent parameter (no MemoryManager class) +### Context injection lifecycle -```typescript -new Agent({ - memory: { stores: [...], injection: {...} }, -}) -``` +When injection is enabled, MemoryManager hooks into `BeforeInvocationEvent` (once per turn). The lifecycle: -**Why rejected:** A config object doesn't provide methods (`search`, `store`, `flush`) that power users need for programmatic access. The class also owns the ingestion queue lifecycle. +1. **Strip**: Remove previous `` block from system prompt +2. **Retrieve**: Use last substantive user message (>10 chars) as search query +3. **Format**: Render results as XML, respecting `maxTokens` budget +4. **Inject**: Append block to end of system prompt -### Single store (no multi-store orchestration) +If no substantive message exists or retrieval returns zero results, injection is skipped for that turn. The block is never persisted in conversation history. -```typescript -new Agent({ - memoryManager: new MemoryManager({ store: myStore, namespace: 'user-123' }), -}) -``` +### `store_memory` tool behavior -**Why rejected:** Forces multi-tenant patterns onto the developer. A support agent that needs personal + team + org knowledge would need three separate MemoryManagers or a custom wrapper. Multi-store is the 80% case for production agents. +Accepts `{ entries: string[] }` (batch). Writes fan out to all stores with `'tool'` in their trigger array. Writes are async and non-blocking. -### Explicit `readOnly` flag instead of ingestion-determines-writability +### Message filter details -```typescript -{ store: teamKB, namespace: 'team-marketing', readOnly: true } -``` +`filter: { exclude: ContentBlockType[] }` strips content block types before they reach the extractor or serializer. Filter applies first. Messages that become empty after filtering are dropped entirely. Default: `exclude: ['toolUse', 'toolResult']`. -**Why rejected:** Redundant signal. If there's no ingestion config, there's nothing to trigger writes. Two signals for one decision creates confusion about which wins when they conflict. +Extractors receive only unprocessed messages (tracked via per-store high-water mark). -### Single `KnowledgeStore` interface with optional write methods +### `strands_source` metadata -```typescript -interface KnowledgeStore { - search(...): Promise - store?(...): Promise - delete?(...): Promise -} -``` - -**Why rejected:** Optional methods push type safety to runtime. A developer implementing a read-only store for a managed KB would need to either leave methods unimplemented (confusing) or implement them as no-ops/throws (surprising). The interface split makes the contract explicit. +Every write is tagged to indicate content type: +- `'tool'`: agent explicitly called `store_memory` +- `'extraction'`: extractor processed messages into facts +- `'raw'`: messages serialized directly (no extractor) -### Injection enabled by default - -**Why rejected:** Adds retrieval latency and token cost to every model call. Most conversations don't need memory context on every turn. Active recall (tools) is cheaper and gives the agent control. Users who want always-on personalization opt in. - -### Injection `format` as a required function (no default format) +### Custom triggers ```typescript -injection: { - format: (entries) => `\n${entries.map(e => `- ${e.content}`).join('\n')}\n`, - maxTokens: 2000, -} -``` - -**Why rejected:** Forces every user to write a formatting function even though 90% want the same XML-tagged block. The simple path (`injection: true`) should work without ceremony. Power users who need custom rendering can provide a `format` function as an escape hatch. - -### Global extractor instead of per-store +const myStore = new BedrockKnowledgeBaseStore({ ... }) -```typescript -new MemoryManager({ - extractor: new ModelExtractor(), // applies to all stores - stores: [...], +agent.addHook(AfterToolCallEvent, async (event) => { + if (event.tool.name === 'important_api') { + await myStore.add('user-123', `API result: ${summarize(event.result)}`) + } }) ``` -**Why rejected:** Different stores often want different extraction granularity. A preferences store wants terse facts; a decisions store wants richer context. Per-store extractors allow this without a routing layer. -
From 21b937ae162c19e9c25e6e406baaf926eb4a93b4 Mon Sep 17 00:00:00 2001 From: opieter-aws Date: Thu, 21 May 2026 15:43:22 -0400 Subject: [PATCH 3/3] Design updates --- designs/0011-knowledge-bases.md | 113 +++++++++++++++++--------------- 1 file changed, 59 insertions(+), 54 deletions(-) diff --git a/designs/0011-knowledge-bases.md b/designs/0011-knowledge-bases.md index ccde7b1cf..83b368fb2 100644 --- a/designs/0011-knowledge-bases.md +++ b/designs/0011-knowledge-bases.md @@ -1,6 +1,6 @@ # Long-Term Memory -**Status**: Proposed +**Status**: Implemented **Date**: 2026-05-14 @@ -24,7 +24,7 @@ This design proposes a `MemoryManager` primitive that owns long-term knowledge: `MemoryManager` is the component that gives agents persistent knowledge across sessions. It handles storing facts, recalling them when relevant, and optionally extracting them from conversations. -It is exposed as a top-level `memoryManager` parameter on `AgentConfig`, following the pattern of `contextManager` and `sessionManager`: +It is exposed as a top-level `memoryManager` parameter on `AgentConfig`, following the pattern of `contextManager` and `sessionManager`. It accepts either a `MemoryManager` instance or a plain `MemoryManagerConfig` object (auto-wrapped): ```typescript new Agent({ @@ -35,21 +35,29 @@ new Agent({ Under the hood, MemoryManager integrates with the agent lifecycle via hooks: registering tools at initialization, injecting knowledge before model calls, and ingesting new facts after each turn. -**Stores.** A store is a backend that holds and retrieves knowledge (a vector database, a managed service like Amazon Bedrock Knowledge Bases or AgentCore Memory, or any implementation of the store interface). MemoryManager orchestrates one or more stores, each scoped by a namespace: +**Stores.** A store is a backend that holds and retrieves knowledge (a vector database, a managed service like Amazon Bedrock Knowledge Bases or AgentCore Memory, or any implementation of the store interface). MemoryManager orchestrates one or more stores: ```typescript memoryManager: new MemoryManager({ stores: [ - { store: userStore, namespace: 'user-123' }, - { store: teamStore, namespace: 'team-marketing' }, - { store: orgStore, namespace: 'org-acme' }, + { store: userStore, ingestion: { trigger: 'tool' } }, + { store: teamStore }, // search-only + { store: orgStore }, // search-only ], }) ``` -Multi-store support avoids pushing multi-tenancy complexity onto the developer. A single agent can query personal, team, and organization knowledge simultaneously, with namespace isolation keeping them separate. +Multi-store support avoids pushing multi-tenancy complexity onto the developer. A single agent can query personal, team, and organization knowledge simultaneously. Scoping (namespace, tenant isolation) is handled by each store's own constructor config — e.g., `BedrockKnowledgeBaseStore({ scope: 'user-123' })` or `AgentCoreMemoryStore({ namespace: 'facts/user-123' })`. -**Read-only vs. mutable stores.** Two interfaces: `KnowledgeStore` (search-only) and `MutableKnowledgeStore` (search + write + delete). This distinction makes multi-tenant patterns natural: team or org stores that are pre-populated externally are read-only, while a user's personal store is mutable and accepts new facts during conversation. Mutability is determined by whether the store has an ingestion configuration (see Knowledge Ingestion below). +One `KnowledgeStore` interface with `search()` required and `add()` / `delete()` optional. Runtime helpers narrow the type when writes are needed. This makes multi-tenant patterns natural: team or org stores that are pre-populated externally simply don't implement `add()`, while a user's personal store does. Writability at the MemoryManager level is determined by whether the store has an ingestion configuration (see Knowledge Ingestion below). + +```typescript +interface KnowledgeStore { + search(query: string, options?: Record): Promise + add?(content: string, metadata?: Record): Promise + delete?(id: string): Promise +} +``` **Shipped backends.** @@ -78,8 +86,8 @@ This works by registering a `search_memory` tool that the agent can call like an const agent = new Agent({ model, memoryManager: new MemoryManager({ - stores: [{ store, namespace: 'user-123' }], - tools: true, // Agent gets search_memory tool. + stores: [{ store, ingestion: { trigger: 'tool' } }], + includeTools: true, // Agent gets search_memory tool. }), }) ``` @@ -96,7 +104,7 @@ Enabled via `injection: true`. Each turn, MemoryManager searches stores using th const agent = new Agent({ model, memoryManager: new MemoryManager({ - stores: [{ store, namespace: 'user-123' }], + stores: [{ store, ingestion: { trigger: 'tool' } }], injection: true, // searches every turn, injects into system prompt }), }) @@ -121,9 +129,9 @@ All writes are async and non-blocking. This means a fact stored in one turn may **Deduplication.** MemoryManager tracks a per-store high-water mark: a pointer to the last message that was already processed. Each trigger only processes messages beyond that mark, preventing duplicate writes. Tool-related content blocks (`toolUse`, `toolResult`) are filtered out by default before processing, since they rarely contain user-relevant knowledge. -**Custom triggers.** For cases the built-in triggers don't cover, the store interface is public and can be called directly from any lifecycle hook. +**Custom triggers.** For cases the built-in triggers don't cover, the store interface is public and `add()` can be called directly from any lifecycle hook. -**Deletion and corrections.** `MutableKnowledgeStore.delete()` is exposed for programmatic use (compliance, cleanup), but no `delete_memory` tool is registered for the agent. Exposing deletion to the agent risks accidental data loss with no undo path. Instead, corrections are handled by storing updated facts. Newer entries take precedence via recency weighting in search results. +**Deletion and corrections.** `KnowledgeStore.delete()` is an optional method available for programmatic use (compliance, cleanup), but no `delete_memory` tool is registered for the agent. Stores that don't support deletion simply don't implement it. Exposing deletion to the agent risks accidental data loss with no undo path. Instead, corrections are handled by storing updated facts. Newer entries take precedence via recency weighting in search results. --- @@ -147,7 +155,7 @@ import { Agent, MemoryManager, InMemoryKnowledgeStore } from '@strands-agents/sd const agent = new Agent({ model, memoryManager: new MemoryManager({ - stores: [{ store: new InMemoryKnowledgeStore(), namespace: 'user-123', ingestion: { trigger: 'tool' } }], + stores: [{ store: new InMemoryKnowledgeStore(), ingestion: { trigger: 'tool' } }], }), }) // Agent now has search_memory and store_memory tools. Zero infrastructure. @@ -162,8 +170,7 @@ const agent = new Agent({ model, memoryManager: new MemoryManager({ stores: [{ - store: new BedrockKnowledgeBaseStore({ knowledgeBaseId: 'KB123', dataSourceId: 'DS456' }), - namespace: 'user-123', + store: new BedrockKnowledgeBaseStore({ knowledgeBaseId: 'KB123', dataSourceId: 'DS456', scope: 'user-123' }), ingestion: { trigger: ['tool', 'perTurn'], extractor: new ModelExtractor({ model }) }, }], }), @@ -177,16 +184,16 @@ const agent = new Agent({ model, memoryManager: new MemoryManager({ stores: [ - // Personal: learns from conversation - { store: userKB, namespace: 'user-123', ingestion: { trigger: ['tool', 'perTurn'], extractor } }, - // Team: read-only, pre-populated - { store: teamKB, namespace: 'team-marketing' }, - // Org: read-only, shared - { store: orgKB, namespace: 'org-acme' }, + // Personal: learns from conversation (scope configured on store) + { store: userKB, ingestion: { trigger: ['tool', 'perTurn'], extractor } }, + // Team: search-only, pre-populated (no ingestion = no writes) + { store: teamKB }, + // Org: search-only, shared + { store: orgKB }, ], }), }) -// search_memory queries all three, merges by store priority +// search_memory queries all three, merges by rank position // store_memory writes only to stores with 'tool' trigger ``` @@ -196,7 +203,7 @@ const agent = new Agent({ const agent = new Agent({ model, memoryManager: new MemoryManager({ - stores: [{ store, namespace: 'user-123', ingestion: { trigger: 'tool' } }], + stores: [{ store, ingestion: { trigger: 'tool' } }], injection: true, // default XML format, 2000 token budget }), }) @@ -210,15 +217,15 @@ const agent = new Agent({ new Agent({ memory: { stores: [...], injection: {...} } }) ``` -**Why rejected:** A config object doesn't provide methods (`search`, `add`, `flush`) that power users need for programmatic access. The class also owns the ingestion queue lifecycle. +**Why rejected:** A config object doesn't provide methods (`search`, `store`, `flush`) that power users need for programmatic access. The class also owns the ingestion queue lifecycle. ### 2. Single store (no multi-store orchestration) **Why rejected:** Forces multi-tenant patterns onto the developer. Multi-store is a customer ask for production agents. -### 3. Single `KnowledgeStore` interface with optional write methods +### 3. Two-interface split (`KnowledgeStore` + `MutableKnowledgeStore`) -**Why rejected:** Optional methods push type safety to runtime. The interface split makes read-only vs mutable an explicit compile-time guarantee. +The split provides compile-time guarantees that are leaky in practice for custom extensions. AgentCore Memory is event-sourced. `delete()` operates on a different entity than what `add()` creates. HindSight doesn't support deletion at all, but it would be forced to implement `delete()` that throws. The two-interface approach creates false confidence while real integrations still hits runtime failures. A single interface with optional methods and runtime helpers is simpler and more honest. ## Consequences @@ -226,7 +233,7 @@ new Agent({ memory: { stores: [...], injection: {...} } }) ### What Becomes Easier - Cross-session knowledge becomes a single parameter. No custom persistence, no manual tool registration, no vector store wiring. -- Multi-tenancy is built in via multi-store + namespacing. +- Multi-tenancy is built in via multi-store (scoping handled per-store). - Progressive complexity: `InMemoryKnowledgeStore` (prototyping) to `BedrockKnowledgeBaseStore` (production) is changing one import. ### What Becomes Harder or Requires Attention @@ -254,26 +261,33 @@ Yes. interface KnowledgeEntry { id: string content: string - score?: number - metadata?: Record + metadata?: Record // score, provenance, etc. live here } ``` -### Store Interfaces +### Store Interface ```typescript -// Read-only: for managed or pre-populated backends interface KnowledgeStore { - search(namespace: string, query: string, limit?: number): Promise + search(query: string, options?: Record): Promise + add?(content: string, metadata?: Record): Promise + delete?(id: string): Promise } -// Read+write: for self-managed backends -interface MutableKnowledgeStore extends KnowledgeStore { - add(namespace: string, content: string, metadata?: Record): Promise - delete(namespace: string, id: string): Promise -} +// Runtime type guards for narrowing +function hasAdd(store: KnowledgeStore): store is KnowledgeStore & { add(...): Promise } +function hasDelete(store: KnowledgeStore): store is KnowledgeStore & { delete(...): Promise } ``` +**`search()` options bag:** Each backend pulls what it understands from the options record: +- InMemory/File/BedrockKB read `options.limit` +- AgentCore reads `options.memoryStrategyId` +- HindSight reads `options.budget`, `options.tags` + +MemoryManager passes `{ limit: config.limit ?? 10 }` — stores use what's relevant, ignore the rest. + +**`score` is metadata:** All backends return results in relevance order. Score is informational metadata some backends provide (`metadata.score`), not a first-class field. Stores that don't produce scores just don't set it. MemoryManager trusts position for round-robin interleaving. + ### Ingestion Pipeline ```typescript @@ -293,7 +307,7 @@ interface IngestionConfig { } interface Extractor { - extract(messages: MessageData[]): Promise<{ content: string; metadata?: Record }[]> + extract(messages: MessageData[]): Promise<{ content: string; metadata?: Record }[]> } ``` @@ -302,7 +316,7 @@ interface Extractor { ```typescript interface MemoryManagerConfig { stores: StoreConfig[] - tools?: boolean | ToolsConfig // default: true (registers search_memory, store_memory) + includeTools?: boolean | ToolsConfig // default: true (registers search_memory, store_memory) injection?: boolean | InjectionConfig // default: false (opt-in passive recall) } @@ -316,19 +330,10 @@ interface ToolsConfig { store?: boolean | ToolConfig } -type StoreConfig = ReadOnlyStoreConfig | WritableStoreConfig - -interface ReadOnlyStoreConfig { +interface StoreConfig { store: KnowledgeStore - namespace: string - limit?: number // max results from this store (default: 10) -} - -interface WritableStoreConfig { - store: MutableKnowledgeStore - namespace: string limit?: number // max results from this store (default: 10) - ingestion: IngestionConfig // required: determines when writes happen + ingestion?: IngestionConfig // if present, store is a write target } interface InjectionConfig { @@ -338,7 +343,7 @@ interface InjectionConfig { } ``` -**`tools` resolution:** +**`includeTools` resolution:** - `true` (default): registers both `search_memory` and `store_memory` - `false`: no tools registered (use with injection-only) - `{ search: true, store: false }`: search only, no agent-driven writes @@ -371,7 +376,7 @@ If no substantive message exists or retrieval returns zero results, injection is ### `store_memory` tool behavior -Accepts `{ entries: string[] }` (batch). Writes fan out to all stores with `'tool'` in their trigger array. Writes are async and non-blocking. +Accepts `{ entries: string[] }` (batch). Writes fan out to all stores with `'tool'` in their trigger array. Writes are async and non-blocking. Returns `{ stored: true }` — no IDs are surfaced to the agent. ### Message filter details @@ -393,7 +398,7 @@ const myStore = new BedrockKnowledgeBaseStore({ ... }) agent.addHook(AfterToolCallEvent, async (event) => { if (event.tool.name === 'important_api') { - await myStore.add('user-123', `API result: ${summarize(event.result)}`) + await myStore.add(`API result: ${summarize(event.result)}`) } }) ```