fix(memory): index memories into BM25 + backfill on startup (closes #257)#258
fix(memory): index memories into BM25 + backfill on startup (closes #257)#258
Conversation
) memory_save returned 200 with a valid memory ID, but every subsequent memory_smart_search and memory_recall returned empty results. Direct REST /agentmemory/smart-search showed the same — confirming the bug was not in the MCP layer. Root cause: mem::remember in src/functions/remember.ts wrote the new Memory to KV.memories but never called getSearchIndex().add(). The same pattern that #228 fixed for vectors-on-observe was missing here for BM25-on-remember. Compounded by rebuildIndex() walking only sessions/ observations and skipping KV.memories, so even a restart-rebuild couldn't recover indexed memories. Three changes: 1. src/functions/remember.ts — synthesize a CompressedObservation-shaped record from the saved Memory (title + content + concepts + files) and add it to BM25 right after kv.set(). Wrapped in try/catch so an indexing hiccup never blocks the durable save. 2. src/functions/search.ts — rebuildIndex() now walks KV.memories before sessions/observations, so a fresh rebuild covers the full corpus. Skips memory.isLatest === false so superseded entries don't pollute results. 3. src/index.ts — startup backfill for users upgrading from <0.9.5: when the persisted BM25 is non-empty (no rebuild triggered) but legacy memories were never indexed, walk KV.memories and add anything missing. SearchIndex.has(id) is the new idempotency gate. indexPersistence.scheduleSave() persists the augmented index. Tests in test/remember-bm25-index.test.ts cover: - SearchIndex.has() before/after add - A saved memory is findable by keyword (the case the bug broke) - The exact repro from the issue (mem_moy3u6ua_..., query "BM25 test") - Concept-only matches (no title/content overlap) Out of scope (file as follow-up): - Removing superseded memories from BM25 when mem::cascade-update fires. Currently relies on isLatest filter at result time; not perfect but doesn't regress recall. - Memory restore via IndexPersistence (covered transparently by the startup backfill, but a dedicated round-trip test would be tighter). Reported by @Nizar-BenHamida with full repro + log capture.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughMemory objects are now indexed into the BM25 search index. Indexing occurs immediately after save, during batch rebuild, and via startup backfill for pre-existing memories. A new ChangesBM25 Memory Indexing Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/functions/remember.ts (1)
16-29: ⚡ Quick winConsider extracting
memoryAsIndexableto eliminate duplication.The same conversion logic appears in four locations:
src/functions/remember.ts(lines 16-29)src/functions/search.ts(lines 15-28)src/index.ts(inlined at lines 422-433)test/remember-bm25-index.test.ts(test fixture at lines 8-21)While the PR summary notes a circular import concern, extracting this to a neutral location like
src/state/memory-indexing.tswould eliminate the maintenance burden. Any future field mapping change currently requires updating four locations.♻️ Suggested approach
Create
src/state/memory-indexing.ts:import type { CompressedObservation, Memory } from "../types.js"; export function memoryAsIndexable(memory: Memory): CompressedObservation { return { id: memory.id, sessionId: memory.sessionIds[0] ?? "memory", timestamp: memory.createdAt, type: "decision", title: memory.title, facts: [memory.content], narrative: memory.content, concepts: memory.concepts, files: memory.files, importance: memory.strength, }; }Then import from
src/functions/remember.ts,src/functions/search.ts, andsrc/index.ts. Sincesrc/state/modules typically don't import fromsrc/functions/, this should avoid circular dependencies.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/remember.ts` around lines 16 - 29, Extract the duplicate mapping into a single exported utility called memoryAsIndexable in a new neutral module (e.g., memory-indexing) and replace the four inlined copies with imports of that function; specifically, move the current memoryAsIndexable implementation (the function that maps Memory -> CompressedObservation using id, sessionIds[0] ?? "memory", createdAt, type "decision", title, facts [content], narrative content, concepts, files, importance strength) into the new module and update the callers (remember.ts, search.ts, index.ts and the test fixture) to import and use that exported memoryAsIndexable to avoid duplication and prevent circular imports by keeping the new module dependency-free of higher-level function modules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/remember-bm25-index.test.ts`:
- Around line 23-39: The helper makeMemory creates two timestamps by calling new
Date().toISOString() twice; change it to capture the ISO timestamp once in a
local variable at the start of makeMemory and reuse that single value for both
createdAt and updatedAt so both fields are identical and Date is only invoked
once (update references to createdAt and updatedAt in the returned Memory object
accordingly).
---
Nitpick comments:
In `@src/functions/remember.ts`:
- Around line 16-29: Extract the duplicate mapping into a single exported
utility called memoryAsIndexable in a new neutral module (e.g., memory-indexing)
and replace the four inlined copies with imports of that function; specifically,
move the current memoryAsIndexable implementation (the function that maps Memory
-> CompressedObservation using id, sessionIds[0] ?? "memory", createdAt, type
"decision", title, facts [content], narrative content, concepts, files,
importance strength) into the new module and update the callers (remember.ts,
search.ts, index.ts and the test fixture) to import and use that exported
memoryAsIndexable to avoid duplication and prevent circular imports by keeping
the new module dependency-free of higher-level function modules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 03f2d765-759f-41e6-9146-97b52e3eed54
📒 Files selected for processing (5)
src/functions/remember.tssrc/functions/search.tssrc/index.tssrc/state/search-index.tstest/remember-bm25-index.test.ts
| function makeMemory(overrides: Partial<Memory> = {}): Memory { | ||
| return { | ||
| id: "mem_test_001", | ||
| createdAt: new Date().toISOString(), | ||
| updatedAt: new Date().toISOString(), | ||
| type: "fact", | ||
| title: "BM25 test memory", | ||
| content: "BM25 search returns this memory by keyword match", | ||
| concepts: ["bm25", "search", "test"], | ||
| files: [], | ||
| sessionIds: [], | ||
| strength: 7, | ||
| version: 1, | ||
| isLatest: true, | ||
| ...overrides, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Capture timestamp once and reuse.
The makeMemory helper calls new Date().toISOString() twice (lines 26-27). As per coding guidelines, capture the timestamp once and reuse it.
📅 Proposed fix
function makeMemory(overrides: Partial<Memory> = {}): Memory {
+ const now = new Date().toISOString();
return {
id: "mem_test_001",
- createdAt: new Date().toISOString(),
- updatedAt: new Date().toISOString(),
+ createdAt: now,
+ updatedAt: now,
type: "fact",As per coding guidelines: "Capture timestamps once with new Date().toISOString() and reuse instead of calling Date multiple times".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function makeMemory(overrides: Partial<Memory> = {}): Memory { | |
| return { | |
| id: "mem_test_001", | |
| createdAt: new Date().toISOString(), | |
| updatedAt: new Date().toISOString(), | |
| type: "fact", | |
| title: "BM25 test memory", | |
| content: "BM25 search returns this memory by keyword match", | |
| concepts: ["bm25", "search", "test"], | |
| files: [], | |
| sessionIds: [], | |
| strength: 7, | |
| version: 1, | |
| isLatest: true, | |
| ...overrides, | |
| }; | |
| } | |
| function makeMemory(overrides: Partial<Memory> = {}): Memory { | |
| const now = new Date().toISOString(); | |
| return { | |
| id: "mem_test_001", | |
| createdAt: now, | |
| updatedAt: now, | |
| type: "fact", | |
| title: "BM25 test memory", | |
| content: "BM25 search returns this memory by keyword match", | |
| concepts: ["bm25", "search", "test"], | |
| files: [], | |
| sessionIds: [], | |
| strength: 7, | |
| version: 1, | |
| isLatest: true, | |
| ...overrides, | |
| }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/remember-bm25-index.test.ts` around lines 23 - 39, The helper makeMemory
creates two timestamps by calling new Date().toISOString() twice; change it to
capture the ISO timestamp once in a local variable at the start of makeMemory
and reuse that single value for both createdAt and updatedAt so both fields are
identical and Date is only invoked once (update references to createdAt and
updatedAt in the returned Memory object accordingly).
Bug-fix patch focused on search recall correctness and plugin compatibility. Pins iii-engine to v0.11.2 because v0.11.6 introduces a new sandbox-everything-via-`iii worker add` model that agentmemory hasn't been refactored for yet — pin lifts once that refactor lands. Adds a hard guard against silent vector-index corruption, fixes BM25 indexing for memories saved via memory_save, and lands four Hermes plugin fixes. Per AGENTS.md release checklist: - package.json version 0.9.4 -> 0.9.5 - src/version.ts VERSION constant - src/types.ts ExportData version union - src/functions/export-import.ts supportedVersions Set - test/export-import.test.ts assertion - plugin/.claude-plugin/plugin.json version - CHANGELOG.md detailed entries with contributor shoutouts Headlines (full detail in CHANGELOG): Fixed: - BM25 search now indexes memories saved via memory_save (#258, #257) Thanks @Nizar-BenHamida for the precise repro. - Embedding providers no longer silently corrupt the vector index when an API returns wrong-dimension vectors (#248, #247, #256) Thanks @AmmarSaleh50 for issue + fix + tests. - Hermes handle_tool_call returns JSON strings, not raw dicts (#255, #254) Thanks @KyoMio for the Anthropic-protocol repro. - Hermes status reflects real service state on systemd installs (#253, #250) Thanks @OptionalCoin for tracing it to env-source divergence. - Hermes hooks accept passthrough kwargs (#252, #249) Thanks @OptionalCoin again for the log analysis. - agentmemory demo now seeds observations correctly (#251, #229) Thanks @seishonagon for root-cause analysis. - LLM compression / summarization timeouts increased (#213) Thanks @xuli500177. - Pi / OpenClaw / Hermes integration plugin fixes (#230) Thanks @deepmroot. Changed: - iii-engine pinned to v0.11.2 across every install path (#260). v0.11.6 introduces a new `iii worker add` sandbox model that agentmemory still pre-dates; pin lifts when we refactor agentmemory to register as a sandboxed worker. Override with AGENTMEMORY_III_VERSION=<version> for users who've migrated manually. - README documents iii worker add extension surface (#242). - README iii Console install/launch commands corrected (#243). Validated: 852/852 tests pass, npm run build clean.
Bug-fix patch focused on search recall correctness and plugin compatibility. Pins iii-engine to v0.11.2 because v0.11.6 introduces a new sandbox-everything-via-`iii worker add` model that agentmemory hasn't been refactored for yet — pin lifts once that refactor lands. Adds a hard guard against silent vector-index corruption, fixes BM25 indexing for memories saved via memory_save, and lands four Hermes plugin fixes. Per AGENTS.md release checklist: - package.json version 0.9.4 -> 0.9.5 - src/version.ts VERSION constant - src/types.ts ExportData version union - src/functions/export-import.ts supportedVersions Set - test/export-import.test.ts assertion - plugin/.claude-plugin/plugin.json version - CHANGELOG.md detailed entries with contributor shoutouts Headlines (full detail in CHANGELOG): Fixed: - BM25 search now indexes memories saved via memory_save (#258, #257) Thanks @Nizar-BenHamida for the precise repro. - Embedding providers no longer silently corrupt the vector index when an API returns wrong-dimension vectors (#248, #247, #256) Thanks @AmmarSaleh50 for issue + fix + tests. - Hermes handle_tool_call returns JSON strings, not raw dicts (#255, #254) Thanks @KyoMio for the Anthropic-protocol repro. - Hermes status reflects real service state on systemd installs (#253, #250) Thanks @OptionalCoin for tracing it to env-source divergence. - Hermes hooks accept passthrough kwargs (#252, #249) Thanks @OptionalCoin again for the log analysis. - agentmemory demo now seeds observations correctly (#251, #229) Thanks @seishonagon for root-cause analysis. - LLM compression / summarization timeouts increased (#213) Thanks @xuli500177. - Pi / OpenClaw / Hermes integration plugin fixes (#230) Thanks @deepmroot. Changed: - iii-engine pinned to v0.11.2 across every install path (#260). v0.11.6 introduces a new `iii worker add` sandbox model that agentmemory still pre-dates; pin lifts when we refactor agentmemory to register as a sandboxed worker. Override with AGENTMEMORY_III_VERSION=<version> for users who've migrated manually. - README documents iii worker add extension surface (#242). - README iii Console install/launch commands corrected (#243). Validated: 852/852 tests pass, npm run build clean.
Bug-fix patch focused on search recall correctness and plugin compatibility. Pins iii-engine to v0.11.2 because v0.11.6 introduces a new sandbox-everything-via-`iii worker add` model that agentmemory hasn't been refactored for yet — pin lifts once that refactor lands. Adds a hard guard against silent vector-index corruption, fixes BM25 indexing for memories saved via memory_save, and lands four Hermes plugin fixes. Per AGENTS.md release checklist: - package.json version 0.9.4 -> 0.9.5 - src/version.ts VERSION constant - src/types.ts ExportData version union - src/functions/export-import.ts supportedVersions Set - test/export-import.test.ts assertion - plugin/.claude-plugin/plugin.json version - CHANGELOG.md detailed entries with contributor shoutouts Headlines (full detail in CHANGELOG): Fixed: - BM25 search now indexes memories saved via memory_save (#258, #257) Thanks @Nizar-BenHamida for the precise repro. - Embedding providers no longer silently corrupt the vector index when an API returns wrong-dimension vectors (#248, #247, #256) Thanks @AmmarSaleh50 for issue + fix + tests. - Hermes handle_tool_call returns JSON strings, not raw dicts (#255, #254) Thanks @KyoMio for the Anthropic-protocol repro. - Hermes status reflects real service state on systemd installs (#253, #250) Thanks @OptionalCoin for tracing it to env-source divergence. - Hermes hooks accept passthrough kwargs (#252, #249) Thanks @OptionalCoin again for the log analysis. - agentmemory demo now seeds observations correctly (#251, #229) Thanks @seishonagon for root-cause analysis. - LLM compression / summarization timeouts increased (#213) Thanks @xuli500177. - Pi / OpenClaw / Hermes integration plugin fixes (#230) Thanks @deepmroot. Changed: - iii-engine pinned to v0.11.2 across every install path (#260). v0.11.6 introduces a new `iii worker add` sandbox model that agentmemory still pre-dates; pin lifts when we refactor agentmemory to register as a sandboxed worker. Override with AGENTMEMORY_III_VERSION=<version> for users who've migrated manually. - README documents iii worker add extension surface (#242). - README iii Console install/launch commands corrected (#243). Validated: 852/852 tests pass, npm run build clean.
Closes #257.
What
memory_savereturned 200 with a valid memory ID, but every subsequentmemory_smart_searchandmemory_recallreturned empty results — and direct REST/agentmemory/smart-searchshowed the same, ruling out the MCP layer. Per the issue:Recall was completely broken in BM25-only mode (no embedding provider configured). High priority — this makes the entire memory feature unusable for the self-hosted-without-embeddings slice of users.
Root cause
src/functions/remember.ts::mem::rememberwrote the newMemorytoKV.memoriesbut never calledgetSearchIndex().add(...). Same shape of bug as #228 fixed for vectors-on-observe — the row hits durable storage, the search index never sees it. Compounded byrebuildIndex()(src/functions/search.ts) walking only sessions/observations and ignoringKV.memories, so even a restart-rebuild couldn't recover the corpus.Fix
Three minimal changes:
src/functions/remember.tskv.set(KV.memories, ...), synthesize aCompressedObservationfrom the memory and callgetSearchIndex().add(). Wrapped intry/catchso an indexing hiccup never blocks the durable save.src/functions/search.tsrebuildIndex()now walksKV.memoriesbefore sessions/observations. Skipsmemory.isLatest === falseso superseded rows don't pollute results.src/index.tsKV.memoriesand add anything not already indexed.SearchIndex.has(id)is the idempotency gate.indexPersistence.scheduleSave()persists the augmented index.src/state/search-index.tshas(id: string): booleanmethod.Test plan
test/remember-bm25-index.test.tscovers:SearchIndex.has()before / afteradd()mem_moy3u6ua_8c6962b668e7with query"BM25 test"→ returns the hittest/search-index.test.ts(9 tests) still passesnpx tsc --noEmit— no errors from changed files5 new tests + 9 existing → 14/14.
Out of scope (filing as follow-ups)
mem::cascade-updatefires. CurrentlyisLatest === falsefilter at result time prevents stale hits from surfacing, but the inverted index keeps growing. Not a regression vs current behavior.IndexPersistence.save()→ reload →memory.title-search test would tighten coverage.Note on the user's "Issue exists since v0.9.0" finding
That tracks. Pre-v0.9.0 the path likely went through
mem::observe→ which DOES callgetSearchIndex().add(). Themem::rememberpath that bypasses observation flow is newer, and the indexing call was never wired. The startup backfill in this PR retroactively fixes existing memories on next start.Credit to @Nizar-BenHamida for the precise repro with logs and the elimination of MCP / engine layers from the diagnosis.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes