Skip to content

fix(memory): index memories into BM25 + backfill on startup (closes #257)#258

Merged
rohitg00 merged 1 commit intomainfrom
fix/257-bm25-memories-not-indexed
May 9, 2026
Merged

fix(memory): index memories into BM25 + backfill on startup (closes #257)#258
rohitg00 merged 1 commit intomainfrom
fix/257-bm25-memories-not-indexed

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented May 9, 2026

Closes #257.

What

memory_save returned 200 with a valid memory ID, but every subsequent memory_smart_search and memory_recall returned empty results — and direct REST /agentmemory/smart-search showed the same, ruling out the MCP layer. Per the issue:

[agentmemory] info Memory saved {"memId":"mem_moy3u6ua_8c6962b668e7","type":"fact"}
[agentmemory] info Smart search compact {"query":"BM25 test","results":0}

Recall was completely broken in BM25-only mode (no embedding provider configured). High priority — this makes the entire memory feature unusable for the self-hosted-without-embeddings slice of users.

Root cause

src/functions/remember.ts::mem::remember wrote the new Memory to KV.memories but never called getSearchIndex().add(...). Same shape of bug as #228 fixed for vectors-on-observe — the row hits durable storage, the search index never sees it. Compounded by rebuildIndex() (src/functions/search.ts) walking only sessions/observations and ignoring KV.memories, so even a restart-rebuild couldn't recover the corpus.

Fix

Three minimal changes:

File Change
src/functions/remember.ts After kv.set(KV.memories, ...), synthesize a CompressedObservation from the memory and call getSearchIndex().add(). Wrapped in try/catch so an indexing hiccup never blocks the durable save.
src/functions/search.ts rebuildIndex() now walks KV.memories before sessions/observations. Skips memory.isLatest === false so superseded rows don't pollute results.
src/index.ts Startup backfill for users upgrading from <0.9.5: when persisted BM25 is non-empty (no rebuild triggered), walk KV.memories and add anything not already indexed. SearchIndex.has(id) is the idempotency gate. indexPersistence.scheduleSave() persists the augmented index.
src/state/search-index.ts New has(id: string): boolean method.

Test plan

test/remember-bm25-index.test.ts covers:

  • SearchIndex.has() before / after add()
  • Saved memory is findable by keyword search (the case the bug broke)
  • Exact repro from the issue: mem_moy3u6ua_8c6962b668e7 with query "BM25 test" → returns the hit
  • Concept-only matches (no title/content keyword overlap)
  • Pre-existing test/search-index.test.ts (9 tests) still passes
  • npx tsc --noEmit — no errors from changed files

5 new tests + 9 existing → 14/14.

Out of scope (filing as follow-ups)

  • Removing superseded memories from BM25 when mem::cascade-update fires. Currently isLatest === false filter at result time prevents stale hits from surfacing, but the inverted index keeps growing. Not a regression vs current behavior.
  • Persistence round-trip test for memories specifically — startup backfill covers the user-visible contract, but a dedicated IndexPersistence.save() → reload → memory.title-search test would tighten coverage.

Note on the user's "Issue exists since v0.9.0" finding

That tracks. Pre-v0.9.0 the path likely went through mem::observe → which DOES call getSearchIndex().add(). The mem::remember path that bypasses observation flow is newer, and the indexing call was never wired. The startup backfill in this PR retroactively fixes existing memories on next start.

Credit to @Nizar-BenHamida for the precise repro with logs and the elimination of MCP / engine layers from the diagnosis.

Summary by CodeRabbit

Release Notes

  • New Features

    • Saved memories are now automatically indexed for searchability upon creation
    • Search results expanded to include all saved memories and decisions
    • System automatically backfills previously saved memories into the search index at startup
  • Bug Fixes

    • Improved search reliability by ensuring all memories are consistently discoverable

Review Change Stack

)

memory_save returned 200 with a valid memory ID, but every subsequent
memory_smart_search and memory_recall returned empty results. Direct
REST /agentmemory/smart-search showed the same — confirming the bug was
not in the MCP layer.

Root cause: mem::remember in src/functions/remember.ts wrote the new
Memory to KV.memories but never called getSearchIndex().add(). The same
pattern that #228 fixed for vectors-on-observe was missing here for
BM25-on-remember. Compounded by rebuildIndex() walking only sessions/
observations and skipping KV.memories, so even a restart-rebuild
couldn't recover indexed memories.

Three changes:

1. src/functions/remember.ts — synthesize a CompressedObservation-shaped
   record from the saved Memory (title + content + concepts + files) and
   add it to BM25 right after kv.set(). Wrapped in try/catch so an
   indexing hiccup never blocks the durable save.

2. src/functions/search.ts — rebuildIndex() now walks KV.memories before
   sessions/observations, so a fresh rebuild covers the full corpus.
   Skips memory.isLatest === false so superseded entries don't pollute
   results.

3. src/index.ts — startup backfill for users upgrading from <0.9.5: when
   the persisted BM25 is non-empty (no rebuild triggered) but legacy
   memories were never indexed, walk KV.memories and add anything
   missing. SearchIndex.has(id) is the new idempotency gate.
   indexPersistence.scheduleSave() persists the augmented index.

Tests in test/remember-bm25-index.test.ts cover:
- SearchIndex.has() before/after add
- A saved memory is findable by keyword (the case the bug broke)
- The exact repro from the issue (mem_moy3u6ua_..., query "BM25 test")
- Concept-only matches (no title/content overlap)

Out of scope (file as follow-up):
- Removing superseded memories from BM25 when mem::cascade-update fires.
  Currently relies on isLatest filter at result time; not perfect but
  doesn't regress recall.
- Memory restore via IndexPersistence (covered transparently by the
  startup backfill, but a dedicated round-trip test would be tighter).

Reported by @Nizar-BenHamida with full repro + log capture.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentmemory Ready Ready Preview, Comment May 9, 2026 1:19pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

📝 Walkthrough

Walkthrough

Memory objects are now indexed into the BM25 search index. Indexing occurs immediately after save, during batch rebuild, and via startup backfill for pre-existing memories. A new SearchIndex.has() method enables duplicate-detection checks. Helper functions normalize Memory objects into the CompressedObservation shape expected by the indexer.

Changes

BM25 Memory Indexing Integration

Layer / File(s) Summary
Search Index API Enhancement
src/state/search-index.ts
SearchIndex exposes a new has(id: string): boolean method to check observation presence in the entries map.
Memory-to-Index Conversion Contract
src/functions/remember.ts, src/functions/search.ts, test/remember-bm25-index.test.ts
Multiple files define memoryAsIndexable helpers that convert Memory into CompressedObservation shape, normalizing type to "decision", deriving sessionId, and mapping fields to facts/narrative.
Immediate Indexing on Memory Save
src/functions/remember.ts
After persisting a new Memory, the remember function adds it to the BM25 index via getSearchIndex().add(); wrapped in try/catch to prevent save failures from indexing errors.
Batch Memory Indexing During Rebuild
src/functions/search.ts
The rebuildIndex function now loads and indexes Memory entries from KV.memories in addition to per-session observations, filtering to latest entries with required title/content and accumulating a single count.
Startup Index Backfill
src/index.ts
During startup, when BM25 already has data, the system iterates KV.memories, checks presence via bm25Index.has(), and adds missing latest memories; schedules persistence only if backfilled items are found.
BM25 Indexing Tests
test/remember-bm25-index.test.ts
Adds test helpers (memoryAsIndexable, makeMemory) and validates SearchIndex.has() behavior, BM25 keyword search recall, and concept matching including a regression case for issue #257.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rohitg00/agentmemory#78: Modifies mem::remember in src/functions/remember.ts to add ttlDays/forgetAfter handling, overlapping the same function where this PR adds immediate BM25 indexing.

Poem

🐰 A memory finds its nest,
Indexed swift, put to the test,
BM25's searching light,
Makes recall shine so bright!
Hop-hop, memories take flight! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(memory): index memories into BM25 + backfill on startup (closes #257)' accurately describes the main changes: fixing memory indexing into BM25 and implementing backfill on startup.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/257-bm25-memories-not-indexed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/functions/remember.ts (1)

16-29: ⚡ Quick win

Consider extracting memoryAsIndexable to eliminate duplication.

The same conversion logic appears in four locations:

  • src/functions/remember.ts (lines 16-29)
  • src/functions/search.ts (lines 15-28)
  • src/index.ts (inlined at lines 422-433)
  • test/remember-bm25-index.test.ts (test fixture at lines 8-21)

While the PR summary notes a circular import concern, extracting this to a neutral location like src/state/memory-indexing.ts would eliminate the maintenance burden. Any future field mapping change currently requires updating four locations.

♻️ Suggested approach

Create src/state/memory-indexing.ts:

import type { CompressedObservation, Memory } from "../types.js";

export function memoryAsIndexable(memory: Memory): CompressedObservation {
  return {
    id: memory.id,
    sessionId: memory.sessionIds[0] ?? "memory",
    timestamp: memory.createdAt,
    type: "decision",
    title: memory.title,
    facts: [memory.content],
    narrative: memory.content,
    concepts: memory.concepts,
    files: memory.files,
    importance: memory.strength,
  };
}

Then import from src/functions/remember.ts, src/functions/search.ts, and src/index.ts. Since src/state/ modules typically don't import from src/functions/, this should avoid circular dependencies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/remember.ts` around lines 16 - 29, Extract the duplicate
mapping into a single exported utility called memoryAsIndexable in a new neutral
module (e.g., memory-indexing) and replace the four inlined copies with imports
of that function; specifically, move the current memoryAsIndexable
implementation (the function that maps Memory -> CompressedObservation using id,
sessionIds[0] ?? "memory", createdAt, type "decision", title, facts [content],
narrative content, concepts, files, importance strength) into the new module and
update the callers (remember.ts, search.ts, index.ts and the test fixture) to
import and use that exported memoryAsIndexable to avoid duplication and prevent
circular imports by keeping the new module dependency-free of higher-level
function modules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/remember-bm25-index.test.ts`:
- Around line 23-39: The helper makeMemory creates two timestamps by calling new
Date().toISOString() twice; change it to capture the ISO timestamp once in a
local variable at the start of makeMemory and reuse that single value for both
createdAt and updatedAt so both fields are identical and Date is only invoked
once (update references to createdAt and updatedAt in the returned Memory object
accordingly).

---

Nitpick comments:
In `@src/functions/remember.ts`:
- Around line 16-29: Extract the duplicate mapping into a single exported
utility called memoryAsIndexable in a new neutral module (e.g., memory-indexing)
and replace the four inlined copies with imports of that function; specifically,
move the current memoryAsIndexable implementation (the function that maps Memory
-> CompressedObservation using id, sessionIds[0] ?? "memory", createdAt, type
"decision", title, facts [content], narrative content, concepts, files,
importance strength) into the new module and update the callers (remember.ts,
search.ts, index.ts and the test fixture) to import and use that exported
memoryAsIndexable to avoid duplication and prevent circular imports by keeping
the new module dependency-free of higher-level function modules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 03f2d765-759f-41e6-9146-97b52e3eed54

📥 Commits

Reviewing files that changed from the base of the PR and between ab4a166 and 0c21c56.

📒 Files selected for processing (5)
  • src/functions/remember.ts
  • src/functions/search.ts
  • src/index.ts
  • src/state/search-index.ts
  • test/remember-bm25-index.test.ts

Comment on lines +23 to +39
function makeMemory(overrides: Partial<Memory> = {}): Memory {
return {
id: "mem_test_001",
createdAt: new Date().toISOString(),
updatedAt: new Date().toISOString(),
type: "fact",
title: "BM25 test memory",
content: "BM25 search returns this memory by keyword match",
concepts: ["bm25", "search", "test"],
files: [],
sessionIds: [],
strength: 7,
version: 1,
isLatest: true,
...overrides,
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Capture timestamp once and reuse.

The makeMemory helper calls new Date().toISOString() twice (lines 26-27). As per coding guidelines, capture the timestamp once and reuse it.

📅 Proposed fix
 function makeMemory(overrides: Partial<Memory> = {}): Memory {
+  const now = new Date().toISOString();
   return {
     id: "mem_test_001",
-    createdAt: new Date().toISOString(),
-    updatedAt: new Date().toISOString(),
+    createdAt: now,
+    updatedAt: now,
     type: "fact",

As per coding guidelines: "Capture timestamps once with new Date().toISOString() and reuse instead of calling Date multiple times".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function makeMemory(overrides: Partial<Memory> = {}): Memory {
return {
id: "mem_test_001",
createdAt: new Date().toISOString(),
updatedAt: new Date().toISOString(),
type: "fact",
title: "BM25 test memory",
content: "BM25 search returns this memory by keyword match",
concepts: ["bm25", "search", "test"],
files: [],
sessionIds: [],
strength: 7,
version: 1,
isLatest: true,
...overrides,
};
}
function makeMemory(overrides: Partial<Memory> = {}): Memory {
const now = new Date().toISOString();
return {
id: "mem_test_001",
createdAt: now,
updatedAt: now,
type: "fact",
title: "BM25 test memory",
content: "BM25 search returns this memory by keyword match",
concepts: ["bm25", "search", "test"],
files: [],
sessionIds: [],
strength: 7,
version: 1,
isLatest: true,
...overrides,
};
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/remember-bm25-index.test.ts` around lines 23 - 39, The helper makeMemory
creates two timestamps by calling new Date().toISOString() twice; change it to
capture the ISO timestamp once in a local variable at the start of makeMemory
and reuse that single value for both createdAt and updatedAt so both fields are
identical and Date is only invoked once (update references to createdAt and
updatedAt in the returned Memory object accordingly).

@rohitg00 rohitg00 merged commit 1a03538 into main May 9, 2026
5 checks passed
@rohitg00 rohitg00 deleted the fix/257-bm25-memories-not-indexed branch May 9, 2026 13:23
rohitg00 added a commit that referenced this pull request May 9, 2026
Bug-fix patch focused on search recall correctness and plugin
compatibility. Pins iii-engine to v0.11.2 because v0.11.6 introduces
a new sandbox-everything-via-`iii worker add` model that agentmemory
hasn't been refactored for yet — pin lifts once that refactor lands.
Adds a hard guard against silent vector-index corruption, fixes BM25
indexing for memories saved via memory_save, and lands four Hermes
plugin fixes.

Per AGENTS.md release checklist:
- package.json version 0.9.4 -> 0.9.5
- src/version.ts VERSION constant
- src/types.ts ExportData version union
- src/functions/export-import.ts supportedVersions Set
- test/export-import.test.ts assertion
- plugin/.claude-plugin/plugin.json version
- CHANGELOG.md detailed entries with contributor shoutouts

Headlines (full detail in CHANGELOG):

Fixed:
- BM25 search now indexes memories saved via memory_save (#258, #257)
  Thanks @Nizar-BenHamida for the precise repro.
- Embedding providers no longer silently corrupt the vector index when
  an API returns wrong-dimension vectors (#248, #247, #256)
  Thanks @AmmarSaleh50 for issue + fix + tests.
- Hermes handle_tool_call returns JSON strings, not raw dicts (#255, #254)
  Thanks @KyoMio for the Anthropic-protocol repro.
- Hermes status reflects real service state on systemd installs (#253, #250)
  Thanks @OptionalCoin for tracing it to env-source divergence.
- Hermes hooks accept passthrough kwargs (#252, #249)
  Thanks @OptionalCoin again for the log analysis.
- agentmemory demo now seeds observations correctly (#251, #229)
  Thanks @seishonagon for root-cause analysis.
- LLM compression / summarization timeouts increased (#213)
  Thanks @xuli500177.
- Pi / OpenClaw / Hermes integration plugin fixes (#230)
  Thanks @deepmroot.

Changed:
- iii-engine pinned to v0.11.2 across every install path (#260).
  v0.11.6 introduces a new `iii worker add` sandbox model that
  agentmemory still pre-dates; pin lifts when we refactor agentmemory
  to register as a sandboxed worker. Override with
  AGENTMEMORY_III_VERSION=<version> for users who've migrated manually.
- README documents iii worker add extension surface (#242).
- README iii Console install/launch commands corrected (#243).

Validated: 852/852 tests pass, npm run build clean.
rohitg00 added a commit that referenced this pull request May 9, 2026
Bug-fix patch focused on search recall correctness and plugin
compatibility. Pins iii-engine to v0.11.2 because v0.11.6 introduces
a new sandbox-everything-via-`iii worker add` model that agentmemory
hasn't been refactored for yet — pin lifts once that refactor lands.
Adds a hard guard against silent vector-index corruption, fixes BM25
indexing for memories saved via memory_save, and lands four Hermes
plugin fixes.

Per AGENTS.md release checklist:
- package.json version 0.9.4 -> 0.9.5
- src/version.ts VERSION constant
- src/types.ts ExportData version union
- src/functions/export-import.ts supportedVersions Set
- test/export-import.test.ts assertion
- plugin/.claude-plugin/plugin.json version
- CHANGELOG.md detailed entries with contributor shoutouts

Headlines (full detail in CHANGELOG):

Fixed:
- BM25 search now indexes memories saved via memory_save (#258, #257)
  Thanks @Nizar-BenHamida for the precise repro.
- Embedding providers no longer silently corrupt the vector index when
  an API returns wrong-dimension vectors (#248, #247, #256)
  Thanks @AmmarSaleh50 for issue + fix + tests.
- Hermes handle_tool_call returns JSON strings, not raw dicts (#255, #254)
  Thanks @KyoMio for the Anthropic-protocol repro.
- Hermes status reflects real service state on systemd installs (#253, #250)
  Thanks @OptionalCoin for tracing it to env-source divergence.
- Hermes hooks accept passthrough kwargs (#252, #249)
  Thanks @OptionalCoin again for the log analysis.
- agentmemory demo now seeds observations correctly (#251, #229)
  Thanks @seishonagon for root-cause analysis.
- LLM compression / summarization timeouts increased (#213)
  Thanks @xuli500177.
- Pi / OpenClaw / Hermes integration plugin fixes (#230)
  Thanks @deepmroot.

Changed:
- iii-engine pinned to v0.11.2 across every install path (#260).
  v0.11.6 introduces a new `iii worker add` sandbox model that
  agentmemory still pre-dates; pin lifts when we refactor agentmemory
  to register as a sandboxed worker. Override with
  AGENTMEMORY_III_VERSION=<version> for users who've migrated manually.
- README documents iii worker add extension surface (#242).
- README iii Console install/launch commands corrected (#243).

Validated: 852/852 tests pass, npm run build clean.
rohitg00 added a commit that referenced this pull request May 9, 2026
Bug-fix patch focused on search recall correctness and plugin
compatibility. Pins iii-engine to v0.11.2 because v0.11.6 introduces
a new sandbox-everything-via-`iii worker add` model that agentmemory
hasn't been refactored for yet — pin lifts once that refactor lands.
Adds a hard guard against silent vector-index corruption, fixes BM25
indexing for memories saved via memory_save, and lands four Hermes
plugin fixes.

Per AGENTS.md release checklist:
- package.json version 0.9.4 -> 0.9.5
- src/version.ts VERSION constant
- src/types.ts ExportData version union
- src/functions/export-import.ts supportedVersions Set
- test/export-import.test.ts assertion
- plugin/.claude-plugin/plugin.json version
- CHANGELOG.md detailed entries with contributor shoutouts

Headlines (full detail in CHANGELOG):

Fixed:
- BM25 search now indexes memories saved via memory_save (#258, #257)
  Thanks @Nizar-BenHamida for the precise repro.
- Embedding providers no longer silently corrupt the vector index when
  an API returns wrong-dimension vectors (#248, #247, #256)
  Thanks @AmmarSaleh50 for issue + fix + tests.
- Hermes handle_tool_call returns JSON strings, not raw dicts (#255, #254)
  Thanks @KyoMio for the Anthropic-protocol repro.
- Hermes status reflects real service state on systemd installs (#253, #250)
  Thanks @OptionalCoin for tracing it to env-source divergence.
- Hermes hooks accept passthrough kwargs (#252, #249)
  Thanks @OptionalCoin again for the log analysis.
- agentmemory demo now seeds observations correctly (#251, #229)
  Thanks @seishonagon for root-cause analysis.
- LLM compression / summarization timeouts increased (#213)
  Thanks @xuli500177.
- Pi / OpenClaw / Hermes integration plugin fixes (#230)
  Thanks @deepmroot.

Changed:
- iii-engine pinned to v0.11.2 across every install path (#260).
  v0.11.6 introduces a new `iii worker add` sandbox model that
  agentmemory still pre-dates; pin lifts when we refactor agentmemory
  to register as a sandboxed worker. Override with
  AGENTMEMORY_III_VERSION=<version> for users who've migrated manually.
- README documents iii worker add extension surface (#242).
- README iii Console install/launch commands corrected (#243).

Validated: 852/852 tests pass, npm run build clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: BM25 Search Returns Empty Results in v0.9.4

1 participant