fix(search): BM25 unicode + vector index live-write — improved (closes #295, supersedes #296) by rohitg00 · Pull Request #327 · rohitg00/agentmemory

rohitg00 · 2026-05-13T11:27:11Z

What

Picks up @nik1t7n's PR #296 (BM25 Unicode tokenizer + vector-index live-write) — rebased onto current main, hardens three sharp edges before merge, keeps the original test suite intact.

Closes #295. Supersedes #296. Full credit to @nik1t7n for the bug report + initial fix; this PR's commits build on top.

Bugs fixed (from #295)

Bug 1 — BM25 tokenizer strips non-ASCII (Cyrillic, Greek, CJK, Arabic, accented Latin)

src/state/search-index.ts:226 used \w in the replace regex. JS \w is ASCII-only — every Unicode letter outside [a-zA-Z0-9_] collapsed to a space during tokenization. Non-English searches returned empty.

Fix (from #296, kept verbatim):

// before
.replace(/[^\w\s/.\\-_]/g, " ")
// after
.replace(/[^\p{L}\p{N}\s/.\\-_]/gu, " ")

Preserves Unicode letters / numbers, keeps underscore (matters for code identifiers like memory_recall), keeps /, ., \, -. Symmetric for index AND query path — same tokenizer, no recall asymmetry.

CJK note: this fix recovers Cyrillic, Greek, accented Latin, Hebrew, Arabic — anything that uses spaces between words. CJK still tokenizes as a single sentence-long token because there are no spaces between characters. CJK needs a segmenter (bigram / dictionary) — that's a separate concern (see PR #224 attempt).

Bug 2 — `VectorIndex.add()` never called at runtime

VectorIndex shipped with a working add() method, was instantiated in src/index.ts, and was persisted via IndexPersistence — but zero callers anywhere in src/. The vector index was populated ONLY from disk at startup; any new memory or observation written at runtime never made it into the vector index. memory_search with vector/hybrid mode returned stale results forever; cold start with no persisted vectors meant the vector index stayed empty.

Fix: wire vectorIndex.add() (via the new guarded helper, see below) into remember.ts, observe.ts synthetic-compression branch, compress.ts post-LLM, and search.ts:rebuildIndex (both memories + observations loops). Global setVectorIndex / setEmbeddingProvider singletons in search.ts so the helper can read the current state without threading through every call site.

Improvements layered on top of #296

#296 inlined vi.add(...) at four sites with the same try/catch shape repeated each time. Three sharp edges open-coded at every site:

No dimension guard at the write boundary. Per fix(embedding): guard provider responses against dimension mismatches #248, persistence-load refuses mismatched dimensions — but the live-write path had no symmetric check. A mis-configured embedder returning the wrong-length Float32Array would silently corrupt the in-memory index until next restart.
No input clipping. Memory.content can be 20KB+ (MemorySlot defaults). Most embedding APIs cap input around 8k tokens. Without clipping: 400 errors at the provider, dropped vector entries, wasted local-Ollama inference time.
Inconsistent error logging. rebuildIndex used catch {} — swallowed every embed failure silently during cold rebuilds. Other three sites had different logger.warn shapes.

Consolidates into vectorIndexAddGuarded(id, sessionId, text, { kind, logId }) in src/functions/search.ts:

Clips input to 16k chars (EMBED_MAX_CHARS) — well under every provider's documented limit, leaves chars-per-token headroom for non-Latin scripts where one Unicode char is multiple tokens.
Validates embedding.length === provider.dimensions before vi.add(); logs structured warning + skips on mismatch.
Single logger.warn on embed failure (provider + error + kind + id).
Returns boolean for future-callers that want to branch; current call sites fire-and-forget — soft-fail contract: index entry skipped, upstream save still succeeds.

All four call sites rewritten to one-line invocations. rebuildIndex catch {} removed — embed failures now surface consistently.

Out-of-scope (deferred follow-ups)

Blocking await on every save. Embed call is on hot path; adds 200ms-2s per remember/observe on network providers. Should be async / fire-and-forget with write-behind queue. Separate perf PR.
Periodic persistence flush of the vector index. Today: serialized at process exit. Runtime adds between flushes survive in memory, die on crash. Future debounced-flush PR.
Batching in rebuildIndex. Today: one embed() per doc, sequential. embedBatch() exists on every provider, ~5-10x faster on cold restart. Defer.

Migration

Existing indexes built with the ASCII tokenizer have zero tokens for non-ASCII docs. Fix only helps NEW writes unless mem::rebuild-index is invoked. README troubleshooting note may follow up.

Test plan

npm test — 886 / 886 (3 new tests from fix: BM25 tokenizer strips non-ASCII text; vector index never populated on write #296 + 1 mock fix in test/multimodal.test.ts to expose vectorIndexAddGuarded on the search.js mock).
npm run build — tsdown clean.
Local rebase against main (5 commits replayed cleanly).
Smoke test in a Russian / Korean / Arabic project after merge.

Summary by CodeRabbit

New Features
- Optional semantic/vector indexing and search added; memories, observations, and synthetic compressions are also indexed. Startup now initializes the embedding/vector layer so it's available to search.
Reliability
- Vector-indexing and embedding operations are guarded and soft-fail (log warnings) so failures don't break workflows or rebuilds.
Quality
- Improved Unicode-aware tokenization for better multilingual search.
Tests
- New tests for vector-index population, migration, embedding dimensions, and multilingual tokenization.

…regex Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

…ldIndex Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

…dex in index.ts Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

…, rebuildIndex vector test Adds migrateVectorIndex utility for re-embedding when embedding provider dimensions change. Adds comprehensive test coverage for Cyrillic tokenization, vector index population on remember/rebuildIndex, and migration edge cases. Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

@nik1t7n

…arded Pre-merge hardening on top of @nik1t7n's PR #296. Original PR had vectorIndex.add() inlined at four sites (remember, observe, compress, rebuildIndex) with the same try/catch shape repeated each time. That open-codes three sharp edges per site: 1. Missing dimension guard at the write boundary. Per #248, persistence load already refuses to start when persisted vectors mismatch the active provider's dimensions — but the live-write path had no symmetric check. A mis-configured embedding provider returning a different-length Float32Array would silently corrupt the in-memory index until next restart cleared it. 2. No input clipping. Memory.content can be arbitrarily large (the MemorySlot path even allows 20KB content per slot). Most embedding providers cap input around 8k tokens and 400 on overflow. A single oversize memory could throw at the provider and lose the vector entry; worse, on local-Ollama-style providers it can cost real inference time. 3. Inconsistent error logging (rebuildIndex had `catch {}` swallowing the error entirely, the other three sites logged via logger.warn with slightly different shape). Consolidates into a single exported `vectorIndexAddGuarded()` in src/functions/search.ts: - Reads vectorIndex + currentEmbeddingProvider from module-level state (same singletons the PR added). - Clips embed input to 16k chars (EMBED_MAX_CHARS) — well under every provider's documented limit, leaves headroom for chars-per-token variance across languages. - Validates embedding.length === provider.dimensions before calling vi.add(); logs structured warning and skips on mismatch. - Single logger.warn on embed failure with provider name + error message + indexed-item kind ("memory" | "observation" | "synthetic") + id. - Returns boolean (true on success) so callers can branch if they want, but every existing call site fire-and-forgets — the contract is "soft-fail: index entry skipped, upstream save still succeeds". Rewrites the four call sites (remember.ts, observe.ts synthetic branch, compress.ts post-LLM, search.ts rebuildIndex memories + obs loops) to call vectorIndexAddGuarded() instead of the inline try/catch. Side effect: cleans up the lingering `catch {}` in rebuildIndex that was swallowing rebuild-time embed failures silently — every failure now surfaces in logs at a consistent shape. Tests: - test/multimodal.test.ts mocked `../src/functions/search.js` and exposed only getSearchIndex; needed vectorIndexAddGuarded added to the mock surface or test fails with "No vectorIndexAddGuarded export is defined on the search.js mock". Added the export to the mock. - All other tests (search-index Cyrillic, search rebuildIndex vector, vector-index-populate, migrate-vector-index) green on the helper refactor — no behavioural change beyond the three guards above. 886 / 886 tests pass. Build clean. Out-of-scope (deferred to follow-ups): - Blocking await on every save. The embed call is on the request hot path. For local Ollama / on-device providers this is fine; for network providers it adds 200ms-2s latency per remember/observe. Should be async / fire-and-forget with a write-behind queue — separate perf PR after this lands. - Periodic persistence flush of the vector index. Today the index is serialized at process exit (per IndexPersistence). Runtime adds between flushes survive in memory but die on crash. Acceptable for this PR; future PR can add a debounced flush. - Batching in rebuildIndex. Today the rebuild path embeds one doc at a time. embedBatch() exists on every provider and would be ~5-10x faster for cold restarts on large corpora. Defer.

vercel · 2026-05-13T11:27:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agentmemory	Ready	Preview, Comment	May 13, 2026 2:23pm

coderabbitai · 2026-05-13T11:27:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1705141d-b8fb-4a96-a576-8a14a9dee4d1

📥 Commits

Reviewing files that changed from the base of the PR and between 07e1ea5 and b44fd99.

📒 Files selected for processing (1)

test/search-index.test.ts

📝 Walkthrough

Walkthrough

Adds optional vector-search embedding/indexing alongside BM25: module-level vector index and embedder state, guarded vector index writes, startup wiring, rebuild/migration utilities, Unicode-aware tokenization, and tests for population/migration/tokenization.

Changes

Vector Search Indexing Integration

Layer / File(s)	Summary
Vector index state & guarded add helper `src/functions/search.ts`	Adds module state for `VectorIndex` and `EmbeddingProvider`, getters/setters, `clipEmbedInput`, and `vectorIndexAddGuarded` that embeds clipped text, validates dimensions, writes to the vector index, and soft-fails on errors. `rebuildIndex` clearing is included here.
Startup wiring of vector index and embedder `src/index.ts`	Wires `setVectorIndex` and `setEmbeddingProvider` during startup to initialize search module state.
Remember / Observe / Compress integration `src/functions/remember.ts`, `src/functions/observe.ts`, `src/functions/compress.ts`	After BM25 `getSearchIndex().add(...)`, these flows call `vectorIndexAddGuarded` to add combined `title+content` or `title+narrative` embeddings with context metadata.
Rebuild index & migrate vector index `src/functions/search.ts`, `src/functions/migrate-vector-index.ts`	`rebuildIndex` clears the vector index and attempts guarded writes; `migrateVectorIndex` re-embeds memories and observations using a provided `EmbeddingProvider` and returns processed/failed counts, vector size, and failed sessions.
Unicode tokenization & tests `src/state/search-index.ts`, `test/*.test.ts`	Search tokenization uses a Unicode-aware regex; tests added/updated for Greek indexing, vector index population on remember, rebuildIndex vector population, migrateVectorIndex behavior, and test mocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

BM25 tokenizer strips non-ASCII text; vector index never populated on write #295: Implements guarded vector-index writes, embedding wiring, Unicode tokenizer, and migrateVectorIndex — matches this PR's scope.

Possibly related PRs

rohitg00/agentmemory#140: Related changes to synthetic compression/indexing in observe.ts; this PR also indexes synthetic compressions into the vector index.
fix(memory): index memories into BM25 + backfill on startup (closes #257) #258: Overlap in memory indexing/backfill and rebuild flows; both touch indexing paths.
fix: resolve 8 performance and architecture issues #77: Related edits to src/functions/search.ts and rebuild logic; code-level overlap with vector-index rebuild changes.

Poem

"I'm a rabbit with a tiny quill,
I clip the text and feed the hill,
From Greek and glyph to memory streams,
I stitch embeddings into dreams,
Hop, index, done — a searcher's thrill!" 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main changes: BM25 Unicode support and vector index live-write functionality, with explicit references to the issues being resolved.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/bm25-cyrillic-vector-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

src/functions/remember.ts (1)
115-129: 💤 Low value

Consider audit tracking and parallelization opportunities.

Two optional improvements:

Audit trail: The coding guidelines state that state-changing operations should call recordAudit() for audit trail tracking. The mem::forget function (line 208) records audits, but mem::remember does not track the creation of new memories. If audit trails are desired for memory creation, consider adding a recordAudit() call after line 100.

Parallelization: The vector indexing (lines 115-120) and cascade trigger (lines 123-129) are independent operations that could run in parallel using Promise.allSettled() to reduce latency.

As per coding guidelines: "Use parallel operations where possible with Promise.all for independent kv writes/reads" and "Record state-changing operations with recordAudit() for audit trail tracking".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/remember.ts` around lines 115 - 129, Add an audit record for
the new memory in mem::remember by calling recordAudit(...) right after the
memory creation/DB write (use the memory.id and context info) to track the
creation, and change the sequential calls to vectorIndexAddGuarded(...) and the
sdk.trigger({ function_id: "mem::cascade-update", ... }) invocation so they run
in parallel using Promise.allSettled([...]) (keep both operations as independent
promises to avoid failing one when the other succeeds); reference the existing
vectorIndexAddGuarded and sdk.trigger calls and ensure any errors from
Promise.allSettled are logged/handled.
test/vector-index-dimensions.test.ts (1)
78-102: ⚡ Quick win

Strengthen test assertions for migration behavior.

The test has two opportunities for improvement:

Lines 99-100 use toBeGreaterThan(0) when exactly 1 observation is created, so toBe(1) would be more precise.

The test doesn't verify that the migrated vectors actually have the new 4D dimensions from newProvider. Consider adding expect(result.vectorSize).toBe(1) and validating that the resulting index contains 4D vectors via oldIndex.validateDimensions(4).
♻️ More specific assertions
     const result = await migrateVectorIndex(kv as never, oldIndex, newProvider);
     expect(result.success).toBe(true);
-    expect(result.totalProcessed).toBeGreaterThan(0);
-    expect(result.vectorSize).toBeGreaterThan(0);
+    expect(result.totalProcessed).toBe(1);
+    expect(result.vectorSize).toBe(1);
     expect(result.failed).toBe(0);
+    // Verify new dimensions were applied
+    const validation = oldIndex.validateDimensions(4);
+    expect(validation.mismatches).toEqual([]);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/vector-index-dimensions.test.ts` around lines 78 - 102, Replace the
loose "greater than 0" assertions with precise checks that reflect the single
observation and 4D embedding: assert totalProcessed is 1
(expect(result.totalProcessed).toBe(1)), assert vectorSize equals 4
(expect(result.vectorSize).toBe(4)) and after migration verify the index
dimensions by calling the VectorIndex validation helper on the returned/modified
index (e.g., oldIndex.validateDimensions(4) or the appropriate returned index
instance) to confirm embeddings are 4-dimensional; update references to
migrateVectorIndex, oldIndex, newProvider, result.vectorSize and
oldIndex.validateDimensions accordingly.
test/vector-index-populate.test.ts (1)
60-80: 💤 Low value

Consider using beforeEach/afterEach for test isolation.

The test manually cleans up global state (lines 78-79) after each test. While this works, using beforeEach/afterEach hooks would provide more robust test isolation and prevent state leakage if a test fails before reaching cleanup.
♻️ Example with lifecycle hooks
 describe("vector index population on remember", () => {
   const mockEmbedder: EmbeddingProvider = {
     name: "test",
     dimensions: 3,
     embed: async (_text: string) => new Float32Array([0.1, 0.2, 0.3]),
     embedBatch: async (_texts: string[]) =>
       _texts.map(() => new Float32Array([0.1, 0.2, 0.3])),
   };
+
+  afterEach(() => {
+    setVectorIndex(null);
+    setEmbeddingProvider(null);
+  });

   it("calls vectorIndex.add() when remember saves a memory", async () => {
     // ... test code ...
     expect(vi.size).toBe(1);
-
-    // Cleanup
-    setVectorIndex(null);
-    setEmbeddingProvider(null);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/vector-index-populate.test.ts` around lines 60 - 80, The test mutates
global state (setVectorIndex/setEmbeddingProvider) and manually cleans up in the
test body; change to use beforeEach to create and set a fresh VectorIndex and
embedding provider (e.g., call setVectorIndex(new VectorIndex()) and
setEmbeddingProvider(mockEmbedder) in beforeEach) and use afterEach to reset
them to null (setVectorIndex(null); setEmbeddingProvider(null)), and remove the
manual cleanup lines from the test; keep the rest of the test (mockSdk, mockKV,
registerRememberFunction, and expectations) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/compress.ts`:
- Around line 176-183: The BM25 and vector indexing calls after persisting the
compressed observation should be wrapped in defensive try/catch blocks so
indexing failures don't abort the compression flow; specifically, surround
getSearchIndex().add(compressed) in a try/catch (log the error and continue) and
likewise wrap the vectorIndexAddGuarded(...) call in its own try/catch (log the
error and continue), using the same pattern as the BM25 handling in remember.ts;
include identifying info like compressed.id, compressed.sessionId and
compressed.title in the log messages to aid debugging.

In `@src/functions/migrate-vector-index.ts`:
- Around line 26-29: The filter and mapping for embeddings should ensure memory
content exists: update the filter that defines textMems to require m.content
(e.g., (m) => m.isLatest !== false && m.title && m.content && m.content.trim()
!== "") or alternatively keep the broader filter but change the texts mapping to
use a safe fallback like (m.title + " " + (m.content ?? "")). Refer to the
textMems variable and the texts mapping to implement the check/fallback so you
never concatenate "undefined" into the embedding input.
- Line 74: The return object always sets success: true regardless of failures;
update the return in migrate-vector-index (the line returning { success: true,
totalProcessed: processed, failed, vectorSize: newIndex.size }) to compute
success based on the failed counter (e.g., success: failed === 0) so callers can
rely on the boolean instead of inspecting failed separately; ensure you only
change the success expression and keep totalProcessed: processed, failed, and
vectorSize: newIndex.size unchanged.

In `@src/functions/observe.ts`:
- Around line 241-246: The call to vectorIndexAddGuarded(synthetic.id,
synthetic.sessionId, synthetic.title + " " + (synthetic.narrative || ""), {
kind: "synthetic", logId: synthetic.id }) should be wrapped in a try/catch so
embedding/indexing errors don't abort the synthetic compression flow; catch any
error and emit a warning log (similar to patterns in remember.ts and
compress.ts) indicating the synthetic id and error, but do not rethrow—ensure
the function continues since the observation was already persisted and
BM25-indexed.

In `@src/functions/remember.ts`:
- Around line 115-120: The vector indexing call using vectorIndexAddGuarded with
arguments (memory.id, memory.sessionIds[0] ?? "memory", memory.title + " " +
memory.content, { kind: "memory", logId: memory.id }) should be wrapped in a
try/catch so indexing failures don't abort the overall mem::remember flow; catch
any error and call processLogger.warn (or similar existing logger) with a
concise message including memory.id and the error, mirroring the BM25 try/catch
pattern used earlier for soft-failing index operations.

In `@src/functions/search.ts`:
- Around line 105-110: rebuildIndex currently clears BM25 but does not reset the
vector store, so stale embeddings remain; before you start adding with
vectorIndexAddGuarded in rebuildIndex, call the vector-index reset/clear API
(e.g., vectorIndexReset or vectorIndexClear — whichever exists in our codebase)
for the target index (use the same identifier you pass to vectorIndexAddGuarded,
e.g., memory.id or session id) to wipe existing vectors, then repopulate; apply
the same reset-before-add change to the other spot that calls
vectorIndexAddGuarded (the second block that appends vectors) so the vector
store and BM25 stay in sync.

In `@test/vector-index-dimensions.test.ts`:
- Line 70: Replace the no-op mockKV.delete with a real deletion that removes the
key from the same in-memory store used by the mock (so it mirrors the
implementation in vector-index-populate.test.ts); implement delete as an async
function matching the signature (delete: async (_scope: string, _key: string):
Promise<void>) that looks up the underlying store and removes the entry for the
provided key (ensuring awaits/Promise resolution if the store API is async) so
tests relying on migrateVectorIndex or other mutation behavior observe actual
deletions.

---

Nitpick comments:
In `@src/functions/remember.ts`:
- Around line 115-129: Add an audit record for the new memory in mem::remember
by calling recordAudit(...) right after the memory creation/DB write (use the
memory.id and context info) to track the creation, and change the sequential
calls to vectorIndexAddGuarded(...) and the sdk.trigger({ function_id:
"mem::cascade-update", ... }) invocation so they run in parallel using
Promise.allSettled([...]) (keep both operations as independent promises to avoid
failing one when the other succeeds); reference the existing
vectorIndexAddGuarded and sdk.trigger calls and ensure any errors from
Promise.allSettled are logged/handled.

In `@test/vector-index-dimensions.test.ts`:
- Around line 78-102: Replace the loose "greater than 0" assertions with precise
checks that reflect the single observation and 4D embedding: assert
totalProcessed is 1 (expect(result.totalProcessed).toBe(1)), assert vectorSize
equals 4 (expect(result.vectorSize).toBe(4)) and after migration verify the
index dimensions by calling the VectorIndex validation helper on the
returned/modified index (e.g., oldIndex.validateDimensions(4) or the appropriate
returned index instance) to confirm embeddings are 4-dimensional; update
references to migrateVectorIndex, oldIndex, newProvider, result.vectorSize and
oldIndex.validateDimensions accordingly.

In `@test/vector-index-populate.test.ts`:
- Around line 60-80: The test mutates global state
(setVectorIndex/setEmbeddingProvider) and manually cleans up in the test body;
change to use beforeEach to create and set a fresh VectorIndex and embedding
provider (e.g., call setVectorIndex(new VectorIndex()) and
setEmbeddingProvider(mockEmbedder) in beforeEach) and use afterEach to reset
them to null (setVectorIndex(null); setEmbeddingProvider(null)), and remove the
manual cleanup lines from the test; keep the rest of the test (mockSdk, mockKV,
registerRememberFunction, and expectations) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f12aec1-1f4b-4288-b708-39146f804f4e

📥 Commits

Reviewing files that changed from the base of the PR and between 25dddc4 and 6fd7906.

📒 Files selected for processing (12)

src/functions/compress.ts
src/functions/migrate-vector-index.ts
src/functions/observe.ts
src/functions/remember.ts
src/functions/search.ts
src/index.ts
src/state/search-index.ts
test/multimodal.test.ts
test/search-index.test.ts
test/search.test.ts
test/vector-index-dimensions.test.ts
test/vector-index-populate.test.ts

coderabbitai · 2026-05-13T11:31:21Z

        getSearchIndex().add(compressed);

+        await vectorIndexAddGuarded(
+          compressed.id,
+          compressed.sessionId,
+          compressed.title + " " + (compressed.narrative || ""),
+          { kind: "observation", logId: compressed.id },
+        );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add error handling for both BM25 and vector indexing.

Both the BM25 indexing (line 176) and vector indexing (lines 178-183) lack try/catch blocks. If either throws, the compression operation will fail even though the compressed observation was already persisted to KV at lines 170-174.

In remember.ts, the BM25 indexing is wrapped in try/catch (lines 107-114). Apply the same defensive pattern here to ensure indexing failures don't break the compression flow.

🛡️ Proposed fix

compressed, ); - getSearchIndex().add(compressed); - - await vectorIndexAddGuarded( - compressed.id, - compressed.sessionId, - compressed.title + " " + (compressed.narrative || ""), - { kind: "observation", logId: compressed.id }, - ); + try { + getSearchIndex().add(compressed); + } catch (err) { + logger.warn("Failed to index compressed observation into BM25", { + obsId: compressed.id, + error: err instanceof Error ? err.message : String(err), + }); + } + + try { + await vectorIndexAddGuarded( + compressed.id, + compressed.sessionId, + compressed.title + " " + (compressed.narrative || ""), + { kind: "observation", logId: compressed.id }, + ); + } catch (err) { + logger.warn("Failed to index compressed observation into vector index", { + obsId: compressed.id, + error: err instanceof Error ? err.message : String(err), + }); + } const streamResults = await Promise.allSettled([

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

getSearchIndex().add(compressed);

await vectorIndexAddGuarded(

compressed.id,

compressed.sessionId,

compressed.title + " " + (compressed.narrative || ""),

{ kind: "observation", logId: compressed.id },

);

try {

getSearchIndex().add(compressed);

} catch (err) {

logger.warn("Failed to index compressed observation into BM25", {

obsId: compressed.id,

error: err instanceof Error ? err.message : String(err),

});

}

try {

await vectorIndexAddGuarded(

compressed.id,

compressed.sessionId,

compressed.title + " " + (compressed.narrative || ""),

{ kind: "observation", logId: compressed.id },

);

} catch (err) {

logger.warn("Failed to index compressed observation into vector index", {

obsId: compressed.id,

error: err instanceof Error ? err.message : String(err),

});

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/compress.ts` around lines 176 - 183, The BM25 and vector indexing calls after persisting the compressed observation should be wrapped in defensive try/catch blocks so indexing failures don't abort the compression flow; specifically, surround getSearchIndex().add(compressed) in a try/catch (log the error and continue) and likewise wrap the vectorIndexAddGuarded(...) call in its own try/catch (log the error and continue), using the same pattern as the BM25 handling in remember.ts; include identifying info like compressed.id, compressed.sessionId and compressed.title in the log messages to aid debugging.

coderabbitai · 2026-05-13T11:31:21Z

+          await vectorIndexAddGuarded(
+            synthetic.id,
+            synthetic.sessionId,
+            synthetic.title + " " + (synthetic.narrative || ""),
+            { kind: "synthetic", logId: synthetic.id },
+          );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap vectorIndexAddGuarded in try/catch to prevent synthetic compression failures.

The vector indexing call lacks error handling. If vectorIndexAddGuarded throws, the entire synthetic compression flow will fail, even though the observation was already persisted to KV at line 235 and indexed in BM25 at line 240.

This is the same pattern as in remember.ts and compress.ts. Wrap the call in try/catch with a warning to ensure embedding failures don't break observation capture.

🛡️ Proposed fix

getSearchIndex().add(synthetic); - await vectorIndexAddGuarded( - synthetic.id, - synthetic.sessionId, - synthetic.title + " " + (synthetic.narrative || ""), - { kind: "synthetic", logId: synthetic.id }, - ); + try { + await vectorIndexAddGuarded( + synthetic.id, + synthetic.sessionId, + synthetic.title + " " + (synthetic.narrative || ""), + { kind: "synthetic", logId: synthetic.id }, + ); + } catch (err) { + logger.warn("Failed to index synthetic observation into vector index", { + obsId: synthetic.id, + error: err instanceof Error ? err.message : String(err), + }); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

await vectorIndexAddGuarded(

synthetic.id,

synthetic.sessionId,

synthetic.title + " " + (synthetic.narrative || ""),

{ kind: "synthetic", logId: synthetic.id },

);

getSearchIndex().add(synthetic);

try {

await vectorIndexAddGuarded(

synthetic.id,

synthetic.sessionId,

synthetic.title + " " + (synthetic.narrative || ""),

{ kind: "synthetic", logId: synthetic.id },

);

} catch (err) {

logger.warn("Failed to index synthetic observation into vector index", {

obsId: synthetic.id,

error: err instanceof Error ? err.message : String(err),

});

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/observe.ts` around lines 241 - 246, The call to vectorIndexAddGuarded(synthetic.id, synthetic.sessionId, synthetic.title + " " + (synthetic.narrative || ""), { kind: "synthetic", logId: synthetic.id }) should be wrapped in a try/catch so embedding/indexing errors don't abort the synthetic compression flow; catch any error and emit a warning log (similar to patterns in remember.ts and compress.ts) indicating the synthetic id and error, but do not rethrow—ensure the function continues since the observation was already persisted and BM25-indexed.

coderabbitai · 2026-05-13T11:31:21Z

+        await vectorIndexAddGuarded(
+          memory.id,
+          memory.sessionIds[0] ?? "memory",
+          memory.title + " " + memory.content,
+          { kind: "memory", logId: memory.id },
+        );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap vectorIndexAddGuarded in try/catch for consistency with BM25 indexing.

The BM25 indexing at lines 107-114 uses try/catch to ensure indexing failures don't block the memory save. However, vectorIndexAddGuarded is awaited directly without error handling. If it throws, the entire mem::remember operation will fail even though the memory was already persisted to KV at line 100.

Per the PR summary: "embed failures soft-fail (index skip, upstream save still succeeds)". Wrap the vector indexing in try/catch with a warning, matching the BM25 pattern.

🛡️ Proposed fix

- await vectorIndexAddGuarded( - memory.id, - memory.sessionIds[0] ?? "memory", - memory.title + " " + memory.content, - { kind: "memory", logId: memory.id }, - ); + try { + await vectorIndexAddGuarded( + memory.id, + memory.sessionIds[0] ?? "memory", + memory.title + " " + memory.content, + { kind: "memory", logId: memory.id }, + ); + } catch (err) { + logger.warn("Failed to index saved memory into vector index", { + memId: memory.id, + error: err instanceof Error ? err.message : String(err), + }); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

await vectorIndexAddGuarded(

memory.id,

memory.sessionIds[0] ?? "memory",

memory.title + " " + memory.content,

{ kind: "memory", logId: memory.id },

);

try {

await vectorIndexAddGuarded(

memory.id,

memory.sessionIds[0] ?? "memory",

memory.title + " " + memory.content,

{ kind: "memory", logId: memory.id },

);

} catch (err) {

logger.warn("Failed to index saved memory into vector index", {

memId: memory.id,

error: err instanceof Error ? err.message : String(err),

});

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/remember.ts` around lines 115 - 120, The vector indexing call using vectorIndexAddGuarded with arguments (memory.id, memory.sessionIds[0] ?? "memory", memory.title + " " + memory.content, { kind: "memory", logId: memory.id }) should be wrapped in a try/catch so indexing failures don't abort the overall mem::remember flow; catch any error and call processLogger.warn (or similar existing logger) with a concise message including memory.id and the error, mirroring the BM25 try/catch pattern used earlier for soft-failing index operations.

coderabbitai · 2026-05-13T11:31:21Z

+        store.get(scope)!.set(key, data);
+        return data;
+      },
+      delete: async (_scope: string, _key: string): Promise<void> => {},


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Align mockKV.delete with vector-index-populate.test.ts implementation.

The delete operation is a no-op here, while test/vector-index-populate.test.ts lines 26-27 implements actual deletion. For consistency and to avoid masking bugs if migrateVectorIndex later relies on deletion, use the same implementation across test files.

♻️ Align delete implementation

- delete: async (_scope: string, _key: string): Promise<void> => {}, + delete: async (scope: string, key: string): Promise<void> => { + store.get(scope)?.delete(key); + },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

delete: async (_scope: string, _key: string): Promise<void> => {},

delete: async (scope: string, key: string): Promise<void> => {

store.get(scope)?.delete(key);

},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/vector-index-dimensions.test.ts` at line 70, Replace the no-op mockKV.delete with a real deletion that removes the key from the same in-memory store used by the mock (so it mirrors the implementation in vector-index-populate.test.ts); implement delete as an async function matching the signature (delete: async (_scope: string, _key: string): Promise<void>) that looks up the underlying store and removes the entry for the provided key (ensuring awaits/Promise resolution if the store API is async) so tests relying on migrateVectorIndex or other mutation behavior observe actual deletions.

Five reviewer findings applied. Five skipped with reason: APPLIED: 1. migrate-vector-index.ts filter now requires m.content (and rejects empty/whitespace). Without this, undefined m.content would concat to "title undefined" and poison the embed input. Memory.content is declared string but #277-era data shapes had cases where it could be missing — defensive filter. 2. migrate-vector-index.ts success expression now derives from the failed counter (success: failed === 0). Previously hardcoded true so callers couldn't rely on the boolean alone. 3. search.ts rebuildIndex now calls vectorIndex?.clear() before the repopulation loops. Symmetric to the BM25 idx.clear() that was already there. Without this, memories deleted between runs leave orphan embeddings in the vector store after every rebuild. 4. test/vector-index-dimensions.test.ts assertion tightened from toBeGreaterThan(0) to toBe(1) for both totalProcessed and vectorSize. The seeded fixture has exactly one observation; loose assertion would hide regressions where the migration produced extra/duplicate vectors. 5. test/vector-index-populate.test.ts switched to beforeEach/afterEach for the setVectorIndex / setEmbeddingProvider lifecycle. Manual cleanup in each test body was easy to drift on additions. PLUS: 6. compress.ts BM25 add now wrapped in try/catch matching the remember.ts pattern for parity. Helps cross-file symmetry even though SearchIndex.add doesn't throw on valid input. SKIPPED (with reason): - Wrapping vectorIndexAddGuarded calls in additional try/catch at remember.ts / observe.ts / compress.ts: the helper has internal try/catch (src/functions/search.ts:67-89). It NEVER throws. Adding outer try/catch is dead code. The reviewer missed the helper's contract. - vector-index-dimensions.test.ts mockKV.delete -> real deletion: the tests in that file don't exercise deletion semantics, and migrateVectorIndex itself never calls kv.delete. Touching it would add noise without effect on any assertion. - recordAudit + Promise.allSettled in remember.ts: out of scope. recordAudit is already called earlier in the function. Parallelizing cascade-update with vector-add changes failure semantics (cascade- update is currently fire-and-forget) — separate perf PR if wanted. - validateDimensions(N) assertion in vector-index-dimensions.test.ts: vectorSize is the COUNT of stored vectors, not their dimensions. Adding a validateDimensions check on the result would require exposing the new index. Out of scope for assertion tightening. 886 / 886 tests pass. Build clean.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

src/functions/search.ts (1)
111-116: 🏗️ Heavy lift

Consider batching embedding calls during rebuild for better performance.

Currently, rebuildIndex calls vectorIndexAddGuarded once per memory and observation, resulting in sequential embed requests. Since EmbeddingProvider supports embedBatch (as seen in migrate-vector-index.ts), you could collect items, batch embed them, validate dimensions per-item, and then add successful embeddings to the index. This would significantly speed up rebuilds for larger datasets while maintaining the same soft-fail guarantees.

As per coding guidelines, "Use parallel operations where possible with Promise.all for independent kv writes/reads" — the same principle applies to independent embedding operations.

Also applies to: 151-156
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/search.ts` around lines 111 - 116, rebuildIndex currently calls
vectorIndexAddGuarded for each memory/observation sequentially which is slow;
refactor rebuildIndex to collect items (id, sessionId, text, metadata) into
batches, call EmbeddingProvider.embedBatch (see migrate-vector-index.ts for
pattern), validate each returned embedding's dimensionality individually, and
then call vectorIndexAddGuarded for only the successful embeddings (you can add
those to the index in parallel with Promise.all); ensure you preserve the
existing soft-fail behavior by skipping/ logging failed embeddings and keep
per-item metadata (e.g., { kind: "memory", logId: memory.id }) when adding to
the index.
src/functions/migrate-vector-index.ts (1)
32-39: ⚡ Quick win

Validate embedding dimensions before adding to index.

Unlike vectorIndexAddGuarded (which validates embedding.length === provider.dimensions before writing), migrateVectorIndex calls newIndex.add() directly without dimension checks. If newProvider.embedBatch returns embeddings with unexpected dimensions (due to provider misconfiguration or bugs), the index will be silently corrupted.

Add per-item dimension validation consistent with the defensive pattern in vectorIndexAddGuarded.
🛡️ Proposed validation for both phases

Memory phase:
       const embeddings = await newProvider.embedBatch(texts);
       for (let i = 0; i < textMems.length; i++) {
+        if (embeddings[i].length !== newProvider.dimensions) {
+          logger.warn("migrateVectorIndex: dimension mismatch for memory", {
+            memoryId: textMems[i].id,
+            expected: newProvider.dimensions,
+            received: embeddings[i].length,
+          });
+          continue;
+        }
         newIndex.add(
           textMems[i].id,
           textMems[i].sessionIds[0] ?? "memory",
           embeddings[i],
         );
         processed++;
       }
Observation phase (apply same pattern):
         const embeddings = await newProvider.embedBatch(texts);
         for (let i = 0; i < textObs.length; i++) {
+          if (embeddings[i].length !== newProvider.dimensions) {
+            logger.warn("migrateVectorIndex: dimension mismatch for observation", {
+              obsId: textObs[i].id,
+              expected: newProvider.dimensions,
+              received: embeddings[i].length,
+            });
+            continue;
+          }
           newIndex.add(textObs[i].id, textObs[i].sessionId, embeddings[i]);
           processed++;
         }
Also applies to: 60-64
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/migrate-vector-index.ts` around lines 32 - 39, The
migrateVectorIndex function is adding embeddings directly via newIndex.add after
calling newProvider.embedBatch, which risks corrupting the index if embedding
lengths are wrong; mimic the defensive check used in vectorIndexAddGuarded by
validating each embedding's length against newProvider.dimensions before calling
newIndex.add (for both the memory phase and the observation phase around the
other add calls), and on mismatch log/error and skip or handle that item
consistently with vectorIndexAddGuarded semantics.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/migrate-vector-index.ts`:
- Around line 49-72: The current migrateVectorIndex re-embedding loop wraps all
sessions in one try/catch so a single kv.list or embed failure aborts remaining
sessions; change it to process sessions per-session (e.g., map sessions to async
tasks and use Promise.allSettled or iterate with a try/catch inside the for
loop) so each session's observation listing (KV.observations(session.id)),
embedding (newProvider.embedBatch), and index adds (newIndex.add) are handled
and failures are recorded per session; on each settled task increment processed
for successful observation adds and increment failed and collect the session id
for failed tasks, and log per-session errors rather than aborting the whole
migration.
- Around line 14-18: The parameter oldIndex on migrateVectorIndex (signature:
migrateVectorIndex(kv: StateKV, oldIndex: VectorIndex, newProvider:
EmbeddingProvider): Promise<MigrateVectorIndexResult>) is unused and confuses
the API; either remove oldIndex from the function signature and all callers (and
update any type annotations) or, if intended, document and actually use it
inside migrateVectorIndex to transform the provided VectorIndex. Ensure you
update all call sites to match the new signature and adjust imports/types
(StateKV, VectorIndex, EmbeddingProvider, MigrateVectorIndexResult) accordingly
to keep type-checking consistent.

In `@test/vector-index-populate.test.ts`:
- Line 60: The test declares a variable named vi which shadows the vitest import
(vi) and prevents use of vi.fn()/vi.spyOn(); rename the VectorIndex variable
(e.g., change the declaration "let vi: VectorIndex" to "let vectorIndex:
VectorIndex") and update every reference in this test file (setup, beforeEach,
assertions, calls to methods like populate or close) from vi -> vectorIndex so
the vitest vi import remains available for mocking and spying.

---

Nitpick comments:
In `@src/functions/migrate-vector-index.ts`:
- Around line 32-39: The migrateVectorIndex function is adding embeddings
directly via newIndex.add after calling newProvider.embedBatch, which risks
corrupting the index if embedding lengths are wrong; mimic the defensive check
used in vectorIndexAddGuarded by validating each embedding's length against
newProvider.dimensions before calling newIndex.add (for both the memory phase
and the observation phase around the other add calls), and on mismatch log/error
and skip or handle that item consistently with vectorIndexAddGuarded semantics.

In `@src/functions/search.ts`:
- Around line 111-116: rebuildIndex currently calls vectorIndexAddGuarded for
each memory/observation sequentially which is slow; refactor rebuildIndex to
collect items (id, sessionId, text, metadata) into batches, call
EmbeddingProvider.embedBatch (see migrate-vector-index.ts for pattern), validate
each returned embedding's dimensionality individually, and then call
vectorIndexAddGuarded for only the successful embeddings (you can add those to
the index in parallel with Promise.all); ensure you preserve the existing
soft-fail behavior by skipping/ logging failed embeddings and keep per-item
metadata (e.g., { kind: "memory", logId: memory.id }) when adding to the index.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8508e1bd-e6c1-4d86-b22a-76f95bfe68b5

📥 Commits

Reviewing files that changed from the base of the PR and between 6fd7906 and d2098a5.

📒 Files selected for processing (5)

src/functions/compress.ts
src/functions/migrate-vector-index.ts
src/functions/search.ts
test/vector-index-dimensions.test.ts
test/vector-index-populate.test.ts

…x param Four reviewer findings applied. One skipped with reason: APPLIED: 1. Per-session isolation in migrateVectorIndex observations phase. The previous shape wrapped the entire sessions for-loop in one try/catch — a single bad session (kv.list throws, embedBatch rejects, etc.) aborted every subsequent session and silently truncated the migration. The loop now has per-session try/catch; failures increment `failed`, append the session id to a new `failedSessions` array on the result, and the loop continues. The kv.list<sessions> call itself is now guarded separately — if THAT fails the whole migration aborts (no sessions to iterate), but with `failed` recorded and `failedSessions` empty so the caller can distinguish. 2. Drop `oldIndex` parameter from migrateVectorIndex signature. It was dead — no read path inside the function. Three test call sites updated to match the new signature migrateVectorIndex(kv, newProvider). Confused the API contract; if we ever DO need a previous-index argument it can come back with a real use case. 3. Dim guard on every embedding result inside migrateVectorIndex, in both the memories phase and the per-session observations phase. Extracted into a local isValidEmbedding() helper that mirrors the shape of search.ts::vectorIndexAddGuarded's guard: log structured warn + skip on mismatch. Without this, a misconfigured provider returning the wrong-length Float32Array would corrupt the rebuilt index, defeating the #248 persistence-load guard. 4. Rename `vi` -> `vectorIndex` in test/vector-index-populate.test.ts so the let binding doesn't shadow the vitest `vi` import. Practical impact: future tests in that file can use `vi.fn()` / `vi.spyOn()` without renaming this binding first. Cosmetic but cheap. SKIPPED (with reason): - Use embedBatch in rebuildIndex (search.ts): perf optimization, not correctness. Already called out as deferred in the previous review's commit message. Cold-restart speed matters but the fix here is bigger refactor (collect-into-batches with metadata, validate each, add only successful) and worth its own PR. Will pick up when we also do the write-behind queue for live-write batching. Result envelope: Added `failedSessions: string[]` to MigrateVectorIndexResult so the caller can decide whether to retry per-session or just abort. Existing fields (success, totalProcessed, failed, vectorSize) unchanged in shape. 886 / 886 tests pass. Build clean.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/functions/migrate-vector-index.ts (1)
89-98: 💤 Low value

Consider distinguishing list-failure from per-session failure in the result.

When kv.list(KV.sessions) rejects, the early return sets failed = 1 and leaves failedSessions empty, which is technically consistent (no specific session was known to fail) but makes the partial result indistinguishable from a single mismatched embedding earlier in the memories phase. A dedicated flag or non-empty diagnostic in failedSessions (e.g., "<list-error>") would make recovery decisions in the caller more deterministic. Optional polish — current behavior is safe.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/migrate-vector-index.ts` around lines 89 - 98, The
kv.list(KV.sessions) failure currently returns failed = 1 with an empty
failedSessions array, making it indistinguishable from a per-session failure;
update the migrateVectorIndex error path (around sessions = await kv.list<{ id:
string }>(KV.sessions)) to record a distinct diagnostic before returning — for
example, push a marker string like "<list-error>" into failedSessions (and keep
failed++), or alternatively add a boolean flag (e.g., listFailed: true) to the
returned object; ensure you update the return to include this diagnostic so
callers can tell apart a listing failure from individual session failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/migrate-vector-index.ts`:
- Around line 76-81: The catch block inside migrateVectorIndex currently
increments failed by 1 even when operations like kv.list or
newProvider.embedBatch abort an entire batch; change the error handling in that
catch to add the actual number of affected memories (e.g., use textMems.length
when textMems is in scope or compute the batch size from the variable holding
the listed/processed items) to failed instead of 1, and ensure any per-session
bookkeeping that relies on success/failed counts (the same counters used
elsewhere in migrateVectorIndex) is updated consistently so callers see the
correct number of missed embeddings.

---

Nitpick comments:
In `@src/functions/migrate-vector-index.ts`:
- Around line 89-98: The kv.list(KV.sessions) failure currently returns failed =
1 with an empty failedSessions array, making it indistinguishable from a
per-session failure; update the migrateVectorIndex error path (around sessions =
await kv.list<{ id: string }>(KV.sessions)) to record a distinct diagnostic
before returning — for example, push a marker string like "<list-error>" into
failedSessions (and keep failed++), or alternatively add a boolean flag (e.g.,
listFailed: true) to the returned object; ensure you update the return to
include this diagnostic so callers can tell apart a listing failure from
individual session failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74110c0e-b61b-4d05-88f6-e94d97570ae3

📥 Commits

Reviewing files that changed from the base of the PR and between d2098a5 and 7985cfa.

📒 Files selected for processing (3)

src/functions/migrate-vector-index.ts
test/vector-index-dimensions.test.ts
test/vector-index-populate.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

test/vector-index-dimensions.test.ts
test/vector-index-populate.test.ts

…nguish list-fail Two new reviewer findings on migrate-vector-index. Six earlier findings re-flagged in the same review block had already been addressed in prior commits on this branch (verified inline): - compress.ts BM25 try/catch (commit d2098a5) - migrate filter requires m.content (commit d2098a5) - migrate success: failed === 0 (commit d2098a5) - rebuildIndex vector clear (commit d2098a5) - observe.ts / remember.ts outer try/catch — SKIPPED with reason (vectorIndexAddGuarded has internal try/catch, NEVER throws) - mockKV.delete no-op — SKIPPED with reason (no test exercises it) NEW FINDINGS APPLIED: 1. Catch block in memories phase counted +1 for the entire batch regardless of batch size. If kv.list returned 250 memories and embedBatch threw, `failed` reflected 1 missed embedding when reality is 250. textMems is now declared outside the try so the catch can read its length. If kv.list itself threw before textMems was populated (length 0), we fall back to +1 (something failed but the batch size is unknown). If embedBatch threw, `failed += textMems.length` so the counter is honest. 2. kv.list<sessions> failure path returned `failed: 1, failedSessions: []`. Indistinguishable from "0 sessions, all OK" from the caller's perspective — they can't tell whether the enumeration itself blew up or simply found no sessions. The path now pushes a marker string "<sessions-list-failed>" into failedSessions before returning. Marker keeps the existing schema (failedSessions: string[]) — no boolean flag added — and gives callers a sentinel they can grep for. Per-session entries continue to use real session ids, so the marker is unambiguous. 886 / 886 tests pass. Build clean.

The two regression tests added in this branch for the BM25 non-ASCII tokenizer fix used Russian-language fixtures ("Проверка памяти", "Тестируем поиск по кириллице", etc). Switching to Greek-language equivalents ("Προβολή μνήμης", "Δοκιμάζουμε αναζήτηση σε ελληνικά", etc). Reasoning: the test fixtures need to exercise the same code path — non-ASCII Unicode letters that the old ASCII-only \w regex stripped. Greek code points (U+0370..U+03FF) hit \p{L} identically to Cyrillic (U+0400..U+04FF), so the regression coverage is preserved. Greek is also less likely to be misread as politically loaded content in a public repo. Test ids renamed from obs_cyrillic to obs_greek; test names from "indexes and finds Cyrillic text" to "indexes and finds non-ASCII (Greek) text" (and the mixed-script variant likewise). Branch name stays as-is to keep PR continuity but the shipped test fixtures are language-neutral going forward. 11/11 search-index tests pass with the swap. Full suite 886/886.

…ardening Three landed PRs since v0.9.11: - #327 (#295) — BM25 tokenizer now accepts non-ASCII (Greek, accented Latin, Hebrew, Arabic, Cyrillic), VectorIndex.add now actually called at runtime via vectorIndexAddGuarded helper with dim guard + input clip, migrateVectorIndex for dim migrations. - #326 (#277) — RetentionScore type no longer declares source twice; JSDoc back-compat note no longer shadowed. - #335 (#323) — viewer drops Google Fonts <link> (CSP-blocked), loadDashboard now surfaces load errors inline instead of sticking on "Loading dashboard…". Bumping 0.9.11 -> 0.9.12 across the 9 standard files: - package.json - packages/mcp/package.json - plugin/.claude-plugin/plugin.json - plugin/.codex-plugin/plugin.json - src/version.ts - src/types.ts (ExportData.version literal) - src/functions/export-import.ts (supportedVersions) - test/export-import.test.ts (round-trip expectation) - CHANGELOG.md (new 0.9.12 entry) 886 / 886 tests pass. Build clean.

…ardening (#337) Three landed PRs since v0.9.11: - #327 (#295) — BM25 tokenizer now accepts non-ASCII (Greek, accented Latin, Hebrew, Arabic, Cyrillic), VectorIndex.add now actually called at runtime via vectorIndexAddGuarded helper with dim guard + input clip, migrateVectorIndex for dim migrations. - #326 (#277) — RetentionScore type no longer declares source twice; JSDoc back-compat note no longer shadowed. - #335 (#323) — viewer drops Google Fonts <link> (CSP-blocked), loadDashboard now surfaces load errors inline instead of sticking on "Loading dashboard…". Bumping 0.9.11 -> 0.9.12 across the 9 standard files: - package.json - packages/mcp/package.json - plugin/.claude-plugin/plugin.json - plugin/.codex-plugin/plugin.json - src/version.ts - src/types.ts (ExportData.version literal) - src/functions/export-import.ts (supportedVersions) - test/export-import.test.ts (round-trip expectation) - CHANGELOG.md (new 0.9.12 entry) 886 / 886 tests pass. Build clean.

nik1t7n and others added 6 commits May 13, 2026 12:02

fix #2: BM25 tokenizer now preserves Cyrillic/Unicode via \p{L}\p{N} …

c66ad22

…regex Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

fix #1: add getVectorIndex/setVectorIndex + vector embedding in rebui…

52d8279

…ldIndex Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

fix #1+#3: vectorIndex.add in remember/observe/compress + setVectorIn…

acb0710

…dex in index.ts Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

chore: remove internal research spec from PR branch

b6b3546

Signed-off-by: Nikita Nosov <20nik.nosov21@gmail.com>

rohitg00 mentioned this pull request May 13, 2026

fix: BM25 tokenizer strips non-ASCII text; vector index never populated on write #296

Closed

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

vercel Bot deployed to Preview May 13, 2026 11:51 View deployment

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/functions/migrate-vector-index.ts

Comment thread src/functions/migrate-vector-index.ts Outdated

Comment thread test/vector-index-populate.test.ts Outdated

vercel Bot deployed to Preview May 13, 2026 13:10 View deployment

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/functions/migrate-vector-index.ts

vercel Bot deployed to Preview May 13, 2026 13:41 View deployment

vercel Bot deployed to Preview May 13, 2026 14:23 View deployment

rohitg00 merged commit 24f6c5f into main May 13, 2026
5 checks passed

rohitg00 deleted the fix/bm25-cyrillic-vector-index branch May 13, 2026 14:27

rohitg00 mentioned this pull request May 13, 2026

chore(release): v0.9.12 — BM25 unicode + vector live-write + viewer hardening #337

Merged

5 tasks

This was referenced May 13, 2026

CJK tokenizer for BM25 (Chinese / Japanese / Korean segmentation) #344

Open

feat(search): CJK tokenizer for BM25 #362

Open

-      delete: async (_scope: string, _key: string): Promise<void> => {},
+      delete: async (scope: string, key: string): Promise<void> => {
+        store.get(scope)?.delete(key);
+      },

Conversation

rohitg00 commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Bugs fixed (from #295)

Bug 1 — BM25 tokenizer strips non-ASCII (Cyrillic, Greek, CJK, Arabic, accented Latin)

Bug 2 — VectorIndex.add() never called at runtime

Improvements layered on top of #296

Out-of-scope (deferred follow-ups)

Migration

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rohitg00 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Bug 2 — `VectorIndex.add()` never called at runtime

vercel Bot commented May 13, 2026 •

edited

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading