fix: clear deleted memories from bm25 and vector indices#636
Conversation
|
@abhinav-m22 is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis PR ensures deleted memories and observations are removed from both BM25 and vector search indexes across all deletion paths, with persistence synchronization to prevent deleted entries from resurfacing on restart. ChangesSearch Index Deletion and Persistence
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/functions/governance.ts (1)
22-31: ⚡ Quick winParallelize per-id delete work in
mem::governance-delete.This loop does independent KV/index operations serially, which adds avoidable latency for larger
memoryIdspayloads. ConsiderPromise.allSettled(or batched chunks) per request, then computedeletedfrom fulfilled results.As per coding guidelines, "Use parallel operations with
Promise.all()for independent kv writes/reads".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/governance.ts` around lines 22 - 31, The serial per-id delete loop over data.memoryIds (using kv.get, kv.delete, deleteAccessLog, getSearchIndex().remove, vectorIndexRemove) causes unnecessary latency; refactor the delete logic in the mem::governance-delete handler to run independent per-id operations in parallel (e.g., map memoryIds to async tasks and use Promise.allSettled or chunked Promise.all for large lists), ensure each task performs the existing sequence (get, conditional delete, deleteAccessLog, remove from search and vector index), and compute the deleted count from the settled results by counting fulfilled tasks that actually deleted an item rather than incrementing a shared counter inside the serial loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/auto-forget.test.ts`:
- Around line 1-13: The test is missing a mock for the iii-sdk; add
vi.mock("iii-sdk") at the top of test/auto-forget.test.ts and return an object
exposing the mocked SDK shape used by the code under test (e.g., sdk: { trigger:
vi.fn() } and kv: { get: vi.fn(), set: vi.fn(), list: vi.fn() }) so the calls in
registerAutoForgetFunction and any code that imports iii-sdk (e.g., code paths
exercised by getSearchIndex, setIndexPersistence, memoryToObservation) have the
expected mocked methods available.
In `@test/retention.test.ts`:
- Around line 1-6: Add a module mock for "iii-sdk" in the test suite by calling
vi.mock("iii-sdk") at the top of test/retention.test.ts and provide mocked
implementations for sdk.trigger and the kv API (kv.get, kv.set, kv.list) so
tests use the module mock instead of the local SDK shim; specifically, export
from the mock an object with trigger (a vi.fn()) and kv (an object with get,
set, list as vi.fn()s or simple in-memory implementations) so tests that call
getSearchIndex, setIndexPersistence, or memoryToObservation interact with the
mocked iii-sdk functions.
---
Nitpick comments:
In `@src/functions/governance.ts`:
- Around line 22-31: The serial per-id delete loop over data.memoryIds (using
kv.get, kv.delete, deleteAccessLog, getSearchIndex().remove, vectorIndexRemove)
causes unnecessary latency; refactor the delete logic in the
mem::governance-delete handler to run independent per-id operations in parallel
(e.g., map memoryIds to async tasks and use Promise.allSettled or chunked
Promise.all for large lists), ensure each task performs the existing sequence
(get, conditional delete, deleteAccessLog, remove from search and vector index),
and compute the deleted count from the settled results by counting fulfilled
tasks that actually deleted an item rather than incrementing a shared counter
inside the serial loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e2b89055-a816-4e11-8655-770358a09535
📒 Files selected for processing (12)
src/functions/auto-forget.tssrc/functions/governance.tssrc/functions/remember.tssrc/functions/retention.tssrc/functions/search.tssrc/index.tssrc/state/search-index.tstest/auto-forget.test.tstest/governance.test.tstest/remember-forget-audit.test.tstest/retention.test.tstest/search-index.test.ts
|
Regarding the review points: Verified each finding against current code: both |
|
Hi @rohitg00, could you please take a look and review the changes? |
There was a problem hiding this comment.
LGTM. Real bug, clean fix, complete test coverage.
Verified
Issue is legit. Reproed #632 against current main — deleted memories still occupy BM25 result slots. SearchIndex.remove() did not exist; VectorIndex.remove() existed but was orphaned (zero callers).
Fix is minimal and complete. Five delete paths fixed: governance.delete-memory, governance.bulk-delete, forget, auto-forget (TTL + low-value), retention.evict. All now call getSearchIndex().remove(id) + vectorIndexRemove(id).
SearchIndex.remove() tears down every actual data structure:
entriesmap — deleteddocTermCountsmap — deletedinvertedIndexposting lists — id pulled from every term, empty terms removedtotalDocLength—entry.termCountsubtracted (clamped at 0)sortedTermscache — invalidated
(PR description mentions a "prefix cache" — there isn't one in the codebase, just inline prefix-match scoring. Doesn't matter, no field was missed.)
Persistence flush is well-designed.
flushIndexSave()is a sync variant of the debouncedscheduleSave— appropriate for delete paths since adds are chatty but deletes are infrequentsetIndexPersistence(p)wiring keeps unit tests independent (no-op until wired)IndexPersistence.save()already catches its own errors vialogFailure()— callers can't crash on flush failure, which is the right tradeoff because the KV delete already committed before flush is invoked
Test coverage is thorough. 5 new test blocks across 5 files covering every delete path, dryRun no-op, persistence flush assertion via mock, plus a dedicated SearchIndex.remove unit-level test for idempotency + term cleanup.
Verified locally
git fetch origin pull/636/head:pr-636 && git checkout pr-636npm test— 1114/1114 passnpm test -- test/auto-forget.test.ts test/governance.test.ts test/remember-forget-audit.test.ts test/retention.test.ts test/search-index.test.ts— 59/59 passnpm run build— 21 files, 2432 KB, clean
Minor (non-blocking)
flushIndexSave() is not internally serialized — two concurrent delete-handler invocations could race two save() calls. Both write the same serialized form so the worst case is one redundant KV write of a slightly stale snapshot. Not a correctness issue, and serializing it would add complexity for negligible win on a delete-shaped workload. Leaving as-is is the right call.
Ship it.
Deleted memories used to stay in the BM25 and vector indices, so they kept occupying result slots and pushed live memories out of limit capped searches. This PR closes the leak across every delete path and makes the cleanup survive a hard process exit.
Summary by CodeRabbit
Bug Fixes
Tests