Skip to content

fix: comprehensive security and reliability audit#6

Merged
rohitg00 merged 3 commits intomainfrom
audit-fixes
Feb 27, 2026
Merged

fix: comprehensive security and reliability audit#6
rohitg00 merged 3 commits intomainfrom
audit-fixes

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Feb 27, 2026

Summary

Fixes all 17 findings from a comprehensive security, code quality, and reliability audit of agentmemory v0.3.0.

Critical (5)

  • Auth on unprotected endpoints — Added checkAuth to api::health, api::sessions, api::observations, and mcp::tools::list
  • Path traversal — Removed ~/.claude from migration ALLOWED_DIRS (exposes plugin credentials)
  • Import DoS — Added payload size limits: 10K sessions, 50K memories, 10K summaries, 5K obs/session
  • Index persistence bugload() created new instances but never restored into active singletons; added restoreFrom() to SearchIndex and VectorIndex

High (5)

  • Unbounded expandIds — Capped to 20 in smart-search and MCP
  • BFS explosion — Capped to 500 visited nodes and 5 max hops; fetch relations once before loop (was O(N²) KV reads per BFS iteration)
  • Observe validation — Added input validation for sessionId, hookType, timestamp
  • Error leakage — Sanitized provider errors in compress and migrate (return generic messages)
  • CSP hardening — Replaced 'unsafe-inline' with 'self' for scripts/styles

Medium/Low (7)

  • O(n²) contradiction cap — Limited auto-forget to 1000 latest memories
  • Secret detection — Added patterns for Anthropic (sk-ant-*), GitHub PAT (github_pat_*), Gemini (AIza*)
  • Input validation — Validate files/concepts arrays in remember
  • Corrupt data resilience — Wrapped deserialize() in try/catch for SearchIndex and VectorIndex
  • Config NaN safety — Added fallback for parseInt NaN in config
  • Dead code — Removed unused ObservationQueue class

Test plan

  • npm run build — clean (130KB bundle)
  • npx tsc --noEmit — zero type errors
  • npx vitest run — 144/144 tests pass (14 integration skipped, requires running server)

Summary by CodeRabbit

  • New Features

    • Added a liveness HTTP endpoint (/agentmemory/livez) for simple health checks.
    • Expanded import validation with strict quotas and shape checks.
  • Security

    • Enforced authentication on health, sessions, observations and related endpoints.
    • Tightened content security policy.
  • Bug Fixes

    • Better input validation and standardized error responses.
    • More robust configuration parsing with sensible defaults.
  • Performance

    • Caps on memory processing and search expansion with a truncated indicator returned.

Critical:
- Add auth to health, sessions, observations, MCP tools list endpoints
- Remove ~/.claude from migration ALLOWED_DIRS (path traversal risk)
- Add import payload size limits (10K sessions, 50K memories, 5K obs/session)
- Fix index persistence: loaded indices now restore into active singletons

High:
- Cap expandIds to 20, BFS traversal to 500 nodes / 5 hops
- Fetch relations once before BFS loop (was O(N²) KV reads)
- Add observe input validation (sessionId, hookType, timestamp)
- Sanitize provider error messages in compress and migrate
- Tighten CSP: remove unsafe-inline

Medium/Low:
- Cap auto-forget contradiction comparison to 1000 memories
- Add secret patterns for Anthropic, GitHub PAT, Gemini keys
- Validate files/concepts arrays in remember
- Wrap deserialize() in try/catch for corrupt data resilience
- Fix parseInt NaN fallback in config
- Remove dead code (ObservationQueue)
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 27, 2026

Warning

Rate limit exceeded

@rohitg00 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 23 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d29590 and 8c66484.

📒 Files selected for processing (4)
  • src/functions/export-import.ts
  • src/functions/smart-search.ts
  • src/state/search-index.ts
  • src/state/vector-index.ts
📝 Walkthrough

Walkthrough

This PR adds input validation, authentication checks, and operational caps across multiple functions and API endpoints; introduces tolerant index deserialization and restore helpers; tightens secret detection and CSP; standardizes some error messages; removes an unused queue; and adjusts configuration parsing to use safe fallbacks.

Changes

Cohort / File(s) Summary
Configuration & Startup
src/config.ts, src/index.ts
Added safe integer parsing with fallbacks for env values; index restoration now restores BM25 and vector indices from persisted data and decides rebuilds based on actual index size.
Input Validation & Import
src/functions/observe.ts, src/functions/remember.ts, src/functions/export-import.ts
Added early-return validations: observe requires string sessionId/hookType/timestamp; remember requires non-empty string content and arrays for files/concepts; import validates shapes and enforces quotas (sessions, memories, summaries, per-session and total observations).
Processing Limits & Traversal
src/functions/auto-forget.ts, src/functions/relations.ts, src/functions/smart-search.ts
Capped workloads: latestMemories limited to 1000 items; relation traversal limited to maxHops ≤ 5 and MAX_VISITED = 500 with preloaded relations; smart-search expand limited to 20 IDs and reports truncation.
Error Standardization & Migration
src/functions/compress.ts, src/functions/migrate.ts
Standardized error return strings (e.g., compression_failed, Migration failed) in catch paths, replacing raw error messages.
Privacy & Secrets
src/functions/privacy.ts
Added secret-detection regexes for Anthropic, GitHub PATs, and Google API keys; minor formatting/consistency updates in privacy registration.
State Indices
src/state/search-index.ts, src/state/vector-index.ts
Added restoreFrom(other) methods; made deserialize() tolerant (try/catch, validation) and return empty indices on failure.
API & Auth
src/triggers/api.ts, src/mcp/server.ts
Added authentication checks to multiple endpoints (health, sessions, observations, mcp::tools::list/call), changed handler signatures to accept req: ApiRequest, added new liveness route /agentmemory/livez, and tightened VIEWER_CSP.
Cleanup
src/health/recovery.ts
Removed ObservationQueue export and associated queue implementation.
Minor
src/functions/export-import.ts, src/functions/auto-forget.ts
Pre-flight import validation and deterministic recent-memory sorting (by createdAt) with a slice cap.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API as "api::health / api::sessions / api::observations"
  participant Auth as "checkAuth"
  participant KV as "KV / Index Store"
  participant Service as "Health Logic / Index Restore"

  Client->>API: HTTP request (includes auth header)
  API->>Auth: checkAuth(req, secret)
  alt auth fails
    Auth-->>API: unauthorized
    API-->>Client: 401 Unauthorized
  else auth succeeds
    Auth-->>API: authorized
    API->>KV: fetch persisted index data
    KV-->>API: persisted BM25/vector payloads
    API->>Service: restore indices if present
    Service-->>API: restored sizes / health payload
    API-->>Client: 200 OK with health/session/observation data
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I hopped through code with careful paws,
Added checks and capped the cause,
Restored lost indices from their beds,
Sniffed out secrets, kept them shed,
A tidy burrow, safe and small—hop, deploy, applause! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: comprehensive security and reliability audit' accurately summarizes the main purpose of the PR, which addresses 17 findings from a security and reliability audit across authentication, input validation, error handling, and performance optimizations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch audit-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
src/config.ts (1)

74-79: Good NaN fallback pattern, but note that explicit 0 values are also overridden.

The || default pattern correctly handles NaN from invalid input. However, if someone intentionally sets TOKEN_BUDGET=0 or MAX_OBS_PER_SESSION=0, the fallback would override their intent since 0 is falsy.

For ports (3111/3112), this is fine—port 0 isn't a valid binding target. For tokenBudget and maxObservationsPerSession, consider whether 0 is a valid value. If not, this is correct. If 0 should be allowed, you could use Number.isNaN() instead:

♻️ Optional: Allow explicit zero values
-    tokenBudget: parseInt(env["TOKEN_BUDGET"] || "2000", 10) || 2000,
+    tokenBudget: (() => {
+      const v = parseInt(env["TOKEN_BUDGET"] || "2000", 10);
+      return Number.isNaN(v) ? 2000 : v;
+    })(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.ts` around lines 74 - 79, The current use of the `...parseInt(...)
|| default` pattern for tokenBudget and maxObservationsPerSession
unintentionally overrides explicit zero values; update the parsing for
`tokenBudget` and `maxObservationsPerSession` to first parse the env var into a
number and then use `Number.isNaN(parsed) ? default : parsed` so only NaN falls
back to the default (leave restPort/streamsPort as-is since port 0 is not
valid). Locate and change the expressions that set `tokenBudget` and
`maxObservationsPerSession` in the config (the `tokenBudget` and
`maxObservationsPerSession` property initializers) to this NaN-check pattern.
src/functions/auto-forget.ts (1)

55-57: Good performance cap, but ordering may not guarantee "latest" memories.

The slice limits O(n²) comparisons to ~500K operations, which is a solid performance safeguard. However, kv.list() doesn't guarantee ordering, so this captures the first 1000 filtered memories—not necessarily the 1000 most recent by createdAt.

If the intent is truly to compare the latest memories, consider sorting by createdAt descending before slicing:

♻️ Optional: Sort by creation date
 const latestMemories = memories
   .filter((m) => m.isLatest !== false && !deletedIds.has(m.id))
+  .sort((a, b) => new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime())
   .slice(0, 1000);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/auto-forget.ts` around lines 55 - 57, The current
latestMemories selection uses memories.filter(...).slice(0, 1000) but kv.list()
is unordered so slice may not return the 1000 most recent; update the pipeline
that builds latestMemories to sort the filtered memories by their createdAt
(descending) before slicing so you truly keep the newest items (use a comparator
on m.createdAt, handling missing/ISO vs numeric timestamps consistently) and
still respect the deletedIds filter; adjust the code that constructs
latestMemories (referencing memories, latestMemories, createdAt, deletedIds, and
kv.list()) to perform this sort-then-slice.
src/functions/smart-search.ts (1)

25-48: Good expansion limit, consider signaling truncation to callers.

Capping to 20 IDs is a reasonable DoS safeguard. However, if a caller provides more than 20 IDs, they receive results for only the first 20 without any indication that truncation occurred. Consider adding a flag or count to the response:

♻️ Optional: Signal truncation
       ctx.logger.info("Smart search expanded", {
         requested: ids.length,
         found: expanded.length,
       });
-      return { mode: "expanded", results: expanded };
+      return {
+        mode: "expanded",
+        results: expanded,
+        truncated: data.expandIds.length > 20,
+        originalCount: data.expandIds.length,
+      };
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/smart-search.ts` around lines 25 - 48, The expansion branch
truncates data.expandIds to 20 (ids) and returns expanded without signaling
truncation; modify the return from the expansion branch in smart-search (the
block using data.expandIds, ids, expanded and findObservation) to include a
truncation indicator such as truncated: true/false and/or
requestedCount/returnedCount (e.g., requested: data.expandIds.length, returned:
expanded.length, truncated: data.expandIds.length > ids.length) so callers can
detect when results were limited.
src/state/vector-index.ts (1)

62-64: Avoid aliasing internal map state in restoreFrom.

This shares the same Map instance across both indexes; future mutations in either instance can leak across unexpectedly.

Proposed refactor
   restoreFrom(other: VectorIndex): void {
-    this.vectors = (other as any).vectors;
+    this.vectors = new Map(other.vectors);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/state/vector-index.ts` around lines 62 - 64, The restoreFrom method in
VectorIndex currently aliases the other.vectors Map causing shared mutable
state; change restoreFrom(VectorIndex) to copy entries into a new Map for
this.vectors instead of assigning the same reference (e.g., create a new Map and
populate it from (other as any).vectors via Map.from/constructor or by iterating
entries) so subsequent mutations on either VectorIndex do not leak across.
src/triggers/api.ts (1)

42-45: Consider a separate unauthenticated liveness endpoint.

Requiring auth for /agentmemory/health can break default orchestrator probes. A minimal unauthenticated liveness endpoint plus authenticated diagnostics endpoint is usually safer operationally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/triggers/api.ts` around lines 42 - 45, Add a separate unauthenticated
liveness handler before the global auth check: detect req.path (or req.url) ===
'/agentmemory/health' inside the async (req: ApiRequest): Promise<Response> => {
... } and return a simple 200/OK Response without calling checkAuth(secret);
keep the existing authenticated flow (checkAuth(req, secret)) for the
diagnostics endpoint(s) so sensitive info remains protected (reference
checkAuth, ApiRequest, and the '/agentmemory/health' path to locate where to
insert the early-return).
src/state/search-index.ts (1)

120-125: Use copy semantics in restoreFrom to avoid shared mutable state.

Current assignments alias internal maps between instances. Copying avoids accidental cross-instance mutation coupling.

Proposed refactor
   restoreFrom(other: SearchIndex): void {
-    this.entries = other.entries;
-    this.invertedIndex = other.invertedIndex;
-    this.docTermCounts = other.docTermCounts;
+    this.entries = new Map(other.entries);
+    this.invertedIndex = new Map(
+      Array.from(other.invertedIndex.entries(), ([term, ids]) => [
+        term,
+        new Set(ids),
+      ]),
+    );
+    this.docTermCounts = new Map(
+      Array.from(other.docTermCounts.entries(), ([id, counts]) => [
+        id,
+        new Map(counts),
+      ]),
+    );
     this.totalDocLength = other.totalDocLength;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/state/search-index.ts` around lines 120 - 125, restoreFrom currently
aliases internal collections causing shared mutable state; replace direct
assignments with copies: create new Map instances for this.entries and
this.docTermCounts (e.g., new Map(other.entries)), deep-clone invertedIndex by
iterating other.invertedIndex entries and copying each nested collection (e.g.,
new Map(key -> new Set(values)) or cloning arrays), and keep totalDocLength as a
direct primitive assignment; ensure no references to the original
maps/sets/arrays are retained so instances are independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/functions/export-import.ts`:
- Around line 88-119: Add a global observations cap check to prevent huge
imports: define a new constant (e.g., MAX_TOTAL_OBSERVATIONS) and, inside the
same validation block that uses MAX_SESSIONS / MAX_OBS_PER_SESSION, compute the
total count by summing lengths of each array in importData.observations (iterate
over Object.values(importData.observations || {})) and if total >
MAX_TOTAL_OBSERVATIONS return the same shaped error { success: false, error:
`Too many total observations (max ${MAX_TOTAL_OBSERVATIONS})` }; keep the
existing per-session check using MAX_OBS_PER_SESSION and place the global check
alongside those validations so oversized imports are rejected early.
- Around line 93-118: The quota checks currently assume importData.sessions,
importData.memories, importData.summaries, and importData.observations have the
correct shapes and can throw; before comparing lengths you must validate types:
ensure importData.sessions, importData.memories, and importData.summaries are
arrays (Array.isArray) and return a failure error if not, and ensure
importData.observations is an object/map (typeof === 'object' && not null) and
its values are arrays before iterating; update the checks around the
MAX_SESSIONS, MAX_MEMORIES, MAX_SUMMARIES, and MAX_OBS_PER_SESSION logic to
first validate the shapes and return clear errors if malformed so later loops
(that iterate observations and assume array entries) cannot throw.

In `@src/index.ts`:
- Around line 156-158: The needsRebuild condition incorrectly triggers a rebuild
based on an empty vectorIndex, but the rebuildIndex function only repopulates
the BM25 index and never touches vectorIndex. Since HybridSearch gracefully
handles missing vector embeddings by falling back to BM25-only search, remove
the vector index check from the needsRebuild condition by eliminating the part
that checks vectorIndex && vectorIndex.size === 0, leaving only the BM25 index
size check (bm25Index.size === 0) to determine if a rebuild is needed.

In `@src/triggers/api.ts`:
- Line 19: The Content-Security-Policy string literal in src/triggers/api.ts is
too strict for src/viewer/index.html because that HTML contains inline CSS and
JS; either move the inline styles and scripts from src/viewer/index.html into
external files and update the viewer to load those files (and keep
script-src/style-src 'self'), or modify the CSP string literal to allow inline
content by adding 'unsafe-inline' to both script-src and style-src directives so
the viewer's inline code will run; update the CSP string in the code (the policy
string shown) and/or relocate inline code to new external assets and reference
them from the viewer accordingly.

---

Nitpick comments:
In `@src/config.ts`:
- Around line 74-79: The current use of the `...parseInt(...) || default`
pattern for tokenBudget and maxObservationsPerSession unintentionally overrides
explicit zero values; update the parsing for `tokenBudget` and
`maxObservationsPerSession` to first parse the env var into a number and then
use `Number.isNaN(parsed) ? default : parsed` so only NaN falls back to the
default (leave restPort/streamsPort as-is since port 0 is not valid). Locate and
change the expressions that set `tokenBudget` and `maxObservationsPerSession` in
the config (the `tokenBudget` and `maxObservationsPerSession` property
initializers) to this NaN-check pattern.

In `@src/functions/auto-forget.ts`:
- Around line 55-57: The current latestMemories selection uses
memories.filter(...).slice(0, 1000) but kv.list() is unordered so slice may not
return the 1000 most recent; update the pipeline that builds latestMemories to
sort the filtered memories by their createdAt (descending) before slicing so you
truly keep the newest items (use a comparator on m.createdAt, handling
missing/ISO vs numeric timestamps consistently) and still respect the deletedIds
filter; adjust the code that constructs latestMemories (referencing memories,
latestMemories, createdAt, deletedIds, and kv.list()) to perform this
sort-then-slice.

In `@src/functions/smart-search.ts`:
- Around line 25-48: The expansion branch truncates data.expandIds to 20 (ids)
and returns expanded without signaling truncation; modify the return from the
expansion branch in smart-search (the block using data.expandIds, ids, expanded
and findObservation) to include a truncation indicator such as truncated:
true/false and/or requestedCount/returnedCount (e.g., requested:
data.expandIds.length, returned: expanded.length, truncated:
data.expandIds.length > ids.length) so callers can detect when results were
limited.

In `@src/state/search-index.ts`:
- Around line 120-125: restoreFrom currently aliases internal collections
causing shared mutable state; replace direct assignments with copies: create new
Map instances for this.entries and this.docTermCounts (e.g., new
Map(other.entries)), deep-clone invertedIndex by iterating other.invertedIndex
entries and copying each nested collection (e.g., new Map(key -> new
Set(values)) or cloning arrays), and keep totalDocLength as a direct primitive
assignment; ensure no references to the original maps/sets/arrays are retained
so instances are independent.

In `@src/state/vector-index.ts`:
- Around line 62-64: The restoreFrom method in VectorIndex currently aliases the
other.vectors Map causing shared mutable state; change restoreFrom(VectorIndex)
to copy entries into a new Map for this.vectors instead of assigning the same
reference (e.g., create a new Map and populate it from (other as any).vectors
via Map.from/constructor or by iterating entries) so subsequent mutations on
either VectorIndex do not leak across.

In `@src/triggers/api.ts`:
- Around line 42-45: Add a separate unauthenticated liveness handler before the
global auth check: detect req.path (or req.url) === '/agentmemory/health' inside
the async (req: ApiRequest): Promise<Response> => { ... } and return a simple
200/OK Response without calling checkAuth(secret); keep the existing
authenticated flow (checkAuth(req, secret)) for the diagnostics endpoint(s) so
sensitive info remains protected (reference checkAuth, ApiRequest, and the
'/agentmemory/health' path to locate where to insert the early-return).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d400b05 and 99b9bb8.

📒 Files selected for processing (16)
  • src/config.ts
  • src/functions/auto-forget.ts
  • src/functions/compress.ts
  • src/functions/export-import.ts
  • src/functions/migrate.ts
  • src/functions/observe.ts
  • src/functions/privacy.ts
  • src/functions/relations.ts
  • src/functions/remember.ts
  • src/functions/smart-search.ts
  • src/health/recovery.ts
  • src/index.ts
  • src/mcp/server.ts
  • src/state/search-index.ts
  • src/state/vector-index.ts
  • src/triggers/api.ts
💤 Files with no reviewable changes (1)
  • src/health/recovery.ts

Comment thread src/functions/export-import.ts
Comment thread src/functions/export-import.ts Outdated
Comment thread src/index.ts Outdated
Comment thread src/triggers/api.ts Outdated
- Deep copy in restoreFrom() to prevent shared mutable state
- Shape validation for import payload (Array.isArray checks)
- Global observation cap (500K) on import
- safeParseInt() using Number.isNaN() to preserve explicit zero
- Unauthenticated /agentmemory/livez liveness endpoint
- CSP reverted to unsafe-inline for viewer inline code
- Sort memories by createdAt before slicing in auto-forget
- Truncation signal in smart-search expanded results
- Removed false vector index trigger from needsRebuild
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
src/state/vector-index.ts (1)

81-95: Avoid dropping the entire vector index for a single bad row.

Current outer try/catch returns an empty index on first malformed entry. Consider per-entry validation/guarding so valid vectors are still restored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/state/vector-index.ts` around lines 81 - 95, The current deserialization
catches any error and returns a fresh VectorIndex, losing all good rows; change
deserialization to keep the outer try for JSON.parse but validate and protect
each row individually: after parsing into data, ensure Array.isArray(data) then
iterate over entries from data and for each [obsId, entry] perform type checks
(obsId is string, entry.embedding and entry.sessionId exist and are strings) and
wrap the per-entry conversion (base64ToFloat32(entry.embedding) and
idx.vectors.set(obsId, ...)) in a try/catch that logs or ignores that single bad
row and continues, so VectorIndex and idx.vectors retain all valid vectors while
skipping malformed entries (keep VectorIndex, idx.vectors, base64ToFloat32, and
the JSON.parse usage as reference points).
src/state/search-index.ts (1)

120-135: Clone IndexEntry objects as well in restoreFrom.

Line 121 clones only the map shell; IndexEntry objects remain shared references between instances. That can leak mutable state across restored indexes.

Suggested patch
   restoreFrom(other: SearchIndex): void {
-    this.entries = new Map(other.entries);
+    this.entries = new Map(
+      Array.from(other.entries.entries()).map(([k, v]) => [k, { ...v }]),
+    );
     this.invertedIndex = new Map(
       Array.from(other.invertedIndex.entries()).map(([k, v]) => [
         k,
         new Set(v),
       ]),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/state/search-index.ts` around lines 120 - 135, restoreFrom currently
shallow-copies the entries map so IndexEntry instances are shared; update
restoreFrom to deep-clone each IndexEntry when building this.entries (e.g.,
iterate other.entries and create new IndexEntry(...) or call a clone method on
IndexEntry for each [key, entry] pair), and ensure any nested mutable parts of
an IndexEntry are also cloned; keep the existing cloning logic for
invertedIndex, docTermCounts and totalDocLength but replace the line creating
this.entries = new Map(other.entries) with a Map built from cloned IndexEntry
objects to avoid shared mutable state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/functions/export-import.ts`:
- Around line 129-141: The loop over importData.observations currently only
limits array lengths and total items but not the number of observation keys
(buckets); before iterating, add a guard that counts
Object.keys(importData.observations) and rejects if it exceeds a new or existing
cap (e.g., MAX_OBS_BUCKETS), returning an error similar to the other checks;
then proceed with the existing loop (symbols to update: importData.observations,
the for loop that uses obs, totalObservations, and the MAX_OBS_PER_SESSION
constant) so the function fails fast on too many buckets and prevents
import-time DoS.

In `@src/functions/smart-search.ts`:
- Around line 45-48: The log's "returned" field is incorrect: it uses ids.length
but the handler actually returns expanded (which may be smaller); update the
logger call in the smart-search handler (the ctx.logger.info invocation that
logs "Smart search expanded") to set returned to expanded.length (or include
both ids.length and expanded.length as separate fields) so the metric reflects
the actual response size.

In `@src/state/search-index.ts`:
- Around line 158-169: Validate and sanitize data.totalDocLength before
assigning it to idx.totalDocLength: check that data.totalDocLength is a finite
number and >= 0, coerce or clamp non-integer values to a safe integer (e.g.,
floor or Math.round) and fallback to 0 for any invalid inputs; update the
assignment of idx.totalDocLength (in the deserialization code that reads
data.totalDocLength) to use the validated/clamped value so NaN or negative
values cannot propagate into ranking math.

In `@src/state/vector-index.ts`:
- Around line 62-64: The restoreFrom method currently performs a shallow copy of
this.vectors from the other VectorIndex, causing shared entry objects and
Float32Array embeddings; change restoreFrom(VectorIndex) to iterate over (other
as any).vectors and construct a new Map where each value is a deep-copy of the
entry (clone object fields and create a new Float32Array for the embedding,
e.g., new Float32Array(old.embedding) or old.embedding.slice()) so entries and
embedding buffers are independent; ensure the method assigns this.vectors to
that newly built Map rather than copying references.

---

Nitpick comments:
In `@src/state/search-index.ts`:
- Around line 120-135: restoreFrom currently shallow-copies the entries map so
IndexEntry instances are shared; update restoreFrom to deep-clone each
IndexEntry when building this.entries (e.g., iterate other.entries and create
new IndexEntry(...) or call a clone method on IndexEntry for each [key, entry]
pair), and ensure any nested mutable parts of an IndexEntry are also cloned;
keep the existing cloning logic for invertedIndex, docTermCounts and
totalDocLength but replace the line creating this.entries = new
Map(other.entries) with a Map built from cloned IndexEntry objects to avoid
shared mutable state.

In `@src/state/vector-index.ts`:
- Around line 81-95: The current deserialization catches any error and returns a
fresh VectorIndex, losing all good rows; change deserialization to keep the
outer try for JSON.parse but validate and protect each row individually: after
parsing into data, ensure Array.isArray(data) then iterate over entries from
data and for each [obsId, entry] perform type checks (obsId is string,
entry.embedding and entry.sessionId exist and are strings) and wrap the
per-entry conversion (base64ToFloat32(entry.embedding) and
idx.vectors.set(obsId, ...)) in a try/catch that logs or ignores that single bad
row and continues, so VectorIndex and idx.vectors retain all valid vectors while
skipping malformed entries (keep VectorIndex, idx.vectors, base64ToFloat32, and
the JSON.parse usage as reference points).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99b9bb8 and 7d29590.

📒 Files selected for processing (8)
  • src/config.ts
  • src/functions/auto-forget.ts
  • src/functions/export-import.ts
  • src/functions/smart-search.ts
  • src/index.ts
  • src/state/search-index.ts
  • src/state/vector-index.ts
  • src/triggers/api.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/config.ts
  • src/functions/auto-forget.ts

Comment thread src/functions/export-import.ts
Comment thread src/functions/smart-search.ts Outdated
Comment thread src/state/search-index.ts
Comment thread src/state/vector-index.ts
- Add MAX_OBS_BUCKETS (10K) cap on import observation keys
- Fix smart-search log: returned → expanded.length (actual response size)
- Validate totalDocLength in deserialize (finite, non-negative, floor)
- Deep copy Float32Array in VectorIndex.restoreFrom()
- Clone IndexEntry objects in SearchIndex.restoreFrom()
- Per-row resilient VectorIndex.deserialize() (skip bad rows, keep good)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant