fix: comprehensive security and reliability audit#6
Conversation
Critical: - Add auth to health, sessions, observations, MCP tools list endpoints - Remove ~/.claude from migration ALLOWED_DIRS (path traversal risk) - Add import payload size limits (10K sessions, 50K memories, 5K obs/session) - Fix index persistence: loaded indices now restore into active singletons High: - Cap expandIds to 20, BFS traversal to 500 nodes / 5 hops - Fetch relations once before BFS loop (was O(N²) KV reads) - Add observe input validation (sessionId, hookType, timestamp) - Sanitize provider error messages in compress and migrate - Tighten CSP: remove unsafe-inline Medium/Low: - Cap auto-forget contradiction comparison to 1000 memories - Add secret patterns for Anthropic, GitHub PAT, Gemini keys - Validate files/concepts arrays in remember - Wrap deserialize() in try/catch for corrupt data resilience - Fix parseInt NaN fallback in config - Remove dead code (ObservationQueue)
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds input validation, authentication checks, and operational caps across multiple functions and API endpoints; introduces tolerant index deserialization and restore helpers; tightens secret detection and CSP; standardizes some error messages; removes an unused queue; and adjusts configuration parsing to use safe fallbacks. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as "api::health / api::sessions / api::observations"
participant Auth as "checkAuth"
participant KV as "KV / Index Store"
participant Service as "Health Logic / Index Restore"
Client->>API: HTTP request (includes auth header)
API->>Auth: checkAuth(req, secret)
alt auth fails
Auth-->>API: unauthorized
API-->>Client: 401 Unauthorized
else auth succeeds
Auth-->>API: authorized
API->>KV: fetch persisted index data
KV-->>API: persisted BM25/vector payloads
API->>Service: restore indices if present
Service-->>API: restored sizes / health payload
API-->>Client: 200 OK with health/session/observation data
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
src/config.ts (1)
74-79: Good NaN fallback pattern, but note that explicit0values are also overridden.The
|| defaultpattern correctly handles NaN from invalid input. However, if someone intentionally setsTOKEN_BUDGET=0orMAX_OBS_PER_SESSION=0, the fallback would override their intent since0is falsy.For ports (3111/3112), this is fine—port 0 isn't a valid binding target. For
tokenBudgetandmaxObservationsPerSession, consider whether 0 is a valid value. If not, this is correct. If 0 should be allowed, you could useNumber.isNaN()instead:♻️ Optional: Allow explicit zero values
- tokenBudget: parseInt(env["TOKEN_BUDGET"] || "2000", 10) || 2000, + tokenBudget: (() => { + const v = parseInt(env["TOKEN_BUDGET"] || "2000", 10); + return Number.isNaN(v) ? 2000 : v; + })(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config.ts` around lines 74 - 79, The current use of the `...parseInt(...) || default` pattern for tokenBudget and maxObservationsPerSession unintentionally overrides explicit zero values; update the parsing for `tokenBudget` and `maxObservationsPerSession` to first parse the env var into a number and then use `Number.isNaN(parsed) ? default : parsed` so only NaN falls back to the default (leave restPort/streamsPort as-is since port 0 is not valid). Locate and change the expressions that set `tokenBudget` and `maxObservationsPerSession` in the config (the `tokenBudget` and `maxObservationsPerSession` property initializers) to this NaN-check pattern.src/functions/auto-forget.ts (1)
55-57: Good performance cap, but ordering may not guarantee "latest" memories.The slice limits O(n²) comparisons to ~500K operations, which is a solid performance safeguard. However,
kv.list()doesn't guarantee ordering, so this captures the first 1000 filtered memories—not necessarily the 1000 most recent bycreatedAt.If the intent is truly to compare the latest memories, consider sorting by
createdAtdescending before slicing:♻️ Optional: Sort by creation date
const latestMemories = memories .filter((m) => m.isLatest !== false && !deletedIds.has(m.id)) + .sort((a, b) => new Date(b.createdAt).getTime() - new Date(a.createdAt).getTime()) .slice(0, 1000);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/auto-forget.ts` around lines 55 - 57, The current latestMemories selection uses memories.filter(...).slice(0, 1000) but kv.list() is unordered so slice may not return the 1000 most recent; update the pipeline that builds latestMemories to sort the filtered memories by their createdAt (descending) before slicing so you truly keep the newest items (use a comparator on m.createdAt, handling missing/ISO vs numeric timestamps consistently) and still respect the deletedIds filter; adjust the code that constructs latestMemories (referencing memories, latestMemories, createdAt, deletedIds, and kv.list()) to perform this sort-then-slice.src/functions/smart-search.ts (1)
25-48: Good expansion limit, consider signaling truncation to callers.Capping to 20 IDs is a reasonable DoS safeguard. However, if a caller provides more than 20 IDs, they receive results for only the first 20 without any indication that truncation occurred. Consider adding a flag or count to the response:
♻️ Optional: Signal truncation
ctx.logger.info("Smart search expanded", { requested: ids.length, found: expanded.length, }); - return { mode: "expanded", results: expanded }; + return { + mode: "expanded", + results: expanded, + truncated: data.expandIds.length > 20, + originalCount: data.expandIds.length, + }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/smart-search.ts` around lines 25 - 48, The expansion branch truncates data.expandIds to 20 (ids) and returns expanded without signaling truncation; modify the return from the expansion branch in smart-search (the block using data.expandIds, ids, expanded and findObservation) to include a truncation indicator such as truncated: true/false and/or requestedCount/returnedCount (e.g., requested: data.expandIds.length, returned: expanded.length, truncated: data.expandIds.length > ids.length) so callers can detect when results were limited.src/state/vector-index.ts (1)
62-64: Avoid aliasing internal map state inrestoreFrom.This shares the same
Mapinstance across both indexes; future mutations in either instance can leak across unexpectedly.Proposed refactor
restoreFrom(other: VectorIndex): void { - this.vectors = (other as any).vectors; + this.vectors = new Map(other.vectors); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/vector-index.ts` around lines 62 - 64, The restoreFrom method in VectorIndex currently aliases the other.vectors Map causing shared mutable state; change restoreFrom(VectorIndex) to copy entries into a new Map for this.vectors instead of assigning the same reference (e.g., create a new Map and populate it from (other as any).vectors via Map.from/constructor or by iterating entries) so subsequent mutations on either VectorIndex do not leak across.src/triggers/api.ts (1)
42-45: Consider a separate unauthenticated liveness endpoint.Requiring auth for
/agentmemory/healthcan break default orchestrator probes. A minimal unauthenticated liveness endpoint plus authenticated diagnostics endpoint is usually safer operationally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/triggers/api.ts` around lines 42 - 45, Add a separate unauthenticated liveness handler before the global auth check: detect req.path (or req.url) === '/agentmemory/health' inside the async (req: ApiRequest): Promise<Response> => { ... } and return a simple 200/OK Response without calling checkAuth(secret); keep the existing authenticated flow (checkAuth(req, secret)) for the diagnostics endpoint(s) so sensitive info remains protected (reference checkAuth, ApiRequest, and the '/agentmemory/health' path to locate where to insert the early-return).src/state/search-index.ts (1)
120-125: Use copy semantics inrestoreFromto avoid shared mutable state.Current assignments alias internal maps between instances. Copying avoids accidental cross-instance mutation coupling.
Proposed refactor
restoreFrom(other: SearchIndex): void { - this.entries = other.entries; - this.invertedIndex = other.invertedIndex; - this.docTermCounts = other.docTermCounts; + this.entries = new Map(other.entries); + this.invertedIndex = new Map( + Array.from(other.invertedIndex.entries(), ([term, ids]) => [ + term, + new Set(ids), + ]), + ); + this.docTermCounts = new Map( + Array.from(other.docTermCounts.entries(), ([id, counts]) => [ + id, + new Map(counts), + ]), + ); this.totalDocLength = other.totalDocLength; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/search-index.ts` around lines 120 - 125, restoreFrom currently aliases internal collections causing shared mutable state; replace direct assignments with copies: create new Map instances for this.entries and this.docTermCounts (e.g., new Map(other.entries)), deep-clone invertedIndex by iterating other.invertedIndex entries and copying each nested collection (e.g., new Map(key -> new Set(values)) or cloning arrays), and keep totalDocLength as a direct primitive assignment; ensure no references to the original maps/sets/arrays are retained so instances are independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/export-import.ts`:
- Around line 88-119: Add a global observations cap check to prevent huge
imports: define a new constant (e.g., MAX_TOTAL_OBSERVATIONS) and, inside the
same validation block that uses MAX_SESSIONS / MAX_OBS_PER_SESSION, compute the
total count by summing lengths of each array in importData.observations (iterate
over Object.values(importData.observations || {})) and if total >
MAX_TOTAL_OBSERVATIONS return the same shaped error { success: false, error:
`Too many total observations (max ${MAX_TOTAL_OBSERVATIONS})` }; keep the
existing per-session check using MAX_OBS_PER_SESSION and place the global check
alongside those validations so oversized imports are rejected early.
- Around line 93-118: The quota checks currently assume importData.sessions,
importData.memories, importData.summaries, and importData.observations have the
correct shapes and can throw; before comparing lengths you must validate types:
ensure importData.sessions, importData.memories, and importData.summaries are
arrays (Array.isArray) and return a failure error if not, and ensure
importData.observations is an object/map (typeof === 'object' && not null) and
its values are arrays before iterating; update the checks around the
MAX_SESSIONS, MAX_MEMORIES, MAX_SUMMARIES, and MAX_OBS_PER_SESSION logic to
first validate the shapes and return clear errors if malformed so later loops
(that iterate observations and assume array entries) cannot throw.
In `@src/index.ts`:
- Around line 156-158: The needsRebuild condition incorrectly triggers a rebuild
based on an empty vectorIndex, but the rebuildIndex function only repopulates
the BM25 index and never touches vectorIndex. Since HybridSearch gracefully
handles missing vector embeddings by falling back to BM25-only search, remove
the vector index check from the needsRebuild condition by eliminating the part
that checks vectorIndex && vectorIndex.size === 0, leaving only the BM25 index
size check (bm25Index.size === 0) to determine if a rebuild is needed.
In `@src/triggers/api.ts`:
- Line 19: The Content-Security-Policy string literal in src/triggers/api.ts is
too strict for src/viewer/index.html because that HTML contains inline CSS and
JS; either move the inline styles and scripts from src/viewer/index.html into
external files and update the viewer to load those files (and keep
script-src/style-src 'self'), or modify the CSP string literal to allow inline
content by adding 'unsafe-inline' to both script-src and style-src directives so
the viewer's inline code will run; update the CSP string in the code (the policy
string shown) and/or relocate inline code to new external assets and reference
them from the viewer accordingly.
---
Nitpick comments:
In `@src/config.ts`:
- Around line 74-79: The current use of the `...parseInt(...) || default`
pattern for tokenBudget and maxObservationsPerSession unintentionally overrides
explicit zero values; update the parsing for `tokenBudget` and
`maxObservationsPerSession` to first parse the env var into a number and then
use `Number.isNaN(parsed) ? default : parsed` so only NaN falls back to the
default (leave restPort/streamsPort as-is since port 0 is not valid). Locate and
change the expressions that set `tokenBudget` and `maxObservationsPerSession` in
the config (the `tokenBudget` and `maxObservationsPerSession` property
initializers) to this NaN-check pattern.
In `@src/functions/auto-forget.ts`:
- Around line 55-57: The current latestMemories selection uses
memories.filter(...).slice(0, 1000) but kv.list() is unordered so slice may not
return the 1000 most recent; update the pipeline that builds latestMemories to
sort the filtered memories by their createdAt (descending) before slicing so you
truly keep the newest items (use a comparator on m.createdAt, handling
missing/ISO vs numeric timestamps consistently) and still respect the deletedIds
filter; adjust the code that constructs latestMemories (referencing memories,
latestMemories, createdAt, deletedIds, and kv.list()) to perform this
sort-then-slice.
In `@src/functions/smart-search.ts`:
- Around line 25-48: The expansion branch truncates data.expandIds to 20 (ids)
and returns expanded without signaling truncation; modify the return from the
expansion branch in smart-search (the block using data.expandIds, ids, expanded
and findObservation) to include a truncation indicator such as truncated:
true/false and/or requestedCount/returnedCount (e.g., requested:
data.expandIds.length, returned: expanded.length, truncated:
data.expandIds.length > ids.length) so callers can detect when results were
limited.
In `@src/state/search-index.ts`:
- Around line 120-125: restoreFrom currently aliases internal collections
causing shared mutable state; replace direct assignments with copies: create new
Map instances for this.entries and this.docTermCounts (e.g., new
Map(other.entries)), deep-clone invertedIndex by iterating other.invertedIndex
entries and copying each nested collection (e.g., new Map(key -> new
Set(values)) or cloning arrays), and keep totalDocLength as a direct primitive
assignment; ensure no references to the original maps/sets/arrays are retained
so instances are independent.
In `@src/state/vector-index.ts`:
- Around line 62-64: The restoreFrom method in VectorIndex currently aliases the
other.vectors Map causing shared mutable state; change restoreFrom(VectorIndex)
to copy entries into a new Map for this.vectors instead of assigning the same
reference (e.g., create a new Map and populate it from (other as any).vectors
via Map.from/constructor or by iterating entries) so subsequent mutations on
either VectorIndex do not leak across.
In `@src/triggers/api.ts`:
- Around line 42-45: Add a separate unauthenticated liveness handler before the
global auth check: detect req.path (or req.url) === '/agentmemory/health' inside
the async (req: ApiRequest): Promise<Response> => { ... } and return a simple
200/OK Response without calling checkAuth(secret); keep the existing
authenticated flow (checkAuth(req, secret)) for the diagnostics endpoint(s) so
sensitive info remains protected (reference checkAuth, ApiRequest, and the
'/agentmemory/health' path to locate where to insert the early-return).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
src/config.tssrc/functions/auto-forget.tssrc/functions/compress.tssrc/functions/export-import.tssrc/functions/migrate.tssrc/functions/observe.tssrc/functions/privacy.tssrc/functions/relations.tssrc/functions/remember.tssrc/functions/smart-search.tssrc/health/recovery.tssrc/index.tssrc/mcp/server.tssrc/state/search-index.tssrc/state/vector-index.tssrc/triggers/api.ts
💤 Files with no reviewable changes (1)
- src/health/recovery.ts
- Deep copy in restoreFrom() to prevent shared mutable state - Shape validation for import payload (Array.isArray checks) - Global observation cap (500K) on import - safeParseInt() using Number.isNaN() to preserve explicit zero - Unauthenticated /agentmemory/livez liveness endpoint - CSP reverted to unsafe-inline for viewer inline code - Sort memories by createdAt before slicing in auto-forget - Truncation signal in smart-search expanded results - Removed false vector index trigger from needsRebuild
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/state/vector-index.ts (1)
81-95: Avoid dropping the entire vector index for a single bad row.Current outer try/catch returns an empty index on first malformed entry. Consider per-entry validation/guarding so valid vectors are still restored.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/vector-index.ts` around lines 81 - 95, The current deserialization catches any error and returns a fresh VectorIndex, losing all good rows; change deserialization to keep the outer try for JSON.parse but validate and protect each row individually: after parsing into data, ensure Array.isArray(data) then iterate over entries from data and for each [obsId, entry] perform type checks (obsId is string, entry.embedding and entry.sessionId exist and are strings) and wrap the per-entry conversion (base64ToFloat32(entry.embedding) and idx.vectors.set(obsId, ...)) in a try/catch that logs or ignores that single bad row and continues, so VectorIndex and idx.vectors retain all valid vectors while skipping malformed entries (keep VectorIndex, idx.vectors, base64ToFloat32, and the JSON.parse usage as reference points).src/state/search-index.ts (1)
120-135: CloneIndexEntryobjects as well inrestoreFrom.Line 121 clones only the map shell;
IndexEntryobjects remain shared references between instances. That can leak mutable state across restored indexes.Suggested patch
restoreFrom(other: SearchIndex): void { - this.entries = new Map(other.entries); + this.entries = new Map( + Array.from(other.entries.entries()).map(([k, v]) => [k, { ...v }]), + ); this.invertedIndex = new Map( Array.from(other.invertedIndex.entries()).map(([k, v]) => [ k, new Set(v), ]),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/search-index.ts` around lines 120 - 135, restoreFrom currently shallow-copies the entries map so IndexEntry instances are shared; update restoreFrom to deep-clone each IndexEntry when building this.entries (e.g., iterate other.entries and create new IndexEntry(...) or call a clone method on IndexEntry for each [key, entry] pair), and ensure any nested mutable parts of an IndexEntry are also cloned; keep the existing cloning logic for invertedIndex, docTermCounts and totalDocLength but replace the line creating this.entries = new Map(other.entries) with a Map built from cloned IndexEntry objects to avoid shared mutable state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/export-import.ts`:
- Around line 129-141: The loop over importData.observations currently only
limits array lengths and total items but not the number of observation keys
(buckets); before iterating, add a guard that counts
Object.keys(importData.observations) and rejects if it exceeds a new or existing
cap (e.g., MAX_OBS_BUCKETS), returning an error similar to the other checks;
then proceed with the existing loop (symbols to update: importData.observations,
the for loop that uses obs, totalObservations, and the MAX_OBS_PER_SESSION
constant) so the function fails fast on too many buckets and prevents
import-time DoS.
In `@src/functions/smart-search.ts`:
- Around line 45-48: The log's "returned" field is incorrect: it uses ids.length
but the handler actually returns expanded (which may be smaller); update the
logger call in the smart-search handler (the ctx.logger.info invocation that
logs "Smart search expanded") to set returned to expanded.length (or include
both ids.length and expanded.length as separate fields) so the metric reflects
the actual response size.
In `@src/state/search-index.ts`:
- Around line 158-169: Validate and sanitize data.totalDocLength before
assigning it to idx.totalDocLength: check that data.totalDocLength is a finite
number and >= 0, coerce or clamp non-integer values to a safe integer (e.g.,
floor or Math.round) and fallback to 0 for any invalid inputs; update the
assignment of idx.totalDocLength (in the deserialization code that reads
data.totalDocLength) to use the validated/clamped value so NaN or negative
values cannot propagate into ranking math.
In `@src/state/vector-index.ts`:
- Around line 62-64: The restoreFrom method currently performs a shallow copy of
this.vectors from the other VectorIndex, causing shared entry objects and
Float32Array embeddings; change restoreFrom(VectorIndex) to iterate over (other
as any).vectors and construct a new Map where each value is a deep-copy of the
entry (clone object fields and create a new Float32Array for the embedding,
e.g., new Float32Array(old.embedding) or old.embedding.slice()) so entries and
embedding buffers are independent; ensure the method assigns this.vectors to
that newly built Map rather than copying references.
---
Nitpick comments:
In `@src/state/search-index.ts`:
- Around line 120-135: restoreFrom currently shallow-copies the entries map so
IndexEntry instances are shared; update restoreFrom to deep-clone each
IndexEntry when building this.entries (e.g., iterate other.entries and create
new IndexEntry(...) or call a clone method on IndexEntry for each [key, entry]
pair), and ensure any nested mutable parts of an IndexEntry are also cloned;
keep the existing cloning logic for invertedIndex, docTermCounts and
totalDocLength but replace the line creating this.entries = new
Map(other.entries) with a Map built from cloned IndexEntry objects to avoid
shared mutable state.
In `@src/state/vector-index.ts`:
- Around line 81-95: The current deserialization catches any error and returns a
fresh VectorIndex, losing all good rows; change deserialization to keep the
outer try for JSON.parse but validate and protect each row individually: after
parsing into data, ensure Array.isArray(data) then iterate over entries from
data and for each [obsId, entry] perform type checks (obsId is string,
entry.embedding and entry.sessionId exist and are strings) and wrap the
per-entry conversion (base64ToFloat32(entry.embedding) and
idx.vectors.set(obsId, ...)) in a try/catch that logs or ignores that single bad
row and continues, so VectorIndex and idx.vectors retain all valid vectors while
skipping malformed entries (keep VectorIndex, idx.vectors, base64ToFloat32, and
the JSON.parse usage as reference points).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/config.tssrc/functions/auto-forget.tssrc/functions/export-import.tssrc/functions/smart-search.tssrc/index.tssrc/state/search-index.tssrc/state/vector-index.tssrc/triggers/api.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/config.ts
- src/functions/auto-forget.ts
- Add MAX_OBS_BUCKETS (10K) cap on import observation keys - Fix smart-search log: returned → expanded.length (actual response size) - Validate totalDocLength in deserialize (finite, non-negative, floor) - Deep copy Float32Array in VectorIndex.restoreFrom() - Clone IndexEntry objects in SearchIndex.restoreFrom() - Per-row resilient VectorIndex.deserialize() (skip bad rows, keep good)
Summary
Fixes all 17 findings from a comprehensive security, code quality, and reliability audit of agentmemory v0.3.0.
Critical (5)
checkAuthtoapi::health,api::sessions,api::observations, andmcp::tools::list~/.claudefrom migrationALLOWED_DIRS(exposes plugin credentials)load()created new instances but never restored into active singletons; addedrestoreFrom()to SearchIndex and VectorIndexHigh (5)
'unsafe-inline'with'self'for scripts/stylesMedium/Low (7)
sk-ant-*), GitHub PAT (github_pat_*), Gemini (AIza*)deserialize()in try/catch for SearchIndex and VectorIndexObservationQueueclassTest plan
npm run build— clean (130KB bundle)npx tsc --noEmit— zero type errorsnpx vitest run— 144/144 tests pass (14 integration skipped, requires running server)Summary by CodeRabbit
New Features
Security
Bug Fixes
Performance