feat: v0.6.0 advanced retrieval with real-world benchmarks#76
Conversation
Add actions, frontier, leases, routines, signals, checkpoints, flow-compress, mesh, and branch-aware modules with full MCP tools, REST endpoints, and 170 new tests. Includes SSRF protection, race-condition-safe keyed mutex locking, SHA-256 fingerprinting, and fixes for 29 CodeRabbit review findings. - 9 new source files (src/functions/*) - 8 new test files (170 tests, total 386) - 10 new MCP tools, 23 new REST endpoints - 8 new KV scopes, new types for orchestration - Version bump to 0.5.0, README and viewer updated
5 new modules inspired by beads patterns but with original naming and iii-engine real-time streaming (no polling): - sentinels: event-driven condition watchers (webhook, timer, threshold, pattern, approval) that auto-unblock gated actions via SSE - sketches: ephemeral action graphs with auto-expiry, promote or discard - crystallize: LLM-powered compaction of completed action chains into compact crystal digests with key outcomes and lessons - diagnostics: self-diagnosis across 8 categories (actions, leases, sentinels, sketches, signals, sessions, memories, mesh) with auto-heal - facets: multi-dimensional tagging (dimension:value) with AND/OR queries - 5 new source files, 5 new test files (132 tests, total 518) - 9 new MCP tools (total 37), 21 new REST endpoints (total 93) - 4 new KV scopes (total 33), new types for all modules - README stats and function table updated
- actions: validate edges before persisting, set blocked status for requires deps - leases: use mem:action lock key, reject blocked actions, check expiry on release - checkpoints: validate linkedActionIds exist, check requires edges in unblock - mesh: add SSRF validation on peer registration - routines: remove invalid "failed" action status check - export-import: add v0.5.0 scope export/import (actions, sentinels, sketches, etc) - mcp/server: validate CSV inputs are strings before splitting - schema: replace runtime require with static import - README: fix stale tool/endpoint counts (28→37, 72→93)
…alidation, README counts - mesh.ts: add withKeyedLock on action writes in receive path, add IPv6 private ranges to SSRF check - routines.ts: validate DAG (duplicate orders, unknown deps), set dep actions to blocked, refresh stepStatus in routine-status - checkpoints.ts: runtime type enum validation, set linked pending actions to blocked - leases.ts: validate ttlMs is finite positive number - export-import.ts: add skip strategy checks for all v0.5.0 import blocks - mcp/server.ts: typeof guards on tags, config JSON.parse, actionIds, linkedActionIds, categories - README.md: Tools 18→37, Functions 33→50, stats line updated - test: update checkpoint test for new blocked-on-create behavior
…xport replace cleanup, MCP input guards - leases.ts: renew extends from max(now, existing expiry) instead of now; release verifies action ownership before mutation - mesh.ts: withKeyedLock on memory writes in mesh-receive; applySyncData validates id/updatedAt and locks both memory and action writes - routines.ts: stepStatus maps "blocked" to "pending" explicitly; progress includes blocked/cancelled counts; routine-freeze wrapped in withKeyedLock - export-import.ts: remove unused RoutineRun import; replace strategy clears all orchestration namespaces; skip strategy for graphNodes/graphEdges/semantic/procedural - mcp/server.ts: sentinel_trigger JSON.parse with typeof+try/catch; facet_query typeof guards on matchAll/matchAny; remove redundant requires cast; .filter(Boolean) on concepts/files/tags/requires CSV splits - README.md: clarify API table is a representative subset
…action handling, boolean normalization - mesh.ts: add redirect:"error" to both outbound fetch calls to prevent SSRF via redirect - routines.ts: create actions with status "blocked" directly when hasDeps (eliminates two-pass race); handle missing actions in routine-status as cancelled; progress.total uses run.actionIds.length - mcp/server.ts: sentinel config accepts object values directly; normalize unreadOnly/dryRun for both JSON booleans and string values - README.md: consistent bundle size (365KB) across both occurrences
…l benchmarks Search improvements: - Porter stemmer for word normalization (authentication ↔ authenticating) - 40+ coding-domain synonym groups (db ↔ database, k8s ↔ kubernetes) - Binary-search prefix matching replaces O(n) full scan - Session diversification (max 3 results per session) - Co-occurrence graph edges between all concept pairs New retrieval modules: - Sliding window inference pipeline (context enrichment at ingestion) - Adaptive query expansion (LLM-generated reformulations) - Triple-stream search (BM25 + Vector + Graph with RRF fusion) - Append-only temporal knowledge graph (versioned edges, point-in-time queries) - Graph-augmented retrieval (entity search + neighborhood expansion) - Ebbinghaus retention scoring (decay + tiered hot/warm/cold/evictable) Real-world benchmarks (240 observations, 20 labeled queries): - Quality eval: 64.1% recall@10 with Xenova embeddings (vs 55.8% grep) - Scale eval: 92-100% token savings vs built-in memory at 240-50K observations - Cross-session: 12/12 queries found vs 10/12 for 200-line MEMORY.md cap - Token measurement uses actual search results (fixed fake constant bug) - Removed old microbenchmarks (bench.ts, run-bench.ts, COMPARISON.md)
📝 WalkthroughWalkthroughThis pull request introduces version 0.6.0 with major feature expansions across memory retrieval, scoring, and orchestration. Changes include new benchmark documentation and scripts for quality/scale/embedding evaluation; new core modules for graph-based retrieval, query expansion, temporal graph management, retention scoring, and sliding-window enrichment; search-index improvements with stemming and synonyms; hybrid-search integration of graph-based retrieval; schema and type extensions for versioning and new KV entries; and updated initialization to wire all new orchestration components. Documentation updates reflect expanded tooling (37 tools, 93 endpoints) and capability metrics. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Query as Query Expansion
participant LLM as Language Model
participant Search as Hybrid Search
participant BM25 as BM25 Index
participant Vector as Vector Index
participant Graph as Graph Retrieval
participant Result as Results Merger
User->>Query: mem::expand-query (original query)
Query->>LLM: Compress & generate reformulations
LLM-->>Query: XML with reformulations & entities
Query->>Search: Multiple reformulated queries + entities
par Parallel Search Modalities
Search->>BM25: Search each reformulation
Search->>Vector: Embed & search each reformulation
Search->>Graph: searchByEntities with extracted entities
end
BM25-->>Search: BM25 results & scores
Vector-->>Search: Vector results & scores
Graph-->>Search: Graph results & context
Search->>Result: Merge & normalize 3-way scores
Result->>Result: Deduplicate by obsId, select best
Result-->>User: Triple-stream results with graphContext
sequenceDiagram
participant Observation as Observation
participant Extractor as Temporal Graph Extractor
participant LLM as Language Model
participant KV as State KV
participant Querier as Temporal Query
Observation->>Extractor: mem::temporal-graph-extract
Extractor->>LLM: Compress observations into temporal graph XML
LLM-->>Extractor: XML with entities, relationships, validity windows
Extractor->>KV: Store graphNodes (with versioning, tcommit)
Extractor->>KV: Append graphEdges & graphEdgeHistory (never overwrite)
Extractor-->>Observation: Success with node/edge counts
alt Later: Temporal Query
Querier->>KV: mem::temporal-query (entity, asOf timestamp)
KV->>KV: Resolve entity by name/alias
KV->>KV: Filter edges by tvalid/tvalidEnd against asOf
KV-->>Querier: currentState (latest valid), history (validEdges)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/functions/export-import.ts (1)
219-265:⚠️ Potential issue | 🔴 Critical
replacestill leaves stale retrieval state behind.This block clears the new action/routine/etc. scopes, but it never deletes
KV.graphNodes,KV.graphEdges,KV.semantic, orKV.procedural. A"replace"import can therefore merge old retrieval data into the restored snapshot and change search behavior after restore.🧹 Suggested fix
const existingSummaries = await kv.list<SessionSummary>(KV.summaries); for (const s of existingSummaries) { await kv.delete(KV.summaries, s.sessionId); } + for (const edge of await kv.list<GraphEdge>(KV.graphEdges).catch(() => [])) { + await kv.delete(KV.graphEdges, edge.id); + } + for (const node of await kv.list<GraphNode>(KV.graphNodes).catch(() => [])) { + await kv.delete(KV.graphNodes, node.id); + } + for (const sem of await kv.list<SemanticMemory>(KV.semantic).catch(() => [])) { + await kv.delete(KV.semantic, sem.id); + } + for (const proc of await kv.list<ProceduralMemory>(KV.procedural).catch(() => [])) { + await kv.delete(KV.procedural, proc.id); + } for (const a of await kv.list<Action>(KV.actions).catch(() => [])) { await kv.delete(KV.actions, a.id); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/export-import.ts` around lines 219 - 265, The "replace" branch currently deletes sessions, actions, routines, etc., but omits clearing retrieval-related scopes (KV.graphNodes, KV.graphEdges, KV.semantic, KV.procedural), leaving stale retrieval state; add loops like the existing ones to list and delete entries for KV.graphNodes, KV.graphEdges, KV.semantic, and KV.procedural (using the appropriate types or catch(() => []) pattern as used elsewhere) so those stores are fully cleared when strategy === "replace".
🟠 Major comments (26)
src/functions/checkpoints.ts-17-24 (1)
17-24:⚠️ Potential issue | 🟠 MajorAdd runtime input validation for checkpoint payloads.
Current checks are incomplete for untrusted input: whitespace-only names pass, non-positive
expiresInMsis accepted, andstatusis not runtime-validated topassed|failedbefore assignment.🛡️ Proposed validation hardening
- if (!data.name) { + if (!data?.name || !String(data.name).trim()) { return { success: false, error: "name is required" }; } + if ( + data.expiresInMs !== undefined && + (!Number.isFinite(data.expiresInMs) || data.expiresInMs <= 0) + ) { + return { success: false, error: "expiresInMs must be a positive number" }; + } ... - if (!data.checkpointId || !data.status) { + if (!data?.checkpointId || !data?.status) { return { success: false, error: "checkpointId and status are required", }; } + if (data.status !== "passed" && data.status !== "failed") { + return { success: false, error: "status must be 'passed' or 'failed'" }; + }Also applies to: 35-37, 83-88, 107-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/checkpoints.ts` around lines 17 - 24, The payload validation is too permissive: update the checks around data.name, data.type, data.expiresInMs, and data.status in the functions that read request payloads (look for the existing checks referencing data.name and validTypes). Specifically, trim and reject whitespace-only names (e.g., if (!data.name || !data.name.trim()) return error), keep the validTypes array (["ci","approval","deploy","external","timer"]) and validate data.type against it, ensure numeric expiresInMs (if provided) is a positive integer (> 0) before accepting, and validate data.status (if provided) is one of "passed" or "failed" before assignment; apply these same checks wherever payload validation is done in this file (the other occurrences you noted).src/functions/branch-aware.ts-149-153 (1)
149-153:⚠️ Potential issue | 🟠 MajorSession-root matching is not path-portable (Windows break).
Line 151 hardcodes
/as separator;C:\repo\subwill not matchC:\repo/. This can silently drop valid sessions on Windows.🐛 Proposed cross-platform fix
-import { resolve } from "node:path"; +import { resolve, relative, isAbsolute } from "node:path"; ... - const matching = sessions.filter((s) => { - if (s.project === projectRoot || s.cwd === projectRoot) return true; - if (s.cwd.startsWith(projectRoot + "/")) return true; - return false; - }); + const root = resolve(projectRoot); + const isWithin = (p?: string): boolean => { + if (!p) return false; + const rel = relative(root, resolve(p)); + return rel === "" || (!rel.startsWith("..") && !isAbsolute(rel)); + }; + + const matching = sessions.filter( + (s) => isWithin(s.project) || isWithin(s.cwd), + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/branch-aware.ts` around lines 149 - 153, Normalize/resolve projectRoot and each session path (s.cwd and s.project) and perform path-accurate comparisons instead of hardcoding "/" — inside the sessions.filter block update the logic using Node's path utilities: normalize/resolve both projectRoot and s.cwd/s.project then either compare equality or use path.relative(projectRoot, s.cwd) and ensure it does not start with ".." (or check that the relative path is not equal to ""/or doesn't begin with "..") to detect subpaths in a cross-platform way; refer to the sessions.filter callback, projectRoot, and s.cwd/s.project when applying the fix.src/functions/routines.ts-33-41 (1)
33-41:⚠️ Potential issue | 🟠 MajorReject self-referential and cyclic step dependencies.
This only validates that a referenced order exists. A step depending on itself, or a cycle like
1 -> 2 -> 1, is still accepted and will create permanently blocked runs. Validate the dependency graph before persisting the routine.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/routines.ts` around lines 33 - 41, The current validation only checks that each dependency exists but allows self-dependencies and cycles; update the validation in the routine save/validate logic that iterates data.steps (the block using uniqueOrders, step.dependsOn, and step.order) to (1) reject any dep that equals the step's own order (self-reference) and (2) detect cycles across all steps by building a directional graph from step.order -> each dep and running a cycle-detection/topological-sort (e.g., DFS with visited/recursion stack or Kahn's algorithm); if a cycle is found return { success: false, error: "cyclic dependency involving orders: …" } (or a similar clear message) instead of persisting. Ensure you use the existing uniqueOrders and step identifiers so the check covers all steps before save.benchmark/real-embeddings-eval.ts-170-177 (1)
170-177:⚠️ Potential issue | 🟠 Major
tokens_per_queryis using labels, not what the system actually returns.This metric is based on
relevant_count, so it depends on the benchmark annotations rather than the retrieved observations. That can materially overstate the reported token savings. Compute tokens from the top-N results actually returned for each query and then average those totals.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/real-embeddings-eval.ts` around lines 170 - 177, The tokens_per_query calculation currently uses the annotated relevant_count; instead, iterate over the actual returned items for each query in the results array and compute tokens by summing estimateTokens(JSON.stringify(item)) for those returned items (the top-N returned per query), then average those per-query totals and apply Math.round as before; update the tokens_per_query assignment to use this computed average (referencing results, estimateTokens, and the per-query returned items collection on each result object instead of relevant_count).src/functions/routines.ts-99-104 (1)
99-104:⚠️ Potential issue | 🟠 MajorDon't start runs from unfrozen routines.
frozenis part of the public routine lifecycle, butmem::routine-runignores it and will execute mutable templates. That makes it easy to launch a run from a routine that's still being edited.Suggested fix
return withKeyedLock(`mem:routine:${data.routineId}`, async () => { const routine = await kv.get<Routine>(KV.routines, data.routineId); if (!routine) { return { success: false, error: "routine not found" }; } + if (!routine.frozen) { + return { + success: false, + error: "routine must be frozen before running", + }; + } const now = new Date().toISOString();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/routines.ts` around lines 99 - 104, The routine run path is missing a check for the routine's frozen state, so withKeyedLock(`mem:routine:${data.routineId}`) and the call to kv.get<Routine>(KV.routines, data.routineId) should validate routine.frozen before proceeding; update the block in routines.ts (the mem::routine-run logic) to return { success: false, error: "routine not frozen" } (or a suitable error) if routine.frozen is false/undefined, ensuring you perform this check after loading the routine and before executing any mutable template/run logic to prevent runs from unfrozen routines.src/functions/retention.ts-82-85 (1)
82-85:⚠️ Potential issue | 🟠 MajorDeep-merge
tierThresholdsinstead of replacing the whole object.The current spread only works when callers send a complete
tierThresholdsobject. A partial override like{ tierThresholds: { cold: 0.2 } }clearshotandwarm, which breaks tiering and eviction defaults.Suggested fix
const ctx = getContext(); - const config = { ...DEFAULT_DECAY, ...data.config }; + const config: DecayConfig = { + ...DEFAULT_DECAY, + ...data.config, + tierThresholds: { + ...DEFAULT_DECAY.tierThresholds, + ...data.config?.tierThresholds, + }, + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/retention.ts` around lines 82 - 85, The config initialization currently overwrites the entire tierThresholds when partial overrides are provided; update the merge so top-level fields still spread from DEFAULT_DECAY and then deep-merge the nested tierThresholds object (i.e., create config from DEFAULT_DECAY and data.config, but for tierThresholds specifically merge DEFAULT_DECAY.tierThresholds with data.config.tierThresholds) so partial updates like { tierThresholds: { cold: 0.2 } } preserve hot/warm defaults; modify the code around the config assignment in the async handler (the variable named config, the DEFAULT_DECAY constant, and DecayConfig/tierThresholds properties) to perform this nested merge and handle missing data.config safely.src/functions/retention.ts-198-220 (1)
198-220:⚠️ Potential issue | 🟠 MajorEviction never removes semantic memories.
mem::retention-scorewrites scores for bothKV.memoriesandKV.semantic, but eviction only deletes fromKV.memories. Low-score semantic entries survive and will be recreated on the next scoring pass. Persist the backing scope with each score, or delete from both stores here.Suggested fix
let evicted = 0; for (const candidate of candidates) { try { await kv.delete(KV.memories, candidate.memoryId); + await kv.delete(KV.semantic, candidate.memoryId); await kv.delete(KV.retentionScores, candidate.memoryId); evicted++; } catch { continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/retention.ts` around lines 198 - 220, The eviction loop only removes entries from KV.memories and KV.retentionScores, leaving low-score semantic entries in KV.semantic; update the eviction logic inside the loop that iterates over candidates (variable candidates, symbol candidate.memoryId) to also delete the corresponding semantic entry (KV.semantic) for each candidate.memoryId in addition to kv.delete(KV.memories, ...) and kv.delete(KV.retentionScores, ...); alternatively, persist the backing scope when writing scores so you can delete the correct scope later, but the quickest fix is to call kv.delete(KV.semantic, candidate.memoryId) alongside the existing deletes.src/functions/flow-compress.ts-79-105 (1)
79-105:⚠️ Potential issue | 🟠 MajorValidate the parsed summary before storing a memory.
If the model misses the XML format,
parseFlowSummary()returns empty fields and this still writes a blank workflow memory. Fail fast, or fall back to the raw response, when the formatted summary is empty.Suggested fix
try { const response = await provider.summarize( FLOW_COMPRESS_SYSTEM, prompt, ); const summary = parseFlowSummary(response); + const content = formatSummary(summary); + if (!content) { + return { + success: false, + error: "compression failed: summarizer returned an invalid summary", + compressed: 0, + }; + } const memory = { id: generateId("mem"), createdAt: new Date().toISOString(), updatedAt: new Date().toISOString(), type: "workflow" as const, title: summary.goal || `Workflow: ${doneActions.length} actions`, - content: formatSummary(summary), + content, concepts: extractConcepts(doneActions),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/flow-compress.ts` around lines 79 - 105, The parsed summary from parseFlowSummary(response) must be validated before creating and storing the memory; update the flow in flow-compress.ts to check the parsed summary (e.g., summary.goal and any other expected fields returned by parseFlowSummary) and if they are empty/invalid either (a) fail fast by throwing or returning early with a logged warning, or (b) fall back to using the raw provider response (response) to populate the memory.title and memory.content (e.g., set title to a safe fallback like `Workflow: ${doneActions.length} actions` and content to the raw response text) so you never persist a blank workflow; ensure this validation happens after provider.summarize(FLOW_COMPRESS_SYSTEM, prompt) and parseFlowSummary(response) and before constructing the memory object and calling kv.set(KV.memories, memory.id, memory).benchmark/quality-eval.ts-155-168 (1)
155-168:⚠️ Potential issue | 🟠 MajorAlign token accounting with the retrieval depth you publish.
These token estimates use
retrieved_countfromsearch(..., 20), but the report later says agentmemory returns the "top 10 results". That makes the headline savings inconsistent with the benchmark configuration. Either cap token accounting at the same K you report, or update the report to reflect top-20 context.Also applies to: 222-237, 347-362, 539-542
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/quality-eval.ts` around lines 155 - 168, The token accounting uses retrieved_count (computed from perQuery items returned by search with depth 20) but the report claims "top 10 results", causing inconsistency; update the calculation in the stats blocks that compute avgTokens/total_tokens_per_query (and the similar blocks around the other ranges you flagged) to cap the retrieval depth at the reported K (e.g., use Math.min(q.retrieved_count, 10) or derive avgTokens from perQuery.map(q => Math.min(q.retrieved_count, PUBLISHED_K))) or alternatively change the report string ("BM25-only" / headline) to reflect top-20; touch the variables avgTokens, avgObsTokens usage and total_tokens_per_query assignments to ensure the same K is used everywhere (referencing perQuery, retrieved_count, avgObsTokens, total_tokens_per_query).src/functions/crystallize.ts-34-47 (1)
34-47:⚠️ Potential issue | 🟠 MajorGuard against duplicate crystallization of the same actions.
This path never checks
action.crystallizedInto, and it updates the actions only after the new crystal has already been stored. Two callers can crystallize the same completed chain concurrently, create duplicate crystals, and let the last writer win onaction.crystallizedInto. Reject already-crystallized actions and serialize the action set before writing.Also applies to: 73-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/crystallize.ts` around lines 34 - 47, In crystallize, when iterating data.actionIds (inside the loop that builds actions) check action.crystallizedInto and return an error if it's already set to prevent duplicate crystallization; then to prevent races serialize the action set (e.g. sort the ids and acquire a short KV lock or use a KV transactional/compare-and-set pattern) before creating the new crystal so two callers cannot run concurrently; finally, set each Action.crystallizedInto and persist those action records as part of the same atomic/serialized step (or write them before or in the same transaction that stores the new crystal) so the system never ends up with two crystals for the same actions (also apply the same guard to the similar block around lines 73-78).benchmark/dataset.ts-281-289 (1)
281-289:⚠️ Potential issue | 🟠 MajorMake
generateScaleDataset()deterministic.The
Math.random()timestamp means the scale corpus changes on every run. Anything with temporal weighting or session-order effects will then produce non-reproducible benchmark numbers and doc churn. Derive the timestamp fromi, or use a seeded PRNG.One simple deterministic option
- timestamp: ts(Math.random() * 90), + timestamp: ts((i * 37) % 90),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/dataset.ts` around lines 281 - 289, The timestamp generation in generateScaleDataset() is non-deterministic because it uses Math.random(); replace that with a deterministic function tied to the loop index (e.g., compute timestamp = ts((i % N) * scale + base) or use a simple LCG seeded by i) so the same input count yields identical datasets across runs; update the timestamp assignment in the loop where id/sessionId/title/narrative are set (inside generateScaleDataset) to derive the value from i instead of Math.random(), ensuring reproducible benchmarks.package.json-3-3 (1)
3-3:⚠️ Potential issue | 🟠 MajorManifest version is still one release behind.
src/version.tsand the updated export/import test now report0.6.0, butpackage.jsonis still0.5.0. Shipping this as-is publishes a 0.5.0 package that identifies itself as 0.6.0 in runtime surfaces.📦 Suggested fix
- "version": "0.5.0", + "version": "0.6.0",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@package.json` at line 3, Update the package manifest version to match the runtime constant: change the "version" field in package.json from "0.5.0" to "0.6.0" so it aligns with the exported value in src/version.ts (and the updated export/import tests) to avoid publishing a package whose runtime reports a different version.src/functions/export-import.ts-325-442 (1)
325-442:⚠️ Potential issue | 🟠 MajorValidate optional collections before the
for...ofloops.These branches only check truthiness. Payloads like
{ actions: {} }or{ graphNodes: "x" }will throw at runtime instead of returning a clean validation error, and{ actions: null }is silently accepted even though the export shape is array-or-undefined.🛡️ Suggested fix
+ const optionalCollections = { + graphNodes: importData.graphNodes, + graphEdges: importData.graphEdges, + semanticMemories: importData.semanticMemories, + proceduralMemories: importData.proceduralMemories, + actions: importData.actions, + actionEdges: importData.actionEdges, + routines: importData.routines, + signals: importData.signals, + checkpoints: importData.checkpoints, + sentinels: importData.sentinels, + sketches: importData.sketches, + crystals: importData.crystals, + facets: importData.facets, + } as const; + for (const [name, value] of Object.entries(optionalCollections)) { + if (value !== undefined && !Array.isArray(value)) { + return { success: false, error: `${name} must be an array` }; + } + } + if (importData.graphNodes) { for (const node of importData.graphNodes) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/export-import.ts` around lines 325 - 442, The import code treats optional collections like importData.actions or importData.graphNodes as truthy and then iterates, which will throw for non-array values or silently accept null; update each block (e.g., importData.graphNodes, importData.graphEdges, importData.actions, importData.routines, importData.signals, importData.checkpoints, importData.sentinels, importData.sketches, importData.crystals, importData.facets, importData.semanticMemories, importData.proceduralMemories, importData.actionEdges) to first check for presence with a strict undefined check and validate the type with Array.isArray(), e.g. if (importData.actions !== undefined) { if (!Array.isArray(importData.actions)) throw new Error("Invalid import: actions must be an array"); for (const action of importData.actions) { ... } }, so non-array payloads produce a clear validation error and null is not treated as a valid value.test/actions.test.ts-92-100 (1)
92-100:⚠️ Potential issue | 🟠 MajorPriority
0expectation is inconsistent with clamp semantics.Given adjacent tests enforce clamping to
[1,10], expectingpriority: 0to become5codifies a fallback bug instead of boundary handling.🧩 Proposed fix
- it("clamps priority 0 to default 5 (falsy fallback)", async () => { + it("clamps priority 0 to 1", async () => { @@ - expect(result.action.priority).toBe(5); + expect(result.action.priority).toBe(1); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/actions.test.ts` around lines 92 - 100, The test "clamps priority 0 to default 5 (falsy fallback)" incorrectly expects a fallback to 5; update the test case for the sdk.trigger("mem::action-create", { priority: 0 }) scenario to assert clamping to the lower bound (expect result.action.priority toBe 1) and rename the it description to reflect boundary clamping (e.g., "clamps priority 0 to minimum 1") so it aligns with the other clamp-to-[1,10] semantics enforced in the suite.test/frontier.test.ts-166-170 (1)
166-170:⚠️ Potential issue | 🟠 MajorThis test is confounded by action status.
act_blockedis created withstatus: "blocked", so it may be filtered regardless of edge logic. The test can pass even if requires-edge handling is broken.✅ Proposed fix
const blocked = makeAction({ id: "act_blocked", title: "Blocked", - status: "blocked", + status: "pending", });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/frontier.test.ts` around lines 166 - 170, The test currently creates act_blocked via makeAction({ id: "act_blocked", title: "Blocked", status: "blocked" }) which causes it to be filtered by status rather than edge logic; change the test so act_blocked is created without the status field (or with status: "open") so that any exclusion in the assertion must come from the edge/requirement handling; locate the makeAction call for "act_blocked" in the frontier.test.ts test and remove or change the status property accordingly and adjust expectations if needed.src/state/stemmer.ts-78-89 (1)
78-89:⚠️ Potential issue | 🟠 MajorStep-4 suffix removal is over-aggressive for
tion/sion.This removes full
tion/sion, which can truncate stems incorrectly (e.g., dropping torelainstead ofrelat). Porter step 4 should handleiononly when base ends withsort.🧭 Proposed fix
- if (w.endsWith("al") || w.endsWith("ance") || w.endsWith("ence") || + if (w.endsWith("al") || w.endsWith("ance") || w.endsWith("ence") || w.endsWith("er") || w.endsWith("ic") || w.endsWith("able") || w.endsWith("ible") || w.endsWith("ant") || w.endsWith("ement") || - w.endsWith("ment") || w.endsWith("ent") || w.endsWith("tion") || - w.endsWith("sion") || w.endsWith("ou") || w.endsWith("ism") || + w.endsWith("ment") || w.endsWith("ent") || w.endsWith("ou") || w.endsWith("ism") || w.endsWith("ate") || w.endsWith("iti") || w.endsWith("ous") || w.endsWith("ive") || w.endsWith("ize") || w.endsWith("ise")) { - const suffixLen = w.match(/(ement|ment|tion|sion|ance|ence|able|ible|ism|ate|iti|ous|ive|ize|ise|ant|ent|al|er|ic|ou)$/)?.[0]?.length ?? 0; + const suffixLen = w.match(/(ement|ment|ance|ence|able|ible|ism|ate|iti|ous|ive|ize|ise|ant|ent|al|er|ic|ou)$/)?.[0]?.length ?? 0; if (suffixLen > 0) { const base = w.slice(0, -suffixLen); if (measure(base) > 1) w = base; } } + if (w.endsWith("ion")) { + const base = w.slice(0, -3); + if (measure(base) > 1 && /[st]$/.test(base)) w = base; + }src/functions/actions.ts-265-279 (1)
265-279:⚠️ Potential issue | 🟠 MajorStale
actionMapused inside lock inpropagateCompletion.The
actionMap(line 266) is populated before the lock is acquired, but it's used inside the lock (lines 277-278) to check dependency status. Another concurrent operation could have changed action statuses between the list fetch and the lock acquisition, leading to incorrect unblocking decisions.🐛 Proposed fix to re-fetch inside the lock
for (const edge of unlockEdges) { const candidateId = edge.sourceActionId; await withKeyedLock(`mem:action:${candidateId}`, async () => { const action = await kv.get<Action>(KV.actions, candidateId); if (action && action.status === "blocked") { + const freshEdges = await kv.list<ActionEdge>(KV.actionEdges); + const freshActions = await kv.list<Action>(KV.actions); + const freshActionMap = new Map(freshActions.map((a) => [a.id, a])); - const deps = allEdges.filter( + const deps = freshEdges.filter( (e) => e.sourceActionId === action.id && e.type === "requires", ); const allDone = deps.every((d) => { - const target = actionMap.get(d.targetActionId); + const target = freshActionMap.get(d.targetActionId); return target && target.status === "done"; });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/actions.ts` around lines 265 - 279, The code builds actionMap from allActions before acquiring the withKeyedLock and then uses that stale actionMap inside the lock to evaluate deps; instead, inside the withKeyedLock callback (where candidateId is locked) re-fetch fresh action states from the KV store rather than relying on the precomputed actionMap: either rebuild actionMap from a fresh kv.list<KV.actions> call or, more efficiently, replace the actionMap lookup for each dependency with an explicit await kv.get<Action>(KV.actions, d.targetActionId) when computing deps and allDone; update references to allActions/actionMap used in the withKeyedLock block (including deps and allDone calculations) so they use the fresh reads.src/functions/signals.ts-70-76 (1)
70-76:⚠️ Potential issue | 🟠 MajorLogic error in signal filtering.
The filter condition at line 74 appears inverted. When
!s.to(broadcast signal) ands.from !== data.agentId, returningtruewould include signals the agent didn't send and weren't addressed to them. This likely should returnfalseto exclude irrelevant broadcasts, or the logic needs restructuring.Current behavior: Broadcasts from other agents are included even when the agent has no relation to them.
🐛 Proposed fix
signals = signals.filter((s) => { if (s.expiresAt && new Date(s.expiresAt).getTime() <= now) return false; if (s.to && s.to !== data.agentId && s.from !== data.agentId) return false; - if (!s.to && s.from !== data.agentId) return true; - return true; + // Include: direct messages to/from agent, or broadcasts from agent, or all broadcasts + return s.to === data.agentId || s.from === data.agentId || !s.to; });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/signals.ts` around lines 70 - 76, The current filter on the signals array incorrectly includes broadcast signals from other agents; update the predicate in the signals.filter (referencing signals, s.expiresAt, s.to, s.from and data.agentId) so that if a signal has no recipient (!s.to) and its sender is not the current agent (s.from !== data.agentId) it returns false (exclude it), while still excluding expired signals and keeping signals that are addressed to the agent or sent by the agent; restructure the if branches to explicitly handle expiration, addressed signals, broadcasts, and own-sent signals so unwanted broadcasts are filtered out.src/functions/graph-retrieval.ts-49-50 (1)
49-50:⚠️ Potential issue | 🟠 MajorTraversal should use current edges, not every historical version.
KV.graphEdgesnow contains superseded relationships withisLatest: false. Feeding the full set into BFS means obsolete edges keep contributing search and expansion hits after they were replaced.getLatestEdges()already exists in this file; use it before traversal in both methods.Suggested fix
- const allEdges = await this.kv.list<GraphEdge>(KV.graphEdges); + const allEdges = this.getLatestEdges( + await this.kv.list<GraphEdge>(KV.graphEdges), + );Also applies to: 122-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/graph-retrieval.ts` around lines 49 - 50, The traversal is using every historical edge from KV.graphEdges (allEdges) instead of only current edges, causing obsolete edges to affect BFS; replace the raw list call with getLatestEdges() so traversal and expansion use the filtered current-edge set—specifically update the places that call this.kv.list<GraphEdge>(KV.graphEdges) (where allEdges is used for BFS/traverse) and the second occurrence around the other traversal method (lines near the other use) to call and use getLatestEdges()’s result instead of allEdges.src/functions/graph-retrieval.ts-63-110 (1)
63-110:⚠️ Potential issue | 🟠 Major
visitedObsis locking in the first path instead of the best one.The BFS loop emits the start-node path first, so direct entity matches are recorded with
score = 0.5, and the laterscore: 1.0seed never runs. The same first-seen logic also discards higher-scoring alternate paths for shared observations. Keep the best result perobsIdinstead of short-circuiting on first encounter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/graph-retrieval.ts` around lines 63 - 110, The current visitedObs Set short-circuits on first encounter and locks in the first path instead of the best one; replace it with a Map<string, GraphRetrievalResult> (e.g., bestResults) keyed by obsId, compute each candidate result (using bfsTraversal, score calculation, buildGraphContext, pathLength, etc.) and compare its score with any existing entry, updating the map only if the new score is higher (ensure the direct startNode seed entries with score 1.0 are also compared/inserted into the map); after processing all paths and seeds, convert bestResults.values() into the final results array to preserve only the highest-scoring result per obsId.src/functions/temporal-graph.ts-315-323 (1)
315-323:⚠️ Potential issue | 🟠 MajorSuperseded edges are being counted twice in the temporal views.
This file keeps old versions in
KV.graphEdgesand also copies them intoKV.graphEdgeHistory. Concatenating both scopes here duplicates every superseded relationship, which inflates the timeline and the differentialchangesoutput, and also duplicateshistoricalEdgeson theasOfpath. Filter the current scope down to latest edges before merging with history, or stop retaining historical versions inKV.graphEdges.Also applies to: 379-397
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/temporal-graph.ts` around lines 315 - 323, The code concatenates current-scope edges (from KV.graphEdges via relatedEdges) with historical copies from KV.graphEdgeHistory, causing superseded edges to be duplicated; update the merge logic (where relatedEdges, historicalEdges, entityHistory, and allEntityEdges are used) to first reduce the current KV.graphEdges set to only the latest edge version per edge identifier (e.g., group by edge id and pick the one with the highest timestamp/version) before concatenating with KV.graphEdgeHistory, or alternately filter out any historical IDs present in KV.graphEdgeHistory from relatedEdges; apply the same deduplication logic in the other occurrence noted (around the section referenced 379-397) so the asOf timeline and changes computation only see one canonical record per edge.src/functions/temporal-graph.ts-56-90 (1)
56-90:⚠️ Potential issue | 🟠 MajorThis parser only accepts one attribute ordering.
Both regexes require attributes in a fixed sequence (
typebeforename, andtype/source/target/weight/...for relationships). XML attributes are unordered, so a valid model response with reordered attributes will silently produce zero nodes or edges. Parse attributes independently or switch to a real XML parser here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/temporal-graph.ts` around lines 56 - 90, The entity and relationship regexes (entityRegex, relRegex) assume strict attribute ordering so reordered XML attributes yield no matches; replace this brittle approach by parsing the XML with a proper XML parser (e.g., fast-xml-parser or DOMParser) and then build nodes and relationships from the parsed object (extract type/name/aliases/properties and source/target/weight/valid_from/valid_to) before pushing into nodes/edges (the same logic that currently uses generateId("gn"), properties, aliases, sourceObservationIds, createdAt can be reused), or if you must keep regexes, stop relying on positional capture groups and instead match the whole tag then use independent attribute regexes (e.g., /type="([^"]+)"/, /name="([^"]+)"/, etc.) to extract attributes in any order and parse child elements (property/alias) from the tag body.src/functions/temporal-graph.ts-324-339 (1)
324-339:⚠️ Potential issue | 🟠 MajorReject invalid temporal bounds instead of treating
NaNas a real timestamp.
new Date(...).getTime()returnsNaNfor bad input. Inmem::temporal-query, that makes every edge look valid; inmem::differential-state, it can filter everything out. ValidateasOf/from/toand return an error before building results.Also applies to: 405-415
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/temporal-graph.ts` around lines 324 - 339, Validate temporal inputs (data.asOf, data.from, data.to) before using Date(...).getTime(): check that new Date(value).getTime() is not NaN and return or throw an error immediately if invalid. In this block around the asOf filter (variables asOfTime, e.tcommit/e.createdAt, e.tvalid, e.tvalidEnd) ensure you parse and validate data.asOf into asOfTime and bail out with a clear error when NaN; do the same for the other temporal checks used in the differential-state area (the code handling data.from/data.to around lines 405-415) so you never treat NaN as a valid timestamp when filtering edges. Ensure error handling returns a failure response before building results.src/functions/graph-retrieval.ts-90-107 (1)
90-107:⚠️ Potential issue | 🟠 MajorGraph-only hits cannot be enriched because
sessionIdis blank.Both result builders hard-code
sessionId: "".HybridSearch.enrichResults()reads observations fromKV.observations(sessionId), so these candidates get dropped entirely, and diversification also collapses all graph-only hits into one pseudo-session. This needs real session provenance before the result leaves the graph layer.Also applies to: 143-149
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/graph-retrieval.ts` around lines 90 - 107, The graph-result objects are being created with sessionId: "" which prevents HybridSearch.enrichResults() from loading observations via KV.observations(sessionId) and collapses graph-only hits; update both result builders (the block that pushes {obsId, sessionId: "", score, graphContext: buildGraphContext(path), ...} and the block pushing {obsId, sessionId: "", score: 1.0, graphContext: `[${startNode.type}] ${startNode.name}`, ...} — also the similar code at the 143-149 region) to set a real session provenance: look up the session id from the observation metadata (or from startNode if it carries a session id) before pushing the result, populate sessionId with that value, and ensure buildGraphContext usage remains unchanged so HybridSearch.enrichResults() and KV.observations(sessionId) can retrieve and enrich the proper observation.src/index.ts-213-221 (1)
213-221:⚠️ Potential issue | 🟠 MajorValidate
AGENTMEMORY_GRAPH_WEIGHTbefore constructing the ranker.
parseFloatreturnsNaNfor invalid env values. Once graph results are present,HybridSearchnormalization/combine math turns intoNaNand ranking breaks. Default and clamp the value before passing it in.Suggested fix
- const graphWeight = parseFloat(getEnvVar("AGENTMEMORY_GRAPH_WEIGHT") || "0.3"); + const parsedGraphWeight = Number.parseFloat( + getEnvVar("AGENTMEMORY_GRAPH_WEIGHT") || "0.3", + ); + const graphWeight = + Number.isFinite(parsedGraphWeight) && parsedGraphWeight >= 0 + ? parsedGraphWeight + : 0.3;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/index.ts` around lines 213 - 221, The AGENTMEMORY_GRAPH_WEIGHT can parse to NaN and must be validated before creating the HybridSearch ranker: parse the env var into a number, check isFinite and if not use the default (0.3), then clamp the value into the valid range (e.g., 0.0 to 1.0) and pass that sanitized graphWeight into the HybridSearch constructor (refer to the graphWeight variable and the HybridSearch(...) call) so normalization/combination math cannot produce NaN.src/state/hybrid-search.ts-246-252 (1)
246-252:⚠️ Potential issue | 🟠 MajorThe refill loop bypasses the 3-per-session cap.
After the first pass enforces
maxPerSession, this fallback adds any remaining unique result without checkingsessionCounts, so a low-session corpus can still return 4+ hits from the same session. If the v0.6 diversification rule is meant to be hard, keep the cap here too or accept returning fewer thanlimit.Suggested fix
if (selected.length < limit) { for (const r of results) { if (selected.length >= limit) break; - if (!selected.some(s => s.obsId === r.obsId)) { + const count = sessionCounts.get(r.sessionId) || 0; + if (count >= maxPerSession) continue; + if (!selected.some((s) => s.obsId === r.obsId)) { selected.push(r); + sessionCounts.set(r.sessionId, count + 1); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/hybrid-search.ts` around lines 246 - 252, The refill loop that adds from results into selected (using selected, limit, results) currently ignores the per-session cap (sessionCounts / maxPerSession) and can exceed maxPerSession; update that loop to check the sessionCounts for r.sessionId before pushing (skip if adding would exceed maxPerSession) and increment sessionCounts[r.sessionId] when you add an item, preserving the dedupe check (selected.some(s => s.obsId === r.obsId)); if you intend the cap to be hard, skip items that would violate it and accept fewer than limit, otherwise adjust logic to allow soft cap consistently across both passes.
🟡 Minor comments (12)
benchmark/real-embeddings-eval.ts-111-135 (1)
111-135:⚠️ Potential issue | 🟡 MinorKeep embedding time separate from ingest/index time.
embedStartis taken before the BM25/KV ingest loop, soembed_time_msincludes unrelated indexing work. That makes the report's “Embedding Time” column misleading. Start timing immediately beforeembedBatch()or emit separate ingest and embedding metrics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmark/real-embeddings-eval.ts` around lines 111 - 135, The current embedTime measurement (embedStart/embedTime) starts before the BM25/KV ingest loop so it incorrectly includes ingest/index time; move the embedding timer to start immediately before the provider.embedBatch loop (or add a separate ingest timer) so embedding timing only wraps the batching/ provider.embedBatch and vector.add work; update references to embedStart/embedTime (or add ingestStart/ingestTime) and ensure logged/returned metrics use the new embed-only timing while leaving the bm25.add and kv.set loop outside that timer.src/functions/sliding-window.ts-153-159 (1)
153-159:⚠️ Potential issue | 🟡 MinorDon't count no-op windows as enriched.
mem::enrich-windowreturnssuccess: truewithenriched: nullwhen there is no adjacent context, butmem::enrich-sessionincrementsenrichedfor any truthysuccess. That makes the batch totals report persisted chunks that were never written. Count only non-nullenrichedresults, or track them asskipped.Also applies to: 232-241
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/sliding-window.ts` around lines 153 - 159, The sliding-window enrichment path currently returns {success: true, enriched: null} from enrichWindow, but enrichSession increments its enriched counter for any truthy success; change enrichSession so it only increments the "enriched" tally when the result.enriched is non-null (i.e., result.enriched !== null or Boolean(result.enriched)), and add/emit a "skipped" counter for cases where success is true but enriched is null; alternatively, if you prefer, adjust enrichWindow to set success:false when there is no adjacent context—ensure the code paths referencing enrichWindow, enrichSession, and the enriched/success fields are updated accordingly so batch totals reflect only actually persisted chunks.test/hybrid-search.test.ts-79-89 (1)
79-89:⚠️ Potential issue | 🟡 MinorThis test no longer proves the behavior named in the title.
A constant positive
combinedScorewould satisfy these assertions. If BM25-only mode is supposed to derivecombinedScorefrom BM25, assert the exact transform or at least a relation that depends onbm25Score.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/hybrid-search.test.ts` around lines 79 - 89, The test "combinedScore is derived from bm25Score when no vector index" currently only asserts combinedScore > 0 which doesn't prove it derives from bm25Score; update the assertions in this test (the it block using makeObs, bm25.add, kv.set and new HybridSearch(bm25, null, null, kv)) to assert the explicit relation between combinedScore and bm25Score (or the exact transform the HybridSearch implementation applies) — e.g., expect(results[0].combinedScore).toEqual(results[0].bm25Score) or expect(results[0].combinedScore).toBeCloseTo(transform(results[0].bm25Score)) or assert a monotonic relation (combinedScore > 0 and combinedScore === some function of bm25Score) so the test fails if combinedScore is independent of bm25Score.test/graph-retrieval.test.ts-145-149 (1)
145-149:⚠️ Potential issue | 🟡 MinorRemove unnecessary
as anycast on a valid edge type.
located_inis already defined inGraphEdgeType(src/types.ts:328), so theas anycast in the fixture is unnecessary and masks type safety. Either themakeEdgefunction's signature is too narrow to acceptGraphEdgeTypedirectly, or the cast is redundant—clarify which and remove the bypass.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/graph-retrieval.test.ts` around lines 145 - 149, The test fixture uses an unnecessary cast "located_in as any" — remove the cast and pass the GraphEdgeType value directly; if TypeScript complains, widen the parameter type on the makeEdge function (update makeEdge's signature to accept GraphEdgeType instead of a narrower union) so the call in graph-retrieval.test.ts compiles without bypassing type safety. Ensure references to GraphEdgeType and makeEdge are updated consistently.test/sliding-window.test.ts-92-102 (1)
92-102:⚠️ Potential issue | 🟡 MinorThe registration test is a false positive.
Line 101 asserts
sdk.triggeris defined, but that is always true even if registration fails. This does not validate that both functions were actually registered.🧪 Proposed fix
function mockSdk() { const functions = new Map<string, Function>(); return { registerFunction: (opts: { id: string }, fn: Function) => { functions.set(opts.id, fn); }, + hasFunction: (id: string) => functions.has(id), trigger: async (id: string, data: unknown) => { const fn = functions.get(id); if (fn) return fn(data); return null; }, @@ it("registers both functions", async () => { @@ - expect(sdk.trigger).toBeDefined(); + expect((sdk as any).hasFunction("mem::get-window")).toBe(true); + expect((sdk as any).hasFunction("mem::enrich-window")).toBe(true); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/sliding-window.test.ts` around lines 92 - 102, The test is a false positive because asserting sdk.trigger is defined does not verify registration; update the test to assert that the SDK registration calls actually happened by making assertions on the mock SDK's register/trigger registration calls (e.g., expect(sdk.register).toHaveBeenCalledTimes(2) and expect(sdk.register).toHaveBeenCalledWith(<expectedFunctionNameOrMatcher>, expect.any(Function)) or expect(sdk.trigger).toHaveBeenCalledWith(<expectedTriggerName>, expect.any(Function)) after calling registerSlidingWindowFunction; use the concrete registration method on the mock returned by mockSdk() and the known function/trigger names used inside registerSlidingWindowFunction to verify both functions were registered.src/state/stemmer.ts-35-39 (1)
35-39:⚠️ Potential issue | 🟡 MinorNormalize input casing before stemming.
stemcurrently processes raw casing. If callers pass capitalized terms, suffix logic may be skipped and recall becomes input-format dependent.🔧 Proposed fix
export function stem(word: string): string { if (word.length <= 2) return word; - let w = word; + let w = word.toLowerCase();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/stemmer.ts` around lines 35 - 39, The stem function currently operates on the original casing and should normalize input to lowercase before performing suffix logic: in function stem convert the input (e.g., set let w = word.toLowerCase()) before any length/suffix checks so all comparisons are case-insensitive; ensure the small-word early return and the final returned value reflect the normalized form used by the algorithm (use the lowered w for suffix operations and return).benchmark/SCALE.md-59-72 (1)
59-72:⚠️ Potential issue | 🟡 MinorAdd a language tag to the fenced block for markdownlint compliance.
Line 59 starts a fenced block without a language identifier, which triggers MD040 and can fail strict docs linting.
🛠️ Proposed fix
-``` +```text Agent context window: ~200K tokens System prompt + tools: ~20K tokens User conversation: ~30K tokens Available for memory: ~150K tokens @@ agentmemory top-10 results: Any corpus size = ~1,924 tokens (0.3% of budget)</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@benchmark/SCALE.mdaround lines 59 - 72, The fenced code block that starts
withand contains the "Agent context window: ~200K tokens" summary is missing a language tag which triggers MD040; fix it by changing the opening fence to include a language identifier (e.g.,text) so the block becomes
text ...(leave the closing fence unchanged) to satisfy markdownlint and
preserve the block contents.</details> </blockquote></details> <details> <summary>src/functions/actions.ts-168-175 (1)</summary><blockquote> `168-175`: _⚠️ Potential issue_ | _🟡 Minor_ **Edge creation doesn't verify action still exists at write time.** The source and target action existence checks (lines 168-175) are done before the edge is created (line 186). A concurrent delete could remove an action between the check and the edge write, creating an orphaned edge. Consider wrapping the validation and creation in a lock, or accepting this as a low-probability race that `mem::heal` can clean up. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/functions/actions.ts` around lines 168 - 175, The current existence checks for source and target actions (kv.get<Action> on KV.actions for data.sourceActionId and data.targetActionId) occur before the edge write and can race with concurrent deletes, creating orphaned edges; fix by making validation+edge creation atomic — e.g., acquire a short-lived lock/mutex keyed by both action IDs (or use your KV's transactional/compare-and-set API) inside the function that writes the edge, re-fetch/validate sourceAction and targetAction after acquiring the lock, then create the edge and release the lock; alternatively, if you accept the small race, document it and rely on mem::heal to clean orphaned edges. ``` </details> </blockquote></details> <details> <summary>benchmark/QUALITY.md-20-20 (1)</summary><blockquote> `20-20`: _⚠️ Potential issue_ | _🟡 Minor_ **Minor: Recall improvement calculation is slightly misleading.** The text states "+4%" improvement (58.0% vs 55.8%), but the absolute difference is +2.2 percentage points. The relative improvement is approximately +4% ((58.0-55.8)/55.8 ≈ 3.9%), so consider clarifying this is relative improvement. <details> <summary>📝 Suggested clarification</summary> ```diff -**Recall improvement:** agentmemory triple-stream finds 58.0% of relevant memories at K=10 vs 55.8% for keyword grep (+4%) +**Recall improvement:** agentmemory triple-stream finds 58.0% of relevant memories at K=10 vs 55.8% for keyword grep (+2.2pp / ~4% relative) ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@benchmark/QUALITY.md` at line 20, The "Recall improvement" line is misleading because it shows "+4%" without specifying whether that's absolute or relative; update the text (the "Recall improvement:" sentence) to clarify both values — e.g., "Recall improvement: agentmemory triple-stream finds 58.0% of relevant memories at K=10 vs 55.8% for keyword grep (absolute +2.2 percentage points, relative ≈+3.9%)." Ensure the revised sentence replaces the current one exactly where the "Recall improvement:" line appears. ``` </details> </blockquote></details> <details> <summary>test/temporal-graph.test.ts-82-84 (1)</summary><blockquote> `82-84`: _⚠️ Potential issue_ | _🟡 Minor_ **Dead code: unused variable `fns`.** Line 82 creates an empty array `fns` that is never used. This appears to be leftover from refactoring. <details> <summary>🧹 Proposed fix</summary> ```diff registerTemporalGraphFunctions(sdk as never, kv as never, provider); - const fns = Array.from((sdk as any).trigger ? [] : []); expect(sdk.trigger).toBeDefined(); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@test/temporal-graph.test.ts` around lines 82 - 84, Remove the dead unused variable fns from the test: the line creating const fns = Array.from((sdk as any).trigger ? [] : []); is leftover and should be deleted so the test only checks expect(sdk.trigger).toBeDefined();; update any surrounding test block referencing fns if present (look for the const fns declaration and the expect(sdk.trigger) assertion) to keep the test concise and free of unused variables. ``` </details> </blockquote></details> <details> <summary>src/functions/diagnostics.ts-595-606 (1)</summary><blockquote> `595-606`: _⚠️ Potential issue_ | _🟡 Minor_ **Orphaned lease deletion lacks locking.** The orphaned lease deletion at line 603 does not use `withKeyedLock`, unlike the expired lease handling at lines 556-583. This creates a potential race condition if another operation concurrently modifies the lease. <details> <summary>🔒 Proposed fix to add locking</summary> ```diff if (!actionIds.has(lease.actionId)) { if (dryRun) { details.push( `[dry-run] Would delete orphaned lease ${lease.id}`, ); fixed++; continue; } - await kv.delete(KV.leases, lease.id); - details.push(`Deleted orphaned lease ${lease.id}`); - fixed++; + const didDelete = await withKeyedLock( + `mem:lease:${lease.actionId}`, + async () => { + const fresh = await kv.get<Lease>(KV.leases, lease.id); + if (!fresh) return false; + const freshActions = await kv.list<Action>(KV.actions); + const freshActionIds = new Set(freshActions.map((a) => a.id)); + if (freshActionIds.has(fresh.actionId)) return false; + await kv.delete(KV.leases, lease.id); + return true; + }, + ); + if (didDelete) { + details.push(`Deleted orphaned lease ${lease.id}`); + fixed++; + } else { + skipped++; + } } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/functions/diagnostics.ts` around lines 595 - 606, The orphaned-lease deletion must be protected with the same keyed lock as expired leases: wrap the delete logic in withKeyedLock keyed on lease.id (use the same withKeyedLock helper used in the expired-lease block), and inside the lock re-check actionIds.has(lease.actionId) before calling kv.delete(KV.leases, lease.id); keep the dryRun branch outside/not acquiring the lock (still push the dry-run message and increment fixed), but for real deletions acquire the lock, re-validate and then delete to avoid the race. ``` </details> </blockquote></details> <details> <summary>src/triggers/api.ts-1722-1727 (1)</summary><blockquote> `1722-1727`: _⚠️ Potential issue_ | _🟡 Minor_ **Validate `limit` before forwarding it to `mem::crystal-list`.** `parseInt(params.limit as string)` is passed through directly here, so `?limit=foo` becomes `NaN` and leaks into the handler. Normalize invalid values to `undefined` or return `400` instead. <details> <summary>Suggested fix</summary> ```diff sdk.registerFunction({ id: "api::crystal-list" }, async (req: ApiRequest) => { const denied = checkAuth(req, secret); if (denied) return denied; const params = req.query_params || {}; - const result = await sdk.trigger("mem::crystal-list", { project: params.project, sessionId: params.sessionId, limit: params.limit ? parseInt(params.limit as string) : undefined }); + const limit = + params.limit === undefined + ? undefined + : Number.parseInt(params.limit as string, 10); + if (limit !== undefined && !Number.isFinite(limit)) { + return { status_code: 400, body: { error: "limit must be a number" } }; + } + const result = await sdk.trigger("mem::crystal-list", { + project: params.project, + sessionId: params.sessionId, + limit, + }); return { status_code: 200, body: result }; }); ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/triggers/api.ts` around lines 1722 - 1727, The api::crystal-list handler is forwarding parseInt(params.limit) unvalidated into sdk.trigger("mem::crystal-list"), so invalid ?limit=foo becomes NaN; update the sdk.registerFunction callback to validate params.limit: if params.limit is present, parse it with parseInt (or Number), check isNaN, and if invalid return a 400 error response (e.g., { status_code: 400, body: { error: "invalid limit" } }); otherwise pass the numeric limit (or undefined when absent) into the mem::crystal-list trigger. Ensure you adjust the code around sdk.registerFunction({ id: "api::crystal-list" }, async (req: ApiRequest) => { ... }) and use the validated variable when calling sdk.trigger. ``` </details> </blockquote></details> </blockquote></details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| function isAllowedUrl(urlStr: string): boolean { | ||
| try { | ||
| const parsed = new URL(urlStr); | ||
| if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false; | ||
| const host = parsed.hostname.toLowerCase(); | ||
| if ( | ||
| host === "localhost" || | ||
| host === "127.0.0.1" || | ||
| host === "::1" || | ||
| host === "0.0.0.0" || | ||
| host.startsWith("10.") || | ||
| host.startsWith("192.168.") || | ||
| host === "169.254.169.254" || | ||
| /^172\.(1[6-9]|2\d|3[01])\./.test(host) || | ||
| host.startsWith("fe80:") || | ||
| host.startsWith("fc00:") || | ||
| host.startsWith("fd") || | ||
| host.startsWith("::ffff:") | ||
| ) { | ||
| return false; | ||
| } | ||
| return true; | ||
| } catch { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
SSRF guard is bypassable with DNS resolution tricks.
isAllowedUrl only inspects hostname text; a public hostname resolving to RFC1918/link-local can still pass and be fetched. This leaves outbound mesh sync open to SSRF.
🔐 Harden URL validation (DNS + IP range checks)
+import { lookup } from "node:dns/promises";
+import { isIP } from "node:net";
...
-function isAllowedUrl(urlStr: string): boolean {
+async function isAllowedUrl(urlStr: string): Promise<boolean> {
try {
const parsed = new URL(urlStr);
if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false;
- const host = parsed.hostname.toLowerCase();
+ const host = parsed.hostname.toLowerCase();
+ if (parsed.username || parsed.password) return false;
+
+ const privateIp = (ip: string): boolean => {
+ // implement RFC1918, loopback, link-local, ULA, mapped/embedded local checks
+ // for both IPv4/IPv6 textual addresses
+ return /* ... */;
+ };
+
+ if (isIP(host) && privateIp(host)) return false;
+ const resolved = await lookup(host, { all: true, verbatim: true });
+ if (resolved.some((r) => privateIp(r.address))) return false;
...
- return true;
+ return true;
} catch {
return false;
}
}And update call sites:
- if (!isAllowedUrl(data.url)) {
+ if (!(await isAllowedUrl(data.url))) {
...
- if (!isAllowedUrl(peer.url)) {
+ if (!(await isAllowedUrl(peer.url))) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/mesh.ts` around lines 7 - 32, isAllowedUrl currently only
checks the hostname string and can be bypassed via DNS tricks; update
isAllowedUrl to perform DNS resolution (both A and AAAA records using
dns.promises.resolve4/resolve6 or lookup with {all:true}) and validate each
resolved IP against the same private/loopback/link-local ranges (IPv4 and IPv6),
rejecting the URL if any resolved address is in a disallowed range or if
resolution fails/times out; ensure the function handles multiple addresses,
treats hostnames that resolve to both public and private IPs as disallowed, and
update all call sites that rely on isAllowedUrl to handle its now-async behavior
(or provide a new async wrapper like isAllowedUrlAsync) and propagate
errors/timeouts appropriately.
| if (data.type === "pattern") { | ||
| const cfg = data.config as { pattern?: string } | undefined; | ||
| if (!cfg || !cfg.pattern || typeof cfg.pattern !== "string") { | ||
| return { | ||
| success: false, | ||
| error: "pattern config requires a pattern string", | ||
| }; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /root/repo && head -n 300 src/functions/sentinels.ts | tail -n +50Repository: rohitg00/agentmemory
Length of output: 124
🏁 Script executed:
cd /root/repo && sed -n '54,65p' src/functions/sentinels.tsRepository: rohitg00/agentmemory
Length of output: 124
🏁 Script executed:
cd /root/repo && sed -n '237,260p' src/functions/sentinels.tsRepository: rohitg00/agentmemory
Length of output: 124
🏁 Script executed:
cd /root/repo && wc -l src/functions/sentinels.tsRepository: rohitg00/agentmemory
Length of output: 124
🏁 Script executed:
find . -name "sentinels.ts" -type fRepository: rohitg00/agentmemory
Length of output: 92
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '50,70p'Repository: rohitg00/agentmemory
Length of output: 849
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '230,260p'Repository: rohitg00/agentmemory
Length of output: 1290
🏁 Script executed:
wc -l src/functions/sentinels.tsRepository: rohitg00/agentmemory
Length of output: 94
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '1,60p'Repository: rohitg00/agentmemory
Length of output: 2384
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '180,240p'Repository: rohitg00/agentmemory
Length of output: 2592
🏁 Script executed:
rg "data\.type.*pattern" -A 10 -B 5 src/functions/sentinels.tsRepository: rohitg00/agentmemory
Length of output: 541
🏁 Script executed:
rg "RegExp|new Regex|escapeRegex|validat.*pattern" src/functions/sentinels.tsRepository: rohitg00/agentmemory
Length of output: 117
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '15,30p'Repository: rohitg00/agentmemory
Length of output: 697
🏁 Script executed:
rg "escape|safe.*pattern|validateRegex" src/Repository: rohitg00/agentmemory
Length of output: 476
🏁 Script executed:
cat src/functions/sentinels.ts | grep -n "sdk.registerFunction" -A 20Repository: rohitg00/agentmemory
Length of output: 4572
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '54,62p'Repository: rohitg00/agentmemory
Length of output: 444
🏁 Script executed:
cat -n src/functions/sentinels.ts | sed -n '237,260p'Repository: rohitg00/agentmemory
Length of output: 1090
Validate sentinel patterns before compiling them as regexes to prevent ReDoS attacks.
External input from mem::sentinel-create passes cfg.pattern directly to new RegExp() at line 239 without safety validation. The regex is then tested against every observation title across all sessions (line 253), making a catastrophic pattern (e.g., (a+)+b) a denial-of-service vector. Current validation only checks that the pattern is a non-empty string. Implement pattern validation using a regex timeout/complexity checker, or switch to safe literal matching mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/sentinels.ts` around lines 54 - 61, The sentinel pattern from
mem::sentinel-create (cfg.pattern) is currently passed straight into new RegExp
and then used to test observation titles, which is vulnerable to ReDoS; to fix,
validate and/or sanitize cfg.pattern before compiling: first enforce a max
length (e.g. 100 chars), reject dangerous tokens (backreferences like \1,
lookarounds `(?=`, `(?!)`, and nested quantifiers such as patterns containing
`+`, `*` or `?` immediately inside repeated groups like `(…+)+`), attempt a safe
compile inside a try/catch around new RegExp and return an error if compilation
fails or heuristics flag it, and alternatively provide an option to treat the
input as a literal by escaping it (escapeRegex) and using that in RegExp if any
risk is detected; update the code paths around cfg.pattern, new RegExp(...) and
the RegExp.test(...) usage so only validated/escaped patterns are compiled and
used.
| const now = new Date(); | ||
| const sentinel: Sentinel = { | ||
| id: generateId("snl"), | ||
| name: data.name.trim(), | ||
| type: data.type, | ||
| status: "watching", | ||
| config: data.config || {}, | ||
| createdAt: now.toISOString(), | ||
| linkedActionIds: data.linkedActionIds || [], | ||
| expiresAt: data.expiresInMs | ||
| ? new Date(now.getTime() + data.expiresInMs).toISOString() | ||
| : undefined, |
There was a problem hiding this comment.
Persist timer deadlines instead of relying on setTimeout() alone.
For timer sentinels, the only trigger path is the in-memory timeout. After a restart, the persisted sentinel is still watching, but no timer is re-armed and mem::sentinel-check never evaluates timer sentinels, so linked actions can stay blocked forever. Store the due time in KV and make timer evaluation recoverable on startup/check.
Also applies to: 125-137
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/sentinels.ts` around lines 96 - 107, The timer sentinel
creation currently only sets an in-memory timeout; persist the deadline and make
timer evaluation recoverable by storing the computed expiresAt on the Sentinel
record (the object created where id is generateId("snl"), name, type, status,
config, linkedActionIds, createdAt, expiresAt) and ensuring the sentinel-check
path (mem::sentinel-check) reads expiresAt from KV/DB on startup and during
checks: for each timer-type sentinel, if expiresAt <= now immediately trigger
the linked actions, otherwise re-arm a setTimeout to fire at expiresAt; also
update any create/update functions that accept data.expiresInMs to compute and
persist expiresAt instead of relying solely on setTimeouts so timers survive
restarts.
Merge conflicts: resolved all 7 files (README, types, schema, index, export-import, version, test) keeping v0.6.0 additions merged with main CodeRabbit fixes: - diagnostics.mjs: use action-level lock key (mem:action:) for lease expiry to prevent races with lease handlers - leases.ts: check for other active leases before clearing action assignment on lease expiry - mesh.ts: DNS resolution check in isAllowedUrl to prevent SSRF via DNS rebinding (resolves IPs and checks private ranges) - export-import.ts: clear graphNodes, graphEdges, semantic, procedural scopes on replace strategy - real-embeddings-eval.ts: measure actual returned result tokens instead of using label counts - .gitignore: exclude data/ directory
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/types.ts (1)
241-262:⚠️ Potential issue | 🟠 Major
ExportDatafor v0.6.0 is missing temporal edge history.Line 242 introduces version
0.6.0, but the export schema still omits graph edge history. That prevents full round-trip of temporal graph state and can cause inconsistent imports.Suggested patch
export interface ExportData { version: "0.3.0" | "0.4.0" | "0.5.0" | "0.6.0"; exportedAt: string; sessions: Session[]; observations: Record<string, CompressedObservation[]>; memories: Memory[]; summaries: SessionSummary[]; profiles?: ProjectProfile[]; graphNodes?: GraphNode[]; graphEdges?: GraphEdge[]; + graphEdgeHistory?: GraphEdge[]; semanticMemories?: SemanticMemory[]; proceduralMemories?: ProceduralMemory[]; actions?: Action[];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/types.ts` around lines 241 - 262, ExportData for version "0.6.0" is missing temporal edge history; add a new optional property to the ExportData interface (e.g., graphEdgeHistory?: GraphEdgeHistory[] or temporalEdgeHistory?: TemporalEdgeHistory[]) so the export can carry the full temporal history of graph edges; update the ExportData interface declaration to include that optional array and ensure the referenced GraphEdgeHistory/TemporalEdgeHistory type (or equivalent) exists and matches the schema used for edge history during exports/imports.
♻️ Duplicate comments (2)
src/functions/mesh.ts (2)
22-38:⚠️ Potential issue | 🔴 CriticalThis validation still doesn’t protect the actual request target.
At Line 37 the guard fails open on DNS errors, and even on success it only validates one
lookup()result while hostname-based peers are resolved again during the later HTTP request. That leaves SSRF possible via resolver outages or DNS rebinding. Please fail closed and either pin the outbound connection to the vetted address set or reject hostname-based peers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/mesh.ts` around lines 22 - 38, The URL validation currently fails open on DNS errors and only checks some resolved addresses, leaving SSRF risk; update isAllowedUrl to fail closed: when host is not an IP, call lookup(host, { all: true }) and if lookup throws return false, and ensure you validate every resolved address (use resolved.every(...) to ensure none arePrivateIP) rather than some(...); to prevent DNS rebinding either (A) change isAllowedUrl to return the vetted IP list (or a structure containing the list) so the later HTTP request can be pinned to those addresses, or (B) simply reject all hostname-based peers by returning false for non-IP hosts after successful validation — implement one of these approaches and reference isAllowedUrl and the lookup/resolved handling when making the change.
9-19:⚠️ Potential issue | 🔴 CriticalBlock the full private/local CIDR ranges here.
isPrivateIPstill misses many loopback, link-local, and ULA forms, so the SSRF guard is bypassable with non-canonical local addresses like127.0.0.2,169.254.10.20,fc01::1, andfe90::1. This needs CIDR-aware checks for the full IPv4/IPv6 ranges instead of a few literal/prefix matches.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/mesh.ts` around lines 9 - 19, The isPrivateIP function currently does string-prefix checks and misses many addresses; replace its logic with CIDR-aware checks by parsing the address (handle IPv4, IPv6, and IPv4-mapped IPv6 like ::ffff:) and testing membership against the full private/local CIDR ranges (IPv4: 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, 0.0.0.0/8, and IPv6: ::1/128, fe80::/10 (link-local), fc00::/7 (ULA) plus ::ffff:0:0/96 mapping); implement this in isPrivateIP using a CIDR-aware IP library (e.g., ipaddr.js or similar) to parse and test contains() rather than string checks so non-canonical forms like 127.0.0.2, 169.254.x.x, fc01::1 and fe90::1 are correctly detected.
🧹 Nitpick comments (1)
plugin/scripts/diagnostics.mjs (1)
52-61: Use a dedicated category list formem::heal.
mem::healcurrently filters againstALL_CATEGORIES, sosessionsandmeshare accepted even though the handler never processes either one. Calls likemem::heal({ categories: ["mesh"] })become silent no-ops instead of surfacing that the category is non-healable.Minimal cleanup
const ALL_CATEGORIES = [ "actions", "leases", "sentinels", "sketches", "signals", "sessions", "memories", "mesh" ]; +const HEALABLE_CATEGORIES = [ + "actions", + "leases", + "sentinels", + "sketches", + "signals", + "memories" +]; @@ - const categories = data.categories && data.categories.length > 0 ? data.categories.filter((c) => ALL_CATEGORIES.includes(c)) : ALL_CATEGORIES; + const categories = data.categories && data.categories.length > 0 ? data.categories.filter((c) => HEALABLE_CATEGORIES.includes(c)) : HEALABLE_CATEGORIES;Also applies to: 336-338
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/scripts/diagnostics.mjs` around lines 52 - 61, The mem::heal handler is filtering against ALL_CATEGORIES which includes categories (e.g., "sessions", "mesh") that the heal logic doesn't process, causing silent no-ops; create a dedicated HEALABLE_CATEGORIES array (e.g., ["actions","leases","sentinels","sketches","signals","memories"]) and replace uses of ALL_CATEGORIES in the mem::heal filtering/validation (and the other occurrence that uses the same filter) with HEALABLE_CATEGORIES, and add a validation step in mem::heal to reject or warn when callers pass categories not in HEALABLE_CATEGORIES so non-healable categories surface an error instead of becoming no-ops.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugin/scripts/diagnostics.mjs`:
- Around line 421-455: The code returns early after expiring an active lease
which prevents the subsequent orphaned-lease check from running; update the
logic around the lease handling in the block that references lease, KV.leases,
KV.actions, actionIds and withKeyedLock so that after you expire a lease (or in
dry-run simulate expiry) you still verify whether the lease's action exists in
actionIds (or is missing from KV.actions) and delete the orphaned lease
accordingly; remove or avoid the premature continue (or set a flag) so both the
expiry path and the orphaned-lease deletion path can run for the same lease,
ensuring mem::heal and mem::diagnose see consistent results.
- Around line 487-493: The sketch-discard path is deleting KV.actions while only
holding the mem:sketch:${sketch.id} lock which can race the lease handlers that
lock on mem:action:${actionId}; update the block around withKeyedLock(...) (the
code calling kv.list(KV.actionEdges), kv.delete(KV.actionEdges, ...), and the
loop that deletes KV.actions) to: acquire per-action locks for each actionId in
a stable order (e.g., sort fresh.actionIds and use withKeyedLock for
mem:action:${actionId}), under those action locks delete any related leases (the
lease rows/keys the leases handler would manage) before deleting KV.actions, and
then release the action locks; keep deletion of edges (KV.actionEdges)
coordinated inside the same locking sequence so no race can orphan leases.
Ensure you reference and use withKeyedLock, KV.actions, KV.actionEdges, and
mem:action:${actionId} when implementing the fix.
In `@src/functions/export-import.ts`:
- Around line 265-275: The replace path's cleanup loops clear KV.graphNodes,
KV.graphEdges, KV.semantic, and KV.procedural but omits KV.graphEdgeHistory,
leaving stale edge history; add a similar cleanup loop for KV.graphEdgeHistory
in the same block (where the other for-await deletions occur) so that for (const
h of await kv.list<{ id: string }>(KV.graphEdgeHistory).catch(() => [])) { await
kv.delete(KV.graphEdgeHistory, h.id); } runs during replace mode to fully purge
historical edges.
---
Outside diff comments:
In `@src/types.ts`:
- Around line 241-262: ExportData for version "0.6.0" is missing temporal edge
history; add a new optional property to the ExportData interface (e.g.,
graphEdgeHistory?: GraphEdgeHistory[] or temporalEdgeHistory?:
TemporalEdgeHistory[]) so the export can carry the full temporal history of
graph edges; update the ExportData interface declaration to include that
optional array and ensure the referenced GraphEdgeHistory/TemporalEdgeHistory
type (or equivalent) exists and matches the schema used for edge history during
exports/imports.
---
Duplicate comments:
In `@src/functions/mesh.ts`:
- Around line 22-38: The URL validation currently fails open on DNS errors and
only checks some resolved addresses, leaving SSRF risk; update isAllowedUrl to
fail closed: when host is not an IP, call lookup(host, { all: true }) and if
lookup throws return false, and ensure you validate every resolved address (use
resolved.every(...) to ensure none arePrivateIP) rather than some(...); to
prevent DNS rebinding either (A) change isAllowedUrl to return the vetted IP
list (or a structure containing the list) so the later HTTP request can be
pinned to those addresses, or (B) simply reject all hostname-based peers by
returning false for non-IP hosts after successful validation — implement one of
these approaches and reference isAllowedUrl and the lookup/resolved handling
when making the change.
- Around line 9-19: The isPrivateIP function currently does string-prefix checks
and misses many addresses; replace its logic with CIDR-aware checks by parsing
the address (handle IPv4, IPv6, and IPv4-mapped IPv6 like ::ffff:) and testing
membership against the full private/local CIDR ranges (IPv4: 127.0.0.0/8,
10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, 0.0.0.0/8, and IPv6:
::1/128, fe80::/10 (link-local), fc00::/7 (ULA) plus ::ffff:0:0/96 mapping);
implement this in isPrivateIP using a CIDR-aware IP library (e.g., ipaddr.js or
similar) to parse and test contains() rather than string checks so non-canonical
forms like 127.0.0.2, 169.254.x.x, fc01::1 and fe90::1 are correctly detected.
---
Nitpick comments:
In `@plugin/scripts/diagnostics.mjs`:
- Around line 52-61: The mem::heal handler is filtering against ALL_CATEGORIES
which includes categories (e.g., "sessions", "mesh") that the heal logic doesn't
process, causing silent no-ops; create a dedicated HEALABLE_CATEGORIES array
(e.g., ["actions","leases","sentinels","sketches","signals","memories"]) and
replace uses of ALL_CATEGORIES in the mem::heal filtering/validation (and the
other occurrence that uses the same filter) with HEALABLE_CATEGORIES, and add a
validation step in mem::heal to reject or warn when callers pass categories not
in HEALABLE_CATEGORIES so non-healable categories surface an error instead of
becoming no-ops.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6cb734de-82b1-45f0-a919-dbf000a8bc98
📒 Files selected for processing (7)
.gitignorebenchmark/real-embeddings-eval.tsplugin/scripts/diagnostics.mjssrc/functions/export-import.tssrc/functions/leases.tssrc/functions/mesh.tssrc/types.ts
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
- src/functions/leases.ts
- benchmark/real-embeddings-eval.ts
| if (lease.status === "active" && new Date(lease.expiresAt).getTime() <= now) { | ||
| if (dryRun) { | ||
| details.push(`[dry-run] Would expire lease ${lease.id} for action ${lease.actionId}`); | ||
| fixed++; | ||
| continue; | ||
| } | ||
| if (await withKeyedLock(`mem:action:${lease.actionId}`, async () => { | ||
| const fresh = await kv.get(KV.leases, lease.id); | ||
| if (!fresh || fresh.status !== "active" || new Date(fresh.expiresAt).getTime() > Date.now()) return false; | ||
| fresh.status = "expired"; | ||
| await kv.set(KV.leases, fresh.id, fresh); | ||
| const action = await kv.get(KV.actions, fresh.actionId); | ||
| if (action && action.status === "active" && action.assignedTo === fresh.agentId) { | ||
| action.status = "pending"; | ||
| action.assignedTo = void 0; | ||
| action.updatedAt = (/* @__PURE__ */ new Date()).toISOString(); | ||
| await kv.set(KV.actions, action.id, action); | ||
| } | ||
| return true; | ||
| })) { | ||
| details.push(`Expired lease ${lease.id} for action ${lease.actionId}`); | ||
| fixed++; | ||
| } else skipped++; | ||
| continue; | ||
| } | ||
| if (!actionIds.has(lease.actionId)) { | ||
| if (dryRun) { | ||
| details.push(`[dry-run] Would delete orphaned lease ${lease.id}`); | ||
| fixed++; | ||
| continue; | ||
| } | ||
| await kv.delete(KV.leases, lease.id); | ||
| details.push(`Deleted orphaned lease ${lease.id}`); | ||
| fixed++; | ||
| } |
There was a problem hiding this comment.
Delete expired orphaned leases in the same pass.
Lines 421-445 continue immediately after expiring an active lease. When that lease also points at a missing action, Lines 446-455 never run, so one mem::heal call can report a fix while mem::diagnose still returns orphaned-lease:* for the same record.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/scripts/diagnostics.mjs` around lines 421 - 455, The code returns
early after expiring an active lease which prevents the subsequent
orphaned-lease check from running; update the logic around the lease handling in
the block that references lease, KV.leases, KV.actions, actionIds and
withKeyedLock so that after you expire a lease (or in dry-run simulate expiry)
you still verify whether the lease's action exists in actionIds (or is missing
from KV.actions) and delete the orphaned lease accordingly; remove or avoid the
premature continue (or set a flag) so both the expiry path and the
orphaned-lease deletion path can run for the same lease, ensuring mem::heal and
mem::diagnose see consistent results.
| if (await withKeyedLock(`mem:sketch:${sketch.id}`, async () => { | ||
| const fresh = await kv.get(KV.sketches, sketch.id); | ||
| if (!fresh || fresh.status !== "active" || new Date(fresh.expiresAt).getTime() > Date.now()) return false; | ||
| const allEdges = await kv.list(KV.actionEdges); | ||
| const actionIdSet = new Set(fresh.actionIds); | ||
| for (const edge of allEdges) if (actionIdSet.has(edge.sourceActionId) || actionIdSet.has(edge.targetActionId)) await kv.delete(KV.actionEdges, edge.id); | ||
| for (const actionId of fresh.actionIds) await kv.delete(KV.actions, actionId); |
There was a problem hiding this comment.
Sketch discard is bypassing the action lock and orphaning leases.
Lines 487-493 delete KV.actions while only holding mem:sketch:${sketch.id}. The lease handlers in src/functions/leases.ts:1-50 serialize on mem:action:${actionId} instead, so this path can race normal action/lease writes and, because the leases pass already ran earlier in the same heal, it can create fresh orphaned leases that survive until the next run. Take the affected action locks in a stable order and remove related leases before deleting the action rows.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/scripts/diagnostics.mjs` around lines 487 - 493, The sketch-discard
path is deleting KV.actions while only holding the mem:sketch:${sketch.id} lock
which can race the lease handlers that lock on mem:action:${actionId}; update
the block around withKeyedLock(...) (the code calling kv.list(KV.actionEdges),
kv.delete(KV.actionEdges, ...), and the loop that deletes KV.actions) to:
acquire per-action locks for each actionId in a stable order (e.g., sort
fresh.actionIds and use withKeyedLock for mem:action:${actionId}), under those
action locks delete any related leases (the lease rows/keys the leases handler
would manage) before deleting KV.actions, and then release the action locks;
keep deletion of edges (KV.actionEdges) coordinated inside the same locking
sequence so no race can orphan leases. Ensure you reference and use
withKeyedLock, KV.actions, KV.actionEdges, and mem:action:${actionId} when
implementing the fix.
| for (const n of await kv.list<{ id: string }>(KV.graphNodes).catch(() => [])) { | ||
| await kv.delete(KV.graphNodes, n.id); | ||
| } | ||
| for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) { | ||
| await kv.delete(KV.graphEdges, e.id); | ||
| } | ||
| for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) { | ||
| await kv.delete(KV.semantic, s.id); | ||
| } | ||
| for (const p of await kv.list<{ id: string }>(KV.procedural).catch(() => [])) { | ||
| await kv.delete(KV.procedural, p.id); |
There was a problem hiding this comment.
Replace strategy still leaves temporal edge history behind.
Line 265 adds graph cleanup, but KV.graphEdgeHistory is still not cleared. In replace mode this can retain stale historical edges and skew point-in-time graph queries.
Suggested patch
for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) {
await kv.delete(KV.graphEdges, e.id);
}
+ for (const h of await kv
+ .list<{ id: string }>(KV.graphEdgeHistory)
+ .catch(() => [])) {
+ await kv.delete(KV.graphEdgeHistory, h.id);
+ }
for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) {
await kv.delete(KV.semantic, s.id);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (const n of await kv.list<{ id: string }>(KV.graphNodes).catch(() => [])) { | |
| await kv.delete(KV.graphNodes, n.id); | |
| } | |
| for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) { | |
| await kv.delete(KV.graphEdges, e.id); | |
| } | |
| for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) { | |
| await kv.delete(KV.semantic, s.id); | |
| } | |
| for (const p of await kv.list<{ id: string }>(KV.procedural).catch(() => [])) { | |
| await kv.delete(KV.procedural, p.id); | |
| for (const n of await kv.list<{ id: string }>(KV.graphNodes).catch(() => [])) { | |
| await kv.delete(KV.graphNodes, n.id); | |
| } | |
| for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) { | |
| await kv.delete(KV.graphEdges, e.id); | |
| } | |
| for (const h of await kv | |
| .list<{ id: string }>(KV.graphEdgeHistory) | |
| .catch(() => [])) { | |
| await kv.delete(KV.graphEdgeHistory, h.id); | |
| } | |
| for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) { | |
| await kv.delete(KV.semantic, s.id); | |
| } | |
| for (const p of await kv.list<{ id: string }>(KV.procedural).catch(() => [])) { | |
| await kv.delete(KV.procedural, p.id); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/export-import.ts` around lines 265 - 275, The replace path's
cleanup loops clear KV.graphNodes, KV.graphEdges, KV.semantic, and KV.procedural
but omits KV.graphEdgeHistory, leaving stale edge history; add a similar cleanup
loop for KV.graphEdgeHistory in the same block (where the other for-await
deletions occur) so that for (const h of await kv.list<{ id: string
}>(KV.graphEdgeHistory).catch(() => [])) { await kv.delete(KV.graphEdgeHistory,
h.id); } runs during replace mode to fully purge historical edges.
Summary
Benchmark Results
Evaluated on 240 real coding observations across 30 sessions with 20 labeled queries:
Key findings:
Test plan
npx tsx benchmark/quality-eval.ts— quality evaluation with labeled queriesnpx tsx benchmark/scale-eval.ts— scale test at 240/1K/5K/10K/50K observationsnpx tsx benchmark/real-embeddings-eval.ts— real Xenova embedding benchmarkSummary by CodeRabbit
New Features
Documentation
Chores