Skip to content

feat: v0.6.0 advanced retrieval with real-world benchmarks#76

Merged
rohitg00 merged 8 commits intomainfrom
feat/v0.6-advanced-retrieval
Mar 18, 2026
Merged

feat: v0.6.0 advanced retrieval with real-world benchmarks#76
rohitg00 merged 8 commits intomainfrom
feat/v0.6-advanced-retrieval

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Mar 18, 2026

Summary

  • Triple-stream search (BM25 + Vector + Graph) with RRF fusion and session diversification
  • Porter stemmer + 40 coding-domain synonym groups for better keyword matching
  • Binary-search prefix matching (O(log n + k) replaces O(n) scan)
  • 5 new retrieval modules: sliding window, query expansion, temporal graph, retention scoring, graph retrieval
  • Real-world benchmarks with labeled ground truth (not microbenchmarks)

Benchmark Results

Evaluated on 240 real coding observations across 30 sessions with 20 labeled queries:

System Recall@10 NDCG@10 MRR Tokens/query
Built-in (grep all into context) 55.8% 80.3% 82.5% 19,462
agentmemory BM25 (stemmed) 55.9% 82.7% 95.5% 1,571
agentmemory + Xenova embeddings 64.1% 94.9% 100.0% 1,571

Key findings:

  • "database performance optimization" → 0% recall with keywords, 40% with real embeddings (finds "N+1 query fix")
  • 92% token savings vs loading everything into context
  • At 1K observations, 80% of built-in memory is invisible (200-line cap). agentmemory searches 100%
  • Cross-session: 12/12 queries found vs 10/12 for built-in

Test plan

  • 551 tests pass (47 test files)
  • npx tsx benchmark/quality-eval.ts — quality evaluation with labeled queries
  • npx tsx benchmark/scale-eval.ts — scale test at 240/1K/5K/10K/50K observations
  • npx tsx benchmark/real-embeddings-eval.ts — real Xenova embedding benchmark
  • Verify stemming: "authentication" matches "authenticating"
  • Verify synonyms: "db" matches "database"
  • Verify session diversification: max 3 results per session

Summary by CodeRabbit

  • New Features

    • Triple-stream search combining keyword, vector, and graph-based retrieval for improved results
    • Query expansion to reformulate and enhance search understanding
    • Temporal graph extraction to capture relationships and entity history
    • Retention-based memory lifecycle management with tiered scoring
    • Sliding window enrichment for contextual memory analysis
  • Documentation

    • Added search quality evaluation benchmarks
    • Added scale and cross-session retrieval evaluations
    • Added real embeddings performance analysis
    • Expanded capability descriptions and endpoint documentation
  • Chores

    • Version updated to 0.6.0
    • Tool count expanded from 18 to 37
    • REST endpoints increased from 49 to 93

rohitg00 added 7 commits March 9, 2026 10:51
Add actions, frontier, leases, routines, signals, checkpoints,
flow-compress, mesh, and branch-aware modules with full MCP tools,
REST endpoints, and 170 new tests. Includes SSRF protection,
race-condition-safe keyed mutex locking, SHA-256 fingerprinting,
and fixes for 29 CodeRabbit review findings.

- 9 new source files (src/functions/*)
- 8 new test files (170 tests, total 386)
- 10 new MCP tools, 23 new REST endpoints
- 8 new KV scopes, new types for orchestration
- Version bump to 0.5.0, README and viewer updated
5 new modules inspired by beads patterns but with original naming and
iii-engine real-time streaming (no polling):

- sentinels: event-driven condition watchers (webhook, timer, threshold,
  pattern, approval) that auto-unblock gated actions via SSE
- sketches: ephemeral action graphs with auto-expiry, promote or discard
- crystallize: LLM-powered compaction of completed action chains into
  compact crystal digests with key outcomes and lessons
- diagnostics: self-diagnosis across 8 categories (actions, leases,
  sentinels, sketches, signals, sessions, memories, mesh) with auto-heal
- facets: multi-dimensional tagging (dimension:value) with AND/OR queries

- 5 new source files, 5 new test files (132 tests, total 518)
- 9 new MCP tools (total 37), 21 new REST endpoints (total 93)
- 4 new KV scopes (total 33), new types for all modules
- README stats and function table updated
- actions: validate edges before persisting, set blocked status for requires deps
- leases: use mem:action lock key, reject blocked actions, check expiry on release
- checkpoints: validate linkedActionIds exist, check requires edges in unblock
- mesh: add SSRF validation on peer registration
- routines: remove invalid "failed" action status check
- export-import: add v0.5.0 scope export/import (actions, sentinels, sketches, etc)
- mcp/server: validate CSV inputs are strings before splitting
- schema: replace runtime require with static import
- README: fix stale tool/endpoint counts (28→37, 72→93)
…alidation, README counts

- mesh.ts: add withKeyedLock on action writes in receive path, add IPv6 private ranges to SSRF check
- routines.ts: validate DAG (duplicate orders, unknown deps), set dep actions to blocked, refresh stepStatus in routine-status
- checkpoints.ts: runtime type enum validation, set linked pending actions to blocked
- leases.ts: validate ttlMs is finite positive number
- export-import.ts: add skip strategy checks for all v0.5.0 import blocks
- mcp/server.ts: typeof guards on tags, config JSON.parse, actionIds, linkedActionIds, categories
- README.md: Tools 18→37, Functions 33→50, stats line updated
- test: update checkpoint test for new blocked-on-create behavior
…xport replace cleanup, MCP input guards

- leases.ts: renew extends from max(now, existing expiry) instead of now; release verifies action ownership before mutation
- mesh.ts: withKeyedLock on memory writes in mesh-receive; applySyncData validates id/updatedAt and locks both memory and action writes
- routines.ts: stepStatus maps "blocked" to "pending" explicitly; progress includes blocked/cancelled counts; routine-freeze wrapped in withKeyedLock
- export-import.ts: remove unused RoutineRun import; replace strategy clears all orchestration namespaces; skip strategy for graphNodes/graphEdges/semantic/procedural
- mcp/server.ts: sentinel_trigger JSON.parse with typeof+try/catch; facet_query typeof guards on matchAll/matchAny; remove redundant requires cast; .filter(Boolean) on concepts/files/tags/requires CSV splits
- README.md: clarify API table is a representative subset
…action handling, boolean normalization

- mesh.ts: add redirect:"error" to both outbound fetch calls to prevent SSRF via redirect
- routines.ts: create actions with status "blocked" directly when hasDeps (eliminates two-pass race); handle missing actions in routine-status as cancelled; progress.total uses run.actionIds.length
- mcp/server.ts: sentinel config accepts object values directly; normalize unreadOnly/dryRun for both JSON booleans and string values
- README.md: consistent bundle size (365KB) across both occurrences
…l benchmarks

Search improvements:
- Porter stemmer for word normalization (authentication ↔ authenticating)
- 40+ coding-domain synonym groups (db ↔ database, k8s ↔ kubernetes)
- Binary-search prefix matching replaces O(n) full scan
- Session diversification (max 3 results per session)
- Co-occurrence graph edges between all concept pairs

New retrieval modules:
- Sliding window inference pipeline (context enrichment at ingestion)
- Adaptive query expansion (LLM-generated reformulations)
- Triple-stream search (BM25 + Vector + Graph with RRF fusion)
- Append-only temporal knowledge graph (versioned edges, point-in-time queries)
- Graph-augmented retrieval (entity search + neighborhood expansion)
- Ebbinghaus retention scoring (decay + tiered hot/warm/cold/evictable)

Real-world benchmarks (240 observations, 20 labeled queries):
- Quality eval: 64.1% recall@10 with Xenova embeddings (vs 55.8% grep)
- Scale eval: 92-100% token savings vs built-in memory at 240-50K observations
- Cross-session: 12/12 queries found vs 10/12 for 200-line MEMORY.md cap
- Token measurement uses actual search results (fixed fake constant bug)
- Removed old microbenchmarks (bench.ts, run-bench.ts, COMPARISON.md)
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This pull request introduces version 0.6.0 with major feature expansions across memory retrieval, scoring, and orchestration. Changes include new benchmark documentation and scripts for quality/scale/embedding evaluation; new core modules for graph-based retrieval, query expansion, temporal graph management, retention scoring, and sliding-window enrichment; search-index improvements with stemming and synonyms; hybrid-search integration of graph-based retrieval; schema and type extensions for versioning and new KV entries; and updated initialization to wire all new orchestration components. Documentation updates reflect expanded tooling (37 tools, 93 endpoints) and capability metrics.

Changes

Cohort / File(s) Summary
Documentation Updates
README.md
Extensive content updates: feature comparison tables, benchmarks metrics, endpoint/tool count increases (18→37 tools, 49→93 endpoints), version bumps to v0.5.0 and v0.6.0 throughout.
Benchmark Documentation
benchmark/QUALITY.md, benchmark/REAL-EMBEDDINGS.md, benchmark/SCALE.md
New benchmark reports documenting search quality evaluation, real-embeddings comparison, and scalability testing across multiple corpus sizes with performance metrics and recommendations.
Benchmark Dataset & Scripts
benchmark/dataset.ts, benchmark/quality-eval.ts, benchmark/real-embeddings-eval.ts, benchmark/scale-eval.ts
Dataset generation module with 293 LOC; comprehensive evaluation scripts for search quality (643 LOC), real embeddings (405 LOC), and scalability (398 LOC) with metrics, mock KV stores, and report generation.
Graph Retrieval & Temporal Graph
src/functions/graph-retrieval.ts, src/functions/temporal-graph.ts
New graph-based retrieval with entity search, BFS traversal, and temporal state queries; temporal graph extraction/storage with versioned edges, historical tracking, and differential-state computation.
Query & Memory Enrichment
src/functions/query-expansion.ts, src/functions/sliding-window.ts, src/functions/retention.ts
Query expansion with reformulations and entity extraction; sliding-window context enrichment for observations; retention-based scoring with temporal decay, access reinforcement, and tier classification.
Search & Indexing Improvements
src/state/search-index.ts, src/state/stemmer.ts, src/state/synonyms.ts
BM25 scoring enhancements, stemming algorithm, synonym expansion, prefix-based search caching, and term normalization for improved query matching.
Hybrid Search Graph Integration
src/state/hybrid-search.ts
Added GraphRetrieval integration, three-way scoring (BM25/vector/graph), diversification by session, and searchWithExpansion method supporting query reformulations and entity hints.
Type & Schema Extensions
src/types.ts, src/state/schema.ts, src/version.ts
Version updates to 0.6.0; new type unions (GraphNodeType, GraphEdgeType); new interfaces (EnrichedChunk, QueryExpansion, TemporalQuery, RetentionScore); KV schema additions for actions, leases, routines, signals, checkpoints, mesh, and retention data; fingerprint utility.
Core Initialization & Updates
src/index.ts, src/functions/leases.ts, src/functions/mesh.ts, src/functions/export-import.ts, plugin/scripts/diagnostics.mjs
Main initialization wiring all new register functions; leases concurrency guard for active lease checks; mesh DNS-based private-IP detection; export-import v0.6.0 support and expanded cleanup; new diagnostics subsystem with health checks and auto-healing.
Tests
test/graph-retrieval.test.ts, test/hybrid-search.test.ts, test/query-expansion.test.ts, test/retention.test.ts, test/sliding-window.test.ts, test/temporal-graph.test.ts
Unit test suites for graph retrieval (186 LOC), query expansion (154 LOC), retention scoring (245 LOC), sliding-window enrichment (199 LOC), temporal graph (378 LOC); hybrid-search test expectations updated for graph scores.
Configuration
package.json, .gitignore
Version bump 0.4.0→0.5.0; Xenova transformer dependency update; data/ directory added to .gitignore.
Test Updates
test/export-import.test.ts
Export-import test version expectation updated from 0.4.0 to 0.6.0.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Query as Query Expansion
    participant LLM as Language Model
    participant Search as Hybrid Search
    participant BM25 as BM25 Index
    participant Vector as Vector Index
    participant Graph as Graph Retrieval
    participant Result as Results Merger

    User->>Query: mem::expand-query (original query)
    Query->>LLM: Compress & generate reformulations
    LLM-->>Query: XML with reformulations & entities
    Query->>Search: Multiple reformulated queries + entities
    
    par Parallel Search Modalities
        Search->>BM25: Search each reformulation
        Search->>Vector: Embed & search each reformulation
        Search->>Graph: searchByEntities with extracted entities
    end
    
    BM25-->>Search: BM25 results & scores
    Vector-->>Search: Vector results & scores
    Graph-->>Search: Graph results & context
    
    Search->>Result: Merge & normalize 3-way scores
    Result->>Result: Deduplicate by obsId, select best
    Result-->>User: Triple-stream results with graphContext
Loading
sequenceDiagram
    participant Observation as Observation
    participant Extractor as Temporal Graph Extractor
    participant LLM as Language Model
    participant KV as State KV
    participant Querier as Temporal Query

    Observation->>Extractor: mem::temporal-graph-extract
    Extractor->>LLM: Compress observations into temporal graph XML
    LLM-->>Extractor: XML with entities, relationships, validity windows
    Extractor->>KV: Store graphNodes (with versioning, tcommit)
    Extractor->>KV: Append graphEdges & graphEdgeHistory (never overwrite)
    Extractor-->>Observation: Success with node/edge counts
    
    alt Later: Temporal Query
        Querier->>KV: mem::temporal-query (entity, asOf timestamp)
        KV->>KV: Resolve entity by name/alias
        KV->>KV: Filter edges by tvalid/tvalidEnd against asOf
        KV-->>Querier: currentState (latest valid), history (validEdges)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PR #68 — Both modify version handling and export/import/schema code with overlapping type changes and version bumps (0.6.0).
  • PR #74 — Shares extensive modifications to diagnostics, leases, mesh, schema KV entries, index wiring, and version updates; incremental work on the orchestration layer.
  • PR #9 — Introduces knowledge-graph/graph nodes & edges, Claude bridge/MCP tooling, and export-import changes that this PR builds upon and extends.

Poem

🐰 Behold, a graph springs forth with temporal wings,
Query expansions bloom in tripled searches sing,
Retention decays while rich embeddings gleam,
From v0.4 to 0.6, a memory dream!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: v0.6.0 release with advanced retrieval capabilities and real-world benchmarks. It is concise, specific, and reflects the primary objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/v0.6-advanced-retrieval
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/functions/export-import.ts (1)

219-265: ⚠️ Potential issue | 🔴 Critical

replace still leaves stale retrieval state behind.

This block clears the new action/routine/etc. scopes, but it never deletes KV.graphNodes, KV.graphEdges, KV.semantic, or KV.procedural. A "replace" import can therefore merge old retrieval data into the restored snapshot and change search behavior after restore.

🧹 Suggested fix
       const existingSummaries = await kv.list<SessionSummary>(KV.summaries);
       for (const s of existingSummaries) {
         await kv.delete(KV.summaries, s.sessionId);
       }
+      for (const edge of await kv.list<GraphEdge>(KV.graphEdges).catch(() => [])) {
+        await kv.delete(KV.graphEdges, edge.id);
+      }
+      for (const node of await kv.list<GraphNode>(KV.graphNodes).catch(() => [])) {
+        await kv.delete(KV.graphNodes, node.id);
+      }
+      for (const sem of await kv.list<SemanticMemory>(KV.semantic).catch(() => [])) {
+        await kv.delete(KV.semantic, sem.id);
+      }
+      for (const proc of await kv.list<ProceduralMemory>(KV.procedural).catch(() => [])) {
+        await kv.delete(KV.procedural, proc.id);
+      }
       for (const a of await kv.list<Action>(KV.actions).catch(() => [])) {
         await kv.delete(KV.actions, a.id);
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/export-import.ts` around lines 219 - 265, The "replace" branch
currently deletes sessions, actions, routines, etc., but omits clearing
retrieval-related scopes (KV.graphNodes, KV.graphEdges, KV.semantic,
KV.procedural), leaving stale retrieval state; add loops like the existing ones
to list and delete entries for KV.graphNodes, KV.graphEdges, KV.semantic, and
KV.procedural (using the appropriate types or catch(() => []) pattern as used
elsewhere) so those stores are fully cleared when strategy === "replace".
🟠 Major comments (26)
src/functions/checkpoints.ts-17-24 (1)

17-24: ⚠️ Potential issue | 🟠 Major

Add runtime input validation for checkpoint payloads.

Current checks are incomplete for untrusted input: whitespace-only names pass, non-positive expiresInMs is accepted, and status is not runtime-validated to passed|failed before assignment.

🛡️ Proposed validation hardening
-      if (!data.name) {
+      if (!data?.name || !String(data.name).trim()) {
         return { success: false, error: "name is required" };
       }
+      if (
+        data.expiresInMs !== undefined &&
+        (!Number.isFinite(data.expiresInMs) || data.expiresInMs <= 0)
+      ) {
+        return { success: false, error: "expiresInMs must be a positive number" };
+      }
...
-      if (!data.checkpointId || !data.status) {
+      if (!data?.checkpointId || !data?.status) {
         return {
           success: false,
           error: "checkpointId and status are required",
         };
       }
+      if (data.status !== "passed" && data.status !== "failed") {
+        return { success: false, error: "status must be 'passed' or 'failed'" };
+      }

Also applies to: 35-37, 83-88, 107-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/checkpoints.ts` around lines 17 - 24, The payload validation is
too permissive: update the checks around data.name, data.type, data.expiresInMs,
and data.status in the functions that read request payloads (look for the
existing checks referencing data.name and validTypes). Specifically, trim and
reject whitespace-only names (e.g., if (!data.name || !data.name.trim()) return
error), keep the validTypes array
(["ci","approval","deploy","external","timer"]) and validate data.type against
it, ensure numeric expiresInMs (if provided) is a positive integer (> 0) before
accepting, and validate data.status (if provided) is one of "passed" or "failed"
before assignment; apply these same checks wherever payload validation is done
in this file (the other occurrences you noted).
src/functions/branch-aware.ts-149-153 (1)

149-153: ⚠️ Potential issue | 🟠 Major

Session-root matching is not path-portable (Windows break).

Line 151 hardcodes / as separator; C:\repo\sub will not match C:\repo/. This can silently drop valid sessions on Windows.

🐛 Proposed cross-platform fix
-import { resolve } from "node:path";
+import { resolve, relative, isAbsolute } from "node:path";
...
-      const matching = sessions.filter((s) => {
-        if (s.project === projectRoot || s.cwd === projectRoot) return true;
-        if (s.cwd.startsWith(projectRoot + "/")) return true;
-        return false;
-      });
+      const root = resolve(projectRoot);
+      const isWithin = (p?: string): boolean => {
+        if (!p) return false;
+        const rel = relative(root, resolve(p));
+        return rel === "" || (!rel.startsWith("..") && !isAbsolute(rel));
+      };
+
+      const matching = sessions.filter(
+        (s) => isWithin(s.project) || isWithin(s.cwd),
+      );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/branch-aware.ts` around lines 149 - 153, Normalize/resolve
projectRoot and each session path (s.cwd and s.project) and perform
path-accurate comparisons instead of hardcoding "/" — inside the sessions.filter
block update the logic using Node's path utilities: normalize/resolve both
projectRoot and s.cwd/s.project then either compare equality or use
path.relative(projectRoot, s.cwd) and ensure it does not start with ".." (or
check that the relative path is not equal to ""/or doesn't begin with "..") to
detect subpaths in a cross-platform way; refer to the sessions.filter callback,
projectRoot, and s.cwd/s.project when applying the fix.
src/functions/routines.ts-33-41 (1)

33-41: ⚠️ Potential issue | 🟠 Major

Reject self-referential and cyclic step dependencies.

This only validates that a referenced order exists. A step depending on itself, or a cycle like 1 -> 2 -> 1, is still accepted and will create permanently blocked runs. Validate the dependency graph before persisting the routine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/routines.ts` around lines 33 - 41, The current validation only
checks that each dependency exists but allows self-dependencies and cycles;
update the validation in the routine save/validate logic that iterates
data.steps (the block using uniqueOrders, step.dependsOn, and step.order) to (1)
reject any dep that equals the step's own order (self-reference) and (2) detect
cycles across all steps by building a directional graph from step.order -> each
dep and running a cycle-detection/topological-sort (e.g., DFS with
visited/recursion stack or Kahn's algorithm); if a cycle is found return {
success: false, error: "cyclic dependency involving orders: …" } (or a similar
clear message) instead of persisting. Ensure you use the existing uniqueOrders
and step identifiers so the check covers all steps before save.
benchmark/real-embeddings-eval.ts-170-177 (1)

170-177: ⚠️ Potential issue | 🟠 Major

tokens_per_query is using labels, not what the system actually returns.

This metric is based on relevant_count, so it depends on the benchmark annotations rather than the retrieved observations. That can materially overstate the reported token savings. Compute tokens from the top-N results actually returned for each query and then average those totals.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/real-embeddings-eval.ts` around lines 170 - 177, The
tokens_per_query calculation currently uses the annotated relevant_count;
instead, iterate over the actual returned items for each query in the results
array and compute tokens by summing estimateTokens(JSON.stringify(item)) for
those returned items (the top-N returned per query), then average those
per-query totals and apply Math.round as before; update the tokens_per_query
assignment to use this computed average (referencing results, estimateTokens,
and the per-query returned items collection on each result object instead of
relevant_count).
src/functions/routines.ts-99-104 (1)

99-104: ⚠️ Potential issue | 🟠 Major

Don't start runs from unfrozen routines.

frozen is part of the public routine lifecycle, but mem::routine-run ignores it and will execute mutable templates. That makes it easy to launch a run from a routine that's still being edited.

Suggested fix
       return withKeyedLock(`mem:routine:${data.routineId}`, async () => {
         const routine = await kv.get<Routine>(KV.routines, data.routineId);
         if (!routine) {
           return { success: false, error: "routine not found" };
         }
+        if (!routine.frozen) {
+          return {
+            success: false,
+            error: "routine must be frozen before running",
+          };
+        }
 
         const now = new Date().toISOString();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/routines.ts` around lines 99 - 104, The routine run path is
missing a check for the routine's frozen state, so
withKeyedLock(`mem:routine:${data.routineId}`) and the call to
kv.get<Routine>(KV.routines, data.routineId) should validate routine.frozen
before proceeding; update the block in routines.ts (the mem::routine-run logic)
to return { success: false, error: "routine not frozen" } (or a suitable error)
if routine.frozen is false/undefined, ensuring you perform this check after
loading the routine and before executing any mutable template/run logic to
prevent runs from unfrozen routines.
src/functions/retention.ts-82-85 (1)

82-85: ⚠️ Potential issue | 🟠 Major

Deep-merge tierThresholds instead of replacing the whole object.

The current spread only works when callers send a complete tierThresholds object. A partial override like { tierThresholds: { cold: 0.2 } } clears hot and warm, which breaks tiering and eviction defaults.

Suggested fix
       const ctx = getContext();
-      const config = { ...DEFAULT_DECAY, ...data.config };
+      const config: DecayConfig = {
+        ...DEFAULT_DECAY,
+        ...data.config,
+        tierThresholds: {
+          ...DEFAULT_DECAY.tierThresholds,
+          ...data.config?.tierThresholds,
+        },
+      };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/retention.ts` around lines 82 - 85, The config initialization
currently overwrites the entire tierThresholds when partial overrides are
provided; update the merge so top-level fields still spread from DEFAULT_DECAY
and then deep-merge the nested tierThresholds object (i.e., create config from
DEFAULT_DECAY and data.config, but for tierThresholds specifically merge
DEFAULT_DECAY.tierThresholds with data.config.tierThresholds) so partial updates
like { tierThresholds: { cold: 0.2 } } preserve hot/warm defaults; modify the
code around the config assignment in the async handler (the variable named
config, the DEFAULT_DECAY constant, and DecayConfig/tierThresholds properties)
to perform this nested merge and handle missing data.config safely.
src/functions/retention.ts-198-220 (1)

198-220: ⚠️ Potential issue | 🟠 Major

Eviction never removes semantic memories.

mem::retention-score writes scores for both KV.memories and KV.semantic, but eviction only deletes from KV.memories. Low-score semantic entries survive and will be recreated on the next scoring pass. Persist the backing scope with each score, or delete from both stores here.

Suggested fix
       let evicted = 0;
       for (const candidate of candidates) {
         try {
           await kv.delete(KV.memories, candidate.memoryId);
+          await kv.delete(KV.semantic, candidate.memoryId);
           await kv.delete(KV.retentionScores, candidate.memoryId);
           evicted++;
         } catch {
           continue;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/retention.ts` around lines 198 - 220, The eviction loop only
removes entries from KV.memories and KV.retentionScores, leaving low-score
semantic entries in KV.semantic; update the eviction logic inside the loop that
iterates over candidates (variable candidates, symbol candidate.memoryId) to
also delete the corresponding semantic entry (KV.semantic) for each
candidate.memoryId in addition to kv.delete(KV.memories, ...) and
kv.delete(KV.retentionScores, ...); alternatively, persist the backing scope
when writing scores so you can delete the correct scope later, but the quickest
fix is to call kv.delete(KV.semantic, candidate.memoryId) alongside the existing
deletes.
src/functions/flow-compress.ts-79-105 (1)

79-105: ⚠️ Potential issue | 🟠 Major

Validate the parsed summary before storing a memory.

If the model misses the XML format, parseFlowSummary() returns empty fields and this still writes a blank workflow memory. Fail fast, or fall back to the raw response, when the formatted summary is empty.

Suggested fix
       try {
         const response = await provider.summarize(
           FLOW_COMPRESS_SYSTEM,
           prompt,
         );
         const summary = parseFlowSummary(response);
+        const content = formatSummary(summary);
+        if (!content) {
+          return {
+            success: false,
+            error: "compression failed: summarizer returned an invalid summary",
+            compressed: 0,
+          };
+        }
 
         const memory = {
           id: generateId("mem"),
           createdAt: new Date().toISOString(),
           updatedAt: new Date().toISOString(),
           type: "workflow" as const,
           title: summary.goal || `Workflow: ${doneActions.length} actions`,
-          content: formatSummary(summary),
+          content,
           concepts: extractConcepts(doneActions),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/flow-compress.ts` around lines 79 - 105, The parsed summary
from parseFlowSummary(response) must be validated before creating and storing
the memory; update the flow in flow-compress.ts to check the parsed summary
(e.g., summary.goal and any other expected fields returned by parseFlowSummary)
and if they are empty/invalid either (a) fail fast by throwing or returning
early with a logged warning, or (b) fall back to using the raw provider response
(response) to populate the memory.title and memory.content (e.g., set title to a
safe fallback like `Workflow: ${doneActions.length} actions` and content to the
raw response text) so you never persist a blank workflow; ensure this validation
happens after provider.summarize(FLOW_COMPRESS_SYSTEM, prompt) and
parseFlowSummary(response) and before constructing the memory object and calling
kv.set(KV.memories, memory.id, memory).
benchmark/quality-eval.ts-155-168 (1)

155-168: ⚠️ Potential issue | 🟠 Major

Align token accounting with the retrieval depth you publish.

These token estimates use retrieved_count from search(..., 20), but the report later says agentmemory returns the "top 10 results". That makes the headline savings inconsistent with the benchmark configuration. Either cap token accounting at the same K you report, or update the report to reflect top-20 context.

Also applies to: 222-237, 347-362, 539-542

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/quality-eval.ts` around lines 155 - 168, The token accounting uses
retrieved_count (computed from perQuery items returned by search with depth 20)
but the report claims "top 10 results", causing inconsistency; update the
calculation in the stats blocks that compute avgTokens/total_tokens_per_query
(and the similar blocks around the other ranges you flagged) to cap the
retrieval depth at the reported K (e.g., use Math.min(q.retrieved_count, 10) or
derive avgTokens from perQuery.map(q => Math.min(q.retrieved_count,
PUBLISHED_K))) or alternatively change the report string ("BM25-only" /
headline) to reflect top-20; touch the variables avgTokens, avgObsTokens usage
and total_tokens_per_query assignments to ensure the same K is used everywhere
(referencing perQuery, retrieved_count, avgObsTokens, total_tokens_per_query).
src/functions/crystallize.ts-34-47 (1)

34-47: ⚠️ Potential issue | 🟠 Major

Guard against duplicate crystallization of the same actions.

This path never checks action.crystallizedInto, and it updates the actions only after the new crystal has already been stored. Two callers can crystallize the same completed chain concurrently, create duplicate crystals, and let the last writer win on action.crystallizedInto. Reject already-crystallized actions and serialize the action set before writing.

Also applies to: 73-78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/crystallize.ts` around lines 34 - 47, In crystallize, when
iterating data.actionIds (inside the loop that builds actions) check
action.crystallizedInto and return an error if it's already set to prevent
duplicate crystallization; then to prevent races serialize the action set (e.g.
sort the ids and acquire a short KV lock or use a KV
transactional/compare-and-set pattern) before creating the new crystal so two
callers cannot run concurrently; finally, set each Action.crystallizedInto and
persist those action records as part of the same atomic/serialized step (or
write them before or in the same transaction that stores the new crystal) so the
system never ends up with two crystals for the same actions (also apply the same
guard to the similar block around lines 73-78).
benchmark/dataset.ts-281-289 (1)

281-289: ⚠️ Potential issue | 🟠 Major

Make generateScaleDataset() deterministic.

The Math.random() timestamp means the scale corpus changes on every run. Anything with temporal weighting or session-order effects will then produce non-reproducible benchmark numbers and doc churn. Derive the timestamp from i, or use a seeded PRNG.

One simple deterministic option
-      timestamp: ts(Math.random() * 90),
+      timestamp: ts((i * 37) % 90),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/dataset.ts` around lines 281 - 289, The timestamp generation in
generateScaleDataset() is non-deterministic because it uses Math.random();
replace that with a deterministic function tied to the loop index (e.g., compute
timestamp = ts((i % N) * scale + base) or use a simple LCG seeded by i) so the
same input count yields identical datasets across runs; update the timestamp
assignment in the loop where id/sessionId/title/narrative are set (inside
generateScaleDataset) to derive the value from i instead of Math.random(),
ensuring reproducible benchmarks.
package.json-3-3 (1)

3-3: ⚠️ Potential issue | 🟠 Major

Manifest version is still one release behind.

src/version.ts and the updated export/import test now report 0.6.0, but package.json is still 0.5.0. Shipping this as-is publishes a 0.5.0 package that identifies itself as 0.6.0 in runtime surfaces.

📦 Suggested fix
-  "version": "0.5.0",
+  "version": "0.6.0",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 3, Update the package manifest version to match the
runtime constant: change the "version" field in package.json from "0.5.0" to
"0.6.0" so it aligns with the exported value in src/version.ts (and the updated
export/import tests) to avoid publishing a package whose runtime reports a
different version.
src/functions/export-import.ts-325-442 (1)

325-442: ⚠️ Potential issue | 🟠 Major

Validate optional collections before the for...of loops.

These branches only check truthiness. Payloads like { actions: {} } or { graphNodes: "x" } will throw at runtime instead of returning a clean validation error, and { actions: null } is silently accepted even though the export shape is array-or-undefined.

🛡️ Suggested fix
+      const optionalCollections = {
+        graphNodes: importData.graphNodes,
+        graphEdges: importData.graphEdges,
+        semanticMemories: importData.semanticMemories,
+        proceduralMemories: importData.proceduralMemories,
+        actions: importData.actions,
+        actionEdges: importData.actionEdges,
+        routines: importData.routines,
+        signals: importData.signals,
+        checkpoints: importData.checkpoints,
+        sentinels: importData.sentinels,
+        sketches: importData.sketches,
+        crystals: importData.crystals,
+        facets: importData.facets,
+      } as const;
+      for (const [name, value] of Object.entries(optionalCollections)) {
+        if (value !== undefined && !Array.isArray(value)) {
+          return { success: false, error: `${name} must be an array` };
+        }
+      }
+
       if (importData.graphNodes) {
         for (const node of importData.graphNodes) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/export-import.ts` around lines 325 - 442, The import code
treats optional collections like importData.actions or importData.graphNodes as
truthy and then iterates, which will throw for non-array values or silently
accept null; update each block (e.g., importData.graphNodes,
importData.graphEdges, importData.actions, importData.routines,
importData.signals, importData.checkpoints, importData.sentinels,
importData.sketches, importData.crystals, importData.facets,
importData.semanticMemories, importData.proceduralMemories,
importData.actionEdges) to first check for presence with a strict undefined
check and validate the type with Array.isArray(), e.g. if (importData.actions
!== undefined) { if (!Array.isArray(importData.actions)) throw new
Error("Invalid import: actions must be an array"); for (const action of
importData.actions) { ... } }, so non-array payloads produce a clear validation
error and null is not treated as a valid value.
test/actions.test.ts-92-100 (1)

92-100: ⚠️ Potential issue | 🟠 Major

Priority 0 expectation is inconsistent with clamp semantics.

Given adjacent tests enforce clamping to [1,10], expecting priority: 0 to become 5 codifies a fallback bug instead of boundary handling.

🧩 Proposed fix
-    it("clamps priority 0 to default 5 (falsy fallback)", async () => {
+    it("clamps priority 0 to 1", async () => {
@@
-      expect(result.action.priority).toBe(5);
+      expect(result.action.priority).toBe(1);
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/actions.test.ts` around lines 92 - 100, The test "clamps priority 0 to
default 5 (falsy fallback)" incorrectly expects a fallback to 5; update the test
case for the sdk.trigger("mem::action-create", { priority: 0 }) scenario to
assert clamping to the lower bound (expect result.action.priority toBe 1) and
rename the it description to reflect boundary clamping (e.g., "clamps priority 0
to minimum 1") so it aligns with the other clamp-to-[1,10] semantics enforced in
the suite.
test/frontier.test.ts-166-170 (1)

166-170: ⚠️ Potential issue | 🟠 Major

This test is confounded by action status.

act_blocked is created with status: "blocked", so it may be filtered regardless of edge logic. The test can pass even if requires-edge handling is broken.

✅ Proposed fix
       const blocked = makeAction({
         id: "act_blocked",
         title: "Blocked",
-        status: "blocked",
+        status: "pending",
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/frontier.test.ts` around lines 166 - 170, The test currently creates
act_blocked via makeAction({ id: "act_blocked", title: "Blocked", status:
"blocked" }) which causes it to be filtered by status rather than edge logic;
change the test so act_blocked is created without the status field (or with
status: "open") so that any exclusion in the assertion must come from the
edge/requirement handling; locate the makeAction call for "act_blocked" in the
frontier.test.ts test and remove or change the status property accordingly and
adjust expectations if needed.
src/state/stemmer.ts-78-89 (1)

78-89: ⚠️ Potential issue | 🟠 Major

Step-4 suffix removal is over-aggressive for tion/sion.

This removes full tion/sion, which can truncate stems incorrectly (e.g., dropping to rela instead of relat). Porter step 4 should handle ion only when base ends with s or t.

🧭 Proposed fix
-  if (w.endsWith("al") || w.endsWith("ance") || w.endsWith("ence") ||
+  if (w.endsWith("al") || w.endsWith("ance") || w.endsWith("ence") ||
       w.endsWith("er") || w.endsWith("ic") || w.endsWith("able") ||
       w.endsWith("ible") || w.endsWith("ant") || w.endsWith("ement") ||
-      w.endsWith("ment") || w.endsWith("ent") || w.endsWith("tion") ||
-      w.endsWith("sion") || w.endsWith("ou") || w.endsWith("ism") ||
+      w.endsWith("ment") || w.endsWith("ent") || w.endsWith("ou") || w.endsWith("ism") ||
       w.endsWith("ate") || w.endsWith("iti") || w.endsWith("ous") ||
       w.endsWith("ive") || w.endsWith("ize") || w.endsWith("ise")) {
-    const suffixLen = w.match(/(ement|ment|tion|sion|ance|ence|able|ible|ism|ate|iti|ous|ive|ize|ise|ant|ent|al|er|ic|ou)$/)?.[0]?.length ?? 0;
+    const suffixLen = w.match(/(ement|ment|ance|ence|able|ible|ism|ate|iti|ous|ive|ize|ise|ant|ent|al|er|ic|ou)$/)?.[0]?.length ?? 0;
     if (suffixLen > 0) {
       const base = w.slice(0, -suffixLen);
       if (measure(base) > 1) w = base;
     }
   }
+  if (w.endsWith("ion")) {
+    const base = w.slice(0, -3);
+    if (measure(base) > 1 && /[st]$/.test(base)) w = base;
+  }
src/functions/actions.ts-265-279 (1)

265-279: ⚠️ Potential issue | 🟠 Major

Stale actionMap used inside lock in propagateCompletion.

The actionMap (line 266) is populated before the lock is acquired, but it's used inside the lock (lines 277-278) to check dependency status. Another concurrent operation could have changed action statuses between the list fetch and the lock acquisition, leading to incorrect unblocking decisions.

🐛 Proposed fix to re-fetch inside the lock
   for (const edge of unlockEdges) {
     const candidateId = edge.sourceActionId;
     await withKeyedLock(`mem:action:${candidateId}`, async () => {
       const action = await kv.get<Action>(KV.actions, candidateId);
       if (action && action.status === "blocked") {
+        const freshEdges = await kv.list<ActionEdge>(KV.actionEdges);
+        const freshActions = await kv.list<Action>(KV.actions);
+        const freshActionMap = new Map(freshActions.map((a) => [a.id, a]));
-        const deps = allEdges.filter(
+        const deps = freshEdges.filter(
           (e) => e.sourceActionId === action.id && e.type === "requires",
         );
         const allDone = deps.every((d) => {
-          const target = actionMap.get(d.targetActionId);
+          const target = freshActionMap.get(d.targetActionId);
           return target && target.status === "done";
         });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/actions.ts` around lines 265 - 279, The code builds actionMap
from allActions before acquiring the withKeyedLock and then uses that stale
actionMap inside the lock to evaluate deps; instead, inside the withKeyedLock
callback (where candidateId is locked) re-fetch fresh action states from the KV
store rather than relying on the precomputed actionMap: either rebuild actionMap
from a fresh kv.list<KV.actions> call or, more efficiently, replace the
actionMap lookup for each dependency with an explicit await
kv.get<Action>(KV.actions, d.targetActionId) when computing deps and allDone;
update references to allActions/actionMap used in the withKeyedLock block
(including deps and allDone calculations) so they use the fresh reads.
src/functions/signals.ts-70-76 (1)

70-76: ⚠️ Potential issue | 🟠 Major

Logic error in signal filtering.

The filter condition at line 74 appears inverted. When !s.to (broadcast signal) and s.from !== data.agentId, returning true would include signals the agent didn't send and weren't addressed to them. This likely should return false to exclude irrelevant broadcasts, or the logic needs restructuring.

Current behavior: Broadcasts from other agents are included even when the agent has no relation to them.

🐛 Proposed fix
       signals = signals.filter((s) => {
         if (s.expiresAt && new Date(s.expiresAt).getTime() <= now) return false;
         if (s.to && s.to !== data.agentId && s.from !== data.agentId)
           return false;
-        if (!s.to && s.from !== data.agentId) return true;
-        return true;
+        // Include: direct messages to/from agent, or broadcasts from agent, or all broadcasts
+        return s.to === data.agentId || s.from === data.agentId || !s.to;
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/signals.ts` around lines 70 - 76, The current filter on the
signals array incorrectly includes broadcast signals from other agents; update
the predicate in the signals.filter (referencing signals, s.expiresAt, s.to,
s.from and data.agentId) so that if a signal has no recipient (!s.to) and its
sender is not the current agent (s.from !== data.agentId) it returns false
(exclude it), while still excluding expired signals and keeping signals that are
addressed to the agent or sent by the agent; restructure the if branches to
explicitly handle expiration, addressed signals, broadcasts, and own-sent
signals so unwanted broadcasts are filtered out.
src/functions/graph-retrieval.ts-49-50 (1)

49-50: ⚠️ Potential issue | 🟠 Major

Traversal should use current edges, not every historical version.

KV.graphEdges now contains superseded relationships with isLatest: false. Feeding the full set into BFS means obsolete edges keep contributing search and expansion hits after they were replaced. getLatestEdges() already exists in this file; use it before traversal in both methods.

Suggested fix
-    const allEdges = await this.kv.list<GraphEdge>(KV.graphEdges);
+    const allEdges = this.getLatestEdges(
+      await this.kv.list<GraphEdge>(KV.graphEdges),
+    );

Also applies to: 122-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/graph-retrieval.ts` around lines 49 - 50, The traversal is
using every historical edge from KV.graphEdges (allEdges) instead of only
current edges, causing obsolete edges to affect BFS; replace the raw list call
with getLatestEdges() so traversal and expansion use the filtered current-edge
set—specifically update the places that call
this.kv.list<GraphEdge>(KV.graphEdges) (where allEdges is used for BFS/traverse)
and the second occurrence around the other traversal method (lines near the
other use) to call and use getLatestEdges()’s result instead of allEdges.
src/functions/graph-retrieval.ts-63-110 (1)

63-110: ⚠️ Potential issue | 🟠 Major

visitedObs is locking in the first path instead of the best one.

The BFS loop emits the start-node path first, so direct entity matches are recorded with score = 0.5, and the later score: 1.0 seed never runs. The same first-seen logic also discards higher-scoring alternate paths for shared observations. Keep the best result per obsId instead of short-circuiting on first encounter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/graph-retrieval.ts` around lines 63 - 110, The current
visitedObs Set short-circuits on first encounter and locks in the first path
instead of the best one; replace it with a Map<string, GraphRetrievalResult>
(e.g., bestResults) keyed by obsId, compute each candidate result (using
bfsTraversal, score calculation, buildGraphContext, pathLength, etc.) and
compare its score with any existing entry, updating the map only if the new
score is higher (ensure the direct startNode seed entries with score 1.0 are
also compared/inserted into the map); after processing all paths and seeds,
convert bestResults.values() into the final results array to preserve only the
highest-scoring result per obsId.
src/functions/temporal-graph.ts-315-323 (1)

315-323: ⚠️ Potential issue | 🟠 Major

Superseded edges are being counted twice in the temporal views.

This file keeps old versions in KV.graphEdges and also copies them into KV.graphEdgeHistory. Concatenating both scopes here duplicates every superseded relationship, which inflates the timeline and the differential changes output, and also duplicates historicalEdges on the asOf path. Filter the current scope down to latest edges before merging with history, or stop retaining historical versions in KV.graphEdges.

Also applies to: 379-397

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/temporal-graph.ts` around lines 315 - 323, The code
concatenates current-scope edges (from KV.graphEdges via relatedEdges) with
historical copies from KV.graphEdgeHistory, causing superseded edges to be
duplicated; update the merge logic (where relatedEdges, historicalEdges,
entityHistory, and allEntityEdges are used) to first reduce the current
KV.graphEdges set to only the latest edge version per edge identifier (e.g.,
group by edge id and pick the one with the highest timestamp/version) before
concatenating with KV.graphEdgeHistory, or alternately filter out any historical
IDs present in KV.graphEdgeHistory from relatedEdges; apply the same
deduplication logic in the other occurrence noted (around the section referenced
379-397) so the asOf timeline and changes computation only see one canonical
record per edge.
src/functions/temporal-graph.ts-56-90 (1)

56-90: ⚠️ Potential issue | 🟠 Major

This parser only accepts one attribute ordering.

Both regexes require attributes in a fixed sequence (type before name, and type/source/target/weight/... for relationships). XML attributes are unordered, so a valid model response with reordered attributes will silently produce zero nodes or edges. Parse attributes independently or switch to a real XML parser here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/temporal-graph.ts` around lines 56 - 90, The entity and
relationship regexes (entityRegex, relRegex) assume strict attribute ordering so
reordered XML attributes yield no matches; replace this brittle approach by
parsing the XML with a proper XML parser (e.g., fast-xml-parser or DOMParser)
and then build nodes and relationships from the parsed object (extract
type/name/aliases/properties and source/target/weight/valid_from/valid_to)
before pushing into nodes/edges (the same logic that currently uses
generateId("gn"), properties, aliases, sourceObservationIds, createdAt can be
reused), or if you must keep regexes, stop relying on positional capture groups
and instead match the whole tag then use independent attribute regexes (e.g.,
/type="([^"]+)"/, /name="([^"]+)"/, etc.) to extract attributes in any order and
parse child elements (property/alias) from the tag body.
src/functions/temporal-graph.ts-324-339 (1)

324-339: ⚠️ Potential issue | 🟠 Major

Reject invalid temporal bounds instead of treating NaN as a real timestamp.

new Date(...).getTime() returns NaN for bad input. In mem::temporal-query, that makes every edge look valid; in mem::differential-state, it can filter everything out. Validate asOf/from/to and return an error before building results.

Also applies to: 405-415

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/temporal-graph.ts` around lines 324 - 339, Validate temporal
inputs (data.asOf, data.from, data.to) before using Date(...).getTime(): check
that new Date(value).getTime() is not NaN and return or throw an error
immediately if invalid. In this block around the asOf filter (variables
asOfTime, e.tcommit/e.createdAt, e.tvalid, e.tvalidEnd) ensure you parse and
validate data.asOf into asOfTime and bail out with a clear error when NaN; do
the same for the other temporal checks used in the differential-state area (the
code handling data.from/data.to around lines 405-415) so you never treat NaN as
a valid timestamp when filtering edges. Ensure error handling returns a failure
response before building results.
src/functions/graph-retrieval.ts-90-107 (1)

90-107: ⚠️ Potential issue | 🟠 Major

Graph-only hits cannot be enriched because sessionId is blank.

Both result builders hard-code sessionId: "". HybridSearch.enrichResults() reads observations from KV.observations(sessionId), so these candidates get dropped entirely, and diversification also collapses all graph-only hits into one pseudo-session. This needs real session provenance before the result leaves the graph layer.

Also applies to: 143-149

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/graph-retrieval.ts` around lines 90 - 107, The graph-result
objects are being created with sessionId: "" which prevents
HybridSearch.enrichResults() from loading observations via
KV.observations(sessionId) and collapses graph-only hits; update both result
builders (the block that pushes {obsId, sessionId: "", score, graphContext:
buildGraphContext(path), ...} and the block pushing {obsId, sessionId: "",
score: 1.0, graphContext: `[${startNode.type}] ${startNode.name}`, ...} — also
the similar code at the 143-149 region) to set a real session provenance: look
up the session id from the observation metadata (or from startNode if it carries
a session id) before pushing the result, populate sessionId with that value, and
ensure buildGraphContext usage remains unchanged so HybridSearch.enrichResults()
and KV.observations(sessionId) can retrieve and enrich the proper observation.
src/index.ts-213-221 (1)

213-221: ⚠️ Potential issue | 🟠 Major

Validate AGENTMEMORY_GRAPH_WEIGHT before constructing the ranker.

parseFloat returns NaN for invalid env values. Once graph results are present, HybridSearch normalization/combine math turns into NaN and ranking breaks. Default and clamp the value before passing it in.

Suggested fix
-  const graphWeight = parseFloat(getEnvVar("AGENTMEMORY_GRAPH_WEIGHT") || "0.3");
+  const parsedGraphWeight = Number.parseFloat(
+    getEnvVar("AGENTMEMORY_GRAPH_WEIGHT") || "0.3",
+  );
+  const graphWeight =
+    Number.isFinite(parsedGraphWeight) && parsedGraphWeight >= 0
+      ? parsedGraphWeight
+      : 0.3;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 213 - 221, The AGENTMEMORY_GRAPH_WEIGHT can parse
to NaN and must be validated before creating the HybridSearch ranker: parse the
env var into a number, check isFinite and if not use the default (0.3), then
clamp the value into the valid range (e.g., 0.0 to 1.0) and pass that sanitized
graphWeight into the HybridSearch constructor (refer to the graphWeight variable
and the HybridSearch(...) call) so normalization/combination math cannot produce
NaN.
src/state/hybrid-search.ts-246-252 (1)

246-252: ⚠️ Potential issue | 🟠 Major

The refill loop bypasses the 3-per-session cap.

After the first pass enforces maxPerSession, this fallback adds any remaining unique result without checking sessionCounts, so a low-session corpus can still return 4+ hits from the same session. If the v0.6 diversification rule is meant to be hard, keep the cap here too or accept returning fewer than limit.

Suggested fix
     if (selected.length < limit) {
       for (const r of results) {
         if (selected.length >= limit) break;
-        if (!selected.some(s => s.obsId === r.obsId)) {
+        const count = sessionCounts.get(r.sessionId) || 0;
+        if (count >= maxPerSession) continue;
+        if (!selected.some((s) => s.obsId === r.obsId)) {
           selected.push(r);
+          sessionCounts.set(r.sessionId, count + 1);
         }
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/state/hybrid-search.ts` around lines 246 - 252, The refill loop that adds
from results into selected (using selected, limit, results) currently ignores
the per-session cap (sessionCounts / maxPerSession) and can exceed
maxPerSession; update that loop to check the sessionCounts for r.sessionId
before pushing (skip if adding would exceed maxPerSession) and increment
sessionCounts[r.sessionId] when you add an item, preserving the dedupe check
(selected.some(s => s.obsId === r.obsId)); if you intend the cap to be hard,
skip items that would violate it and accept fewer than limit, otherwise adjust
logic to allow soft cap consistently across both passes.
🟡 Minor comments (12)
benchmark/real-embeddings-eval.ts-111-135 (1)

111-135: ⚠️ Potential issue | 🟡 Minor

Keep embedding time separate from ingest/index time.

embedStart is taken before the BM25/KV ingest loop, so embed_time_ms includes unrelated indexing work. That makes the report's “Embedding Time” column misleading. Start timing immediately before embedBatch() or emit separate ingest and embedding metrics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/real-embeddings-eval.ts` around lines 111 - 135, The current
embedTime measurement (embedStart/embedTime) starts before the BM25/KV ingest
loop so it incorrectly includes ingest/index time; move the embedding timer to
start immediately before the provider.embedBatch loop (or add a separate ingest
timer) so embedding timing only wraps the batching/ provider.embedBatch and
vector.add work; update references to embedStart/embedTime (or add
ingestStart/ingestTime) and ensure logged/returned metrics use the new
embed-only timing while leaving the bm25.add and kv.set loop outside that timer.
src/functions/sliding-window.ts-153-159 (1)

153-159: ⚠️ Potential issue | 🟡 Minor

Don't count no-op windows as enriched.

mem::enrich-window returns success: true with enriched: null when there is no adjacent context, but mem::enrich-session increments enriched for any truthy success. That makes the batch totals report persisted chunks that were never written. Count only non-null enriched results, or track them as skipped.

Also applies to: 232-241

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/sliding-window.ts` around lines 153 - 159, The sliding-window
enrichment path currently returns {success: true, enriched: null} from
enrichWindow, but enrichSession increments its enriched counter for any truthy
success; change enrichSession so it only increments the "enriched" tally when
the result.enriched is non-null (i.e., result.enriched !== null or
Boolean(result.enriched)), and add/emit a "skipped" counter for cases where
success is true but enriched is null; alternatively, if you prefer, adjust
enrichWindow to set success:false when there is no adjacent context—ensure the
code paths referencing enrichWindow, enrichSession, and the enriched/success
fields are updated accordingly so batch totals reflect only actually persisted
chunks.
test/hybrid-search.test.ts-79-89 (1)

79-89: ⚠️ Potential issue | 🟡 Minor

This test no longer proves the behavior named in the title.

A constant positive combinedScore would satisfy these assertions. If BM25-only mode is supposed to derive combinedScore from BM25, assert the exact transform or at least a relation that depends on bm25Score.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/hybrid-search.test.ts` around lines 79 - 89, The test "combinedScore is
derived from bm25Score when no vector index" currently only asserts
combinedScore > 0 which doesn't prove it derives from bm25Score; update the
assertions in this test (the it block using makeObs, bm25.add, kv.set and new
HybridSearch(bm25, null, null, kv)) to assert the explicit relation between
combinedScore and bm25Score (or the exact transform the HybridSearch
implementation applies) — e.g.,
expect(results[0].combinedScore).toEqual(results[0].bm25Score) or
expect(results[0].combinedScore).toBeCloseTo(transform(results[0].bm25Score)) or
assert a monotonic relation (combinedScore > 0 and combinedScore === some
function of bm25Score) so the test fails if combinedScore is independent of
bm25Score.
test/graph-retrieval.test.ts-145-149 (1)

145-149: ⚠️ Potential issue | 🟡 Minor

Remove unnecessary as any cast on a valid edge type.

located_in is already defined in GraphEdgeType (src/types.ts:328), so the as any cast in the fixture is unnecessary and masks type safety. Either the makeEdge function's signature is too narrow to accept GraphEdgeType directly, or the cast is redundant—clarify which and remove the bypass.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/graph-retrieval.test.ts` around lines 145 - 149, The test fixture uses
an unnecessary cast "located_in as any" — remove the cast and pass the
GraphEdgeType value directly; if TypeScript complains, widen the parameter type
on the makeEdge function (update makeEdge's signature to accept GraphEdgeType
instead of a narrower union) so the call in graph-retrieval.test.ts compiles
without bypassing type safety. Ensure references to GraphEdgeType and makeEdge
are updated consistently.
test/sliding-window.test.ts-92-102 (1)

92-102: ⚠️ Potential issue | 🟡 Minor

The registration test is a false positive.

Line 101 asserts sdk.trigger is defined, but that is always true even if registration fails. This does not validate that both functions were actually registered.

🧪 Proposed fix
 function mockSdk() {
   const functions = new Map<string, Function>();
   return {
     registerFunction: (opts: { id: string }, fn: Function) => {
       functions.set(opts.id, fn);
     },
+    hasFunction: (id: string) => functions.has(id),
     trigger: async (id: string, data: unknown) => {
       const fn = functions.get(id);
       if (fn) return fn(data);
       return null;
     },
@@
   it("registers both functions", async () => {
@@
-    expect(sdk.trigger).toBeDefined();
+    expect((sdk as any).hasFunction("mem::get-window")).toBe(true);
+    expect((sdk as any).hasFunction("mem::enrich-window")).toBe(true);
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sliding-window.test.ts` around lines 92 - 102, The test is a false
positive because asserting sdk.trigger is defined does not verify registration;
update the test to assert that the SDK registration calls actually happened by
making assertions on the mock SDK's register/trigger registration calls (e.g.,
expect(sdk.register).toHaveBeenCalledTimes(2) and
expect(sdk.register).toHaveBeenCalledWith(<expectedFunctionNameOrMatcher>,
expect.any(Function)) or
expect(sdk.trigger).toHaveBeenCalledWith(<expectedTriggerName>,
expect.any(Function)) after calling registerSlidingWindowFunction; use the
concrete registration method on the mock returned by mockSdk() and the known
function/trigger names used inside registerSlidingWindowFunction to verify both
functions were registered.
src/state/stemmer.ts-35-39 (1)

35-39: ⚠️ Potential issue | 🟡 Minor

Normalize input casing before stemming.

stem currently processes raw casing. If callers pass capitalized terms, suffix logic may be skipped and recall becomes input-format dependent.

🔧 Proposed fix
 export function stem(word: string): string {
   if (word.length <= 2) return word;
 
-  let w = word;
+  let w = word.toLowerCase();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/state/stemmer.ts` around lines 35 - 39, The stem function currently
operates on the original casing and should normalize input to lowercase before
performing suffix logic: in function stem convert the input (e.g., set let w =
word.toLowerCase()) before any length/suffix checks so all comparisons are
case-insensitive; ensure the small-word early return and the final returned
value reflect the normalized form used by the algorithm (use the lowered w for
suffix operations and return).
benchmark/SCALE.md-59-72 (1)

59-72: ⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced block for markdownlint compliance.

Line 59 starts a fenced block without a language identifier, which triggers MD040 and can fail strict docs linting.

🛠️ Proposed fix
-```
+```text
 Agent context window: ~200K tokens
 System prompt + tools:  ~20K tokens
 User conversation:      ~30K tokens
 Available for memory:  ~150K tokens
@@
 agentmemory top-10 results:
   Any corpus size     =  ~1,924 tokens  (0.3% of budget)
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @benchmark/SCALE.md around lines 59 - 72, The fenced code block that starts
with and contains the "Agent context window: ~200K tokens" summary is missing a language tag which triggers MD040; fix it by changing the opening fence to include a language identifier (e.g.,text) so the block becomes
text ... (leave the closing fence unchanged) to satisfy markdownlint and
preserve the block contents.


</details>

</blockquote></details>
<details>
<summary>src/functions/actions.ts-168-175 (1)</summary><blockquote>

`168-175`: _⚠️ Potential issue_ | _🟡 Minor_

**Edge creation doesn't verify action still exists at write time.**

The source and target action existence checks (lines 168-175) are done before the edge is created (line 186). A concurrent delete could remove an action between the check and the edge write, creating an orphaned edge.



Consider wrapping the validation and creation in a lock, or accepting this as a low-probability race that `mem::heal` can clean up.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@src/functions/actions.ts` around lines 168 - 175, The current existence
checks for source and target actions (kv.get<Action> on KV.actions for
data.sourceActionId and data.targetActionId) occur before the edge write and can
race with concurrent deletes, creating orphaned edges; fix by making
validation+edge creation atomic — e.g., acquire a short-lived lock/mutex keyed
by both action IDs (or use your KV's transactional/compare-and-set API) inside
the function that writes the edge, re-fetch/validate sourceAction and
targetAction after acquiring the lock, then create the edge and release the
lock; alternatively, if you accept the small race, document it and rely on
mem::heal to clean orphaned edges.
```

</details>

</blockquote></details>
<details>
<summary>benchmark/QUALITY.md-20-20 (1)</summary><blockquote>

`20-20`: _⚠️ Potential issue_ | _🟡 Minor_

**Minor: Recall improvement calculation is slightly misleading.**

The text states "+4%" improvement (58.0% vs 55.8%), but the absolute difference is +2.2 percentage points. The relative improvement is approximately +4% ((58.0-55.8)/55.8 ≈ 3.9%), so consider clarifying this is relative improvement.


<details>
<summary>📝 Suggested clarification</summary>

```diff
-**Recall improvement:** agentmemory triple-stream finds 58.0% of relevant memories at K=10 vs 55.8% for keyword grep (+4%)
+**Recall improvement:** agentmemory triple-stream finds 58.0% of relevant memories at K=10 vs 55.8% for keyword grep (+2.2pp / ~4% relative)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@benchmark/QUALITY.md` at line 20, The "Recall improvement" line is misleading
because it shows "+4%" without specifying whether that's absolute or relative;
update the text (the "Recall improvement:" sentence) to clarify both values —
e.g., "Recall improvement: agentmemory triple-stream finds 58.0% of relevant
memories at K=10 vs 55.8% for keyword grep (absolute +2.2 percentage points,
relative ≈+3.9%)." Ensure the revised sentence replaces the current one exactly
where the "Recall improvement:" line appears.
```

</details>

</blockquote></details>
<details>
<summary>test/temporal-graph.test.ts-82-84 (1)</summary><blockquote>

`82-84`: _⚠️ Potential issue_ | _🟡 Minor_

**Dead code: unused variable `fns`.**

Line 82 creates an empty array `fns` that is never used. This appears to be leftover from refactoring.


<details>
<summary>🧹 Proposed fix</summary>

```diff
     registerTemporalGraphFunctions(sdk as never, kv as never, provider);

-    const fns = Array.from((sdk as any).trigger ? [] : []);
     expect(sdk.trigger).toBeDefined();
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@test/temporal-graph.test.ts` around lines 82 - 84, Remove the dead unused
variable fns from the test: the line creating const fns = Array.from((sdk as
any).trigger ? [] : []); is leftover and should be deleted so the test only
checks expect(sdk.trigger).toBeDefined();; update any surrounding test block
referencing fns if present (look for the const fns declaration and the
expect(sdk.trigger) assertion) to keep the test concise and free of unused
variables.
```

</details>

</blockquote></details>
<details>
<summary>src/functions/diagnostics.ts-595-606 (1)</summary><blockquote>

`595-606`: _⚠️ Potential issue_ | _🟡 Minor_

**Orphaned lease deletion lacks locking.**

The orphaned lease deletion at line 603 does not use `withKeyedLock`, unlike the expired lease handling at lines 556-583. This creates a potential race condition if another operation concurrently modifies the lease.


<details>
<summary>🔒 Proposed fix to add locking</summary>

```diff
          if (!actionIds.has(lease.actionId)) {
            if (dryRun) {
              details.push(
                `[dry-run] Would delete orphaned lease ${lease.id}`,
              );
              fixed++;
              continue;
            }
-           await kv.delete(KV.leases, lease.id);
-           details.push(`Deleted orphaned lease ${lease.id}`);
-           fixed++;
+           const didDelete = await withKeyedLock(
+             `mem:lease:${lease.actionId}`,
+             async () => {
+               const fresh = await kv.get<Lease>(KV.leases, lease.id);
+               if (!fresh) return false;
+               const freshActions = await kv.list<Action>(KV.actions);
+               const freshActionIds = new Set(freshActions.map((a) => a.id));
+               if (freshActionIds.has(fresh.actionId)) return false;
+               await kv.delete(KV.leases, lease.id);
+               return true;
+             },
+           );
+           if (didDelete) {
+             details.push(`Deleted orphaned lease ${lease.id}`);
+             fixed++;
+           } else {
+             skipped++;
+           }
          }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@src/functions/diagnostics.ts` around lines 595 - 606, The orphaned-lease
deletion must be protected with the same keyed lock as expired leases: wrap the
delete logic in withKeyedLock keyed on lease.id (use the same withKeyedLock
helper used in the expired-lease block), and inside the lock re-check
actionIds.has(lease.actionId) before calling kv.delete(KV.leases, lease.id);
keep the dryRun branch outside/not acquiring the lock (still push the dry-run
message and increment fixed), but for real deletions acquire the lock,
re-validate and then delete to avoid the race.
```

</details>

</blockquote></details>
<details>
<summary>src/triggers/api.ts-1722-1727 (1)</summary><blockquote>

`1722-1727`: _⚠️ Potential issue_ | _🟡 Minor_

**Validate `limit` before forwarding it to `mem::crystal-list`.**

`parseInt(params.limit as string)` is passed through directly here, so `?limit=foo` becomes `NaN` and leaks into the handler. Normalize invalid values to `undefined` or return `400` instead.  


<details>
<summary>Suggested fix</summary>

```diff
   sdk.registerFunction({ id: "api::crystal-list" }, async (req: ApiRequest) => {
     const denied = checkAuth(req, secret);
     if (denied) return denied;
     const params = req.query_params || {};
-    const result = await sdk.trigger("mem::crystal-list", { project: params.project, sessionId: params.sessionId, limit: params.limit ? parseInt(params.limit as string) : undefined });
+    const limit =
+      params.limit === undefined
+        ? undefined
+        : Number.parseInt(params.limit as string, 10);
+    if (limit !== undefined && !Number.isFinite(limit)) {
+      return { status_code: 400, body: { error: "limit must be a number" } };
+    }
+    const result = await sdk.trigger("mem::crystal-list", {
+      project: params.project,
+      sessionId: params.sessionId,
+      limit,
+    });
     return { status_code: 200, body: result };
   });
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@src/triggers/api.ts` around lines 1722 - 1727, The api::crystal-list handler
is forwarding parseInt(params.limit) unvalidated into
sdk.trigger("mem::crystal-list"), so invalid ?limit=foo becomes NaN; update the
sdk.registerFunction callback to validate params.limit: if params.limit is
present, parse it with parseInt (or Number), check isNaN, and if invalid return
a 400 error response (e.g., { status_code: 400, body: { error: "invalid limit" }
}); otherwise pass the numeric limit (or undefined when absent) into the
mem::crystal-list trigger. Ensure you adjust the code around
sdk.registerFunction({ id: "api::crystal-list" }, async (req: ApiRequest) => {
... }) and use the validated variable when calling sdk.trigger.
```

</details>

</blockquote></details>

</blockquote></details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread plugin/scripts/diagnostics.mjs Outdated
Comment thread src/functions/leases.ts
Comment thread src/functions/mesh.ts Outdated
Comment on lines +7 to +32
function isAllowedUrl(urlStr: string): boolean {
try {
const parsed = new URL(urlStr);
if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false;
const host = parsed.hostname.toLowerCase();
if (
host === "localhost" ||
host === "127.0.0.1" ||
host === "::1" ||
host === "0.0.0.0" ||
host.startsWith("10.") ||
host.startsWith("192.168.") ||
host === "169.254.169.254" ||
/^172\.(1[6-9]|2\d|3[01])\./.test(host) ||
host.startsWith("fe80:") ||
host.startsWith("fc00:") ||
host.startsWith("fd") ||
host.startsWith("::ffff:")
) {
return false;
}
return true;
} catch {
return false;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SSRF guard is bypassable with DNS resolution tricks.

isAllowedUrl only inspects hostname text; a public hostname resolving to RFC1918/link-local can still pass and be fetched. This leaves outbound mesh sync open to SSRF.

🔐 Harden URL validation (DNS + IP range checks)
+import { lookup } from "node:dns/promises";
+import { isIP } from "node:net";
...
-function isAllowedUrl(urlStr: string): boolean {
+async function isAllowedUrl(urlStr: string): Promise<boolean> {
   try {
     const parsed = new URL(urlStr);
     if (parsed.protocol !== "http:" && parsed.protocol !== "https:") return false;
-    const host = parsed.hostname.toLowerCase();
+    const host = parsed.hostname.toLowerCase();
+    if (parsed.username || parsed.password) return false;
+
+    const privateIp = (ip: string): boolean => {
+      // implement RFC1918, loopback, link-local, ULA, mapped/embedded local checks
+      // for both IPv4/IPv6 textual addresses
+      return /* ... */;
+    };
+
+    if (isIP(host) && privateIp(host)) return false;
+    const resolved = await lookup(host, { all: true, verbatim: true });
+    if (resolved.some((r) => privateIp(r.address))) return false;
...
-    return true;
+    return true;
   } catch {
     return false;
   }
}

And update call sites:

-      if (!isAllowedUrl(data.url)) {
+      if (!(await isAllowedUrl(data.url))) {
...
-          if (!isAllowedUrl(peer.url)) {
+          if (!(await isAllowedUrl(peer.url))) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/mesh.ts` around lines 7 - 32, isAllowedUrl currently only
checks the hostname string and can be bypassed via DNS tricks; update
isAllowedUrl to perform DNS resolution (both A and AAAA records using
dns.promises.resolve4/resolve6 or lookup with {all:true}) and validate each
resolved IP against the same private/loopback/link-local ranges (IPv4 and IPv6),
rejecting the URL if any resolved address is in a disallowed range or if
resolution fails/times out; ensure the function handles multiple addresses,
treats hostnames that resolve to both public and private IPs as disallowed, and
update all call sites that rely on isAllowedUrl to handle its now-async behavior
(or provide a new async wrapper like isAllowedUrlAsync) and propagate
errors/timeouts appropriately.

Comment on lines +54 to +61
if (data.type === "pattern") {
const cfg = data.config as { pattern?: string } | undefined;
if (!cfg || !cfg.pattern || typeof cfg.pattern !== "string") {
return {
success: false,
error: "pattern config requires a pattern string",
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd /root/repo && head -n 300 src/functions/sentinels.ts | tail -n +50

Repository: rohitg00/agentmemory

Length of output: 124


🏁 Script executed:

cd /root/repo && sed -n '54,65p' src/functions/sentinels.ts

Repository: rohitg00/agentmemory

Length of output: 124


🏁 Script executed:

cd /root/repo && sed -n '237,260p' src/functions/sentinels.ts

Repository: rohitg00/agentmemory

Length of output: 124


🏁 Script executed:

cd /root/repo && wc -l src/functions/sentinels.ts

Repository: rohitg00/agentmemory

Length of output: 124


🏁 Script executed:

find . -name "sentinels.ts" -type f

Repository: rohitg00/agentmemory

Length of output: 92


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '50,70p'

Repository: rohitg00/agentmemory

Length of output: 849


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '230,260p'

Repository: rohitg00/agentmemory

Length of output: 1290


🏁 Script executed:

wc -l src/functions/sentinels.ts

Repository: rohitg00/agentmemory

Length of output: 94


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '1,60p'

Repository: rohitg00/agentmemory

Length of output: 2384


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '180,240p'

Repository: rohitg00/agentmemory

Length of output: 2592


🏁 Script executed:

rg "data\.type.*pattern" -A 10 -B 5 src/functions/sentinels.ts

Repository: rohitg00/agentmemory

Length of output: 541


🏁 Script executed:

rg "RegExp|new Regex|escapeRegex|validat.*pattern" src/functions/sentinels.ts

Repository: rohitg00/agentmemory

Length of output: 117


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '15,30p'

Repository: rohitg00/agentmemory

Length of output: 697


🏁 Script executed:

rg "escape|safe.*pattern|validateRegex" src/

Repository: rohitg00/agentmemory

Length of output: 476


🏁 Script executed:

cat src/functions/sentinels.ts | grep -n "sdk.registerFunction" -A 20

Repository: rohitg00/agentmemory

Length of output: 4572


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '54,62p'

Repository: rohitg00/agentmemory

Length of output: 444


🏁 Script executed:

cat -n src/functions/sentinels.ts | sed -n '237,260p'

Repository: rohitg00/agentmemory

Length of output: 1090


Validate sentinel patterns before compiling them as regexes to prevent ReDoS attacks.

External input from mem::sentinel-create passes cfg.pattern directly to new RegExp() at line 239 without safety validation. The regex is then tested against every observation title across all sessions (line 253), making a catastrophic pattern (e.g., (a+)+b) a denial-of-service vector. Current validation only checks that the pattern is a non-empty string. Implement pattern validation using a regex timeout/complexity checker, or switch to safe literal matching mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/sentinels.ts` around lines 54 - 61, The sentinel pattern from
mem::sentinel-create (cfg.pattern) is currently passed straight into new RegExp
and then used to test observation titles, which is vulnerable to ReDoS; to fix,
validate and/or sanitize cfg.pattern before compiling: first enforce a max
length (e.g. 100 chars), reject dangerous tokens (backreferences like \1,
lookarounds `(?=`, `(?!)`, and nested quantifiers such as patterns containing
`+`, `*` or `?` immediately inside repeated groups like `(…+)+`), attempt a safe
compile inside a try/catch around new RegExp and return an error if compilation
fails or heuristics flag it, and alternatively provide an option to treat the
input as a literal by escaping it (escapeRegex) and using that in RegExp if any
risk is detected; update the code paths around cfg.pattern, new RegExp(...) and
the RegExp.test(...) usage so only validated/escaped patterns are compiled and
used.

Comment on lines +96 to +107
const now = new Date();
const sentinel: Sentinel = {
id: generateId("snl"),
name: data.name.trim(),
type: data.type,
status: "watching",
config: data.config || {},
createdAt: now.toISOString(),
linkedActionIds: data.linkedActionIds || [],
expiresAt: data.expiresInMs
? new Date(now.getTime() + data.expiresInMs).toISOString()
: undefined,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Persist timer deadlines instead of relying on setTimeout() alone.

For timer sentinels, the only trigger path is the in-memory timeout. After a restart, the persisted sentinel is still watching, but no timer is re-armed and mem::sentinel-check never evaluates timer sentinels, so linked actions can stay blocked forever. Store the due time in KV and make timer evaluation recoverable on startup/check.

Also applies to: 125-137

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/sentinels.ts` around lines 96 - 107, The timer sentinel
creation currently only sets an in-memory timeout; persist the deadline and make
timer evaluation recoverable by storing the computed expiresAt on the Sentinel
record (the object created where id is generateId("snl"), name, type, status,
config, linkedActionIds, createdAt, expiresAt) and ensuring the sentinel-check
path (mem::sentinel-check) reads expiresAt from KV/DB on startup and during
checks: for each timer-type sentinel, if expiresAt <= now immediately trigger
the linked actions, otherwise re-arm a setTimeout to fire at expiresAt; also
update any create/update functions that accept data.expiresInMs to compute and
persist expiresAt instead of relying solely on setTimeouts so timers survive
restarts.

Merge conflicts: resolved all 7 files (README, types, schema, index,
export-import, version, test) keeping v0.6.0 additions merged with main

CodeRabbit fixes:
- diagnostics.mjs: use action-level lock key (mem:action:) for lease
  expiry to prevent races with lease handlers
- leases.ts: check for other active leases before clearing action
  assignment on lease expiry
- mesh.ts: DNS resolution check in isAllowedUrl to prevent SSRF via
  DNS rebinding (resolves IPs and checks private ranges)
- export-import.ts: clear graphNodes, graphEdges, semantic, procedural
  scopes on replace strategy
- real-embeddings-eval.ts: measure actual returned result tokens
  instead of using label counts
- .gitignore: exclude data/ directory
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/types.ts (1)

241-262: ⚠️ Potential issue | 🟠 Major

ExportData for v0.6.0 is missing temporal edge history.

Line 242 introduces version 0.6.0, but the export schema still omits graph edge history. That prevents full round-trip of temporal graph state and can cause inconsistent imports.

Suggested patch
 export interface ExportData {
   version: "0.3.0" | "0.4.0" | "0.5.0" | "0.6.0";
   exportedAt: string;
   sessions: Session[];
   observations: Record<string, CompressedObservation[]>;
   memories: Memory[];
   summaries: SessionSummary[];
   profiles?: ProjectProfile[];
   graphNodes?: GraphNode[];
   graphEdges?: GraphEdge[];
+  graphEdgeHistory?: GraphEdge[];
   semanticMemories?: SemanticMemory[];
   proceduralMemories?: ProceduralMemory[];
   actions?: Action[];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types.ts` around lines 241 - 262, ExportData for version "0.6.0" is
missing temporal edge history; add a new optional property to the ExportData
interface (e.g., graphEdgeHistory?: GraphEdgeHistory[] or temporalEdgeHistory?:
TemporalEdgeHistory[]) so the export can carry the full temporal history of
graph edges; update the ExportData interface declaration to include that
optional array and ensure the referenced GraphEdgeHistory/TemporalEdgeHistory
type (or equivalent) exists and matches the schema used for edge history during
exports/imports.
♻️ Duplicate comments (2)
src/functions/mesh.ts (2)

22-38: ⚠️ Potential issue | 🔴 Critical

This validation still doesn’t protect the actual request target.

At Line 37 the guard fails open on DNS errors, and even on success it only validates one lookup() result while hostname-based peers are resolved again during the later HTTP request. That leaves SSRF possible via resolver outages or DNS rebinding. Please fail closed and either pin the outbound connection to the vetted address set or reject hostname-based peers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/mesh.ts` around lines 22 - 38, The URL validation currently
fails open on DNS errors and only checks some resolved addresses, leaving SSRF
risk; update isAllowedUrl to fail closed: when host is not an IP, call
lookup(host, { all: true }) and if lookup throws return false, and ensure you
validate every resolved address (use resolved.every(...) to ensure none
arePrivateIP) rather than some(...); to prevent DNS rebinding either (A) change
isAllowedUrl to return the vetted IP list (or a structure containing the list)
so the later HTTP request can be pinned to those addresses, or (B) simply reject
all hostname-based peers by returning false for non-IP hosts after successful
validation — implement one of these approaches and reference isAllowedUrl and
the lookup/resolved handling when making the change.

9-19: ⚠️ Potential issue | 🔴 Critical

Block the full private/local CIDR ranges here.

isPrivateIP still misses many loopback, link-local, and ULA forms, so the SSRF guard is bypassable with non-canonical local addresses like 127.0.0.2, 169.254.10.20, fc01::1, and fe90::1. This needs CIDR-aware checks for the full IPv4/IPv6 ranges instead of a few literal/prefix matches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/mesh.ts` around lines 9 - 19, The isPrivateIP function
currently does string-prefix checks and misses many addresses; replace its logic
with CIDR-aware checks by parsing the address (handle IPv4, IPv6, and
IPv4-mapped IPv6 like ::ffff:) and testing membership against the full
private/local CIDR ranges (IPv4: 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12,
192.168.0.0/16, 169.254.0.0/16, 0.0.0.0/8, and IPv6: ::1/128, fe80::/10
(link-local), fc00::/7 (ULA) plus ::ffff:0:0/96 mapping); implement this in
isPrivateIP using a CIDR-aware IP library (e.g., ipaddr.js or similar) to parse
and test contains() rather than string checks so non-canonical forms like
127.0.0.2, 169.254.x.x, fc01::1 and fe90::1 are correctly detected.
🧹 Nitpick comments (1)
plugin/scripts/diagnostics.mjs (1)

52-61: Use a dedicated category list for mem::heal.

mem::heal currently filters against ALL_CATEGORIES, so sessions and mesh are accepted even though the handler never processes either one. Calls like mem::heal({ categories: ["mesh"] }) become silent no-ops instead of surfacing that the category is non-healable.

Minimal cleanup
 const ALL_CATEGORIES = [
 	"actions",
 	"leases",
 	"sentinels",
 	"sketches",
 	"signals",
 	"sessions",
 	"memories",
 	"mesh"
 ];
+const HEALABLE_CATEGORIES = [
+	"actions",
+	"leases",
+	"sentinels",
+	"sketches",
+	"signals",
+	"memories"
+];
@@
-		const categories = data.categories && data.categories.length > 0 ? data.categories.filter((c) => ALL_CATEGORIES.includes(c)) : ALL_CATEGORIES;
+		const categories = data.categories && data.categories.length > 0 ? data.categories.filter((c) => HEALABLE_CATEGORIES.includes(c)) : HEALABLE_CATEGORIES;

Also applies to: 336-338

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/scripts/diagnostics.mjs` around lines 52 - 61, The mem::heal handler
is filtering against ALL_CATEGORIES which includes categories (e.g., "sessions",
"mesh") that the heal logic doesn't process, causing silent no-ops; create a
dedicated HEALABLE_CATEGORIES array (e.g.,
["actions","leases","sentinels","sketches","signals","memories"]) and replace
uses of ALL_CATEGORIES in the mem::heal filtering/validation (and the other
occurrence that uses the same filter) with HEALABLE_CATEGORIES, and add a
validation step in mem::heal to reject or warn when callers pass categories not
in HEALABLE_CATEGORIES so non-healable categories surface an error instead of
becoming no-ops.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/scripts/diagnostics.mjs`:
- Around line 421-455: The code returns early after expiring an active lease
which prevents the subsequent orphaned-lease check from running; update the
logic around the lease handling in the block that references lease, KV.leases,
KV.actions, actionIds and withKeyedLock so that after you expire a lease (or in
dry-run simulate expiry) you still verify whether the lease's action exists in
actionIds (or is missing from KV.actions) and delete the orphaned lease
accordingly; remove or avoid the premature continue (or set a flag) so both the
expiry path and the orphaned-lease deletion path can run for the same lease,
ensuring mem::heal and mem::diagnose see consistent results.
- Around line 487-493: The sketch-discard path is deleting KV.actions while only
holding the mem:sketch:${sketch.id} lock which can race the lease handlers that
lock on mem:action:${actionId}; update the block around withKeyedLock(...) (the
code calling kv.list(KV.actionEdges), kv.delete(KV.actionEdges, ...), and the
loop that deletes KV.actions) to: acquire per-action locks for each actionId in
a stable order (e.g., sort fresh.actionIds and use withKeyedLock for
mem:action:${actionId}), under those action locks delete any related leases (the
lease rows/keys the leases handler would manage) before deleting KV.actions, and
then release the action locks; keep deletion of edges (KV.actionEdges)
coordinated inside the same locking sequence so no race can orphan leases.
Ensure you reference and use withKeyedLock, KV.actions, KV.actionEdges, and
mem:action:${actionId} when implementing the fix.

In `@src/functions/export-import.ts`:
- Around line 265-275: The replace path's cleanup loops clear KV.graphNodes,
KV.graphEdges, KV.semantic, and KV.procedural but omits KV.graphEdgeHistory,
leaving stale edge history; add a similar cleanup loop for KV.graphEdgeHistory
in the same block (where the other for-await deletions occur) so that for (const
h of await kv.list<{ id: string }>(KV.graphEdgeHistory).catch(() => [])) { await
kv.delete(KV.graphEdgeHistory, h.id); } runs during replace mode to fully purge
historical edges.

---

Outside diff comments:
In `@src/types.ts`:
- Around line 241-262: ExportData for version "0.6.0" is missing temporal edge
history; add a new optional property to the ExportData interface (e.g.,
graphEdgeHistory?: GraphEdgeHistory[] or temporalEdgeHistory?:
TemporalEdgeHistory[]) so the export can carry the full temporal history of
graph edges; update the ExportData interface declaration to include that
optional array and ensure the referenced GraphEdgeHistory/TemporalEdgeHistory
type (or equivalent) exists and matches the schema used for edge history during
exports/imports.

---

Duplicate comments:
In `@src/functions/mesh.ts`:
- Around line 22-38: The URL validation currently fails open on DNS errors and
only checks some resolved addresses, leaving SSRF risk; update isAllowedUrl to
fail closed: when host is not an IP, call lookup(host, { all: true }) and if
lookup throws return false, and ensure you validate every resolved address (use
resolved.every(...) to ensure none arePrivateIP) rather than some(...); to
prevent DNS rebinding either (A) change isAllowedUrl to return the vetted IP
list (or a structure containing the list) so the later HTTP request can be
pinned to those addresses, or (B) simply reject all hostname-based peers by
returning false for non-IP hosts after successful validation — implement one of
these approaches and reference isAllowedUrl and the lookup/resolved handling
when making the change.
- Around line 9-19: The isPrivateIP function currently does string-prefix checks
and misses many addresses; replace its logic with CIDR-aware checks by parsing
the address (handle IPv4, IPv6, and IPv4-mapped IPv6 like ::ffff:) and testing
membership against the full private/local CIDR ranges (IPv4: 127.0.0.0/8,
10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, 0.0.0.0/8, and IPv6:
::1/128, fe80::/10 (link-local), fc00::/7 (ULA) plus ::ffff:0:0/96 mapping);
implement this in isPrivateIP using a CIDR-aware IP library (e.g., ipaddr.js or
similar) to parse and test contains() rather than string checks so non-canonical
forms like 127.0.0.2, 169.254.x.x, fc01::1 and fe90::1 are correctly detected.

---

Nitpick comments:
In `@plugin/scripts/diagnostics.mjs`:
- Around line 52-61: The mem::heal handler is filtering against ALL_CATEGORIES
which includes categories (e.g., "sessions", "mesh") that the heal logic doesn't
process, causing silent no-ops; create a dedicated HEALABLE_CATEGORIES array
(e.g., ["actions","leases","sentinels","sketches","signals","memories"]) and
replace uses of ALL_CATEGORIES in the mem::heal filtering/validation (and the
other occurrence that uses the same filter) with HEALABLE_CATEGORIES, and add a
validation step in mem::heal to reject or warn when callers pass categories not
in HEALABLE_CATEGORIES so non-healable categories surface an error instead of
becoming no-ops.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cb734de-82b1-45f0-a919-dbf000a8bc98

📥 Commits

Reviewing files that changed from the base of the PR and between b565030 and aae8d5c.

📒 Files selected for processing (7)
  • .gitignore
  • benchmark/real-embeddings-eval.ts
  • plugin/scripts/diagnostics.mjs
  • src/functions/export-import.ts
  • src/functions/leases.ts
  • src/functions/mesh.ts
  • src/types.ts
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/functions/leases.ts
  • benchmark/real-embeddings-eval.ts

Comment on lines +421 to +455
if (lease.status === "active" && new Date(lease.expiresAt).getTime() <= now) {
if (dryRun) {
details.push(`[dry-run] Would expire lease ${lease.id} for action ${lease.actionId}`);
fixed++;
continue;
}
if (await withKeyedLock(`mem:action:${lease.actionId}`, async () => {
const fresh = await kv.get(KV.leases, lease.id);
if (!fresh || fresh.status !== "active" || new Date(fresh.expiresAt).getTime() > Date.now()) return false;
fresh.status = "expired";
await kv.set(KV.leases, fresh.id, fresh);
const action = await kv.get(KV.actions, fresh.actionId);
if (action && action.status === "active" && action.assignedTo === fresh.agentId) {
action.status = "pending";
action.assignedTo = void 0;
action.updatedAt = (/* @__PURE__ */ new Date()).toISOString();
await kv.set(KV.actions, action.id, action);
}
return true;
})) {
details.push(`Expired lease ${lease.id} for action ${lease.actionId}`);
fixed++;
} else skipped++;
continue;
}
if (!actionIds.has(lease.actionId)) {
if (dryRun) {
details.push(`[dry-run] Would delete orphaned lease ${lease.id}`);
fixed++;
continue;
}
await kv.delete(KV.leases, lease.id);
details.push(`Deleted orphaned lease ${lease.id}`);
fixed++;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Delete expired orphaned leases in the same pass.

Lines 421-445 continue immediately after expiring an active lease. When that lease also points at a missing action, Lines 446-455 never run, so one mem::heal call can report a fix while mem::diagnose still returns orphaned-lease:* for the same record.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/scripts/diagnostics.mjs` around lines 421 - 455, The code returns
early after expiring an active lease which prevents the subsequent
orphaned-lease check from running; update the logic around the lease handling in
the block that references lease, KV.leases, KV.actions, actionIds and
withKeyedLock so that after you expire a lease (or in dry-run simulate expiry)
you still verify whether the lease's action exists in actionIds (or is missing
from KV.actions) and delete the orphaned lease accordingly; remove or avoid the
premature continue (or set a flag) so both the expiry path and the
orphaned-lease deletion path can run for the same lease, ensuring mem::heal and
mem::diagnose see consistent results.

Comment on lines +487 to +493
if (await withKeyedLock(`mem:sketch:${sketch.id}`, async () => {
const fresh = await kv.get(KV.sketches, sketch.id);
if (!fresh || fresh.status !== "active" || new Date(fresh.expiresAt).getTime() > Date.now()) return false;
const allEdges = await kv.list(KV.actionEdges);
const actionIdSet = new Set(fresh.actionIds);
for (const edge of allEdges) if (actionIdSet.has(edge.sourceActionId) || actionIdSet.has(edge.targetActionId)) await kv.delete(KV.actionEdges, edge.id);
for (const actionId of fresh.actionIds) await kv.delete(KV.actions, actionId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Sketch discard is bypassing the action lock and orphaning leases.

Lines 487-493 delete KV.actions while only holding mem:sketch:${sketch.id}. The lease handlers in src/functions/leases.ts:1-50 serialize on mem:action:${actionId} instead, so this path can race normal action/lease writes and, because the leases pass already ran earlier in the same heal, it can create fresh orphaned leases that survive until the next run. Take the affected action locks in a stable order and remove related leases before deleting the action rows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/scripts/diagnostics.mjs` around lines 487 - 493, The sketch-discard
path is deleting KV.actions while only holding the mem:sketch:${sketch.id} lock
which can race the lease handlers that lock on mem:action:${actionId}; update
the block around withKeyedLock(...) (the code calling kv.list(KV.actionEdges),
kv.delete(KV.actionEdges, ...), and the loop that deletes KV.actions) to:
acquire per-action locks for each actionId in a stable order (e.g., sort
fresh.actionIds and use withKeyedLock for mem:action:${actionId}), under those
action locks delete any related leases (the lease rows/keys the leases handler
would manage) before deleting KV.actions, and then release the action locks;
keep deletion of edges (KV.actionEdges) coordinated inside the same locking
sequence so no race can orphan leases. Ensure you reference and use
withKeyedLock, KV.actions, KV.actionEdges, and mem:action:${actionId} when
implementing the fix.

Comment on lines +265 to +275
for (const n of await kv.list<{ id: string }>(KV.graphNodes).catch(() => [])) {
await kv.delete(KV.graphNodes, n.id);
}
for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) {
await kv.delete(KV.graphEdges, e.id);
}
for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) {
await kv.delete(KV.semantic, s.id);
}
for (const p of await kv.list<{ id: string }>(KV.procedural).catch(() => [])) {
await kv.delete(KV.procedural, p.id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace strategy still leaves temporal edge history behind.

Line 265 adds graph cleanup, but KV.graphEdgeHistory is still not cleared. In replace mode this can retain stale historical edges and skew point-in-time graph queries.

Suggested patch
         for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) {
           await kv.delete(KV.graphEdges, e.id);
         }
+        for (const h of await kv
+          .list<{ id: string }>(KV.graphEdgeHistory)
+          .catch(() => [])) {
+          await kv.delete(KV.graphEdgeHistory, h.id);
+        }
         for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) {
           await kv.delete(KV.semantic, s.id);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (const n of await kv.list<{ id: string }>(KV.graphNodes).catch(() => [])) {
await kv.delete(KV.graphNodes, n.id);
}
for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) {
await kv.delete(KV.graphEdges, e.id);
}
for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) {
await kv.delete(KV.semantic, s.id);
}
for (const p of await kv.list<{ id: string }>(KV.procedural).catch(() => [])) {
await kv.delete(KV.procedural, p.id);
for (const n of await kv.list<{ id: string }>(KV.graphNodes).catch(() => [])) {
await kv.delete(KV.graphNodes, n.id);
}
for (const e of await kv.list<{ id: string }>(KV.graphEdges).catch(() => [])) {
await kv.delete(KV.graphEdges, e.id);
}
for (const h of await kv
.list<{ id: string }>(KV.graphEdgeHistory)
.catch(() => [])) {
await kv.delete(KV.graphEdgeHistory, h.id);
}
for (const s of await kv.list<{ id: string }>(KV.semantic).catch(() => [])) {
await kv.delete(KV.semantic, s.id);
}
for (const p of await kv.list<{ id: string }>(KV.procedural).catch(() => [])) {
await kv.delete(KV.procedural, p.id);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/export-import.ts` around lines 265 - 275, The replace path's
cleanup loops clear KV.graphNodes, KV.graphEdges, KV.semantic, and KV.procedural
but omits KV.graphEdgeHistory, leaving stale edge history; add a similar cleanup
loop for KV.graphEdgeHistory in the same block (where the other for-await
deletions occur) so that for (const h of await kv.list<{ id: string
}>(KV.graphEdgeHistory).catch(() => [])) { await kv.delete(KV.graphEdgeHistory,
h.id); } runs during replace mode to fully purge historical edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant