feat(multimodal): Phase 1 — CLIP visual embeddings + vision-search#179
feat(multimodal): Phase 1 — CLIP visual embeddings + vision-search#179
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds optional CLIP-based image embeddings and cross-modal search: new CLIP embedding provider, SDK functions Changes
Sequence Diagram(s)sequenceDiagram
participant Observe as Observer
participant VisionEmbed as mem::vision-embed
participant ImageProvider as CLIP Provider
participant KV as KV Store
Observe->>Observe: Save extracted image to disk
Observe->>Observe: Increment image ref count
Note right of Observe: if AGENTMEMORY_IMAGE_EMBEDDINGS=true
Observe->>VisionEmbed: Trigger mem::vision-embed (imageRef, sessionId, obsId)
VisionEmbed->>KV: kv.get(KV.imageRefs, imageRef) verify ref exists
VisionEmbed->>ImageProvider: embedImage(filePath)
ImageProvider->>ImageProvider: Load image & run CLIP pipeline
ImageProvider-->>VisionEmbed: Float32Array embedding
VisionEmbed->>KV: kv.put(KV.imageEmbeddings, imageRef, StoredEmbedding)
VisionEmbed-->>Observe: Return success / dimensions
sequenceDiagram
participant Client as Client/API
participant MCP as mcp::tools / API
participant VisionSearch as mem::vision-search
participant ImageProvider as CLIP Provider
participant KV as KV Store
Client->>MCP: Request (queryText or queryImageRef / queryImageBase64, topK, sessionId)
MCP->>VisionSearch: Trigger mem::vision-search with payload
alt queryText provided
VisionSearch->>ImageProvider: embed(queryText)
else queryImageBase64 provided
VisionSearch->>ImageProvider: embedImage(base64)
else queryImageRef provided
VisionSearch->>KV: kv.get(KV.imageRefs, queryImageRef) verify
VisionSearch->>ImageProvider: embedImage(queryImageRef)
end
ImageProvider-->>VisionSearch: query vector
VisionSearch->>KV: list KV.imageEmbeddings (optionally filter by sessionId)
KV-->>VisionSearch: stored embeddings
VisionSearch->>VisionSearch: compute cosine similarities, sort, slice topK
VisionSearch-->>MCP: Return ranked results
MCP-->>Client: Respond with results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Builds on PR #111's image store. Adds a dedicated visual embedding path so agents can search screenshots by natural language or by similar image, without routing through VLM-describe → BM25-text. ## Additions - `EmbeddingProvider.embedImage?(src)` slot (non-breaking, optional) - `ClipEmbeddingProvider` — CLIP ViT-B/32 via `@xenova/transformers` (already a dep). 512d, normalized. Supports base64, data URL, and file paths. Lazy model load; same pattern as the existing local text provider. - `createImageEmbeddingProvider()` gated on `AGENTMEMORY_IMAGE_EMBEDDINGS=true` — opt-in, matches the auto-compress/context-injection opt-in pattern from #138 and #143. - `KV.imageEmbeddings` scope keyed by imageRef path. - `mem::vision-embed` + `mem::vision-search` functions. Cosine similarity, brute-force over the stored set (fine at image-memory scales; ANN can come later). - `POST /agentmemory/vision-embed` + `POST /agentmemory/vision-search` HTTP triggers with auth + 400/503 error shapes. - MCP tool `memory_vision_search` exposed via the running-server handler. Accepts queryText, queryImageRef, queryImageBase64, topK, sessionId. - Observe pipeline: when an image is stored and the flag is on, fires `mem::vision-embed` via triggerVoid so embedding happens off the critical path. - `decrementImageRef` sweeps the embedding too when refcount hits zero, so forget/evict/retention stop orphaning vectors. ## Why Mem0/Zep/Letta are text-only in 2026. AgenticVision ships CLIP but needs manual capture. agentmemory already auto-captures via hooks — bolting CLIP onto that pipeline makes it the only memory system where the agent sees what it sees and can cross-modal retrieve without manual invocation. Full plan: local spec at ~/specs-backup/agentmemory-multimodal-plan.md. ## Tests - test/vision-search.test.ts: 6 new cases covering embed→store, text query ranking, image-to-image query, sessionId filter, provider- absent error, missing-query rejection. - Full suite: 794/794 pass (+6 new).
4e87d19 to
bf6f33f
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
README.md (2)
813-847:⚠️ Potential issue | 🟡 MinorDocument the opt-in vision embeddings flag.
The new MCP tool says it requires
AGENTMEMORY_IMAGE_EMBEDDINGS=true, but the configuration block does not show that env var. Add it here so users can enable the feature without reading source.Suggested env var entry
# GRAPH_EXTRACTION_ENABLED=false +# AGENTMEMORY_IMAGE_EMBEDDINGS=false # OFF by default. Set true to enable CLIP image embeddings + # and vision-search over stored screenshots. # CONSOLIDATION_ENABLED=true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 813 - 847, Add a new documented env var entry for AGENTMEMORY_IMAGE_EMBEDDINGS in the configuration block under the "Features" section so users can opt into image embedding support for the MCP tool; declare it default false (e.g., AGENTMEMORY_IMAGE_EMBEDDINGS=false) and include a short comment explaining that setting it to true enables image embeddings required by the new MCP tool and may affect storage/token usage, similar to the other AGENTMEMORY_* flags (reference the existing AGENTMEMORY_AUTO_COMPRESS and AGENTMEMORY_INJECT_CONTEXT entries to match style).
337-366:⚠️ Potential issue | 🟡 MinorFinish updating the MCP tool counts and list.
The README now says 45 tools in some places, but this section still has
all 43 memory tools,### 44 Tools, andExtended tools (44 total...). The core tool table also omitsmemory_vision_search.Suggested documentation updates
-Install agentmemory for OpenClaw. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to my OpenClaw MCP config so agentmemory is available with all 43 memory tools: +Install agentmemory for OpenClaw. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to my OpenClaw MCP config so agentmemory is available with all 45 memory tools: -Install agentmemory for Hermes. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to ~/.hermes/config.yaml so Hermes can use agentmemory as an MCP server with all 43 memory tools: +Install agentmemory for Hermes. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to ~/.hermes/config.yaml so Hermes can use agentmemory as an MCP server with all 45 memory tools: -### 44 Tools +### 45 Tools +| `memory_vision_search` | Search screenshots by text or similar image | + -<summary>Extended tools (44 total — set AGENTMEMORY_TOOLS=all)</summary> +<summary>Extended tools (45 total — set AGENTMEMORY_TOOLS=all)</summary>Based on learnings: When adding or removing MCP tools, update ALL of: tools-registry.ts, server.ts, triggers/api.ts, index.ts, test/mcp-standalone.test.ts, README.md, and plugin/.claude-plugin/plugin.json.
Also applies to: 596-620
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 337 - 366, README and some docs still reference inconsistent counts and omit memory_vision_search; update all mentions of tool counts and the core tool list so they match the current MCP registry. Edit README.md to replace any "43", "44", or "45" occurrences with the authoritative current count, add memory_vision_search to the core tools table, and ensure headings like "### 44 Tools" reflect the same number; then synchronize the code-side registries by confirming tools-registry.ts, server.ts, triggers/api.ts, index.ts, test/mcp-standalone.test.ts, and plugin/.claude-plugin/plugin.json include the same set (including memory_vision_search) so all counts and lists match across README.md and those files.src/functions/observe.ts (1)
136-155:⚠️ Potential issue | 🟠 MajorTrigger vision embedding only after the observation is persisted.
Lines 143-149 enqueue
mem::vision-embedbeforekv.set(...)succeeds. If persistence fails, the async embed can race with cleanup and leave a searchable vector for a deleted/uncommitted observation.Proposed ordering fix
+ let imageRefToEmbed: string | undefined; if (pendingImageData && (pendingImageData.startsWith("data:image/") || pendingImageData.startsWith("iVBORw0KGgo") || pendingImageData.startsWith("/9j/"))) { const { saveImageToDisk } = await import("../utils/image-store.js"); const { filePath, bytesWritten } = await saveImageToDisk(pendingImageData); raw.imageData = filePath; const { incrementImageRef } = await import("./image-refs.js"); await incrementImageRef(kv, filePath); sdk.triggerVoid("mem::disk-size-delta", { deltaBytes: bytesWritten }); - if (process.env["AGENTMEMORY_IMAGE_EMBEDDINGS"] === "true") { - sdk.triggerVoid("mem::vision-embed", { - imageRef: filePath, - sessionId: payload.sessionId, - observationId: obsId, - }); - } + imageRefToEmbed = filePath; } try { await kv.set(KV.observations(payload.sessionId), obsId, raw); + if (imageRefToEmbed && process.env["AGENTMEMORY_IMAGE_EMBEDDINGS"] === "true") { + sdk.triggerVoid("mem::vision-embed", { + imageRef: imageRefToEmbed, + sessionId: payload.sessionId, + observationId: obsId, + }); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/observe.ts` around lines 136 - 155, The mem::vision-embed trigger is being fired before the observation is persisted, which can create embeddings for observations that fail to save; change the ordering so that after saving image to disk (saveImageToDisk) and incrementing refs (incrementImageRef) you call kv.set(KV.observations(payload.sessionId), obsId, raw) first and only when that Promise resolves successfully then call sdk.triggerVoid("mem::vision-embed", { imageRef: filePath, sessionId: payload.sessionId, observationId: obsId }); keep the disk-size-delta trigger where it is (it can run before or after persistence) but ensure the vision embedding enqueue happens strictly after kv.set resolves to avoid embeddings for unpersisted observations.
🧹 Nitpick comments (2)
test/mcp-standalone.test.ts (1)
49-50: Assert the new tool explicitly, not only the count.This keeps the test tied to the
memory_vision_searchregistration rather than any arbitrary 12th core tool.Suggested test tightening
- it("CORE_TOOLS has 12 items", () => { + it("CORE_TOOLS includes memory_vision_search", () => { expect(CORE_TOOLS.length).toBe(12); + expect(CORE_TOOLS.some((tool) => tool.name === "memory_vision_search")).toBe(true); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/mcp-standalone.test.ts` around lines 49 - 50, The test only asserts CORE_TOOLS.length; update the spec to also assert that the new tool registration is present by name/ID so it doesn't pass due to any unrelated 12th item. In the test around CORE_TOOLS (the it block that reads CORE_TOOLS has 12 items) add an assertion that CORE_TOOLS contains the memory_vision_search tool (e.g., check CORE_TOOLS.some(t => t.name === 'memory_vision_search' or t.id === 'memory_vision_search') or expect(find(...) toBeDefined), keeping the length assertion but explicitly verifying the memory_vision_search entry exists.src/state/schema.ts (1)
45-45: Add a typed record for this new KV scope.
KV.imageEmbeddingsneeds a matching exported interface insrc/types.tsfor the stored embedding shape, including the vector plus metadata used for filtering such asimageRef,sessionId, andobservationId.Based on learnings: When adding new KV scopes, add to the KV object in src/state/schema.ts and add corresponding interface in src/types.ts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/state/schema.ts` at line 45, Add an exported typed record that matches the new KV scope KV.imageEmbeddings by creating an interface (e.g., ImageEmbeddingRecord) in src/types.ts that describes the stored embedding shape: include a numeric vector (number[]), required metadata fields imageRef, sessionId, observationId (strings), and any optional fields you need such as timestamp or extra metadata; export the interface and ensure the KV mapping for KV.imageEmbeddings uses this type so reads/writes are strongly typed (reference KV.imageEmbeddings and the new ImageEmbeddingRecord interface when updating usages).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/image-refs.ts`:
- Around line 24-25: Currently the code deletes KV.imageRefs first and
suppresses errors on deleting KV.imageEmbeddings, which can leave orphan
searchable embeddings; change the order to delete KV.imageEmbeddings before
KV.imageRefs and remove the .catch(() => {}) suppression so errors from
kv.delete(KV.imageEmbeddings, filePath) are allowed to surface (i.e., await
kv.delete(KV.imageEmbeddings, filePath) followed by await
kv.delete(KV.imageRefs, filePath)), ensuring failures prevent ref deletion and
avoiding orphan embeddings.
In `@src/functions/vision-search.ts`:
- Around line 35-53: The vision-embed path currently writes a StoredEmbedding to
KV.imageEmbeddings but doesn't record an audit entry; after the successful
kv.set in the try block (the block that constructs StoredEmbedding and calls
kv.set(KV.imageEmbeddings, data.imageRef, stored)), call recordAudit(...) to
emit an audit record for operation "mem::vision-embed" including relevant
metadata (at minimum imageRef, sessionId, observationId, modelName or provider
name, dimensions and updatedAt) so operators can trace the write; ensure you
import/use the existing recordAudit function and keep the audit call inside the
success path before returning the success object.
In `@src/mcp/tools-registry.ts`:
- Around line 129-143: Add the new tool "memory_vision_search" to the remaining
MCP surfaces: add test coverage in mcp-standalone.test.ts to exercise the
memory_vision_search tool (use the tool name "memory_vision_search" and its
inputSchema fields like queryText, queryImageBase64, queryImageRef, topK,
sessionId), update README.md to document AGENTMEMORY_IMAGE_EMBEDDINGS and usage
examples for memory_vision_search, and add the corresponding plugin metadata
entry in plugin/.claude-plugin/plugin.json so the plugin includes
memory_vision_search in its tools list and metadata.
In `@src/providers/embedding/clip.ts`:
- Around line 85-104: The heuristic in loadImage (function loadImage, returning
Promise<RawImageInstance>, using t.RawImage.fromBlob and readFile) incorrectly
tries to detect raw base64 by testing src.slice(0,64) and !src.includes("/"),
which rejects valid base64 payloads and misclassifies paths; remove that
heuristic and only treat inputs starting with "data:" as in-memory base64 blobs
(handled by the existing data: branch) and treat all other src values as
filesystem paths passed to readFile; if raw base64 support is required later,
add an explicit option or a clear "base64:" prefix parameter rather than
guessing the format.
In `@src/triggers/api.ts`:
- Around line 1282-1343: The api::vision-search and api::vision-embed handlers
currently forward req.body verbatim and only truthiness-check fields; instead,
whitelist and validate expected fields (e.g., for api::vision-search accept only
queryText:string, queryImageRef:string, queryImageBase64:string,
sessionId:string, topK:number and for api::vision-embed accept only
imageRef:string, sessionId:string, observationId:string), perform typeof checks,
clamp/normalize topK to an integer within acceptable bounds, build a new payload
object containing only those validated fields, and pass that payload to
sdk.trigger({ function_id: "mem::vision-search" }) / sdk.trigger({ function_id:
"mem::vision-embed" }) so downstream functions receive only well-typed,
whitelisted data.
---
Outside diff comments:
In `@README.md`:
- Around line 813-847: Add a new documented env var entry for
AGENTMEMORY_IMAGE_EMBEDDINGS in the configuration block under the "Features"
section so users can opt into image embedding support for the MCP tool; declare
it default false (e.g., AGENTMEMORY_IMAGE_EMBEDDINGS=false) and include a short
comment explaining that setting it to true enables image embeddings required by
the new MCP tool and may affect storage/token usage, similar to the other
AGENTMEMORY_* flags (reference the existing AGENTMEMORY_AUTO_COMPRESS and
AGENTMEMORY_INJECT_CONTEXT entries to match style).
- Around line 337-366: README and some docs still reference inconsistent counts
and omit memory_vision_search; update all mentions of tool counts and the core
tool list so they match the current MCP registry. Edit README.md to replace any
"43", "44", or "45" occurrences with the authoritative current count, add
memory_vision_search to the core tools table, and ensure headings like "### 44
Tools" reflect the same number; then synchronize the code-side registries by
confirming tools-registry.ts, server.ts, triggers/api.ts, index.ts,
test/mcp-standalone.test.ts, and plugin/.claude-plugin/plugin.json include the
same set (including memory_vision_search) so all counts and lists match across
README.md and those files.
In `@src/functions/observe.ts`:
- Around line 136-155: The mem::vision-embed trigger is being fired before the
observation is persisted, which can create embeddings for observations that fail
to save; change the ordering so that after saving image to disk
(saveImageToDisk) and incrementing refs (incrementImageRef) you call
kv.set(KV.observations(payload.sessionId), obsId, raw) first and only when that
Promise resolves successfully then call sdk.triggerVoid("mem::vision-embed", {
imageRef: filePath, sessionId: payload.sessionId, observationId: obsId }); keep
the disk-size-delta trigger where it is (it can run before or after persistence)
but ensure the vision embedding enqueue happens strictly after kv.set resolves
to avoid embeddings for unpersisted observations.
---
Nitpick comments:
In `@src/state/schema.ts`:
- Line 45: Add an exported typed record that matches the new KV scope
KV.imageEmbeddings by creating an interface (e.g., ImageEmbeddingRecord) in
src/types.ts that describes the stored embedding shape: include a numeric vector
(number[]), required metadata fields imageRef, sessionId, observationId
(strings), and any optional fields you need such as timestamp or extra metadata;
export the interface and ensure the KV mapping for KV.imageEmbeddings uses this
type so reads/writes are strongly typed (reference KV.imageEmbeddings and the
new ImageEmbeddingRecord interface when updating usages).
In `@test/mcp-standalone.test.ts`:
- Around line 49-50: The test only asserts CORE_TOOLS.length; update the spec to
also assert that the new tool registration is present by name/ID so it doesn't
pass due to any unrelated 12th item. In the test around CORE_TOOLS (the it block
that reads CORE_TOOLS has 12 items) add an assertion that CORE_TOOLS contains
the memory_vision_search tool (e.g., check CORE_TOOLS.some(t => t.name ===
'memory_vision_search' or t.id === 'memory_vision_search') or expect(find(...)
toBeDefined), keeping the length assertion but explicitly verifying the
memory_vision_search entry exists.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f40a010a-a1ea-4df6-8cff-61067a4bfd74
📒 Files selected for processing (15)
README.mdsrc/functions/image-refs.tssrc/functions/observe.tssrc/functions/vision-search.tssrc/index.tssrc/mcp/server.tssrc/mcp/tools-registry.tssrc/providers/embedding/clip.tssrc/providers/embedding/index.tssrc/providers/index.tssrc/state/schema.tssrc/triggers/api.tssrc/types.tstest/mcp-standalone.test.tstest/vision-search.test.ts
| { | ||
| name: "memory_vision_search", | ||
| description: | ||
| "Cross-modal image search via CLIP embeddings. Pass queryText to find screenshots matching a description, or queryImageBase64/queryImageRef to find similar images. Requires AGENTMEMORY_IMAGE_EMBEDDINGS=true.", | ||
| inputSchema: { | ||
| type: "object", | ||
| properties: { | ||
| queryText: { type: "string", description: "Text query (e.g. 'login form with error banner')" }, | ||
| queryImageRef: { type: "string", description: "Absolute path to a stored image to match against" }, | ||
| queryImageBase64: { type: "string", description: "Raw base64 image bytes or data URL" }, | ||
| topK: { type: "number", description: "Max results (default 10, max 50)" }, | ||
| sessionId: { type: "string", description: "Filter to a single session" }, | ||
| }, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
targets=(
"src/mcp/tools-registry.ts"
"src/mcp/server.ts"
"src/triggers/api.ts"
"src/index.ts"
"test/mcp-standalone.test.ts"
"README.md"
"plugin/.claude-plugin/plugin.json"
)
for target in "${targets[@]}"; do
if [[ ! -e "$target" ]]; then
printf '\nMISSING: %s\n' "$target"
continue
fi
printf '\n== %s ==\n' "$target"
rg -n 'memory_vision_search|mem::vision-search|vision-search|AGENTMEMORY_IMAGE_EMBEDDINGS' "$target" || true
doneRepository: rohitg00/agentmemory
Length of output: 1128
Update missing MCP surfaces for memory_vision_search.
The tool is registered in tools-registry.ts, server.ts, triggers/api.ts, and index.ts, but is missing from:
- test/mcp-standalone.test.ts — add test coverage
- README.md — add AGENTMEMORY_IMAGE_EMBEDDINGS documentation
- plugin/.claude-plugin/plugin.json — add plugin metadata
Per the MCP integration pattern, all seven surfaces must be updated when adding tools.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/mcp/tools-registry.ts` around lines 129 - 143, Add the new tool
"memory_vision_search" to the remaining MCP surfaces: add test coverage in
mcp-standalone.test.ts to exercise the memory_vision_search tool (use the tool
name "memory_vision_search" and its inputSchema fields like queryText,
queryImageBase64, queryImageRef, topK, sessionId), update README.md to document
AGENTMEMORY_IMAGE_EMBEDDINGS and usage examples for memory_vision_search, and
add the corresponding plugin metadata entry in plugin/.claude-plugin/plugin.json
so the plugin includes memory_vision_search in its tools list and metadata.
- src/functions/image-refs.ts: delete imageEmbeddings BEFORE imageRefs
and drop the silent .catch() so transient KV errors surface. Stops
orphan searchable embeddings when the ref delete fails halfway.
- src/functions/vision-search.ts: record an audit row for every
successful mem::vision-embed write (new "vision_embed" op). State-
changing functions must leave an audit trail per the repo policy.
- src/functions/vision-search.ts: mem::vision-search now wraps raw
queryImageBase64 into a data: URL before handing to embedImage, so
the provider contract stays "data: URL or filesystem path".
- src/providers/embedding/clip.ts: drop the raw-base64 heuristic in
loadImage. The old regex whitelisted "/" in the char class but the
!src.includes("/") guard rejected any realistic base64 blob,
falling through to readFile and throwing ENOENT. Now: data: URL or
filesystem path only; callers with raw base64 pass a data: URL.
Also drop unused RawImage.read type and collapse the redundant
Float32Array copy in embedImage.
- src/triggers/api.ts: both handlers now whitelist fields with
asNonEmptyString and parseOptionalPositiveInt before constructing
the payload. Matches api::observe / api::search style, stops
forwarding the raw req.body, clamps topK at the boundary.
- src/types.ts: add "vision_embed" to AuditEntry.operation union and
rename EmbeddingProvider.embedImage's param to src (matches the
implementation now that raw base64 isn't accepted).
- README.md, plugin/.claude-plugin/plugin.json: document
AGENTMEMORY_IMAGE_EMBEDDINGS and bump tool count string.
Full suite still 794/794.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
337-366:⚠️ Potential issue | 🟡 MinorMake the MCP tool counts and list consistent.
The README now advertises 45 tools, but nearby sections still say 43/44 tools and the extended list omits
memory_vision_search. Update the headings/prompts and add the new tool row so users can discover it. Based on learnings: When adding or removing MCP tools, update ALL of: tools-registry.ts, server.ts, triggers/api.ts, index.ts, test/mcp-standalone.test.ts, README.md, and plugin/.claude-plugin/plugin.json.📝 Proposed README fixes
-Install agentmemory for OpenClaw. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to my OpenClaw MCP config so agentmemory is available with all 43 memory tools: +Install agentmemory for OpenClaw. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to my OpenClaw MCP config so agentmemory is available with all 45 memory tools: -Install agentmemory for Hermes. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to ~/.hermes/config.yaml so Hermes can use agentmemory as an MCP server with all 43 memory tools: +Install agentmemory for Hermes. Run `npx `@agentmemory/agentmemory`` in a separate terminal to start the memory server on localhost:3111. Then add this to ~/.hermes/config.yaml so Hermes can use agentmemory as an MCP server with all 45 memory tools: -### 44 Tools +### 45 Tools -<summary>Extended tools (44 total — set AGENTMEMORY_TOOLS=all)</summary> +<summary>Extended tools (45 total — set AGENTMEMORY_TOOLS=all)</summary>Also add a row for:
| `memory_vision_search` | Cross-modal screenshot search via CLIP embeddings |Also applies to: 596-620
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 337 - 366, Update the README and related registries to make the MCP tool count and list consistent: change all advertised tool counts to 45 and add the missing memory_vision_search tool entry to the tools list in README.md; then propagate the addition/removal to the canonical code locations — tools-registry.ts, server.ts, triggers/api.ts, index.ts, test/mcp-standalone.test.ts, and plugin/.claude-plugin/plugin.json — ensuring each file registers or tests the new memory_vision_search symbol (and any updated tool-count constants) so documentation and code remain in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/vision-search.ts`:
- Line 75: The topK calculation can become NaN or fractional (since Math.max(1,
NaN) yields NaN), so normalize data?.topK to a finite integer before clamping:
coerce the incoming value (data?.topK) to a Number, guard with Number.isFinite,
fall back to a default (e.g., 10) if invalid, then use Math.floor to remove
fractions and finally clamp with Math.min(50, Math.max(1, normalizedTopK));
update the expression that sets topK in vision-search.ts to use this
normalizedTopK variable.
- Around line 33-37: The code currently calls imageProvider.embedImage(...) with
raw data.imageRef and data.queryImageRef, which allows arbitrary filesystem
reads; before calling embedImage, verify that the provided ref exists in the
managed store by checking KV.imageRefs.get(imageRef) (and likewise
KV.imageRefs.get(queryImageRef)) and return an error if not found or invalid;
update the validation around the embedImage call in the vision-search handler
(where embedImage is invoked) to perform this KV.imageRefs lookup for both
data.imageRef and data.queryImageRef and only call imageProvider.embedImage when
the KV check succeeds.
---
Outside diff comments:
In `@README.md`:
- Around line 337-366: Update the README and related registries to make the MCP
tool count and list consistent: change all advertised tool counts to 45 and add
the missing memory_vision_search tool entry to the tools list in README.md; then
propagate the addition/removal to the canonical code locations —
tools-registry.ts, server.ts, triggers/api.ts, index.ts,
test/mcp-standalone.test.ts, and plugin/.claude-plugin/plugin.json — ensuring
each file registers or tests the new memory_vision_search symbol (and any
updated tool-count constants) so documentation and code remain in sync.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5b217ef9-9e54-4279-a256-f46ec9b6d3a5
📒 Files selected for processing (7)
README.mdplugin/.claude-plugin/plugin.jsonsrc/functions/image-refs.tssrc/functions/vision-search.tssrc/providers/embedding/clip.tssrc/triggers/api.tssrc/types.ts
✅ Files skipped from review due to trivial changes (1)
- plugin/.claude-plugin/plugin.json
🚧 Files skipped from review as they are similar to previous changes (3)
- src/functions/image-refs.ts
- src/types.ts
- src/triggers/api.ts
CodeRabbit major finding on mem::vision-embed: an authenticated caller
could POST { "imageRef": "/etc/passwd" } and the CLIP provider would
happily readFile that path and store its vector. Same vector on the
queryImageRef path of mem::vision-search. Arbitrary filesystem probing
via a REST endpoint.
- src/functions/vision-search.ts: both handlers now require the
incoming ref to satisfy isManagedImagePath() and to exist in
KV.imageRefs. Either check failing short-circuits with a specific
error. Only paths that went through the managed image store (and
still have at least one live reference) can be embedded or used as
a query target.
- src/functions/vision-search.ts: topK is now normalised before
clamping. `Math.max(1, NaN)` is NaN, so `slice(0, NaN)` returned
nothing for malformed input. Coerce to finite integer, default 10,
then clamp to [1, 50].
- test/vision-search.test.ts: imageRefs use the managed path, seed
KV.imageRefs before each embed, and add coverage for
managed-path rejection, unregistered-ref rejection, queryImageRef
rejection, and NaN/fractional topK clamp. 10/10 pass locally, 798
in the full suite.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/vision-search.ts`:
- Around line 93-97: The inline base64 image path (data.queryImageBase64)
bypasses image-store limits and is passed directly to imageProvider.embedImage;
fix by validating and bounding the decoded byte length before constructing the
data URL and calling embedImage: decode the base64 string to bytes, enforce a
configurable maximum (e.g., 1MB or project limit) and throw a validation error
if exceeded, only then build the `data:image/...;base64,` string and call
`imageProvider.embedImage(b64)`; also ensure this input validation runs as part
of the function registration/validation (sdk.registerFunction) for the
vision-search handler and record the check outcome via recordAudit() so
oversized attempts are rejected and audited.
- Around line 40-42: The refCount read via kv.get<number>(KV.imageRefs,
data.imageRef) can be a non-numeric/truthy value; coerce it immediately to a
Number (e.g., const count = Number(refCount)) and validate that
Number.isFinite(count) && count > 0 before accepting the imageRef, returning the
error when that check fails; update the same pattern used later (the other
occurrence around the 102-104 region) to perform the single coercion and the
finite positive check so malformed mem:image-refs rows are rejected consistently
(refer to kv.get, KV.imageRefs and the refCount variable for locating code).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 35e3ae9f-42b3-4cc5-aeef-64315a063ad4
📒 Files selected for processing (2)
src/functions/vision-search.tstest/vision-search.test.ts
✅ Files skipped from review due to trivial changes (1)
- test/vision-search.test.ts
| const refCount = await kv.get<number>(KV.imageRefs, data.imageRef); | ||
| if (!refCount || Number(refCount) < 1) { | ||
| return { success: false, error: "imageRef not registered in mem:image-refs" }; |
There was a problem hiding this comment.
Reject malformed imageRefs rows instead of treating them as registered.
kv.get<number>() does not enforce the runtime value. A truthy non-numeric value can make Number(refCount) < 1 evaluate false, allowing corrupted mem:image-refs rows to pass validation. Coerce once and require a finite positive count in both paths. As per coding guidelines: "src/{functions,triggers}/**/*.ts: Register functions using sdk.registerFunction() with validation of inputs, work via kv.get/kv.set/kv.list, and record audit operations via recordAudit()."
🛡️ Proposed validation hardening
- const refCount = await kv.get<number>(KV.imageRefs, data.imageRef);
- if (!refCount || Number(refCount) < 1) {
+ const refCount = Number(await kv.get<number>(KV.imageRefs, data.imageRef));
+ if (!Number.isFinite(refCount) || refCount < 1) {
return { success: false, error: "imageRef not registered in mem:image-refs" };
}
...
- const refCount = await kv.get<number>(KV.imageRefs, data.queryImageRef);
- if (!refCount || Number(refCount) < 1) {
+ const refCount = Number(await kv.get<number>(KV.imageRefs, data.queryImageRef));
+ if (!Number.isFinite(refCount) || refCount < 1) {
return { success: false, error: "queryImageRef not registered in mem:image-refs" };
}Also applies to: 102-104
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/vision-search.ts` around lines 40 - 42, The refCount read via
kv.get<number>(KV.imageRefs, data.imageRef) can be a non-numeric/truthy value;
coerce it immediately to a Number (e.g., const count = Number(refCount)) and
validate that Number.isFinite(count) && count > 0 before accepting the imageRef,
returning the error when that check fails; update the same pattern used later
(the other occurrence around the 102-104 region) to perform the single coercion
and the finite positive check so malformed mem:image-refs rows are rejected
consistently (refer to kv.get, KV.imageRefs and the refCount variable for
locating code).
| } else if (data?.queryImageBase64) { | ||
| const b64 = data.queryImageBase64.startsWith("data:") | ||
| ? data.queryImageBase64 | ||
| : `data:image/png;base64,${data.queryImageBase64}`; | ||
| queryVec = await imageProvider.embedImage(b64); |
There was a problem hiding this comment.
Bound inline image queries before decoding them.
queryImageBase64 bypasses the managed image-store limits and is passed straight into embedImage(). Add a decoded-byte cap before constructing the data URL so an authenticated caller cannot force unbounded image decoding/model work. As per coding guidelines: "src/{functions,triggers}/**/*.ts: Register functions using sdk.registerFunction() with validation of inputs, work via kv.get/kv.set/kv.list, and record audit operations via recordAudit()."
🛡️ Proposed guard
} else if (data?.queryImageBase64) {
+ const inlineBase64 = data.queryImageBase64.startsWith("data:")
+ ? data.queryImageBase64.split(",", 2)[1]
+ : data.queryImageBase64;
+ if (!inlineBase64) {
+ return { success: false, error: "queryImageBase64 invalid" };
+ }
+ const estimatedBytes =
+ Math.floor((inlineBase64.length * 3) / 4) -
+ (inlineBase64.endsWith("==") ? 2 : inlineBase64.endsWith("=") ? 1 : 0);
+ if (estimatedBytes > 10 * 1024 * 1024) {
+ return { success: false, error: "queryImageBase64 too large" };
+ }
const b64 = data.queryImageBase64.startsWith("data:")
? data.queryImageBase64
: `data:image/png;base64,${data.queryImageBase64}`;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if (data?.queryImageBase64) { | |
| const b64 = data.queryImageBase64.startsWith("data:") | |
| ? data.queryImageBase64 | |
| : `data:image/png;base64,${data.queryImageBase64}`; | |
| queryVec = await imageProvider.embedImage(b64); | |
| } else if (data?.queryImageBase64) { | |
| const inlineBase64 = data.queryImageBase64.startsWith("data:") | |
| ? data.queryImageBase64.split(",", 2)[1] | |
| : data.queryImageBase64; | |
| if (!inlineBase64) { | |
| return { success: false, error: "queryImageBase64 invalid" }; | |
| } | |
| const estimatedBytes = | |
| Math.floor((inlineBase64.length * 3) / 4) - | |
| (inlineBase64.endsWith("==") ? 2 : inlineBase64.endsWith("=") ? 1 : 0); | |
| if (estimatedBytes > 10 * 1024 * 1024) { | |
| return { success: false, error: "queryImageBase64 too large" }; | |
| } | |
| const b64 = data.queryImageBase64.startsWith("data:") | |
| ? data.queryImageBase64 | |
| : `data:image/png;base64,${data.queryImageBase64}`; | |
| queryVec = await imageProvider.embedImage(b64); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/vision-search.ts` around lines 93 - 97, The inline base64 image
path (data.queryImageBase64) bypasses image-store limits and is passed directly
to imageProvider.embedImage; fix by validating and bounding the decoded byte
length before constructing the data URL and calling embedImage: decode the
base64 string to bytes, enforce a configurable maximum (e.g., 1MB or project
limit) and throw a validation error if exceeded, only then build the
`data:image/...;base64,` string and call `imageProvider.embedImage(b64)`; also
ensure this input validation runs as part of the function
registration/validation (sdk.registerFunction) for the vision-search handler and
record the check outcome via recordAudit() so oversized attempts are rejected
and audited.
…bal scope
Introduces memory slots as a new first-class memory primitive, orthogonal
to observations / memories / graph. A slot is a labelled, size-limited,
agent-editable chunk of context. 8 defaults seed on first run
(persona, user_preferences, tool_guidelines, project_context, guidance,
pending_items, session_patterns, self_notes). Agent reads/writes via
MCP tools; pinned slots serialise cleanly into SessionStart context.
Functions (`src/functions/slots.ts`):
- mem::slot-list
- mem::slot-get
- mem::slot-create
- mem::slot-append (refuses writes over sizeLimit)
- mem::slot-replace (refuses content over sizeLimit)
- mem::slot-delete
- mem::slot-reflect (used by idle reflection)
REST (`src/triggers/api.ts`):
- GET /agentmemory/slots
- GET /agentmemory/slot?label=...
- POST /agentmemory/slot
- POST /agentmemory/slot/append
- POST /agentmemory/slot/replace
- DELETE /agentmemory/slot?label=...
- POST /agentmemory/slot/reflect
MCP tools (`src/mcp/tools-registry.ts`, `src/mcp/server.ts`):
- memory_slot_list
- memory_slot_get
- memory_slot_create
- memory_slot_append
- memory_slot_replace
- memory_slot_delete
event::session::stopped fires mem::slot-reflect via triggerVoid when
AGENTMEMORY_REFLECT=true. The reflector reads the last N observations,
extracts TODO-flavoured titles into pending_items, counts error /
command types into session_patterns, and records touched files in
project_context. Fire-and-forget: does not block summarise.
Slots carry a scope field ("project" | "global"). Global slots live at
KV.globalSlots ("mem:slots:global") — one brain across all projects for
things like persona / user_preferences / tool_guidelines. Project-
scoped slot with the same label shadows the global one on read.
- AGENTMEMORY_SLOTS=true — enable the whole primitive
- AGENTMEMORY_REFLECT=true — enable Stop-hook reflection (requires SLOTS)
Both default OFF. Matches the opt-in convention of auto-compress (#138),
inject-context (#143), vision embeddings (#179).
44 MCP tools → 50 across README (stats badge, install instructions,
config block) and plugin/.claude-plugin/plugin.json description. New
AGENTMEMORY_SLOTS and AGENTMEMORY_REFLECT entries in the config block.
test/slots.test.ts — 12 new cases: default seeding by scope, label
validation, create/get/append/replace/delete round-trip, duplicate-
create rejection, sizeLimit enforcement on append + replace, project
shadows global, listPinnedSlots + renderPinnedContext, reflect no-op
on empty session, reflect updates pending_items / session_patterns /
project_context from TODO + error + file signals.
Full suite: 800/800 pass (+12 new). Build clean.
Summary
Phase 1 of the multimodal plan. Builds on PR #111 (image store + ref counting + VLM description). Adds the missing piece: a direct visual embedding space so agents can search screenshots by natural language or by similar image, without routing through VLM-describe → BM25-text.
Depends on #111. This PR is stacked on top of
fix/multimodal-conflicts. Rebase cleanly once #111 merges.Motivation
vision_captureinvocation.Full design doc lives locally at
~/specs-backup/agentmemory-multimodal-plan.md(per our no-SPEC-in-PR convention).What ships
New
ClipEmbeddingProvider— CLIP ViT-B/32 (Xenova/clip-vit-base-patch32) via@xenova/transformers(already a dep). 512-dim normalized vectors. Handles base64, data URL, and file paths. Lazy model download on first call.createImageEmbeddingProvider()— gated onAGENTMEMORY_IMAGE_EMBEDDINGS=true. Default OFF, same opt-in pattern asAGENTMEMORY_AUTO_COMPRESS(Busting token allocation #138) andAGENTMEMORY_INJECT_CONTEXT(bursting token allocation (re-open) #143).KV.imageEmbeddingsscope keyed by imageRef path.mem::vision-embed— stores a CLIP vector for a givenimageRef.mem::vision-search— cosine similarity over stored vectors. Accepts text query (text→image cross-modal), image path (image→image), or base64.topKclamped to 50. OptionalsessionIdfilter.POST /agentmemory/vision-embed,POST /agentmemory/vision-search— auth-gated HTTP triggers, 503 when the flag is off.memory_vision_search— exposed via the running-server handler insrc/mcp/server.ts(standalone MCP unchanged by design; this is server-backed).Wired into existing flows
EmbeddingProvider.embedImage?()— optional slot on the interface. Non-breaking.mem::vision-embedviatriggerVoidso embedding runs off the critical path.decrementImageRefnow drops the embedding alongside the disk file + ref row. Forget / evict / retention won't orphan vectors.README / counts
MCP tool count 44 → 45 across README stats, install instructions, config block.
CORE_TOOLStest expectation bumped accordingly.Design choices
memory_vision_searchserver-onlyTest plan
npm run build— cleannpm test— 794/794 pass (+6 new vision-search tests)AGENTMEMORY_IMAGE_EMBEDDINGS=true npx @agentmemory/agentmemory, hook-capture a screenshot,curl -sX POST :3111/agentmemory/vision-search -d '{"queryText":"error banner"}'decrementImageRefcleans up the vector when refcount hits 0 (via evict dry-run then real run)Not in this PR (Phase 2+)
/images/thumb/{sha})mem::smart-search(reciprocal rank fusion)vision_diff, OCR fallback, video keyframesShip Phase 1 first, watch real usage, then pick the next axis.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests