feat(search): add optional project/cwd filter to mem::search and /agentmemory/search#105
Conversation
…ntmemory/search Closes rohitg00#104 The /agentmemory/search HTTP endpoint and the underlying mem::search function previously accepted only { query, limit }, returning hits across all projects on the host. Sessions are already created with { project, cwd } via session::start, so this just exposes that scope on search. - mem::search now accepts optional project and cwd. When set, results are filtered by looking up each hit's owning Session. - To keep the post-filter from starving the result set, the index fetch is over-fetched (max(limit*10, 100)) when filtering, then trimmed to limit after filter. - Session lookups are cached per call. - api::search forwards the new params; fully backward compatible.
📝 WalkthroughWalkthroughSearch now accepts optional Changes
Sequence DiagramsequenceDiagram
rect rgba(200,200,255,0.5)
actor Client
end
participant API as api::search
participant Search as mem::search
participant Index as Search Index
participant Cache as Session Cache
participant Store as Session Store
participant Obs as Observations
Client->>API: POST /search { query, limit?, project?, cwd? }
API->>Search: trigger("mem::search", normalizedBody)
Search->>Index: query with fetchLimit / effectiveLimit
Index-->>Search: candidate results (with sessionId list)
alt filters present
loop per sessionId
Search->>Cache: loadSession(sessionId)
alt cache hit
Cache-->>Search: Session
else
Search->>Store: fetch session(sessionId)
Store-->>Search: Session
Search->>Cache: cache session
end
Search->>Search: apply project/cwd check
end
end
Search->>Obs: load observations for filtered candidates (parallel)
Obs-->>Search: observation details
Search-->>API: filtered results
API-->>Client: 200 { results }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/triggers/api.ts (1)
123-128:⚠️ Potential issue | 🟠 MajorValidate and whitelist
/agentmemory/searchbefore forwarding it.This still passes
req.bodystraight through, so malformed filters and arbitrary extra fields reachmem::search. Please validatequery/limit/project/cwdhere and forward only the allowed shape.Suggested hardening
async ( req: ApiRequest<{ query: string; limit?: number; project?: string; cwd?: string }>, ): Promise<Response> => { const authErr = checkAuth(req, secret); if (authErr) return authErr; - const result = await sdk.trigger("mem::search", req.body); + const body = (req.body as Record<string, unknown>) || {}; + if ( + typeof body.query !== "string" || + !body.query.trim() + ) { + return { status_code: 400, body: { error: "query is required" } }; + } + if ( + body.limit !== undefined && + (!Number.isInteger(body.limit) || (body.limit as number) < 0) + ) { + return { + status_code: 400, + body: { error: "limit must be a non-negative integer" }, + }; + } + if (body.project !== undefined && typeof body.project !== "string") { + return { status_code: 400, body: { error: "project must be a string" } }; + } + if (body.cwd !== undefined && typeof body.cwd !== "string") { + return { status_code: 400, body: { error: "cwd must be a string" } }; + } + const result = await sdk.trigger("mem::search", { + query: body.query.trim(), + limit: body.limit as number | undefined, + project: body.project as string | undefined, + cwd: body.cwd as string | undefined, + }); return { status_code: 200, body: result }; },As per coding guidelines, REST endpoint handlers must validate inputs, whitelist request body fields, and never pass raw request body directly to
sdk.trigger().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/triggers/api.ts` around lines 123 - 128, The handler currently forwards req.body directly to sdk.trigger("mem::search") without validation; replace that by validating and whitelisting the allowed fields (query: non-empty string, limit: optional integer within acceptable bounds, project: optional string, cwd: optional string) before calling sdk.trigger. Use the ApiRequest<{ query: string; limit?: number; project?: string; cwd?: string }> shape to parse and coerce/validate types, reject or return a 4xx Response on invalid input (e.g., missing or non-string query, non-integer/out-of-range limit), and construct a new payload object containing only { query, limit, project, cwd } to pass to sdk.trigger("mem::search", payload) instead of forwarding req.body. Ensure you still call checkAuth(req, secret) first and preserve the existing success response shape.src/functions/search.ts (1)
57-71:⚠️ Potential issue | 🟠 MajorNormalize and validate
limitbefore over-fetching.
limit: 0currently expands to 20 results here, and other malformed values can still distortfetchLimitor leak through toidx.search(). Sinceproject/cwdare now part of the callable surface, this path should reject or normalize invalid runtime inputs instead of assuming callers always send a sane shape.As per coding guidelines, TypeScript functions must use
sdk.registerFunction()pattern with input validation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/search.ts` around lines 57 - 71, Validate and normalize the incoming "limit" before computing fetchLimit and calling idx.search: switch this handler to the sdk.registerFunction() input-validation pattern (define a schema for { query: string; limit?: number; project?: string; cwd?: string }) that enforces limit is an integer ≥ 1 (and optionally a sensible max like 100) and provides a default of 20; use the validated value (avoid shadowing the variable name) when computing filtering and fetchLimit, and reject/return an error for malformed inputs instead of allowing 0 or non-integer values to influence fetchLimit or be passed to idx.search() (refer to the async handler signature, sdk.registerFunction usage, getSearchIndex(), rebuildIndex(), and idx.search()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/search.ts`:
- Around line 97-102: The info log in ctx.logger.info currently emits the raw
data.cwd which may leak host paths; change the log to avoid emitting the
absolute cwd by replacing data.cwd with a redacted indicator or a boolean flag
(e.g., hasCwdFilter = Boolean(data.cwd)) and, if desired, include a
sanitized/half-masked cwd only when safe. Update the call sites around
ctx.logger.info in search.ts so the logged payload uses hasCwdFilter (and/or a
sanitizedCwd variable) instead of raw data.cwd, keeping other fields (query,
results, project) unchanged.
---
Outside diff comments:
In `@src/functions/search.ts`:
- Around line 57-71: Validate and normalize the incoming "limit" before
computing fetchLimit and calling idx.search: switch this handler to the
sdk.registerFunction() input-validation pattern (define a schema for { query:
string; limit?: number; project?: string; cwd?: string }) that enforces limit is
an integer ≥ 1 (and optionally a sensible max like 100) and provides a default
of 20; use the validated value (avoid shadowing the variable name) when
computing filtering and fetchLimit, and reject/return an error for malformed
inputs instead of allowing 0 or non-integer values to influence fetchLimit or be
passed to idx.search() (refer to the async handler signature,
sdk.registerFunction usage, getSearchIndex(), rebuildIndex(), and idx.search()).
In `@src/triggers/api.ts`:
- Around line 123-128: The handler currently forwards req.body directly to
sdk.trigger("mem::search") without validation; replace that by validating and
whitelisting the allowed fields (query: non-empty string, limit: optional
integer within acceptable bounds, project: optional string, cwd: optional
string) before calling sdk.trigger. Use the ApiRequest<{ query: string; limit?:
number; project?: string; cwd?: string }> shape to parse and coerce/validate
types, reject or return a 4xx Response on invalid input (e.g., missing or
non-string query, non-integer/out-of-range limit), and construct a new payload
object containing only { query, limit, project, cwd } to pass to
sdk.trigger("mem::search", payload) instead of forwarding req.body. Ensure you
still call checkAuth(req, secret) first and preserve the existing success
response shape.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: de0cd3d8-dac5-402a-a4bc-1e108ffc0092
📒 Files selected for processing (2)
src/functions/search.tssrc/triggers/api.ts
Address CodeRabbit review on rohitg00#105: - mem::search now validates query (non-empty string), limit (positive integer, capped at MAX_LIMIT=100), project and cwd (must be strings if provided). Throws on invalid input instead of silently coercing. - Renamed local 'limit' to 'effectiveLimit' to avoid shadowing. - ctx.logger no longer emits raw cwd/project (host paths) — logs only hasProjectFilter/hasCwdFilter booleans. - api::search whitelists request body fields, returns 400 on malformed input, and constructs an explicit payload for sdk.trigger instead of forwarding req.body.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/functions/search.ts (1)
97-110: Consider parallel observation fetching for improved throughput.The current sequential
awaitin the loop works correctly but could be slower when many results pass the filter. An optional optimization would be to batch-fetch observations in parallel after filtering sessions.♻️ Optional: Parallel observation loading
- const enriched: SearchResult[] = [] - for (const r of results) { - if (enriched.length >= effectiveLimit) break - if (filtering) { - const s = await loadSession(r.sessionId) - if (!s) continue - if (projectFilter && s.project !== projectFilter) continue - if (cwdFilter && s.cwd !== cwdFilter) continue - } - const obs = await kv.get<CompressedObservation>(KV.observations(r.sessionId), r.obsId) - if (obs) { - enriched.push({ observation: obs, score: r.score, sessionId: r.sessionId }) - } - } + // First pass: filter by session (sequential due to caching benefits) + const candidates: typeof results = [] + for (const r of results) { + if (candidates.length >= effectiveLimit) break + if (filtering) { + const s = await loadSession(r.sessionId) + if (!s) continue + if (projectFilter && s.project !== projectFilter) continue + if (cwdFilter && s.cwd !== cwdFilter) continue + } + candidates.push(r) + } + + // Second pass: load observations in parallel + const obsResults = await Promise.all( + candidates.map((r) => kv.get<CompressedObservation>(KV.observations(r.sessionId), r.obsId)) + ) + const enriched: SearchResult[] = [] + for (let i = 0; i < candidates.length; i++) { + const obs = obsResults[i] + if (obs) { + enriched.push({ observation: obs, score: candidates[i].score, sessionId: candidates[i].sessionId }) + } + }As per coding guidelines: "Perform parallel operations where possible using Promise.all for independent kv writes/reads in TypeScript"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/search.ts` around lines 97 - 110, The loop that builds enriched (variable enriched) fetches sessions via loadSession and observations via kv.get sequentially which hurts throughput; after applying the session-level filters (filtering, projectFilter, cwdFilter) over results[], collect up to effectiveLimit candidate pairs (sessionId and obsId), then perform a batched parallel fetch of observations using Promise.all on kv.get(KV.observations(sessionId), obsId) for each candidate, filter out missing observations, and push the resulting { observation, score, sessionId } entries into enriched in the same order (or trim to effectiveLimit). Update the code paths around results, filtering, loadSession, KV.observations, kv.get, and enriched.push to use this two-phase (filter then parallel fetch) approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/functions/search.ts`:
- Around line 97-110: The loop that builds enriched (variable enriched) fetches
sessions via loadSession and observations via kv.get sequentially which hurts
throughput; after applying the session-level filters (filtering, projectFilter,
cwdFilter) over results[], collect up to effectiveLimit candidate pairs
(sessionId and obsId), then perform a batched parallel fetch of observations
using Promise.all on kv.get(KV.observations(sessionId), obsId) for each
candidate, filter out missing observations, and push the resulting {
observation, score, sessionId } entries into enriched in the same order (or trim
to effectiveLimit). Update the code paths around results, filtering,
loadSession, KV.observations, kv.get, and enriched.push to use this two-phase
(filter then parallel fetch) approach.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 11b0ae5f-c2ca-4b6b-bc35-e507d1cb2f19
📒 Files selected for processing (2)
src/functions/search.tssrc/triggers/api.ts
Address CodeRabbit nitpick on rohitg00#105: split the result-building loop into two phases. First filter sequentially (benefits from the per-call session cache), then batch the kv.get(observations) calls through Promise.all instead of awaiting them one at a time.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/functions/search.ts (1)
88-108: Preload unique sessions in parallel instead of serializingkv.getcalls.When the hit list spans many sessions, awaiting
loadSession()inside the ranking loop turns this path into a chain of round-trips. The reads are independent, so prefetching distinctsessionIds first will keep the ordering logic intact without paying that latency tax.♻️ Suggested refactor
const sessionCache = new Map<string, Session | null>() - const loadSession = async (sessionId: string): Promise<Session | null> => { - if (sessionCache.has(sessionId)) return sessionCache.get(sessionId)! - const s = await kv.get<Session>(KV.sessions, sessionId) - sessionCache.set(sessionId, s ?? null) - return s ?? null - } + if (filtering) { + await Promise.all( + [...new Set(results.map((r) => r.sessionId))].map(async (sessionId) => { + const s = await kv.get<Session>(KV.sessions, sessionId) + sessionCache.set(sessionId, s ?? null) + }) + ) + } const candidates: typeof results = [] for (const r of results) { if (candidates.length >= effectiveLimit) break if (filtering) { - const s = await loadSession(r.sessionId) + const s = sessionCache.get(r.sessionId) ?? null if (!s) continue if (projectFilter && s.project !== projectFilter) continue if (cwdFilter && s.cwd !== cwdFilter) continue } candidates.push(r)As per coding guidelines, "Perform parallel operations where possible using Promise.all for independent kv writes/reads in TypeScript".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/functions/search.ts` around lines 88 - 108, The current loop awaits loadSession(sessionId) for each result, serializing kv.get calls; instead, prefetch unique sessionIds in parallel before the first-pass filtering: collect distinct sessionIds from results (up to effectiveLimit worth of candidates if filtering applies), call Promise.all on kv.get for those ids to populate sessionCache, then run the existing filtering loop using sessionCache without awaiting inside it. Update the helpers referenced (sessionCache, loadSession, kv.get, candidates, results, effectiveLimit) so loadSession reads from sessionCache synchronously after the parallel preload.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/functions/search.ts`:
- Around line 74-75: The projectFilter and cwdFilter are set without trimming so
values like " repo " or strings of only whitespace become ineffective filters;
update the logic where projectFilter and cwdFilter are computed (using
data.project and data.cwd) to trim the input first and then check length—i.e.,
compute const projectVal = typeof data.project === 'string' ?
data.project.trim() : '' and similarly for cwdVal, then set projectFilter =
projectVal.length ? projectVal : undefined and cwdFilter = cwdVal.length ?
cwdVal : undefined so only non-empty, trimmed strings enable filtering.
---
Nitpick comments:
In `@src/functions/search.ts`:
- Around line 88-108: The current loop awaits loadSession(sessionId) for each
result, serializing kv.get calls; instead, prefetch unique sessionIds in
parallel before the first-pass filtering: collect distinct sessionIds from
results (up to effectiveLimit worth of candidates if filtering applies), call
Promise.all on kv.get for those ids to populate sessionCache, then run the
existing filtering loop using sessionCache without awaiting inside it. Update
the helpers referenced (sessionCache, loadSession, kv.get, candidates, results,
effectiveLimit) so loadSession reads from sessionCache synchronously after the
parallel preload.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| const projectFilter = typeof data.project === 'string' && data.project.length > 0 ? data.project : undefined | ||
| const cwdFilter = typeof data.cwd === 'string' && data.cwd.length > 0 ? data.cwd : undefined |
There was a problem hiding this comment.
Trim project/cwd before enabling the filter.
These values are matched verbatim, so " repo " will miss an otherwise matching session, and " " becomes an active filter that can only return zero results. src/triggers/api.ts forwards both fields unchanged, so this is user-visible on the HTTP path too.
💡 Proposed fix
- const projectFilter = typeof data.project === 'string' && data.project.length > 0 ? data.project : undefined
- const cwdFilter = typeof data.cwd === 'string' && data.cwd.length > 0 ? data.cwd : undefined
+ const projectFilter =
+ typeof data.project === 'string' && data.project.trim().length > 0
+ ? data.project.trim()
+ : undefined
+ const cwdFilter =
+ typeof data.cwd === 'string' && data.cwd.trim().length > 0
+ ? data.cwd.trim()
+ : undefined🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/functions/search.ts` around lines 74 - 75, The projectFilter and
cwdFilter are set without trimming so values like " repo " or strings of only
whitespace become ineffective filters; update the logic where projectFilter and
cwdFilter are computed (using data.project and data.cwd) to trim the input first
and then check length—i.e., compute const projectVal = typeof data.project ===
'string' ? data.project.trim() : '' and similarly for cwdVal, then set
projectFilter = projectVal.length ? projectVal : undefined and cwdFilter =
cwdVal.length ? cwdVal : undefined so only non-empty, trimmed strings enable
filtering.
Closes #104
Summary
mem::searchand/agentmemory/searchnow accept optionalprojectandcwdparamsSessionmatchesWhy
Sessions are tagged with
{ project, cwd }atsession::start, but/agentmemory/searchignored that and returned hits across all projects on the host. For multi-project setups (e.g. agentmemory as aUserPromptSubmitcontext-injection hook in Claude Code), this contaminates the injected context with unrelated projects./agentmemory/contextalready has aprojectparam — this just brings/searchin line.Implementation notes
max(limit * 10, 100)) when filtering is active, then trimmed tolimit.Test plan
tsc --noEmitclean for the changed files (pre-existing errors elsewhere onmainare unrelated)/agentmemory/searchwith{ query, project }returns only matching project's observationsproject/cwdreturns identical results to beforeHappy to add a unit test for the filter path if you'd like — let me know your preferred shape (the existing tests in
test/enrich.test.tsmockmem::searchrather than exercising it directly).Summary by CodeRabbit