Skip to content

feat(search): add optional project/cwd filter to mem::search and /agentmemory/search#105

Merged
rohitg00 merged 3 commits intorohitg00:mainfrom
Getty:feat/search-project-filter
Apr 9, 2026
Merged

feat(search): add optional project/cwd filter to mem::search and /agentmemory/search#105
rohitg00 merged 3 commits intorohitg00:mainfrom
Getty:feat/search-project-filter

Conversation

@Getty
Copy link
Copy Markdown
Contributor

@Getty Getty commented Apr 9, 2026

Closes #104

Summary

  • mem::search and /agentmemory/search now accept optional project and cwd params
  • When set, results are filtered to observations whose owning Session matches
  • Fully backward compatible — both params are optional, existing callers unaffected

Why

Sessions are tagged with { project, cwd } at session::start, but /agentmemory/search ignored that and returned hits across all projects on the host. For multi-project setups (e.g. agentmemory as a UserPromptSubmit context-injection hook in Claude Code), this contaminates the injected context with unrelated projects. /agentmemory/context already has a project param — this just brings /search in line.

Implementation notes

  • Filtering happens after the index hit list is built. To keep the post-filter from starving the result set, the index fetch is over-fetched (max(limit * 10, 100)) when filtering is active, then trimmed to limit.
  • Session lookups are cached per call to avoid repeat KV reads when many hits share a session.
  • No new index — sessions are already a small set, lookup cost is bounded.

Test plan

  • tsc --noEmit clean for the changed files (pre-existing errors elsewhere on main are unrelated)
  • Manual: POST /agentmemory/search with { query, project } returns only matching project's observations
  • Manual: omitting project/cwd returns identical results to before

Happy to add a unit test for the filter path if you'd like — let me know your preferred shape (the existing tests in test/enrich.test.ts mock mem::search rather than exercising it directly).

Summary by CodeRabbit

  • New Features
    • Added optional project and working-directory filters to search for more precise results.
  • Improvements
    • Enforced input validation and sensible result limits (default/cap) for reliability.
    • Over-fetched and post-filtered results to maintain effective limits when filters apply.
    • Introduced session-aware enrichment with per-session caching for more accurate context.
    • Improved search logging to use trimmed queries and boolean filter flags.

…ntmemory/search

Closes rohitg00#104

The /agentmemory/search HTTP endpoint and the underlying mem::search
function previously accepted only { query, limit }, returning hits
across all projects on the host. Sessions are already created with
{ project, cwd } via session::start, so this just exposes that scope
on search.

- mem::search now accepts optional project and cwd. When set, results
  are filtered by looking up each hit's owning Session.
- To keep the post-filter from starving the result set, the index
  fetch is over-fetched (max(limit*10, 100)) when filtering, then
  trimmed to limit after filter.
- Session lookups are cached per call.
- api::search forwards the new params; fully backward compatible.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Search now accepts optional project and cwd filters. Inputs are validated/normalized; the index query over-fetches when filters apply, sessions are cached per-request for post-filtering, observations loaded in parallel, and logs/reporting include trimmed query and applied-filter flags.

Changes

Cohort / File(s) Summary
Search implementation
src/functions/search.ts
Extended input schema to accept optional project and cwd. Validates/normalizes query, limit, project, cwd. Computes effectiveLimit (default 20, cap 100) and larger fetchLimit when filters present. Applies per-request sessionId→`Session
API trigger / request validation
src/triggers/api.ts
Expanded request typing to include project?: string and cwd?: string. Adds runtime validation: trimmed non-empty query, positive integer limit (if given), and string checks for project/cwd. Normalizes payload to { query: trimmed, limit?, project?, cwd? } before calling sdk.trigger("mem::search", ...).

Sequence Diagram

sequenceDiagram
    rect rgba(200,200,255,0.5)
    actor Client
    end
    participant API as api::search
    participant Search as mem::search
    participant Index as Search Index
    participant Cache as Session Cache
    participant Store as Session Store
    participant Obs as Observations

    Client->>API: POST /search { query, limit?, project?, cwd? }
    API->>Search: trigger("mem::search", normalizedBody)

    Search->>Index: query with fetchLimit / effectiveLimit
    Index-->>Search: candidate results (with sessionId list)

    alt filters present
        loop per sessionId
            Search->>Cache: loadSession(sessionId)
            alt cache hit
                Cache-->>Search: Session
            else
                Search->>Store: fetch session(sessionId)
                Store-->>Search: Session
                Search->>Cache: cache session
            end
            Search->>Search: apply project/cwd check
        end
    end

    Search->>Obs: load observations for filtered candidates (parallel)
    Obs-->>Search: observation details
    Search-->>API: filtered results
    API-->>Client: 200 { results }
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through indexes, sniffed each thread,
Over-fetched a pile, then trimmed my spread.
Cached every session, matched project and cwd,
Returned tidy results for a neat memory bed.
🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding optional project/cwd filter parameters to mem::search and /agentmemory/search endpoints.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #104: adds optional project/cwd parameters to both mem::search and api::search, performs server-side filtering with session lookup caching, and implements over-fetching strategy to prevent result starvation.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue objectives; modifications are limited to search.ts and api.ts files implementing the filter feature with no extraneous changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/triggers/api.ts (1)

123-128: ⚠️ Potential issue | 🟠 Major

Validate and whitelist /agentmemory/search before forwarding it.

This still passes req.body straight through, so malformed filters and arbitrary extra fields reach mem::search. Please validate query/limit/project/cwd here and forward only the allowed shape.

Suggested hardening
     async (
       req: ApiRequest<{ query: string; limit?: number; project?: string; cwd?: string }>,
     ): Promise<Response> => {
       const authErr = checkAuth(req, secret);
       if (authErr) return authErr;
-      const result = await sdk.trigger("mem::search", req.body);
+      const body = (req.body as Record<string, unknown>) || {};
+      if (
+        typeof body.query !== "string" ||
+        !body.query.trim()
+      ) {
+        return { status_code: 400, body: { error: "query is required" } };
+      }
+      if (
+        body.limit !== undefined &&
+        (!Number.isInteger(body.limit) || (body.limit as number) < 0)
+      ) {
+        return {
+          status_code: 400,
+          body: { error: "limit must be a non-negative integer" },
+        };
+      }
+      if (body.project !== undefined && typeof body.project !== "string") {
+        return { status_code: 400, body: { error: "project must be a string" } };
+      }
+      if (body.cwd !== undefined && typeof body.cwd !== "string") {
+        return { status_code: 400, body: { error: "cwd must be a string" } };
+      }
+      const result = await sdk.trigger("mem::search", {
+        query: body.query.trim(),
+        limit: body.limit as number | undefined,
+        project: body.project as string | undefined,
+        cwd: body.cwd as string | undefined,
+      });
       return { status_code: 200, body: result };
     },

As per coding guidelines, REST endpoint handlers must validate inputs, whitelist request body fields, and never pass raw request body directly to sdk.trigger().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/triggers/api.ts` around lines 123 - 128, The handler currently forwards
req.body directly to sdk.trigger("mem::search") without validation; replace that
by validating and whitelisting the allowed fields (query: non-empty string,
limit: optional integer within acceptable bounds, project: optional string, cwd:
optional string) before calling sdk.trigger. Use the ApiRequest<{ query: string;
limit?: number; project?: string; cwd?: string }> shape to parse and
coerce/validate types, reject or return a 4xx Response on invalid input (e.g.,
missing or non-string query, non-integer/out-of-range limit), and construct a
new payload object containing only { query, limit, project, cwd } to pass to
sdk.trigger("mem::search", payload) instead of forwarding req.body. Ensure you
still call checkAuth(req, secret) first and preserve the existing success
response shape.
src/functions/search.ts (1)

57-71: ⚠️ Potential issue | 🟠 Major

Normalize and validate limit before over-fetching.

limit: 0 currently expands to 20 results here, and other malformed values can still distort fetchLimit or leak through to idx.search(). Since project/cwd are now part of the callable surface, this path should reject or normalize invalid runtime inputs instead of assuming callers always send a sane shape.

As per coding guidelines, TypeScript functions must use sdk.registerFunction() pattern with input validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/search.ts` around lines 57 - 71, Validate and normalize the
incoming "limit" before computing fetchLimit and calling idx.search: switch this
handler to the sdk.registerFunction() input-validation pattern (define a schema
for { query: string; limit?: number; project?: string; cwd?: string }) that
enforces limit is an integer ≥ 1 (and optionally a sensible max like 100) and
provides a default of 20; use the validated value (avoid shadowing the variable
name) when computing filtering and fetchLimit, and reject/return an error for
malformed inputs instead of allowing 0 or non-integer values to influence
fetchLimit or be passed to idx.search() (refer to the async handler signature,
sdk.registerFunction usage, getSearchIndex(), rebuildIndex(), and idx.search()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/functions/search.ts`:
- Around line 97-102: The info log in ctx.logger.info currently emits the raw
data.cwd which may leak host paths; change the log to avoid emitting the
absolute cwd by replacing data.cwd with a redacted indicator or a boolean flag
(e.g., hasCwdFilter = Boolean(data.cwd)) and, if desired, include a
sanitized/half-masked cwd only when safe. Update the call sites around
ctx.logger.info in search.ts so the logged payload uses hasCwdFilter (and/or a
sanitizedCwd variable) instead of raw data.cwd, keeping other fields (query,
results, project) unchanged.

---

Outside diff comments:
In `@src/functions/search.ts`:
- Around line 57-71: Validate and normalize the incoming "limit" before
computing fetchLimit and calling idx.search: switch this handler to the
sdk.registerFunction() input-validation pattern (define a schema for { query:
string; limit?: number; project?: string; cwd?: string }) that enforces limit is
an integer ≥ 1 (and optionally a sensible max like 100) and provides a default
of 20; use the validated value (avoid shadowing the variable name) when
computing filtering and fetchLimit, and reject/return an error for malformed
inputs instead of allowing 0 or non-integer values to influence fetchLimit or be
passed to idx.search() (refer to the async handler signature,
sdk.registerFunction usage, getSearchIndex(), rebuildIndex(), and idx.search()).

In `@src/triggers/api.ts`:
- Around line 123-128: The handler currently forwards req.body directly to
sdk.trigger("mem::search") without validation; replace that by validating and
whitelisting the allowed fields (query: non-empty string, limit: optional
integer within acceptable bounds, project: optional string, cwd: optional
string) before calling sdk.trigger. Use the ApiRequest<{ query: string; limit?:
number; project?: string; cwd?: string }> shape to parse and coerce/validate
types, reject or return a 4xx Response on invalid input (e.g., missing or
non-string query, non-integer/out-of-range limit), and construct a new payload
object containing only { query, limit, project, cwd } to pass to
sdk.trigger("mem::search", payload) instead of forwarding req.body. Ensure you
still call checkAuth(req, secret) first and preserve the existing success
response shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: de0cd3d8-dac5-402a-a4bc-1e108ffc0092

📥 Commits

Reviewing files that changed from the base of the PR and between 8c1890e and a732ad1.

📒 Files selected for processing (2)
  • src/functions/search.ts
  • src/triggers/api.ts

Comment thread src/functions/search.ts
Address CodeRabbit review on rohitg00#105:

- mem::search now validates query (non-empty string), limit
  (positive integer, capped at MAX_LIMIT=100), project and cwd
  (must be strings if provided). Throws on invalid input instead
  of silently coercing.
- Renamed local 'limit' to 'effectiveLimit' to avoid shadowing.
- ctx.logger no longer emits raw cwd/project (host paths) — logs
  only hasProjectFilter/hasCwdFilter booleans.
- api::search whitelists request body fields, returns 400 on
  malformed input, and constructs an explicit payload for
  sdk.trigger instead of forwarding req.body.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/functions/search.ts (1)

97-110: Consider parallel observation fetching for improved throughput.

The current sequential await in the loop works correctly but could be slower when many results pass the filter. An optional optimization would be to batch-fetch observations in parallel after filtering sessions.

♻️ Optional: Parallel observation loading
-      const enriched: SearchResult[] = []
-      for (const r of results) {
-        if (enriched.length >= effectiveLimit) break
-        if (filtering) {
-          const s = await loadSession(r.sessionId)
-          if (!s) continue
-          if (projectFilter && s.project !== projectFilter) continue
-          if (cwdFilter && s.cwd !== cwdFilter) continue
-        }
-        const obs = await kv.get<CompressedObservation>(KV.observations(r.sessionId), r.obsId)
-        if (obs) {
-          enriched.push({ observation: obs, score: r.score, sessionId: r.sessionId })
-        }
-      }
+      // First pass: filter by session (sequential due to caching benefits)
+      const candidates: typeof results = []
+      for (const r of results) {
+        if (candidates.length >= effectiveLimit) break
+        if (filtering) {
+          const s = await loadSession(r.sessionId)
+          if (!s) continue
+          if (projectFilter && s.project !== projectFilter) continue
+          if (cwdFilter && s.cwd !== cwdFilter) continue
+        }
+        candidates.push(r)
+      }
+
+      // Second pass: load observations in parallel
+      const obsResults = await Promise.all(
+        candidates.map((r) => kv.get<CompressedObservation>(KV.observations(r.sessionId), r.obsId))
+      )
+      const enriched: SearchResult[] = []
+      for (let i = 0; i < candidates.length; i++) {
+        const obs = obsResults[i]
+        if (obs) {
+          enriched.push({ observation: obs, score: candidates[i].score, sessionId: candidates[i].sessionId })
+        }
+      }

As per coding guidelines: "Perform parallel operations where possible using Promise.all for independent kv writes/reads in TypeScript"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/search.ts` around lines 97 - 110, The loop that builds enriched
(variable enriched) fetches sessions via loadSession and observations via kv.get
sequentially which hurts throughput; after applying the session-level filters
(filtering, projectFilter, cwdFilter) over results[], collect up to
effectiveLimit candidate pairs (sessionId and obsId), then perform a batched
parallel fetch of observations using Promise.all on
kv.get(KV.observations(sessionId), obsId) for each candidate, filter out missing
observations, and push the resulting { observation, score, sessionId } entries
into enriched in the same order (or trim to effectiveLimit). Update the code
paths around results, filtering, loadSession, KV.observations, kv.get, and
enriched.push to use this two-phase (filter then parallel fetch) approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/functions/search.ts`:
- Around line 97-110: The loop that builds enriched (variable enriched) fetches
sessions via loadSession and observations via kv.get sequentially which hurts
throughput; after applying the session-level filters (filtering, projectFilter,
cwdFilter) over results[], collect up to effectiveLimit candidate pairs
(sessionId and obsId), then perform a batched parallel fetch of observations
using Promise.all on kv.get(KV.observations(sessionId), obsId) for each
candidate, filter out missing observations, and push the resulting {
observation, score, sessionId } entries into enriched in the same order (or trim
to effectiveLimit). Update the code paths around results, filtering,
loadSession, KV.observations, kv.get, and enriched.push to use this two-phase
(filter then parallel fetch) approach.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11b0ae5f-c2ca-4b6b-bc35-e507d1cb2f19

📥 Commits

Reviewing files that changed from the base of the PR and between a732ad1 and 26fea3a.

📒 Files selected for processing (2)
  • src/functions/search.ts
  • src/triggers/api.ts

Address CodeRabbit nitpick on rohitg00#105: split the result-building loop into
two phases. First filter sequentially (benefits from the per-call
session cache), then batch the kv.get(observations) calls through
Promise.all instead of awaiting them one at a time.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/functions/search.ts (1)

88-108: Preload unique sessions in parallel instead of serializing kv.get calls.

When the hit list spans many sessions, awaiting loadSession() inside the ranking loop turns this path into a chain of round-trips. The reads are independent, so prefetching distinct sessionIds first will keep the ordering logic intact without paying that latency tax.

♻️ Suggested refactor
       const sessionCache = new Map<string, Session | null>()
-      const loadSession = async (sessionId: string): Promise<Session | null> => {
-        if (sessionCache.has(sessionId)) return sessionCache.get(sessionId)!
-        const s = await kv.get<Session>(KV.sessions, sessionId)
-        sessionCache.set(sessionId, s ?? null)
-        return s ?? null
-      }
+      if (filtering) {
+        await Promise.all(
+          [...new Set(results.map((r) => r.sessionId))].map(async (sessionId) => {
+            const s = await kv.get<Session>(KV.sessions, sessionId)
+            sessionCache.set(sessionId, s ?? null)
+          })
+        )
+      }

       const candidates: typeof results = []
       for (const r of results) {
         if (candidates.length >= effectiveLimit) break
         if (filtering) {
-          const s = await loadSession(r.sessionId)
+          const s = sessionCache.get(r.sessionId) ?? null
           if (!s) continue
           if (projectFilter && s.project !== projectFilter) continue
           if (cwdFilter && s.cwd !== cwdFilter) continue
         }
         candidates.push(r)

As per coding guidelines, "Perform parallel operations where possible using Promise.all for independent kv writes/reads in TypeScript".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/search.ts` around lines 88 - 108, The current loop awaits
loadSession(sessionId) for each result, serializing kv.get calls; instead,
prefetch unique sessionIds in parallel before the first-pass filtering: collect
distinct sessionIds from results (up to effectiveLimit worth of candidates if
filtering applies), call Promise.all on kv.get for those ids to populate
sessionCache, then run the existing filtering loop using sessionCache without
awaiting inside it. Update the helpers referenced (sessionCache, loadSession,
kv.get, candidates, results, effectiveLimit) so loadSession reads from
sessionCache synchronously after the parallel preload.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/functions/search.ts`:
- Around line 74-75: The projectFilter and cwdFilter are set without trimming so
values like " repo " or strings of only whitespace become ineffective filters;
update the logic where projectFilter and cwdFilter are computed (using
data.project and data.cwd) to trim the input first and then check length—i.e.,
compute const projectVal = typeof data.project === 'string' ?
data.project.trim() : '' and similarly for cwdVal, then set projectFilter =
projectVal.length ? projectVal : undefined and cwdFilter = cwdVal.length ?
cwdVal : undefined so only non-empty, trimmed strings enable filtering.

---

Nitpick comments:
In `@src/functions/search.ts`:
- Around line 88-108: The current loop awaits loadSession(sessionId) for each
result, serializing kv.get calls; instead, prefetch unique sessionIds in
parallel before the first-pass filtering: collect distinct sessionIds from
results (up to effectiveLimit worth of candidates if filtering applies), call
Promise.all on kv.get for those ids to populate sessionCache, then run the
existing filtering loop using sessionCache without awaiting inside it. Update
the helpers referenced (sessionCache, loadSession, kv.get, candidates, results,
effectiveLimit) so loadSession reads from sessionCache synchronously after the
parallel preload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67d8ef2a-d2d4-470f-8392-7b8dbf771675

📥 Commits

Reviewing files that changed from the base of the PR and between 26fea3a and d145b0e.

📒 Files selected for processing (1)
  • src/functions/search.ts

Comment thread src/functions/search.ts
Comment on lines +74 to +75
const projectFilter = typeof data.project === 'string' && data.project.length > 0 ? data.project : undefined
const cwdFilter = typeof data.cwd === 'string' && data.cwd.length > 0 ? data.cwd : undefined
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trim project/cwd before enabling the filter.

These values are matched verbatim, so " repo " will miss an otherwise matching session, and " " becomes an active filter that can only return zero results. src/triggers/api.ts forwards both fields unchanged, so this is user-visible on the HTTP path too.

💡 Proposed fix
-      const projectFilter = typeof data.project === 'string' && data.project.length > 0 ? data.project : undefined
-      const cwdFilter = typeof data.cwd === 'string' && data.cwd.length > 0 ? data.cwd : undefined
+      const projectFilter =
+        typeof data.project === 'string' && data.project.trim().length > 0
+          ? data.project.trim()
+          : undefined
+      const cwdFilter =
+        typeof data.cwd === 'string' && data.cwd.trim().length > 0
+          ? data.cwd.trim()
+          : undefined
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/search.ts` around lines 74 - 75, The projectFilter and
cwdFilter are set without trimming so values like " repo " or strings of only
whitespace become ineffective filters; update the logic where projectFilter and
cwdFilter are computed (using data.project and data.cwd) to trim the input first
and then check length—i.e., compute const projectVal = typeof data.project ===
'string' ? data.project.trim() : '' and similarly for cwdVal, then set
projectFilter = projectVal.length ? projectVal : undefined and cwdFilter =
cwdVal.length ? cwdVal : undefined so only non-empty, trimmed strings enable
filtering.

@rohitg00 rohitg00 merged commit 7d64449 into rohitg00:main Apr 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add project/cwd filter to /agentmemory/search

2 participants