Skip to content

feat(mcp): forward file_pattern in semantic_search to scope results#1149

Open
carlos-alm wants to merge 7 commits into
mainfrom
feat/mcp-semantic-search-file-pattern
Open

feat(mcp): forward file_pattern in semantic_search to scope results#1149
carlos-alm wants to merge 7 commits into
mainfrom
feat/mcp-semantic-search-file-pattern

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • MCP semantic_search now accepts file_pattern (string or string[]) and forwards it as filePattern into the search core for hybrid, semantic, and keyword modes
  • Declares file_pattern in the tool input schema so MCP clients can discover it
  • Brings MCP parity with the CLI's codegraph search --file <pattern> (repeatable)

Closes #1143

Why

Previously the args interface in src/mcp/tools/semantic-search.ts listed only {query, mode, limit, offset, min_score}. MCP silently ignores unknown args, so a caller passing {"query": "...", "file_pattern": ["db/"]} got unscoped global hits back with no error. In monorepos this made the tool effectively unusable from MCP — the larger subtree dominated top-K and the caller had no signal that the filter was dropped.

Test plan

  • npx vitest run tests/unit/mcp.test.ts — 41/41 pass (includes new dispatch test asserting file_pattern reaches each backend as filePattern)
  • npx vitest run tests/unit/mcp.test.ts tests/search/ — 116/116 pass
  • npx tsc --noEmit — clean
  • npm run lint — clean for changed files

The MCP semantic_search wrapper silently dropped any file scoping argument
because file_pattern was not declared on its args interface, even though
the underlying search core has supported it all along (the CLI exposes it
via --file). In monorepos with multiple large subtrees, this made MCP
semantic_search effectively unusable: top-K was dominated by the larger,
less-relevant tree with no signal to the caller that the filter was
ignored.

Add file_pattern (string | string[]) to the handler args, forward it as
filePattern into searchOpts for hybrid/semantic/keyword modes, and declare
it in the tool input schema. Unit tests assert the arg reaches each
search backend.

Closes #1143
@claude
Copy link
Copy Markdown

claude Bot commented May 18, 2026

Claude finished @carlos-alm's task —— View job


Root Cause Analysis

Problem Being Solved: MCP semantic search lacked file pattern filtering, causing queries in monorepos to return unscoped results from the entire codebase instead of targeted subdirectories. This made the tool "effectively unusable" for large codebases where users needed focused search results.

Why This Approach: The PR adds file_pattern parameter to the MCP tool schema and forwards it as filePattern to the existing search backend. This approach reuses the established pattern from CLI codegraph search --file <pattern> and leverages existing infrastructure.

Risk Assessment: Very low risk. This is a pure additive change that forwards an existing parameter through the MCP interface without modifying core search logic.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Improves AI agent accuracy by enabling scoped search, aligns with P1 (better context) and P5 (embeddable-first)
  • Problem-fit: 4 — Directly improves agent accuracy by reducing token waste from irrelevant results in large codebases
  • Breaking: No — Purely additive, backward compatible
  • Tier: 1

Review Findings

Phase 1: Root Cause Analysis ✅

The PR addresses a legitimate usability gap. MCP clients passing file_pattern received no error feedback when the parameter was silently ignored, leading to confusing global search results.

Phase 2: Configuration Changes ✅

No configuration changes detected. All modifications are to source code and tests.

Phase 3: Test Integrity ✅

New test in tests/unit/mcp.test.ts:1238-1307 properly validates that file_pattern reaches each search backend (hybridSearchData, ftsSearchData, searchData) as filePattern. Test coverage includes:

  • Array format: ['db/']
  • String format: 'src/mcp/'
  • All three search modes: hybrid, semantic, keyword

Existing tests continue to pass, confirming no regressions.

Phase 4: Code Quality ✅

Schema Declaration (src/mcp/tool-registry.ts:325-329):

file_pattern: {
  oneOf: [{ type: 'string' }, { type: 'array', items: { type: 'string' } }],
  description: 'Restrict results to files matching one or more substring patterns (e.g. "db/" or ["db/", "src/"])',
},

Well-designed schema using oneOf to accept both string and array formats, matching CLI behavior.

Parameter Forwarding (src/mcp/tools/semantic-search.ts:21):

filePattern: args.file_pattern,

Clean parameter mapping from MCP convention (file_pattern) to internal convention (filePattern).

Backend Implementation: Reviewed existing search implementations:

  • hybridSearchData forwards filePattern to both ftsSearchData and searchData correctly
  • ftsSearchData applies file filtering via SQL LIKE patterns with proper escaping
  • searchData delegates to prepareSearch which handles file filtering during data selection

All backends already supported filePattern, so this PR only exposes existing functionality through MCP.

Minor Observations

  1. Parameter naming: MCP uses snake_case (file_pattern) while internal APIs use camelCase (filePattern) — this is consistent with existing MCP tool patterns
  2. SQL injection protection: Existing escapeLike function properly escapes file patterns in SQL queries
  3. Glob support: File patterns support both substring matching and glob patterns, handled appropriately in each backend

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-architected additive feature that exposes existing functionality through MCP. Solves a real usability problem for AI agents working with large codebases. Clean implementation, comprehensive tests, no breaking changes. Only minor deduction for being an incremental enhancement rather than transformative functionality.

This PR successfully brings MCP-CLI parity for file pattern filtering and will significantly improve the usability of semantic search in monorepo environments.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR adds file_pattern (string or string[]) to the MCP semantic_search tool, forwarding it as filePattern into the search core so callers can scope results to specific paths or glob patterns. Previously the field was silently dropped, causing unscoped global results in monorepos.

  • semantic-search.ts: adds file_pattern to the SemanticSearchArgs interface and includes it in searchOpts as filePattern for all three modes (hybrid, semantic, keyword).
  • tool-registry.ts: declares file_pattern in the JSON Schema with oneOf: [string, array] and a description covering both glob and substring usage.
  • mcp.test.ts: adds schema-presence assertion and a dispatch test covering all three search modes.

Confidence Score: 5/5

Safe to merge — the change is additive and backwards-compatible, touching only the MCP tool interface and its schema declaration.

The file_pattern field is optional, so existing callers that omit it continue to receive unfiltered results exactly as before. All three search backends are exercised by the new dispatch test, the schema oneOf is correct, and the forwarding from args.file_pattern to filePattern in searchOpts is a one-liner with no side-effects on the rest of the handler logic.

No files require special attention.

Important Files Changed

Filename Overview
src/mcp/tools/semantic-search.ts Adds file_pattern to interface and forwards it as filePattern in searchOpts for all three backend modes — clean, minimal, and backwards-compatible.
src/mcp/tool-registry.ts Declares file_pattern in the JSON Schema with correct oneOf for string/array and a description covering glob and substring patterns.
tests/unit/mcp.test.ts Adds schema-presence assertion and a full dispatch test exercising all three modes with both string and array file_pattern values.

Sequence Diagram

sequenceDiagram
    participant C as MCP Client
    participant R as tool-registry.ts
    participant H as semantic-search.ts handler
    participant S as search/index.js

    C->>R: "semantic_search {query, file_pattern, mode, limit}"
    R->>H: dispatch(args, ctx)
    H->>H: "build searchOpts {limit, offset, minScore, filePattern}"
    alt "mode = keyword"
        H->>S: ftsSearchData(query, dbPath, searchOpts)
    else "mode = semantic"
        H->>S: searchData(query, dbPath, searchOpts)
    else "mode = hybrid (default)"
        H->>S: hybridSearchData(query, dbPath, searchOpts)
        S-->>H: null (no FTS5)
        H->>S: searchData(query, dbPath, searchOpts)
    end
    S-->>H: "{results}"
    H-->>C: "{results} scoped to file_pattern"
Loading

Reviews (8): Last reviewed commit: "Merge branch 'main' into feat/mcp-semant..." | Re-trigger Greptile

Comment thread src/mcp/tool-registry.ts Outdated
Comment on lines +327 to +328
description:
'Restrict results to files matching one or more substring patterns (e.g. "db/" or ["db/", "src/"])',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The schema description says "substring patterns" but the underlying backends (prepare.ts, filters.ts, keyword.ts) also support glob syntax (*, **, ?, […]). A caller reading only the schema would not know to try "src/**/*.ts" — and those glob patterns do work end-to-end through applyFilters/globMatch. Mentioning glob support here keeps the schema accurate and prevents confusion.

Suggested change
description:
'Restrict results to files matching one or more substring patterns (e.g. "db/" or ["db/", "src/"])',
description:
'Restrict results to files matching one or more glob or substring patterns (e.g. "db/", "src/**/*.ts", or ["db/", "src/"])',

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — applied the suggestion verbatim in c35a237. Confirmed by reading src/domain/search/search/filters.ts that applyFilters does branch on /[*?[\]]/.test(p) and routes glob patterns through globMatch (which handles *, **, ?, and char classes), so the schema description now accurately reflects what the backends support.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Codegraph Impact Analysis

2 functions changed0 callers affected across 0 files

  • SemanticSearchArgs.file_pattern in src/mcp/tools/semantic-search.ts:12 (0 transitive callers)
  • handler in src/mcp/tools/semantic-search.ts:15 (0 transitive callers)

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP semantic_search wrapper doesn't support file_pattern

1 participant