Skip to content

fix(search_code): tighten query description for accurate model guidance#2513

Merged
SamMorrowDrums merged 2 commits into
mainfrom
sammorrowdrums/search-code-description-polish
May 20, 2026
Merged

fix(search_code): tighten query description for accurate model guidance#2513
SamMorrowDrums merged 2 commits into
mainfrom
sammorrowdrums/search-code-description-polish

Conversation

@SamMorrowDrums
Copy link
Copy Markdown
Collaborator

Closes #2442. Addresses #2390.

Re-applies the spirit of #2442 by @jluocsa (originally suggested by @danmoseley in #2390), corrected against what this tool's underlying endpoint actually supports.

Huge thanks to @danmoseley for the original analysis of 5000 sessions and to @jluocsa for kicking off the fix — your version got us ~80% of the way there. 🙏

The catch

This tool calls go-github's client.Search.Code, which hits the legacy REST /search/code endpoint — not the new code search ("Blackbird"). Verified against the live API:

Query Result
WithContext repo:github/github-mcp-server (baseline) 15 ✅
symbol:WithContext repo:github/github-mcp-server 0
func language:go repo:github/github-mcp-server (baseline) 167 ✅
path:**/*.go func repo:github/github-mcp-server 0
path:*.go func repo:github/github-mcp-server 0
filename:*.md repo:github/github-mcp-server 0
/Get|Set/ repo:github/github-mcp-server 0
(Foo OR Bar) -path:vendor language:go 422

So symbol:, /regex/, path globs, filename globs, and parenthesized boolean groups — features the proposal in #2442 (and the earlier draft of this PR) listed — silently return zero or fail. Documenting them would teach the model syntax that doesn't work on this endpoint.

What this PR does

Rewrites the description to accurately describe legacy /search/code and to displace the specific bug patterns observed in #2390:

  • States path:dir is a prefix, not a glob — displaces path:**/*.ts guesses.
  • States filename:exact.ext is exact — displaces filename:*.md guesses.
  • Explicitly calls out that symbol:, /regex/, path/filename globs, and parenthesized groups are not supported by this endpoint, so the model stops generating them. (Also displaces the \|-in-quotes-for-regex bug.)
  • Adds qualifiers omitted by the previous text: user:, extension:, in:file, in:path, size:, filename:.
  • Documents implicit AND, OR, NOT, and "quoted phrase" for exact match positively.
  • Adds the 256-char query limit.

Before / after

Before:

Search query using GitHub's powerful code search syntax. Examples: 'content:Skill language:Java org:github', 'NOT is:archived language:Python OR language:go', 'repo:github/github-mcp-server'. Supports exact matching, language filters, path filters, and more.

After:

Search query (GitHub code search REST). Implicit AND between terms; supports OR, NOT, and "quoted phrase" for exact match. Qualifiers: repo:owner/repo, org:, user:, language:, path:dir (prefix, NOT a glob), filename:exact.ext, extension:, in:file, in:path, size:, is:archived, is:fork. NOT supported by this endpoint (silently 0 or 422): symbol:, /regex/, path globs (path:**/*.ts, filename:*.md), parenthesized boolean groups. Max 256 chars. Examples: WithContext language:go org:github; "package main" repo:o/r; func extension:go path:cmd repo:o/r; NOT TODO language:go repo:o/r.

Verification

All four examples in the new description return non-zero results when run against the live GitHub API:

WithContext language:go org:github                            -> 2592
"package main" repo:github/github-mcp-server                  -> 6
func extension:go path:cmd repo:github/github-mcp-server      -> 5
NOT TODO language:go repo:github/github-mcp-server            -> 163

Locally:

  • script/lint
  • script/test
  • Toolsnap regenerated ✅
  • README regenerated ✅

Follow-up worth considering (out of scope)

If we want symbol:, /regex/, and path globs to actually work, the underlying call needs to move off REST /search/code to the new code search API. Filing as a separate concern.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the search_code tool’s query parameter documentation to accurately describe the legacy REST /search/code endpoint syntax, aiming to reduce model-generated invalid queries and resulting 422s.

Changes:

  • Rewrites search_code.query description to document supported qualifiers/operators and explicitly call out unsupported syntax for REST /search/code.
  • Regenerates the search_code toolsnap to reflect the updated schema description.
  • Regenerates README tool docs to mirror the updated query description.
Show a summary per file
File Description
README.md Updates generated tool documentation for search_code.query description.
pkg/github/search.go Updates the JSON schema description for the search_code tool query parameter.
pkg/github/toolsnaps/search_code.snap Regenerates snapshot to match the updated schema description.

Copilot's findings

Comments suppressed due to low confidence (1)

pkg/github/search.go:203

  • The query description emphasizes that symbol: is not supported by the REST /search/code endpoint, but the tool’s Description still says it’s best for finding exact symbols/functions/classes. That mixed messaging may steer models back toward unsupported symbol-oriented searches. Consider updating the tool Description to align with the legacy endpoint capabilities (or explicitly call out the limitation there too).
				Description: "Search query (GitHub code search REST). Implicit AND between terms; supports `OR`, `NOT`, and `\"quoted phrase\"` for exact match. Qualifiers: `repo:owner/repo`, `org:`, `user:`, `language:`, `path:dir` (prefix, NOT a glob), `filename:exact.ext`, `extension:`, `in:file`, `in:path`, `size:`, `is:archived`, `is:fork`. NOT supported by this endpoint (silently 0 or 422): `symbol:`, `/regex/`, path globs (`path:**/*.ts`, `filename:*.md`), parenthesized boolean groups. Max 256 chars. Examples: `WithContext language:go org:github`; `\"package main\" repo:o/r`; `func extension:go path:cmd repo:o/r`; `NOT TODO language:go repo:o/r`.",
  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread pkg/github/search.go Outdated
"query": {
Type: "string",
Description: "Search query using GitHub's powerful code search syntax. Examples: 'content:Skill language:Java org:github', 'NOT is:archived language:Python OR language:go', 'repo:github/github-mcp-server'. Supports exact matching, language filters, path filters, and more.",
Description: "Search query (GitHub code search REST). Implicit AND between terms; supports `OR`, `NOT`, and `\"quoted phrase\"` for exact match. Qualifiers: `repo:owner/repo`, `org:`, `user:`, `language:`, `path:dir` (prefix, NOT a glob), `filename:exact.ext`, `extension:`, `in:file`, `in:path`, `size:`, `is:archived`, `is:fork`. NOT supported by this endpoint (silently 0 or 422): `symbol:`, `/regex/`, path globs (`path:**/*.ts`, `filename:*.md`), parenthesized boolean groups. Max 256 chars. Examples: `WithContext language:go org:github`; `\"package main\" repo:o/r`; `func extension:go path:cmd repo:o/r`; `NOT TODO language:go repo:o/r`.",
Comment thread README.md Outdated
- `page`: Page number for pagination (min 1) (number, optional)
- `perPage`: Results per page for pagination (min 1, max 100) (number, optional)
- `query`: Search query using GitHub's powerful code search syntax. Examples: 'content:Skill language:Java org:github', 'NOT is:archived language:Python OR language:go', 'repo:github/github-mcp-server'. Supports exact matching, language filters, path filters, and more. (string, required)
- `query`: Search query (GitHub code search REST). Implicit AND between terms; supports `OR`, `NOT`, and `"quoted phrase"` for exact match. Qualifiers: `repo:owner/repo`, `org:`, `user:`, `language:`, `path:dir` (prefix, NOT a glob), `filename:exact.ext`, `extension:`, `in:file`, `in:path`, `size:`, `is:archived`, `is:fork`. NOT supported by this endpoint (silently 0 or 422): `symbol:`, `/regex/`, path globs (`path:**/*.ts`, `filename:*.md`), parenthesized boolean groups. Max 256 chars. Examples: `WithContext language:go org:github`; `"package main" repo:o/r`; `func extension:go path:cmd repo:o/r`; `NOT TODO language:go repo:o/r`. (string, required)
The current `search_code` query description is hand-wavy and gives the
model little usable guidance on GitHub code search syntax, which (per
analysis in #2390 across thousands of agent sessions) leads to repeated
422 ERROR_TYPE_QUERY_PARSING_FATAL responses from agents that guess at
plausible-but-invalid syntax.

Re-applies the spirit of #2442 by @jluocsa, originally suggested by
@danmoseley in #2390, but corrected against the actual endpoint this
tool calls.

Critically, this tool uses go-github's `client.Search.Code`, which hits
the legacy REST `/search/code` endpoint — NOT the new code search
("Blackbird"). Verified against the live API:

  symbol:WithContext repo:github/github-mcp-server  -> 0
  /Get|Set/ repo:github/github-mcp-server           -> 0
  path:**/*.go func repo:github/github-mcp-server   -> 0
  filename:*.md repo:github/github-mcp-server       -> 0
  (Foo OR Bar) -path:vendor language:go             -> 422

So `symbol:`, `/regex/`, path globs, filename globs, and parenthesized
boolean groups — features the proposal in #2442 listed — silently
return zero or fail. Documenting them would teach the model syntax
that doesn't work on this endpoint.

The new description focuses on what's actually supported by legacy
code search and the real bugs observed in #2390:

- `path:dir` is a prefix, NOT a glob (displaces `path:**/*.ts` guesses).
- `filename:exact.ext` is exact, NOT a glob (displaces `filename:*.md`).
- `/regex/` and `\|` inside quotes don't work — call this out so the
  model stops generating them.
- `symbol:` doesn't work on this endpoint — call this out.
- Parenthesized boolean groups 422 — call this out so the model
  stops wrapping `OR` chains in parens.
- Adds `extension:`, `in:file`, `in:path`, `size:`, `filename:`, `user:`
  qualifiers that the previous text omitted.
- Implicit AND, `OR`, `NOT`, and `"quoted phrase"` for exact match are
  documented positively.
- 256-char query limit.

All four examples in the new description are verified against the live
GitHub API and return non-zero results.

Co-authored-by: jluocsa <103165870+jluocsa@users.noreply.github.com>
Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SamMorrowDrums SamMorrowDrums force-pushed the sammorrowdrums/search-code-description-polish branch from 4d519bb to db4c1a0 Compare May 20, 2026 09:25
@SamMorrowDrums
Copy link
Copy Markdown
Collaborator Author

A/B eval: does the new description actually help?

Ran a headless A/B against the live GitHub API. Same model, same prompts, only the query field description differs.

Setup

  • Built two github-mcp-server binaries; only the query description differs.
  • Each binary wrapped in a stdio MCP proxy that logs every JSON-RPC call.
  • Ran copilot -p headless with --disable-builtin-mcps and --disable-mcp-server github-insiders so the agent has only this MCP for code search — no fallback.
  • 10 abstract prompts × 2 variants = 20 trials, run 3-at-a-time, shuffled order so transient API blips affect both arms equally.
  • Prompts deliberately don't leak syntax — e.g. "find changelog or release notes files in github/github-mcp-server" rather than "use filename:CHANGELOG". If the prompt told the model how to search, the description wouldn't be load-bearing.

Aggregate result

calls OK zero error waste rate prompts answered
OLD 36 14 14 7 61% 10/10
NEW 30 18 6 5 37% 10/10

NEW issued 17% fewer queries, more than halved zero-result responses (14 → 6), produced more useful results overall (14 → 18), and matched OLD on prompts answered. No 422 increase.

Two unambiguous wins for NEW

p04 — "find multi-stage Dockerfiles in github/github-mcp-server"

  • OLD: 6 calls, 3 wasted. Kept guessing path:Dockerfile FROM, path:Dockerfile content:"FROM" content:"AS"path: is a prefix and doesn't match a filename anywhere in the tree.
  • NEW: 1 call: FROM AS repo:... filename:Dockerfile → OK(1). Done.

p07 — "find changelog or release notes files in github/github-mcp-server"

  • OLD: 3 calls, all zero/error. Tried path:HISTORY OR path:NEWS OR path:RELEASES (parens-less path-OR — 422) and path:CHANGELOG. Gave up empty.
  • NEW: 4 calls — first: filename:RELEASE repo:... → OK(7). Done.

Both wins trace to the same gap: OLD doesn't list filename: at all, so the agent reaches for path: (prefix match, wrong tool for the job). NEW listing filename:exact.ext fixes this.

Other prompts

  • p01 inventory pattern, p09 custom error types — broadly tied; both arms found the right answer with similar effort.
  • p03 mark3labs/mcp-go usage — NEW 3 calls vs OLD 7 calls for similar quality. Efficiency win for NEW.
  • p06 rate limiter middleware — OLD 4 calls / 0 useful; NEW 3 calls / 1 useful.
  • p02 fizzbuzz, p08 cobra, p10 K8s Deployment — single-call success on both arms; trivial enough that description doesn't matter.
  • p05 golangci configs — both succeeded; NEW issued more calls because the model looped on the same query 5 times (model loopiness, not a description issue). Useful-call count was still higher for NEW (6 vs 2).

Negative-side check

Neither arm hallucinated symbol: or /regex/ in any of the 20 trials — confirming that NEW's omission of those (rather than calling them out as unsupported) doesn't tempt the model toward them. The earlier "NOT supported" callouts I'd added were genuinely overkill.

What's driving the wins

The single biggest factor is the NEW description listing filename:exact.ext — every clear win came from the agent successfully reaching for filename: instead of path:. Secondary contributors are the explicit prefix-not-glob clarification on path: and the addition of extension:.

Caveats

  • 10 prompts × 1 sample. Indicative, not statistically conclusive.
  • Many runs hit copilot's 300s prompt-session idle timeout after the agent had already produced its final answer — comparable signal lives in the per-call traces, not the exit code.
  • One NEW trial (p05) had model-side loop behaviour repeating the same query — inflates NEW's call count but doesn't change the useful-vs-wasted ratio.

Repro

Tooling and per-call traces available on request — wrapper proxy script + 20 JSONL logs of every JSON-RPC call.

Bottom line

Across 10 varied, syntax-free prompts the NEW description cut wasted queries by ~40% and increased useful queries by ~30% — while issuing fewer calls overall. The wins concentrate on prompts where the agent needs to find files by name, which is where OLD's missing filename: qualifier hurts most.

@SamMorrowDrums SamMorrowDrums merged commit e953333 into main May 20, 2026
18 checks passed
@SamMorrowDrums SamMorrowDrums deleted the sammorrowdrums/search-code-description-polish branch May 20, 2026 10:30
@danmoseley
Copy link
Copy Markdown
Contributor

Great this is fixed thanks @SamMorrowDrums !

@SamMorrowDrums
Copy link
Copy Markdown
Collaborator Author

@danmoseley thanks for all your work, @tclem has actually been working on the blackbird side and I hope we'll eventually ship the powerful search via MCP too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants