fix(search_code): tighten query description for accurate model guidance#2513
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the search_code tool’s query parameter documentation to accurately describe the legacy REST /search/code endpoint syntax, aiming to reduce model-generated invalid queries and resulting 422s.
Changes:
- Rewrites
search_code.querydescription to document supported qualifiers/operators and explicitly call out unsupported syntax for REST/search/code. - Regenerates the
search_codetoolsnap to reflect the updated schema description. - Regenerates README tool docs to mirror the updated
querydescription.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates generated tool documentation for search_code.query description. |
| pkg/github/search.go | Updates the JSON schema description for the search_code tool query parameter. |
| pkg/github/toolsnaps/search_code.snap | Regenerates snapshot to match the updated schema description. |
Copilot's findings
Comments suppressed due to low confidence (1)
pkg/github/search.go:203
- The query description emphasizes that
symbol:is not supported by the REST/search/codeendpoint, but the tool’s Description still says it’s best for finding exact symbols/functions/classes. That mixed messaging may steer models back toward unsupported symbol-oriented searches. Consider updating the tool Description to align with the legacy endpoint capabilities (or explicitly call out the limitation there too).
Description: "Search query (GitHub code search REST). Implicit AND between terms; supports `OR`, `NOT`, and `\"quoted phrase\"` for exact match. Qualifiers: `repo:owner/repo`, `org:`, `user:`, `language:`, `path:dir` (prefix, NOT a glob), `filename:exact.ext`, `extension:`, `in:file`, `in:path`, `size:`, `is:archived`, `is:fork`. NOT supported by this endpoint (silently 0 or 422): `symbol:`, `/regex/`, path globs (`path:**/*.ts`, `filename:*.md`), parenthesized boolean groups. Max 256 chars. Examples: `WithContext language:go org:github`; `\"package main\" repo:o/r`; `func extension:go path:cmd repo:o/r`; `NOT TODO language:go repo:o/r`.",
- Files reviewed: 3/3 changed files
- Comments generated: 2
| "query": { | ||
| Type: "string", | ||
| Description: "Search query using GitHub's powerful code search syntax. Examples: 'content:Skill language:Java org:github', 'NOT is:archived language:Python OR language:go', 'repo:github/github-mcp-server'. Supports exact matching, language filters, path filters, and more.", | ||
| Description: "Search query (GitHub code search REST). Implicit AND between terms; supports `OR`, `NOT`, and `\"quoted phrase\"` for exact match. Qualifiers: `repo:owner/repo`, `org:`, `user:`, `language:`, `path:dir` (prefix, NOT a glob), `filename:exact.ext`, `extension:`, `in:file`, `in:path`, `size:`, `is:archived`, `is:fork`. NOT supported by this endpoint (silently 0 or 422): `symbol:`, `/regex/`, path globs (`path:**/*.ts`, `filename:*.md`), parenthesized boolean groups. Max 256 chars. Examples: `WithContext language:go org:github`; `\"package main\" repo:o/r`; `func extension:go path:cmd repo:o/r`; `NOT TODO language:go repo:o/r`.", |
| - `page`: Page number for pagination (min 1) (number, optional) | ||
| - `perPage`: Results per page for pagination (min 1, max 100) (number, optional) | ||
| - `query`: Search query using GitHub's powerful code search syntax. Examples: 'content:Skill language:Java org:github', 'NOT is:archived language:Python OR language:go', 'repo:github/github-mcp-server'. Supports exact matching, language filters, path filters, and more. (string, required) | ||
| - `query`: Search query (GitHub code search REST). Implicit AND between terms; supports `OR`, `NOT`, and `"quoted phrase"` for exact match. Qualifiers: `repo:owner/repo`, `org:`, `user:`, `language:`, `path:dir` (prefix, NOT a glob), `filename:exact.ext`, `extension:`, `in:file`, `in:path`, `size:`, `is:archived`, `is:fork`. NOT supported by this endpoint (silently 0 or 422): `symbol:`, `/regex/`, path globs (`path:**/*.ts`, `filename:*.md`), parenthesized boolean groups. Max 256 chars. Examples: `WithContext language:go org:github`; `"package main" repo:o/r`; `func extension:go path:cmd repo:o/r`; `NOT TODO language:go repo:o/r`. (string, required) |
The current `search_code` query description is hand-wavy and gives the model little usable guidance on GitHub code search syntax, which (per analysis in #2390 across thousands of agent sessions) leads to repeated 422 ERROR_TYPE_QUERY_PARSING_FATAL responses from agents that guess at plausible-but-invalid syntax. Re-applies the spirit of #2442 by @jluocsa, originally suggested by @danmoseley in #2390, but corrected against the actual endpoint this tool calls. Critically, this tool uses go-github's `client.Search.Code`, which hits the legacy REST `/search/code` endpoint — NOT the new code search ("Blackbird"). Verified against the live API: symbol:WithContext repo:github/github-mcp-server -> 0 /Get|Set/ repo:github/github-mcp-server -> 0 path:**/*.go func repo:github/github-mcp-server -> 0 filename:*.md repo:github/github-mcp-server -> 0 (Foo OR Bar) -path:vendor language:go -> 422 So `symbol:`, `/regex/`, path globs, filename globs, and parenthesized boolean groups — features the proposal in #2442 listed — silently return zero or fail. Documenting them would teach the model syntax that doesn't work on this endpoint. The new description focuses on what's actually supported by legacy code search and the real bugs observed in #2390: - `path:dir` is a prefix, NOT a glob (displaces `path:**/*.ts` guesses). - `filename:exact.ext` is exact, NOT a glob (displaces `filename:*.md`). - `/regex/` and `\|` inside quotes don't work — call this out so the model stops generating them. - `symbol:` doesn't work on this endpoint — call this out. - Parenthesized boolean groups 422 — call this out so the model stops wrapping `OR` chains in parens. - Adds `extension:`, `in:file`, `in:path`, `size:`, `filename:`, `user:` qualifiers that the previous text omitted. - Implicit AND, `OR`, `NOT`, and `"quoted phrase"` for exact match are documented positively. - 256-char query limit. All four examples in the new description are verified against the live GitHub API and return non-zero results. Co-authored-by: jluocsa <103165870+jluocsa@users.noreply.github.com> Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4d519bb to
db4c1a0
Compare
A/B eval: does the new description actually help?Ran a headless A/B against the live GitHub API. Same model, same prompts, only the Setup
Aggregate result
NEW issued 17% fewer queries, more than halved zero-result responses (14 → 6), produced more useful results overall (14 → 18), and matched OLD on prompts answered. No 422 increase. Two unambiguous wins for NEWp04 — "find multi-stage Dockerfiles in github/github-mcp-server"
p07 — "find changelog or release notes files in github/github-mcp-server"
Both wins trace to the same gap: OLD doesn't list Other prompts
Negative-side checkNeither arm hallucinated What's driving the winsThe single biggest factor is the NEW description listing Caveats
ReproTooling and per-call traces available on request — wrapper proxy script + 20 JSONL logs of every JSON-RPC call. Bottom lineAcross 10 varied, syntax-free prompts the NEW description cut wasted queries by ~40% and increased useful queries by ~30% — while issuing fewer calls overall. The wins concentrate on prompts where the agent needs to find files by name, which is where OLD's missing |
|
Great this is fixed thanks @SamMorrowDrums ! |
|
@danmoseley thanks for all your work, @tclem has actually been working on the blackbird side and I hope we'll eventually ship the powerful search via MCP too! |
Closes #2442. Addresses #2390.
Re-applies the spirit of #2442 by @jluocsa (originally suggested by @danmoseley in #2390), corrected against what this tool's underlying endpoint actually supports.
Huge thanks to @danmoseley for the original analysis of 5000 sessions and to @jluocsa for kicking off the fix — your version got us ~80% of the way there. 🙏
The catch
This tool calls go-github's
client.Search.Code, which hits the legacy REST/search/codeendpoint — not the new code search ("Blackbird"). Verified against the live API:WithContext repo:github/github-mcp-server(baseline)symbol:WithContext repo:github/github-mcp-serverfunc language:go repo:github/github-mcp-server(baseline)path:**/*.go func repo:github/github-mcp-serverpath:*.go func repo:github/github-mcp-serverfilename:*.md repo:github/github-mcp-server/Get|Set/ repo:github/github-mcp-server(Foo OR Bar) -path:vendor language:goSo
symbol:,/regex/, path globs, filename globs, and parenthesized boolean groups — features the proposal in #2442 (and the earlier draft of this PR) listed — silently return zero or fail. Documenting them would teach the model syntax that doesn't work on this endpoint.What this PR does
Rewrites the description to accurately describe legacy
/search/codeand to displace the specific bug patterns observed in #2390:path:diris a prefix, not a glob — displacespath:**/*.tsguesses.filename:exact.extis exact — displacesfilename:*.mdguesses.symbol:,/regex/, path/filename globs, and parenthesized groups are not supported by this endpoint, so the model stops generating them. (Also displaces the\|-in-quotes-for-regex bug.)user:,extension:,in:file,in:path,size:,filename:.OR,NOT, and"quoted phrase"for exact match positively.Before / after
Before:
After:
Verification
All four examples in the new description return non-zero results when run against the live GitHub API:
Locally:
script/lint✅script/test✅Follow-up worth considering (out of scope)
If we want
symbol:,/regex/, and path globs to actually work, the underlying call needs to move off REST/search/codeto the new code search API. Filing as a separate concern.