fix: URL-based fallback for search result number and repo extraction#2379
fix: URL-based fallback for search result number and repo extraction#2379
Conversation
Search results from the MCP gateway were getting filtered with 'none' integrity because items lacked 'number' and 'base.repo.full_name' fields. Changes: - extract_resource_number: falls back to parsing html_url/url for trailing number segment when the number field is missing - pr_integrity/issue_integrity: enrichment now uses URL-based number fallback, enabling REST enrichment for search result items - response_items.rs: PR items fall back to extract_repo_from_item() (parses repository_url, html_url) when base/head repo info is missing - tool_rules.rs: search_issues and search_pull_requests now check owner/repo in tool_args as fallback when query extraction fails - Added 8 new tests (154 total, all passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Temporarily switch repo-assist to build :local from this branch to validate the search result filtering fixes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses missing per-item integrity labeling for GitHub search results returned by the MCP gateway by adding fallbacks to derive issue/PR numbers and repository identity when key fields are absent.
Changes:
- Add URL-based number extraction fallback (
html_url/url) for issue/PR labeling and backend enrichment. - Add repository extraction fallback for PR search results when
base/headrepo info is missing. - Add tool_args (
owner/repo) fallback for determining repo scope insearch_issues/search_pull_requeststool labeling. - Update
repo-assistlock workflow to build and run a local gateway container image for validation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| guards/github-guard/rust-guard/src/labels/helpers.rs | Adds URL-based fallback to extract issue/PR numbers and use it for enrichment. |
| guards/github-guard/rust-guard/src/labels/response_items.rs | Adds PR repo fallback extraction for search results missing base/head repo fields. |
| guards/github-guard/rust-guard/src/labels/tool_rules.rs | Adds repo-scope fallback for search tools using owner/repo from tool_args. |
| guards/github-guard/rust-guard/src/labels/mod.rs | Adds unit tests covering new URL/repo extraction behavior. |
| .github/workflows/repo-assist.lock.yml | Builds and uses a :local container image for end-to-end workflow validation. |
Comments suppressed due to low confidence (1)
guards/github-guard/rust-guard/src/labels/tool_rules.rs:252
- Same issue as search_issues: this branch scopes labels to
s_repo_idbut leavesbaseline_scopeandrepo_privatetied to the initial tool_args-derived repo context. If repo scope is extracted from the query (or tool_args omit owner/repo),ensure_integrity_baselinemay strip the scoped integrity and private repos may not receive writer integrity. Recommend settingbaseline_scope = s_repo_id.clone()when available and derivingrepo_privatefroms_owner/s_repofor the private-writer check.
if !s_repo_id.is_empty() {
desc = format!("search_pull_requests:{}", s_repo_id);
secrecy =
apply_repo_visibility_secrecy(&s_owner, &s_repo, &s_repo_id, secrecy, ctx);
integrity = private_writer_integrity(&s_repo_id, repo_private, ctx);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !s_repo_id.is_empty() { | ||
| desc = format!("search_issues:{}", s_repo_id); | ||
| secrecy = | ||
| apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx); | ||
| integrity = private_writer_integrity(&q_repo_id, repo_private, ctx); | ||
| apply_repo_visibility_secrecy(&s_owner, &s_repo, &s_repo_id, secrecy, ctx); | ||
| integrity = private_writer_integrity(&s_repo_id, repo_private, ctx); |
There was a problem hiding this comment.
In this search_issues branch, integrity/secrecy are scoped to s_repo_id, but baseline_scope and repo_private are still derived from the top-level repo_id / tool_args owner+repo. If the repo scope comes from the query (or differs from tool_args), ensure_integrity_baseline can downgrade scoped integrity to none:*, and private repos may incorrectly get empty integrity. Consider setting baseline_scope = s_repo_id.clone() when present and computing repo_private using s_owner/s_repo for the private-writer decision.
This issue also appears on line 248 of the same file.
See below for a potential fix:
// Only reuse the top-level repo_private flag when the scoped repo_id matches.
let scoped_repo_private = if !s_repo_id.is_empty() && s_repo_id == repo_id {
repo_private
} else {
false
};
if !s_repo_id.is_empty() {
desc = format!("search_issues:{}", s_repo_id);
secrecy =
apply_repo_visibility_secrecy(&s_owner, &s_repo, &s_repo_id, secrecy, ctx);
integrity = private_writer_integrity(&s_repo_id, scoped_repo_private, ctx);
| - name: Build local container image | ||
| run: | | ||
| # Install Rust toolchain for WASM guard build | ||
| curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable | ||
| source "$HOME/.cargo/env" | ||
| rustup target add wasm32-wasip1 | ||
|
|
There was a problem hiding this comment.
This workflow step installs Rust via curl https://sh.rustup.rs | sh and uses an unpinned stable toolchain. That makes the build less reproducible and increases supply-chain risk for the CI environment. Prefer using a pinned toolchain version (e.g., via dtolnay/rust-toolchain/actions-rs/toolchain) and avoiding piping curl directly into a shell (or verifying the installer/signature).
| - name: Build local container image | |
| run: | | |
| # Install Rust toolchain for WASM guard build | |
| curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable | |
| source "$HOME/.cargo/env" | |
| rustup target add wasm32-wasip1 | |
| - name: Install Rust toolchain | |
| uses: actions-rs/toolchain@v1 | |
| with: | |
| toolchain: 1.76.0 | |
| target: wasm32-wasip1 | |
| default: true | |
| profile: minimal | |
| - name: Build local container image | |
| run: | |
When search_pull_requests tool_args lack owner/repo (only query parameter), the guard couldn't determine repo privacy. This caused private repo items to be treated as public, skipping the automatic approved-level integrity boost and failing enrichment. Changes: - tool_rules.rs: check is_repo_private using search query's repo when tool_args-based repo_private is None - response_items.rs: extract repo from query string for default_repo_private - response_paths.rs: same query-based repo fallback for both PR and issue search response labeling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When search APIs return zero results ({"total_count":0} with no items
array), the response was incorrectly treated as a single data item via
the is_object() fallback, producing #unknown number and none integrity.
Similarly, when the MCP server returns a plain-text error message (not
JSON) in content[0].text, extract_mcp_response leaves the MCP wrapper
unchanged, and the wrapper was treated as a single data item.
Root cause: the single-item fallback in response_items.rs only checked
is_graphql_wrapper() but not these two additional wrapper types.
Fix:
- Add is_search_result_wrapper() to detect {total_count:N} objects
- Add is_mcp_text_wrapper() to detect MCP content wrappers with
non-JSON text
- Both single-item fallback locations now exclude these wrapper types
- Empty search results and text errors produce zero labeled items,
falling back to resource-level labels from tool_rules
Validated against CI run 23451003849 JSONL logs which showed:
- search_pull_requests: {total_count:0,incomplete_results:false}
- search_issues: {total_count:0,incomplete_results:false}
- list_issues page=2: cursor-based pagination text error
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply the same is_search_result_wrapper and is_mcp_text_wrapper guards to all remaining single-item fallback locations in response_items.rs: - get_file_contents - list_commits / get_commit - list_gists / get_gist - list_releases / get_latest_release / get_release_by_tag These had unprotected is_object() fallbacks that would treat MCP text errors or empty search wrappers as data items, producing incorrect per-item labels. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MCP text error messages (e.g., pagination guidance) and empty search results (total_count:0) contain no repository data — they are server-generated metadata. Previously these fell through to resource-level labels which gave them 'none' integrity for public repos, causing DIFC filtering to block the agent from seeing helpful instructional messages. Now the fallback in label_response detects server metadata via is_mcp_text_wrapper and is_search_result_wrapper(total_count==0) and labels them with approved:<scope> integrity so they pass through to the agent. Also extends infer_scope_for_baseline to handle search_issues and search_pull_requests (previously only search_code), ensuring proper scope inference from repo:owner/name in search queries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit used raw repo-scoped tags (approved:github/gh-aw-mcpg) but the DIFC system uses exact tag matching. When the policy scope is owner-level (github/*), the agent's integrity uses 'approved:github' so repo-scoped tags don't match. Use writer_integrity() which calls normalize_scope() to produce tags that match the policy scope token. Also extend infer_scope_for_baseline to handle search_issues and search_pull_requests queries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
Search results from the MCP gateway were getting filtered with
noneintegrity because items lackednumberandbase.repo.full_namefields that the guard needs for per-item integrity labeling and enrichment.Symptoms:
search_issues: items labeledissue:github/gh-aw-mcpg#unknownwithintegrity: ["none:github"]search_pull_requests: items labeledpr:#unknownwithintegrity: ["none"]Root Cause
extract_resource_number()only checkeditem.number— search result items from the MCP server may not include this field directlyresponse_items.rsPR handler only checkedbase.repo.full_name/head.repo.full_name— search results don't have this structureowner/repoin tool_args when query string extraction failedChanges
helpers.rs— URL-based fallback extractionextract_resource_number(): Falls back to parsinghtml_url/urlfor trailing number (e.g..../issues/2093→2093)extract_number_from_url()helperpr_integrity()/issue_integrity(): Enrichment now uses URL-based number fallbackresponse_items.rs— PR repo fallbackextract_repo_from_item()(parsesrepository_url,html_url) whenbase/headrepo info is missingtool_rules.rs— Search scope from tool_argssearch_issuesandsearch_pull_requestsnow checkowner/repoin tool_args as fallback when query extraction failsTests
Validation
repo-assistworkflow configured to build local container image for end-to-end testing.