fix: URL-based fallback for search result number and repo extraction by lpcox · Pull Request #2379 · github/gh-aw-mcpg

lpcox · 2026-03-23T16:16:06Z

Problem

Search results from the MCP gateway were getting filtered with none integrity because items lacked number and base.repo.full_name fields that the guard needs for per-item integrity labeling and enrichment.

Symptoms:

search_issues: items labeled issue:github/gh-aw-mcpg#unknown with integrity: ["none:github"]
search_pull_requests: items labeled pr:#unknown with integrity: ["none"]

Root Cause

Missing number field: extract_resource_number() only checked item.number — search result items from the MCP server may not include this field directly
Missing repo for PRs: response_items.rs PR handler only checked base.repo.full_name / head.repo.full_name — search results don't have this structure
Blocked enrichment: Without a number, REST enrichment calls couldn't proceed, leaving items with empty integrity
No tool_args fallback: search tools didn't check owner/repo in tool_args when query string extraction failed

Changes

`helpers.rs` — URL-based fallback extraction

extract_resource_number(): Falls back to parsing html_url/url for trailing number (e.g. .../issues/2093 → 2093)
New extract_number_from_url() helper
pr_integrity() / issue_integrity(): Enrichment now uses URL-based number fallback

`response_items.rs` — PR repo fallback

PR items now fall back to extract_repo_from_item() (parses repository_url, html_url) when base/head repo info is missing

`tool_rules.rs` — Search scope from tool_args

Both search_issues and search_pull_requests now check owner/repo in tool_args as fallback when query extraction fails

Tests

8 new tests (154 total, all passing)
URL-based number extraction (direct, html_url, api url, preference, unknown)
PR search with repository_url fallback (items + paths)
Issue search URL number fallback

Validation

repo-assist workflow configured to build local container image for end-to-end testing.

Search results from the MCP gateway were getting filtered with 'none' integrity because items lacked 'number' and 'base.repo.full_name' fields. Changes: - extract_resource_number: falls back to parsing html_url/url for trailing number segment when the number field is missing - pr_integrity/issue_integrity: enrichment now uses URL-based number fallback, enabling REST enrichment for search result items - response_items.rs: PR items fall back to extract_repo_from_item() (parses repository_url, html_url) when base/head repo info is missing - tool_rules.rs: search_issues and search_pull_requests now check owner/repo in tool_args as fallback when query extraction fails - Added 8 new tests (154 total, all passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Temporarily switch repo-assist to build :local from this branch to validate the search result filtering fixes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR addresses missing per-item integrity labeling for GitHub search results returned by the MCP gateway by adding fallbacks to derive issue/PR numbers and repository identity when key fields are absent.

Changes:

Add URL-based number extraction fallback (html_url / url) for issue/PR labeling and backend enrichment.
Add repository extraction fallback for PR search results when base/head repo info is missing.
Add tool_args (owner/repo) fallback for determining repo scope in search_issues / search_pull_requests tool labeling.
Update repo-assist lock workflow to build and run a local gateway container image for validation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
guards/github-guard/rust-guard/src/labels/helpers.rs	Adds URL-based fallback to extract issue/PR numbers and use it for enrichment.
guards/github-guard/rust-guard/src/labels/response_items.rs	Adds PR repo fallback extraction for search results missing `base/head` repo fields.
guards/github-guard/rust-guard/src/labels/tool_rules.rs	Adds repo-scope fallback for search tools using `owner`/`repo` from tool_args.
guards/github-guard/rust-guard/src/labels/mod.rs	Adds unit tests covering new URL/repo extraction behavior.
.github/workflows/repo-assist.lock.yml	Builds and uses a `:local` container image for end-to-end workflow validation.

Comments suppressed due to low confidence (1)

guards/github-guard/rust-guard/src/labels/tool_rules.rs:252

Same issue as search_issues: this branch scopes labels to s_repo_id but leaves baseline_scope and repo_private tied to the initial tool_args-derived repo context. If repo scope is extracted from the query (or tool_args omit owner/repo), ensure_integrity_baseline may strip the scoped integrity and private repos may not receive writer integrity. Recommend setting baseline_scope = s_repo_id.clone() when available and deriving repo_private from s_owner/s_repo for the private-writer check.

            if !s_repo_id.is_empty() {
                desc = format!("search_pull_requests:{}", s_repo_id);
                secrecy =
                    apply_repo_visibility_secrecy(&s_owner, &s_repo, &s_repo_id, secrecy, ctx);
                integrity = private_writer_integrity(&s_repo_id, repo_private, ctx);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T16:20:05Z

+            if !s_repo_id.is_empty() {
+                desc = format!("search_issues:{}", s_repo_id);
                secrecy =
-                    apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx);
-                integrity = private_writer_integrity(&q_repo_id, repo_private, ctx);
+                    apply_repo_visibility_secrecy(&s_owner, &s_repo, &s_repo_id, secrecy, ctx);
+                integrity = private_writer_integrity(&s_repo_id, repo_private, ctx);


In this search_issues branch, integrity/secrecy are scoped to s_repo_id, but baseline_scope and repo_private are still derived from the top-level repo_id / tool_args owner+repo. If the repo scope comes from the query (or differs from tool_args), ensure_integrity_baseline can downgrade scoped integrity to none:*, and private repos may incorrectly get empty integrity. Consider setting baseline_scope = s_repo_id.clone() when present and computing repo_private using s_owner/s_repo for the private-writer decision.

This issue also appears on line 248 of the same file.

See below for a potential fix:

// Only reuse the top-level repo_private flag when the scoped repo_id matches. let scoped_repo_private = if !s_repo_id.is_empty() && s_repo_id == repo_id { repo_private } else { false }; if !s_repo_id.is_empty() { desc = format!("search_issues:{}", s_repo_id); secrecy = apply_repo_visibility_secrecy(&s_owner, &s_repo, &s_repo_id, secrecy, ctx); integrity = private_writer_integrity(&s_repo_id, scoped_repo_private, ctx);

Copilot · 2026-03-23T16:20:07Z

+      - name: Build local container image
+        run: |
+          # Install Rust toolchain for WASM guard build
+          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
+          source "$HOME/.cargo/env"
+          rustup target add wasm32-wasip1
+


This workflow step installs Rust via curl https://sh.rustup.rs | sh and uses an unpinned stable toolchain. That makes the build less reproducible and increases supply-chain risk for the CI environment. Prefer using a pinned toolchain version (e.g., via dtolnay/rust-toolchain/actions-rs/toolchain) and avoiding piping curl directly into a shell (or verifying the installer/signature).

Suggested change

- name: Build local container image

run: |

# Install Rust toolchain for WASM guard build

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable

source "$HOME/.cargo/env"

rustup target add wasm32-wasip1

- name: Install Rust toolchain

uses: actions-rs/toolchain@v1

with:

toolchain: 1.76.0

target: wasm32-wasip1

default: true

profile: minimal

- name: Build local container image

run: |

When search_pull_requests tool_args lack owner/repo (only query parameter), the guard couldn't determine repo privacy. This caused private repo items to be treated as public, skipping the automatic approved-level integrity boost and failing enrichment. Changes: - tool_rules.rs: check is_repo_private using search query's repo when tool_args-based repo_private is None - response_items.rs: extract repo from query string for default_repo_private - response_paths.rs: same query-based repo fallback for both PR and issue search response labeling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When search APIs return zero results ({"total_count":0} with no items array), the response was incorrectly treated as a single data item via the is_object() fallback, producing #unknown number and none integrity. Similarly, when the MCP server returns a plain-text error message (not JSON) in content[0].text, extract_mcp_response leaves the MCP wrapper unchanged, and the wrapper was treated as a single data item. Root cause: the single-item fallback in response_items.rs only checked is_graphql_wrapper() but not these two additional wrapper types. Fix: - Add is_search_result_wrapper() to detect {total_count:N} objects - Add is_mcp_text_wrapper() to detect MCP content wrappers with non-JSON text - Both single-item fallback locations now exclude these wrapper types - Empty search results and text errors produce zero labeled items, falling back to resource-level labels from tool_rules Validated against CI run 23451003849 JSONL logs which showed: - search_pull_requests: {total_count:0,incomplete_results:false} - search_issues: {total_count:0,incomplete_results:false} - list_issues page=2: cursor-based pagination text error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Apply the same is_search_result_wrapper and is_mcp_text_wrapper guards to all remaining single-item fallback locations in response_items.rs: - get_file_contents - list_commits / get_commit - list_gists / get_gist - list_releases / get_latest_release / get_release_by_tag These had unprotected is_object() fallbacks that would treat MCP text errors or empty search wrappers as data items, producing incorrect per-item labels. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

MCP text error messages (e.g., pagination guidance) and empty search results (total_count:0) contain no repository data — they are server-generated metadata. Previously these fell through to resource-level labels which gave them 'none' integrity for public repos, causing DIFC filtering to block the agent from seeing helpful instructional messages. Now the fallback in label_response detects server metadata via is_mcp_text_wrapper and is_search_result_wrapper(total_count==0) and labels them with approved:<scope> integrity so they pass through to the agent. Also extends infer_scope_for_baseline to handle search_issues and search_pull_requests (previously only search_code), ensuring proper scope inference from repo:owner/name in search queries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous commit used raw repo-scoped tags (approved:github/gh-aw-mcpg) but the DIFC system uses exact tag matching. When the policy scope is owner-level (github/*), the agent's integrity uses 'approved:github' so repo-scoped tags don't match. Use writer_integrity() which calls normalize_scope() to produce tags that match the policy scope token. Also extend infer_scope_for_baseline to handle search_issues and search_pull_requests queries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

lpcox and others added 2 commits March 23, 2026 09:14

ci: build local container image for search filtering test

74ebdb5

Temporarily switch repo-assist to build :local from this branch to validate the search result filtering fixes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 23, 2026 16:16

Copilot started reviewing on behalf of lpcox March 23, 2026 16:16 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

lpcox and others added 5 commits March 23, 2026 10:27

lpcox merged commit 6859dff into main Mar 23, 2026
9 checks passed

lpcox deleted the fix/search-result-filtering branch March 23, 2026 21:01

This was referenced Mar 23, 2026

Smoke Test: Copilot - 23459973971 #2385

Closed

Smoke Test: Copilot - 23461372472 #2389

Closed

Guards and Integrity: tracking issue #1711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: URL-based fallback for search result number and repo extraction#2379

fix: URL-based fallback for search result number and repo extraction#2379
lpcox merged 7 commits intomainfrom
fix/search-result-filtering

lpcox commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lpcox commented Mar 23, 2026

Problem

Root Cause

Changes

helpers.rs — URL-based fallback extraction

response_items.rs — PR repo fallback

tool_rules.rs — Search scope from tool_args

Tests

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`helpers.rs` — URL-based fallback extraction

`response_items.rs` — PR repo fallback

`tool_rules.rs` — Search scope from tool_args