guard: GraphQL response labeling, backend enrichment, and sub-method skip#2335
guard: GraphQL response labeling, backend enrichment, and sub-method skip#2335
Conversation
The guard's response labeling code only recognized REST-like formats
(arrays, {items: [...]}) but not GraphQL nested responses like
data.repository.pullRequests.nodes. This caused the proxy to apply
coarse-grained filtering (blocking entire responses) instead of
per-item integrity filtering for gh CLI calls.
Changes:
- extract_items_array() now traverses GraphQL nested paths:
data.repository.{issues,pullRequests,discussions,...}.nodes
and data.search.nodes
- extract_graphql_nodes() extracts collection arrays from GraphQL
- extract_graphql_single_object() extracts singular objects
(data.repository.issue, data.repository.pullRequest)
- is_graphql_wrapper() prevents treating {data:...} as a single item
- response_items.rs PR and issue handlers check GraphQL formats
before the single-object fallback
- 9 new tests covering GraphQL collection, single object, search,
path labeling, and wrapper detection
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the GitHub MCP Server omits author_association from list responses (e.g. list_issues via GraphQL, list_pull_requests via REST), the guard now fetches the individual item via a backend call to obtain the correct author_association value. This prevents incorrectly assigning 'none' integrity to repo members/collaborators. Changes: - Add has_author_association() helper to detect absent field - issue_integrity(): enrich via get_issue_author_association when field is missing on public repos (private repos already get writer integrity) - pr_integrity(): enrich via get_pull_request_facts when field is missing, also backfills is_forked and is_merged status - Add 6 tests for has_author_association and enrichment scenarios Upstream fix filed: github/github-mcp-server#2250 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- response_items.rs/response_paths.rs: Skip per-item response labeling for pull_request_read and issue_read sub-methods (get_check_runs, get_files, get_comments, etc.) that return non-PR/issue objects. Resource-level labels from tool_rules provide correct integrity. - tool_rules.rs: Extract repo scope from search_issues and search_pull_requests query strings using extract_repo_info_from_search_query. When repo:owner/repo is present, apply scoped secrecy and integrity instead of deferring entirely to response labeling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the MCP backend returns isError=true (e.g. 404 Not Found), skip per-item response labeling in both response_items.rs and response_paths.rs. Error responses don't contain valid domain objects and would be mislabeled as pr:#unknown or issue:#unknown with none integrity, causing incorrect DIFC filtering. Resource-level labels from tool_rules still apply to error responses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds GraphQL-aware response labeling to the GitHub guard so PR/issue results can be labeled correctly even when returned in GraphQL wrapper shapes, and adjusts workflow wiring to run against a locally built MCPG container.
Changes:
- Improve repo scoping for
search_issues/search_pull_requestsby parsingrepo:owner/repofrom the search query. - Extend response labeling to handle GraphQL collection/single-object shapes and skip per-item labeling for
*_readsub-methods and error responses. - Add backend “enrichment” in
pr_integrity/issue_integritywhenauthor_associationis missing, and update the repo-assist workflow to build/run a local MCPG image.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
guards/github-guard/rust-guard/src/labels/tool_rules.rs |
Adds repo-qualified search scoping for issues/PR search tools. |
guards/github-guard/rust-guard/src/labels/response_paths.rs |
Adds error-response skip + GraphQL items-path support + sub-method skip logic. |
guards/github-guard/rust-guard/src/labels/response_items.rs |
Adds GraphQL node/single-object extraction + sub-method skip logic. |
guards/github-guard/rust-guard/src/labels/mod.rs |
Re-exports new helpers and adds tests for GraphQL extraction helpers. |
guards/github-guard/rust-guard/src/labels/helpers.rs |
Implements GraphQL extraction helpers and backend enrichment for missing author_association. |
.github/workflows/repo-assist.lock.yml |
Switches repo-assist to build and run gh-aw-mcpg:local instead of a pinned image. |
Comments suppressed due to low confidence (4)
.github/workflows/repo-assist.lock.yml:421
- Switching the proxy container from a pinned release image to
ghcr.io/github/gh-aw-mcpg:localmakes this workflow dependent on the local docker build step and removes the reproducibility benefits of the lock file. If the intent is to run published code, keep the pinned tag; if the intent is to test PR builds, consider limiting this to PR-triggered runs (not scheduled) or make it configurable via workflow inputs.
docker run -d --name awmg-proxy --network host \
-e GH_TOKEN \
-e DEBUG='*' \
-v "$PROXY_LOG_DIR:$PROXY_LOG_DIR" \
ghcr.io/github/gh-aw-mcpg:local proxy \
--policy "$POLICY" \
.github/workflows/repo-assist.lock.yml:517
- The image pre-pull step no longer includes the MCPG image; combined with the shift to a locally built MCPG image, this reduces cache usefulness and can increase runtime variability. If the workflow should rely on released images, keep MCPG in the pre-pull list; if it should rely on local builds, consider adjusting the step name/documentation and ensuring scheduled runs aren’t impacted.
- name: Download container images
run: bash ${RUNNER_TEMP}/gh-aw/actions/download_docker_images.sh ghcr.io/github/gh-aw-firewall/agent:0.24.3 ghcr.io/github/gh-aw-firewall/api-proxy:0.24.3 ghcr.io/github/gh-aw-firewall/squid:0.24.3 ghcr.io/github/github-mcp-server:v0.32.0 node:lts-alpine
guards/github-guard/rust-guard/src/labels/helpers.rs:985
pr_integritynow does a backendget_pull_requestcall per item whenauthor_associationis missing. WithMAX_ITEMS_PER_RESPONSE = 100, this can turn a single list/search response into up to 100 extra GitHub API calls, increasing latency and risk of rate limiting. Consider adding caching for PR facts (keyed by repo+number), lowering the max for enrichment, or only enriching whenis_forked/merge state is unknown and the integrity outcome would otherwise remain empty.
// Backend enrichment: when author_association is absent from the response
// (e.g. GitHub MCP Server omits it from MinimalPullRequest), fetch the
// individual PR via REST to obtain the correct association, fork status,
// and merge status.
if integrity.is_empty() && !has_author_association(item) && !repo_private {
if let Some(number) = item.get("number").and_then(|v| v.as_u64()) {
let (owner, repo) = repo_full_name.split_once('/').unwrap_or(("", ""));
if !owner.is_empty() && !repo.is_empty() {
let number_str = number.to_string();
if let Some(facts) =
super::backend::get_pull_request_facts(owner, repo, &number_str)
{
crate::log_debug(&format!(
"[integrity] pr:{}#{} enriched: author_association={:?}, is_forked={:?}, is_merged={}",
repo_full_name, number, facts.author_association, facts.is_forked, facts.is_merged
));
.github/workflows/repo-assist.lock.yml:865
MCP_GATEWAY_DOCKER_COMMANDis updated to runghcr.io/github/gh-aw-mcpg:local, coupling gateway startup to the earlier local docker build. This can be fragile on reruns/cached steps and makes the lock workflow non-reproducible. Prefer the pinned release image for scheduled/default runs, or gate the local image path behind an explicit debug/PR-only switch.
export GH_AW_ENGINE="copilot"
export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host -v /var/run/docker.sock:/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:local'
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Backend enrichment: when author_association is absent from the response | ||
| // (e.g. GitHub MCP Server's list_issues GraphQL path omits it), fetch the | ||
| // individual issue via REST to obtain the correct value. This avoids | ||
| // incorrectly assigning "none" integrity to members/collaborators. | ||
| if integrity.is_empty() && !has_author_association(item) && !repo_private { | ||
| if let Some(number) = item.get("number").and_then(|v| v.as_u64()) { | ||
| let (owner, repo) = repo_full_name.split_once('/').unwrap_or(("", "")); | ||
| if !owner.is_empty() && !repo.is_empty() { | ||
| let number_str = number.to_string(); | ||
| if let Some(association) = | ||
| super::backend::get_issue_author_association(owner, repo, &number_str) | ||
| { | ||
| crate::log_debug(&format!( | ||
| "[integrity] issue:{}#{} enriched author_association='{}'", | ||
| repo_full_name, number, association | ||
| )); | ||
| // Re-check trusted bot status with enriched login | ||
| let enriched_floor = | ||
| author_association_floor_from_str(repo_full_name, Some(&association), ctx); | ||
| integrity = | ||
| max_integrity(repo_full_name, integrity, enriched_floor, ctx); | ||
| } else { |
There was a problem hiding this comment.
issue_integrity adds per-item backend enrichment via get_issue_author_association when author_association is missing. This can create an N+1 pattern (up to MAX_ITEMS_PER_RESPONSE extra backend calls) on list/search issue responses. Consider caching associations (repo+issue number) or otherwise bounding/avoiding enrichment to reduce latency and rate-limit exposure.
This issue also appears on line 970 of the same file.
| # NOTE: Local container build kept for debugging. Uncomment to test unpublished changes. | ||
| # - name: Build local MCPG container (debugging only) | ||
| # run: | | ||
| # rustup target add wasm32-wasip1 | ||
| # cd guards/github-guard/rust-guard && ./build.sh && cd ../../.. | ||
| # docker build . -t ghcr.io/github/gh-aw-mcpg:local | ||
| - name: Build local MCPG container (debugging only) | ||
| run: | | ||
| rustup target add wasm32-wasip1 | ||
| cd guards/github-guard/rust-guard && ./build.sh && cd ../../.. | ||
| docker build . -t ghcr.io/github/gh-aw-mcpg:local |
There was a problem hiding this comment.
This step is labeled “debugging only” and the preceding note says it should be uncommented for debugging, but it is now always enabled. Building the MCPG container in this scheduled workflow will add significant runtime and introduces additional failure modes (toolchain install, docker build). If this is meant for occasional debugging, gate it behind an input/flag and keep the default path using the pinned published image.
This issue also appears in the following locations of the same file:
- line 416
- line 516
- line 864
| "[integrity] issue:{}#{} enriched author_association='{}'", | ||
| repo_full_name, number, association | ||
| )); | ||
| // Re-check trusted bot status with enriched login |
There was a problem hiding this comment.
The comment // Re-check trusted bot status with enriched login is misleading here: the enrichment path only fetches author_association (no author login), so there is no bot re-check happening. Please update/remove the comment to reflect what the code actually does.
| // Re-check trusted bot status with enriched login | |
| // Recompute integrity floor using enriched author_association |
| desc = format!("search_issues:{}", q_repo_id); | ||
| secrecy = | ||
| apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx); | ||
| integrity = private_writer_integrity(&q_repo_id, repo_private, ctx); |
There was a problem hiding this comment.
In the repo-qualified search path, repo_private is computed from extract_repo_info(tool_args) (which is typically empty for search tools), so private_writer_integrity(&q_repo_id, repo_private, ctx) will never grant writer integrity even when the repo:owner/repo target is private. Compute repo visibility for q_owner/q_repo and pass that result into private_writer_integrity (or return the privacy flag from apply_repo_visibility_secrecy). Also update baseline_scope to q_repo_id when you emit repo-scoped integrity so ensure_integrity_baseline doesn’t downgrade labels.
| integrity = private_writer_integrity(&q_repo_id, repo_private, ctx); | |
| // Derive repo privacy from the applied secrecy labels for this repo scope | |
| let repo_private_for_search = | |
| secrecy.contains(&policy_private_scope_label(&q_repo_id, ctx)); | |
| integrity = private_writer_integrity(&q_repo_id, repo_private_for_search, ctx); | |
| // Ensure the integrity baseline is applied at the repo scope for this query | |
| baseline_scope = q_repo_id.clone(); |
| desc = format!("search_pull_requests:{}", q_repo_id); | ||
| secrecy = | ||
| apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx); | ||
| integrity = private_writer_integrity(&q_repo_id, repo_private, ctx); |
There was a problem hiding this comment.
Same issue as search_issues: repo_private here is derived from tool_args owner/repo (usually absent for search), so private_writer_integrity(&q_repo_id, repo_private, ctx) won’t grant writer integrity for private repos even when the query includes repo:owner/repo. Determine privacy for q_owner/q_repo and set baseline_scope = q_repo_id when applying repo-scoped integrity.
| desc = format!("search_pull_requests:{}", q_repo_id); | |
| secrecy = | |
| apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx); | |
| integrity = private_writer_integrity(&q_repo_id, repo_private, ctx); | |
| // Determine privacy for the repository referenced in the query and | |
| // scope integrity baselines to that repository. | |
| let (_, _, _, q_repo_private) = extract_repo_info(&q_owner, &q_repo, ctx); | |
| desc = format!("search_pull_requests:{}", q_repo_id); | |
| secrecy = | |
| apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx); | |
| baseline_scope = q_repo_id.clone(); | |
| integrity = private_writer_integrity(&q_repo_id, q_repo_private, ctx); |
Switch repo-assist.lock.yml from local container build to published `ghcr.io/github/gh-aw-mcpg:v0.1.25` image (includes GraphQL response labeling, backend enrichment, sub-method skip, and error response handling fixes from #2335). ### Changes - Comment out local container build step (kept for debugging) - Update proxy image: `:local` → `:v0.1.25` - Update gateway image: `:local` → `:v0.1.25`
## Summary Switch repo-assist from local container build to published image `v0.1.26`, which includes all DIFC integrity filtering fixes: - **Backend enrichment tool name fix** (PR #2340): corrected `get_pull_request` → `pull_request_read` and added missing `method: "get"` parameters - **Sub-method response labeling skip** (PR #2335): prevents mislabeling `get_check_runs`/`get_comments` sub-method responses - **isError response skip** (PR #2335): avoids labeling 404/error responses as domain objects - **Search scope extraction** (PR #2335): enables filtering for `search_issues`/`search_pull_requests` ## Changes - Comment out local container build step (kept as comment for future debugging) - Update proxy image tag: `:local` → `:v0.1.26` - Update MCP gateway image tag: `:local` → `:v0.1.26` ## Validation v0.1.26 was validated with **0 DIFC-FILTERED events** in run [23412918230](https://github.com/github/gh-aw-mcpg/actions/runs/23412918230).
Summary
Fixes integrity filtering bugs in the GitHub guard that caused false positives when processing responses from the GitHub MCP Server.
Problem
The GitHub MCP Server returns responses in different formats depending on the API used (GraphQL vs REST), and several fields critical for integrity labeling are missing:
list_issuesandlist_pull_requestsreturn GraphQL-shaped responses ({issues: [...]},{pullRequests: [...]}) that the guard's response labeling didn't recognizeauthor_association— GraphQLIssueFragmentdoesn't requestauthorAssociation;MinimalPullRequestlacks the field entirely (github/github-mcp-server#2250)pull_request_read(method: get_check_runs)returns check run objects, but the guard tried to parse them as PRs →pr:#unknownwithnoneintegrity → incorrectly filteredpr:#unknown→ filteredFixes
GraphQL response format support (
response_items.rs,response_paths.rs) — Handle{issues: [...]},{pullRequests: [...]}, and GraphQL single-node formats alongside existing REST array formatBackend enrichment (
helpers.rs) — Whenauthor_associationis missing on public repos, the guard now makes backend calls to fetch it:issue_read(REST endpoint includes the field)get_pull_request_factsfor association + fork/merge statusSub-method skip (
response_items.rs,response_paths.rs) — Forpull_request_readandissue_readsub-methods (get_check_runs,get_files,get_comments, etc.), skip per-item response labeling. These return non-PR/issue objects. Resource-level labels fromtool_rulesprovide correct integrity.Error response skip (
response_items.rs,response_paths.rs) — WhenisError=true, skip response labeling entirely. Resource-level labels handle error cases.Search scope extraction (
tool_rules.rs) —search_issuesandsearch_pull_requestsnow extractrepo:owner/repofrom query strings to apply scoped secrecy/integrity labels.Validation
Tested via 5 sequential repo-assist CI runs:
Related
author_associationin GraphQL and REST responsesNote
repo-assist.lock.ymlis temporarily set to build a local container for testing. Must be reverted before merge.