Skip to content

fix(triage-panel): use search_issues with -label:status/triaged for sweep#1194

Merged
danielmeppiel merged 1 commit into
mainfrom
fix/triage-search-issues-untriaged
May 7, 2026
Merged

fix(triage-panel): use search_issues with -label:status/triaged for sweep#1194
danielmeppiel merged 1 commit into
mainfrom
fix/triage-search-issues-untriaged

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

#1193 (paginate list_issues oldest-first) did not fix the throughput problem. The first dispatch on the fixed main (run 25514877383) noop'd the sweep — 0 issues triaged — while 11 untriaged issues sat in the queue. This PR replaces the prose-driven pagination loop with a server-side search_issues query that returns only untriaged candidates.

Why #1193 failed (RCA from MCP logs)

Two compounding effects sabotaged the pagination approach:

1. MCP gateway DIFC filter silently drops non-collaborator issues mid-page.
/tmp/fixed-agent/mcp-logs/rpc-messages.jsonl shows 17 of 30 returned issues were filtered with messages like:

Resource 'issue:microsoft/apm#20' has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

The agent saw 13 of 30 items even though the underlying API returned 30. This breaks any "did I get a full page?" heuristic.

2. The LLM hallucinated hasNextPage: false.
The actual MCP response (75 KB into the payload) clearly contained:

pageInfo: {"hasNextPage":true, "hasPreviousPage":false, "startCursor":"...", "endCursor":"..."}

But the agent's final output said:

"Zero untriaged candidates found across the full open-issue queue (single page, hasNextPage implicitly false with 13 of 30 requested)."

It ignored the explicit field and invented an inference from the short count. Prose pagination instructions cannot recover from this — the LLM is unreliable at multi-page loops driven by deeply-nested JSON fields.

Fix

Replace list_issues-with-prose-pagination with search_issues using a server-side filter:

search_issues(
  query: "repo:microsoft/apm is:open is:issue -label:status/triaged sort:created-asc",
  perPage: 30,
)

GitHub Search supports -label: negation, so the response contains only the candidates we want. One page = one tool call = no LLM reasoning about pagination. sort:created-asc returns oldest-first; the first 30 results are the correct batch.

What changed

  • .github/workflows/triage-panel.md SCHEDULED_SWEEP gather step rewritten to use search_issues and explain why (~33 lines deleted, ~28 added).
  • The status/triaged-label-exclusion bullet in the in-reasoning filter list is removed (now redundant — the search query already excluded those issues).
  • The "After dropping ... sort by createdAt ascending" line is removed (redundant — query already sorts).
  • Spam-shape, bot, locked, template, and per-author-quota filters all unchanged.
  • CHANGELOG.md updated under [Unreleased] / Fixed.

Lockfile

triage-panel.lock.yml uses {{#runtime-import .github/workflows/triage-panel.md}} (line 280), so .md edits take effect on the next workflow run with no recompile needed.

Validation

  • uv run --extra dev ruff check src/ tests/ → All checks passed
  • uv run --extra dev ruff format --check src/ tests/ → 705 files already formatted

Manual validation will be on the next scheduled cron tick (or a manual dispatch). Expected behavior: search_issues returns 11 untriaged candidates oldest-first; the agent triages up to 10 of them and rolls 1 to tomorrow.

Why this is robust

  • Server-side filter = no client reasoning about which issues to skip.
  • No pagination loop = no hasNextPage inference for the LLM to get wrong.
  • DIFC-resilient = even if some search results are integrity-filtered client-side, the agent's only decision is "process whichever survive" rather than "should I fetch more?" — short page = small sweep, not a noop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…weep

The previous SCHEDULED_SWEEP gather step instructed the agent to call
list_issues paginated oldest-first and filter status/triaged in its
reasoning step. Empirical observation of run 25514877383 showed this
fails for two compounding reasons:

1. The MCP gateway DIFC integrity filter silently drops issues from
   non-collaborator authors *after* the API response returns. Result:
   the agent sees fewer items than perPage requested even though
   pageInfo.hasNextPage is true.

2. The LLM ignores the pageInfo block (75 KB into the response) and
   infers 'hasNextPage implicitly false' from the short item count,
   then noops the entire sweep. Prose mandating pagination loops is
   not enforceable in this failure mode.

Fix: replace list_issues with search_issues using the query
'repo:microsoft/apm is:open is:issue -label:status/triaged
sort:created-asc'. GitHub Search supports negative label filters, so
the response contains *only* untriaged candidates oldest-first. One
page is enough; no pagination loop, no LLM-driven label exclusion.

Lockfile is unchanged because triage-panel.lock.yml uses
{{#runtime-import .github/workflows/triage-panel.md}}, so .md edits
take effect at next workflow run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 7, 2026 19:55
@danielmeppiel danielmeppiel merged commit c71c600 into main May 7, 2026
17 checks passed
@danielmeppiel danielmeppiel deleted the fix/triage-search-issues-untriaged branch May 7, 2026 19:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the triage-panel scheduled sweep to avoid LLM-driven pagination failures by switching candidate collection to a server-side search_issues query that returns only untriaged issues.

Changes:

  • Replace list_issues + prose pagination instructions with a single search_issues query using -label:status/triaged sort:created-asc.
  • Remove redundant client-side “exclude triaged label” and “sort oldest-first” steps from the sweep instructions.
  • Add changelog entries describing the sweep behavior change and the underlying RCA.
Show a summary per file
File Description
CHANGELOG.md Documents the sweep behavior changes and rationale under Unreleased/Fixed.
.github/workflows/triage-panel.md Rewrites the scheduled sweep gather instructions to use search_issues with server-side filtering and no pagination.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 4

Comment thread CHANGELOG.md
Comment on lines +19 to +20
- `triage-panel` scheduled sweep now paginates the candidate query oldest-first via the GitHub MCP `list_issues` tool instead of a single 200-issue page, so daily runs actually drain the untriaged backlog rather than processing one issue per cron tick. (#1193)
- `triage-panel` scheduled sweep switches the candidate query from `list_issues`+prose-driven pagination to `search_issues` with `-label:status/triaged sort:created-asc`, so untriaged candidates are filtered server-side; the previous approach silently noop'd because the MCP gateway DIFC filter dropped non-collaborator issues mid-page and the agent inferred a false `hasNextPage:false`.
direction: "ASC",
perPage: 30,
search_issues(
query: "repo:microsoft/apm is:open is:issue -label:status/triaged sort:created-asc",
Comment on lines +296 to +297
want to drain. **One page is enough.** If `total_count` is greater
than 30 the extras roll to tomorrow's sweep; do not paginate.
Comment on lines +279 to 282
search_issues(
query: "repo:microsoft/apm is:open is:issue -label:status/triaged sort:created-asc",
perPage: 30,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants