Repo Pulse v2: deterministic pre-fetch + disable integrity filter#16421
Repo Pulse v2: deterministic pre-fetch + disable integrity filter#16421
Conversation
|
FYI @davidfowl — this is the v2 fix for the Repo Pulse missing-PRs issue surfaced in the Teams thread. |
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16421Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16421" |
There was a problem hiding this comment.
Pull request overview
Reworks the Repo Pulse agentic workflow to make daily reporting deterministic by pre-fetching/sanitizing GitHub data before the agent runs, and disables the GitHub MCP integrity filter since the agent should no longer ingest untrusted GitHub content directly.
Changes:
- Adds
pre-agent-stepsthat fetch Repo Pulse datasets viagh api --paginate, sanitizes fields (notably titles), and writes a local.repo-pulse/JSON bundle for the agent to render. - Disables the GitHub MCP integrity filter (
min-integrity: none) and restricts MCP toolsets torepos(agent is instructed not to do any GitHub search/list calls). - Updates the locked workflow to include the new prefetch step and updated MCP guard configuration.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| .github/workflows/repo-pulse.md | Adds deterministic prefetch + sanitization and updates the agent instructions/guardrails to consume only .repo-pulse/*.json. |
| .github/workflows/repo-pulse.lock.yml | Regenerates the locked workflow to run the prefetch script and apply the updated MCP configuration. |
Rework the Repo Pulse workflow so every PR and issue in the 3-day window reliably shows up in the pinned report (#16404). The first production run was missing items that the team expected to see; enumeration depended on the model's pagination behaviour. Changes: 1. Deterministic data collection in a `pre-agent-steps` block using `gh api --paginate`. All five sections' data is collected into JSON files under `.repo-pulse/` before the agent starts, on the same runner/workspace. The agent reads those files and renders the report; it no longer enumerates GitHub itself. 2. Smaller, schema-tight input to the agent: only the fields the report actually uses (number, title, author, labels, timestamps, URL). Titles are normalized before being handed to the agent. 3. Filter out `quarantined-test` and `failing-test` labels from the Activity Highlights section per team feedback. Filed Issues is unchanged (new issues with those labels are still legit new work). 4. Precomputed `See all'' URLs are passed through `meta.json` so the agent doesn't have to build GitHub search queries itself. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses review feedback on PR #16421 around daily-job reliability: - Each section's fetch now runs independently: a single gh api failure no longer aborts the entire report. Failing sections write an empty array and surface a warning; remaining sections still render. - The search/issues API can set incomplete_results: true when the backend times out. We now check this signal across all paginated responses and record a warning rather than silently publishing a partial slice as if it were complete. - Warnings are accumulated into meta.data_quality_warnings (array of strings). The agent prompt now instructs the renderer to emit a banner at the top of the pinned issue when that array is non-empty, so readers can see at a glance that the dashboard may be partial. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🎬 CLI E2E Test Recordings — 75 recordings uploaded (commit View all recordings
📹 Recordings uploaded automatically from CI run #24911440426 |
Description
Rework the Repo Pulse daily workflow so every PR and issue in the window reliably shows up in the pinned report (#16404). The first production run was missing items that the team expected to see.
What's changing
pre-agent-stepsblock usinggh api --paginatebefore the agent starts, on the same runner/workspace. The agent just reads the resulting JSON files and renders the report — it no longer enumerates GitHub itself, so coverage is no longer model-dependent.quarantined-test/failing-testfrom the Activity Highlights section per team feedback — label churn on those surfaces isn't meaningful "attention going somewhere new" signal. Filed Issues is unchanged (new issues with those labels are still legit new work).meta.jsonso the agent doesn't have to build GitHub search queries itself.Validation
gh aw compileclean (0 errors, 1 unrelated "fuzzy schedule" advisory we kept intentionally for 08:00 PST).actions-lock.jsonunchanged vs main.bash -non the extracted pre-agent-steps script passes.workflow_dispatchrun after merge to confirm the report reflects the expected items.FYI @davidfowl · reviewer @radical
Fixes # (issue)
Checklist
<remarks />and<code />elements on your triple slash comments?aspire.devissue: