feat: audit command accepts multiple run IDs for automatic diff mode#28483
feat: audit command accepts multiple run IDs for automatic diff mode#28483
Conversation
- Update NewAuditCommand to accept MinimumNArgs(1); when 2+ args are provided the first is used as the base run and the rest as comparison runs, delegating to RunAuditDiff (same as 'audit diff' subcommand) - Add runAuditMulti helper that parses each arg via ParseRunURLExtended, validates for self-comparison and duplicates, then calls RunAuditDiff - Update registerAuditTool MCP wrapper to accept run_ids_or_urls []string (new preferred field) while keeping run_id_or_url string for backward compat; builds multi-arg audit command when multiple items are supplied - Update TestAuditToolErrorEnvelopeSetsIsErrorFalse to check run_ids_or_urls - Add TestAuditTool_MultiRunDiffMode and TestAuditTool_ReturnsErrorWhenNoRunIDProvided Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d520a589-9e48-4171-926c-40d7175d7392 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Simplify RunIDsOrURLs jsonschema annotation (avoid duplicating tool desc) - Improve error message when no run ID provided to be more descriptive - Rename TestAuditTool_ReturnsErrorWhenNoRunIDProvided to FailsWhen... - Add TestRunAuditMulti_Validation covering self-comparison, duplicates, and invalid run ID error cases Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d520a589-9e48-4171-926c-40d7175d7392 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot deprecated audit diff in favor of audit only (hide command), update docs |
- Mark NewAuditDiffSubcommand as Hidden:true (deprecated in favor of passing multiple run IDs directly to 'audit') - Remove "(same as 'audit diff')" from audit command long description - Update docs/reference/audit.md: collapse single-run and diff sections into one unified 'audit' section, drop standalone 'audit diff' section - Update docs/guides/audit-with-agents.md: replace 'audit diff <id1> <id2>' references with 'audit <id1> <id2>' Agent-Logs-Url: https://github.com/github/gh-aw/sessions/2a22e031-c8d1-45e7-a844-87e4695937d5 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 161440b:
|
|
@copilot update debug.md and other instruction files to teach the agent about multi run id audit |
…un audit Update all agent instruction files and documentation to replace `audit diff` references with the new `audit <base> <compare...>` syntax: - debug.md: add multi-run diff examples to Quick Reference and Key Commands - .github/aw/debug-agentic-workflow.md: update Key Commands, add multi-run note in "Audit the Run" section, clarify agentic-workflows tool usage - scratchpad/dev.md: update Command Categories and CLI quick-reference table - docs/troubleshooting/debugging.md: replace audit diff with audit multi-run - docs/setup/cli.md: rename 'audit diff' section to 'Multi-run diff mode', update examples to use new syntax - docs/guides/maintaining-repos.md: replace audit diff invocations - docs/patterns/monitoring.md: replace audit diff, add multi-run example - docs/reference/network.md: replace audit diff - docs/reference/mcp-gateway.md: replace audit diff reference - docs/reference/glossary.md: rename 'Audit Diff (gh aw audit diff)' entry and update Firewall Analysis entry Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e375d9d1-739e-4744-ab6a-5d9dd4146e28 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in b75db83. Updated the following instruction and documentation files to teach agents about multi-run Agent instruction files:
Docs:
|
There was a problem hiding this comment.
Pull request overview
Updates the gh aw audit UX so users can run multi-run diffs directly via audit <base> <compare...>, with corresponding MCP tool/schema updates and documentation/instruction refreshes; keeps audit diff hidden for backward compatibility.
Changes:
- CLI:
auditnow accepts 1+ run IDs/URLs and routes 2+ args into diff mode;audit diffis hidden/deprecated. - MCP:
audittool addsrun_ids_or_urls(preferred) while keepingrun_id_or_urlas a deprecated fallback; error envelopes now echorun_ids_or_urls. - Docs/instructions: replace
audit diffusage withaudit <id1> <id2>across guides and references.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/audit.go | Accept multiple args for audit and add runAuditMulti diff-mode entrypoint. |
| pkg/cli/audit_diff_command.go | Hide/deprecate audit diff while retaining implementation for compatibility. |
| pkg/cli/mcp_tools_privileged.go | Extend MCP audit tool args to support multi-run via run_ids_or_urls and update error envelope. |
| pkg/cli/mcp_tools_privileged_test.go | Update envelope assertions and add MCP multi-run argument passthrough test. |
| pkg/cli/audit_test.go | Add validation tests for runAuditMulti argument handling. |
| docs/src/content/docs/reference/audit.md | Unify docs under audit syntax and document multi-run diff mode. |
| docs/src/content/docs/setup/cli.md | Update CLI setup docs to teach multi-run diffs via audit. |
| docs/src/content/docs/troubleshooting/debugging.md | Update troubleshooting guidance to use audit <id1> <id2>. |
| scratchpad/dev.md | Update scratchpad command tables/examples to new audit diff syntax. |
| debug.md / .github/aw/debug-agentic-workflow.md | Update quick references and agent debugging instructions to new syntax. |
| docs/src/content/docs/reference/network.md | Update network docs to use multi-run audit diff mode. |
| docs/src/content/docs/reference/mcp-gateway.md | Update MCP gateway guidance to new diff invocation. |
| docs/src/content/docs/reference/glossary.md | Reframe “Audit Diff” as multi-run mode under audit. |
| docs/src/content/docs/patterns/monitoring.md | Update monitoring examples to audit <base> <compare...>. |
| docs/src/content/docs/guides/maintaining-repos.md | Replace audit diff examples with multi-arg audit. |
| docs/src/content/docs/guides/audit-with-agents.md | Update agent guidance to use MCP audit tool for diffs. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
pkg/cli/audit.go:174
- Multi-run diff mode hard-codes the output format to "pretty" and the
auditcommand does not define/plumb a--formatflag, sogh aw audit <base> <compare...> --format markdown(documented elsewhere in this PR) will fail and users can’t get Markdown diffs via the new syntax. Consider adding a--formatflag toaudit(multi-run only) and passing its value through torunAuditMulti/RunAuditDiff(reusing the same validation/allowed values as the hiddenaudit diffsubcommand).
}
return RunAuditDiff(ctx, baseComponents.Number, compareRunIDs, owner, repo, hostname, outputDir, verbose, jsonOutput, "pretty", artifacts)
}
pkg/cli/audit.go:162
- In multi-run diff mode, comparison args parsed from URLs may include a different host/owner/repo than the base (or
--repo), butrunAuditMultiignores those fields and always diffs using the base repo context. This can lead to confusing 404s or (worse) comparing a run ID that exists in a different repo. Consider validating that any comparison URL with an explicit host/owner/repo matches the resolved base/--repovalues (and similarly reject job/step URLs here if they’re intended to be single-run only).
for _, arg := range args[1:] {
c, err := parser.ParseRunURLExtended(arg)
if err != nil {
return fmt.Errorf("invalid comparison run %q: %w", arg, err)
}
- Files reviewed: 17/17 changed files
- Comments generated: 2
| // Parse base run | ||
| baseComponents, err := parser.ParseRunURLExtended(args[0]) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid base run %q: %w", args[0], err) | ||
| } |
There was a problem hiding this comment.
audit help text says job/step URLs are supported in single-run mode only, but runAuditMulti currently accepts a base argument that may include /job/<id> and/or #step: and silently ignores that extra specificity. It would be clearer to reject base job/step URLs in multi-run mode with a descriptive error (e.g., require run IDs or run URLs only).
This issue also appears in the following locations of the same file:
- line 158
- line 171
| | `--parse` | off | Run JavaScript parsers on agent and firewall logs, writing `log.md` and `firewall.md` (single-run only) | | ||
| | `--repo <owner/repo>` | auto | Specify repository when the run ID is not from a URL | | ||
| | `--verbose` | off | Print detailed progress information | | ||
| | `--format <fmt>` | `pretty` | Diff output format: `pretty` or `markdown` (multi-run only) | |
There was a problem hiding this comment.
gh aw audit is documented here with a --format flag and --format markdown examples for multi-run diff mode, but the audit Cobra command in this PR doesn’t currently define/support --format (only the hidden audit diff subcommand does). Either add --format support to audit multi-run mode or adjust these docs/examples to match the actual CLI behavior.
| | `--format <fmt>` | `pretty` | Diff output format: `pretty` or `markdown` (multi-run only) | |
🧪 Test Quality Sentinel ReportTest Quality Score: 100/100✅ Excellent test quality
Test Classification DetailsView All Test Classifications
Scoring Breakdown
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §24936666712
|
… main command Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Commit pushed:
|
🏗️ Design Decision Gate — ADR RequiredThis PR makes significant changes to core business logic ( AI has analyzed the PR diff and generated a draft ADR to help you get started: 📄 Draft ADR: What to do next
Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision. Why ADRs Matter
ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you. 📋 Michael Nygard ADR Format ReferenceAn ADR must contain these four sections to be considered complete:
All ADRs are stored in
References:
|
|
@copilot review all comments |
…o audit - runAuditMulti now rejects base and comparison arguments that contain job/step URL specificity (JobID != 0 or StepNumber != 0) with a descriptive error; previously these were silently accepted and ignored - Add --format flag (pretty|markdown, default: pretty) to the audit command for multi-run diff mode; previously runAuditMulti hardcoded "pretty" and the docs referenced a flag that didn't exist - Update audit_test.go: fix runAuditMulti call signature and add two new test cases for job URL rejection Agent-Logs-Url: https://github.com/github/gh-aw/sessions/dc34ad55-b9ce-492b-a3ef-52e7c12f274c Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed both review comments in a1e9963:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot in diff mode do not reject job/steps URLs, instead normalize to diffing runs |
… of rejecting Remove the job/step URL rejection added in the previous commit. Job URLs and step-anchored URLs are now silently normalized to their parent run ID when used as base or comparison arguments in diff mode. This is consistent with how ParseRunURLExtended already extracts the run Number from all URL forms. - Update test cases: replace "rejected" cases with "normalized" cases that prove self-comparison and duplicate detection still fire correctly after normalization - Update docs/reference/audit.md to document the normalization behavior Agent-Logs-Url: https://github.com/github/gh-aw/sessions/066807a7-7dfd-4fba-8f75-4af6d1b2e3f1 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 8f1adc4. Removed the rejection — job/step URLs are now silently normalized to their parent run ID in diff mode. Updated tests to confirm self-comparison and duplicate detection still work correctly after normalization (e.g. a job URL for run |
The
auditcommand only accepted a single run ID/URL, requiring users to explicitly invokeaudit difffor comparisons. This change makes multi-run diff mode accessible directly fromaudit, surfaces the same capability in the MCP tool wrapper, deprecates theaudit diffsubcommand in its favor, and updates all agent instruction files and documentation to teach the new syntax.CLI
auditnow accepts one or more run IDs/URLs (MinimumNArgs(1)). When 2+ args are given, the first is the base and the rest are comparisons:Validation (self-comparison, duplicates, invalid IDs) mirrors the
audit diffsubcommand. The--parseflag is silently a no-op in multi-run mode (single-run only).Job URLs and step-anchored URLs (e.g.
.../job/123or#step:7:1) are accepted as base or comparison arguments in diff mode — the job/step specificity is silently normalized to the parent run ID, so it is always a run-level diff.The
audit diffsubcommand is now hidden (Hidden: true) and kept only for backward compatibility. Users should useaudit <base> <compare...>going forward.MCP tool (
audit)run_ids_or_urls []stringas the new preferred field. Single item → audit report; 2+ items → diff mode.run_id_or_url(string) kept as a deprecated backward-compat fallback.run_ids_or_urlsinstead ofrun_id_or_url.{ "run_ids_or_urls": ["1234567890", "1234567891"] }New flags
--format(pretty|markdown, default: pretty) added to theauditcommand for multi-run diff mode, matching the behavior already present on the hiddenaudit diffsubcommand.Docs & Instruction Files
docs/reference/audit.md: unified into a singleauditsection covering both single-run and multi-run diff usage; removed the standaloneaudit diffsection; documented job/step URL normalization behavior in diff mode.docs/guides/audit-with-agents.md: updated table and examples to use the newaudit <id1> <id2>syntax.debug.md: added multi-run diff examples to Quick Reference and Key Debugging Commands..github/aw/debug-agentic-workflow.md: updated Key Commands, added "Comparing two runs" guidance in the Audit step, updatedagentic-workflowsMCP tool hint to mentionrun_ids_or_urls.scratchpad/dev.md: updated Command Categories and CLI reference table.docs/setup/cli.md: renamedaudit diffsection to "Multi-run diff mode", updated all examples.docs/troubleshooting/debugging.md,docs/guides/maintaining-repos.md,docs/patterns/monitoring.md,docs/reference/network.md,docs/reference/mcp-gateway.md,docs/reference/glossary.md: replacedaudit diff <id1> <id2>withaudit <id1> <id2>throughout.