Add agent-optimized structured output to audit command #4476

Copilot · 2025-11-21T12:57:46Z

Optimize Audit Output for Agentic Usage

This PR optimizes the gh aw audit command output to make it more suitable for AI agent analysis of workflow runs by adding structured insights, actionable recommendations, failure analysis, and performance metrics.

Changes Summary

✅ Completed all planned work:

Analyze current audit output structure and identify areas for improvement
Add structured sections for agent-friendly analysis
Enhance JSON output structure with new fields
Update console output with visual indicators and clear hierarchy
Add comprehensive test coverage (42+ tests, all passing)
Address code review feedback (algorithm optimization, formatting)

New Output Structures

1. Key Findings

Categorized insights with severity levels to help agents quickly identify important issues:

Categories:

error - Critical failures, multiple errors
cost - High or moderate cost issues
performance - Token usage, turn count, timeouts
tooling - MCP failures, missing tools
network - Firewall blocks, connectivity issues
success - Successful completion

Severity Levels:

🔴 critical - Immediate action required
🟠 high - Important issues to address
🟡 medium - Moderate concerns
ℹ️ low - Minor issues or information
✅ info - Success indicators

2. Recommendations

Prioritized actionable suggestions with examples:

Priority Levels:

high - Critical issues requiring immediate attention
medium - Important optimizations and fixes
low - Best practices and monitoring suggestions

Each recommendation includes:

Action to take
Reason for the action
Example implementation

3. Failure Analysis (for failed workflows)

Structured analysis including:

Primary failure type
List of failed jobs
Error summary
Identified root cause (timeout, permission, MCP failure, etc.)

4. Performance Metrics

Aggregated performance statistics:

Tokens per minute
Cost efficiency (excellent/good/moderate/poor)
Average tool duration
Most used tool
Network request count

Console Output Example

ℹ # Workflow Run Audit Report

ℹ ## Overview
  Run ID  : 987654
  Workflow: weekly-research
  Status  : completed (success)
  Duration: 14.5m

ℹ ## Key Findings
  🟡 Many Iterations [performance]
     Workflow took 12 turns to complete
     Impact: Many turns may indicate task complexity or unclear instructions

  ✅ ✓ Workflow Completed Successfully [success]
     Completed in 12 turns with no errors

ℹ ## Recommendations
  1. [MEDIUM] Clarify workflow instructions or break into smaller tasks
     Reason: Many iterations may indicate unclear objectives
     Example: Split complex workflows into discrete steps

ℹ ## Performance Metrics
  Tokens per Minute: 3103.4
  Cost Efficiency: ✓ good
  Most Used Tool: github_search_repositories (8 calls)

JSON Output Example

{
  "overview": {
    "run_id": 987654,
    "workflow_name": "weekly-research",
    "status": "completed",
    "conclusion": "success"
  },
  "key_findings": [
    {
      "category": "performance",
      "severity": "medium",
      "title": "Many Iterations",
      "description": "Workflow took 12 turns to complete",
      "impact": "Many turns may indicate task complexity"
    }
  ],
  "recommendations": [
    {
      "priority": "medium",
      "action": "Clarify workflow instructions",
      "reason": "Many iterations may indicate unclear objectives",
      "example": "Split complex workflows into discrete steps"
    }
  ],
  "performance_metrics": {
    "tokens_per_minute": 3103.4,
    "cost_efficiency": "good",
    "most_used_tool": "github_search_repositories (8 calls)"
  }
}

Test Coverage

New Test Files:

audit_agent_output_test.go (568 lines) - Unit tests for all new features
audit_agent_example_test.go (444 lines) - Example outputs for documentation

Test Coverage:

Key findings generation (4 scenarios)
Recommendations generation (4 scenarios)
Failure analysis generation (4 scenarios)
Performance metrics calculation (4 scenarios)
JSON structure validation
Success scenario example
Failure scenario example

All tests pass: 42+ tests in 42 seconds ✅

Benefits for AI Agents

Structured Insights: Categorized findings help agents quickly identify issue types
Prioritized Actions: Recommendations with priority levels guide decision-making
Root Cause Analysis: Automatic identification of common failure patterns
Performance Context: Metrics help assess workflow efficiency
Machine-Readable: JSON output perfect for programmatic consumption
Visual Hierarchy: Console output with emojis aids human review

Backward Compatibility

All existing fields remain unchanged
New fields are optional and only populated when relevant
Existing JSON consumers continue to work
Console output adds new sections without breaking existing ones

File Changes

pkg/cli/audit_report.go - Added 630 lines for new structures and generation logic
pkg/cli/audit_agent_output_test.go - 568 lines of unit tests
pkg/cli/audit_agent_example_test.go - 444 lines of example tests

Total: 1,642 lines added, 13 lines modified

Code Review Updates

Optimized most-used-tool algorithm to O(n) complexity
Fixed code formatting issues
All lint checks pass

Ready for Review

✅ All tests pass (42+ tests)
✅ Build successful
✅ Lint clean
✅ Comprehensive test coverage
✅ Example outputs documented
✅ Backward compatible
✅ Code review feedback addressed

Original prompt

This section details on the original issue you should resolve

<issue_title>Audit for agents</issue_title>
<issue_description>/q analyze the structure of the output generated by the "audit" tool and optimize it for agentic usage. Use "logs" find a couple run to analyze.

The goal is to generate an output that optimizes the ability of an AI agent to analyzis a workflow run.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Audit for agents #4473

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Add KeyFindings, Recommendations, FailureAnalysis, and PerformanceMetrics to AuditData - Generate structured insights for agent consumption - Enhance JSON output with actionable sections - Update console output to display new sections with colored severity indicators - All existing tests pass Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

- TestKeyFindingsGeneration - validates finding categorization and severity - TestRecommendationsGeneration - validates recommendation prioritization - TestFailureAnalysisGeneration - validates root cause identification - TestPerformanceMetricsGeneration - validates performance calculations - TestAuditDataJSONStructure - validates complete JSON structure - All tests pass (42s test suite runtime) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

- TestAgentFriendlyOutputExample - demonstrates successful workflow analysis - TestAgentFriendlyOutputFailureScenario - demonstrates failure analysis - Shows complete JSON and console output examples - Validates quality of findings, recommendations, and performance metrics - All tests pass with realistic workflow scenarios Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull Request Overview

This PR adds agent-optimized structured output to the audit command, introducing four new analytical sections (Key Findings, Recommendations, Failure Analysis, Performance Metrics) that help AI agents quickly understand workflow failures and performance issues. The changes maintain backward compatibility while significantly enhancing both console and JSON output formats.

Key Changes:

New structured analysis types with severity/priority levels and categorization
Intelligent finding generation with thresholds for cost, performance, and errors
Context-aware recommendations based on workflow state and findings
Enhanced rendering with visual indicators (emojis) and priority-based ordering

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
pkg/cli/audit_report.go	Core implementation adding 4 new struct types, 4 generation functions (generateFindings, generateRecommendations, generateFailureAnalysis, generatePerformanceMetrics), 4 rendering functions, and integration into buildAuditData/renderConsole
pkg/cli/audit_agent_output_test.go	Comprehensive unit tests covering finding generation, recommendation logic, failure analysis, performance metrics calculation, and JSON serialization
pkg/cli/audit_agent_example_test.go	Integration tests with realistic success and failure scenarios demonstrating the new output format for documentation purposes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T13:32:29Z

pkg/cli/audit_report.go

+	if len(toolUsage) > 0 {
+		mostUsed := toolUsage[0]
+		for _, tool := range toolUsage {
+			if tool.CallCount > mostUsed.CallCount {
+				mostUsed = tool
+			}
+		}
+		pm.MostUsedTool = fmt.Sprintf("%s (%d calls)", mostUsed.Name, mostUsed.CallCount)
+	}


The algorithm for finding the most used tool has O(n²) complexity. It starts with mostUsed = toolUsage[0], then iterates through all tools and compares each with mostUsed.CallCount. This means the first tool is also compared against itself unnecessarily.

Consider simplifying to:

// Find most used tool if len(toolUsage) > 0 { mostUsed := toolUsage[0] for i := 1; i < len(toolUsage); i++ { if toolUsage[i].CallCount > mostUsed.CallCount { mostUsed = toolUsage[i] } } pm.MostUsedTool = fmt.Sprintf("%s (%d calls)", mostUsed.Name, mostUsed.CallCount) }

This avoids comparing the first element with itself and makes the iteration pattern clearer.

Fixed in c872402. Changed to index-based loop starting at index 1 to avoid the unnecessary self-comparison and improve clarity.

pelikhan · 2025-11-21T13:32:54Z

@copilot lint

- Change from range loop to index loop to avoid comparing first element with itself - Start iteration at index 1 instead of 0 - Run make fmt to fix code formatting issues - All tests pass, lint clean Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2025-11-21T13:38:52Z

@copilot lint

Lint completed successfully in c872402. Ran make fmt to fix formatting issues and make lint passes cleanly now.

Initial plan

4e5f15f

Copilot AI assigned Copilot and pelikhan Nov 21, 2025

Copilot started work on behalf of pelikhan November 21, 2025 12:58 View session

Copilot AI and others added 3 commits November 21, 2025 13:09

Copilot AI changed the title ~~[WIP] Update audit tool output for agentic usage~~ Add agent-optimized structured output to audit command Nov 21, 2025

Copilot AI requested a review from pelikhan November 21, 2025 13:20

Copilot finished work on behalf of pelikhan November 21, 2025 13:20

pelikhan approved these changes Nov 21, 2025

View reviewed changes

pelikhan marked this pull request as ready for review November 21, 2025 13:29

Copilot AI review requested due to automatic review settings November 21, 2025 13:29

Copilot started reviewing on behalf of pelikhan November 21, 2025 13:29 View session

Copilot finished reviewing on behalf of pelikhan November 21, 2025 13:32

Copilot AI reviewed Nov 21, 2025

View reviewed changes

Copilot started work on behalf of pelikhan November 21, 2025 13:33 View session

Copilot AI requested a review from pelikhan November 21, 2025 13:39

Copilot finished work on behalf of pelikhan November 21, 2025 13:39

pelikhan merged commit 5c0f054 into main Nov 21, 2025
54 of 55 checks passed

pelikhan deleted the copilot/optimize-audit-output-structure branch November 21, 2025 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add agent-optimized structured output to audit command #4476

Add agent-optimized structured output to audit command #4476

Uh oh!

Copilot AI commented Nov 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

pelikhan commented Nov 21, 2025

Uh oh!

Copilot AI commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add agent-optimized structured output to audit command #4476

Add agent-optimized structured output to audit command #4476

Uh oh!

Conversation

Copilot AI commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimize Audit Output for Agentic Usage

Changes Summary

New Output Structures

1. Key Findings

2. Recommendations

3. Failure Analysis (for failed workflows)

4. Performance Metrics

Console Output Example

JSON Output Example

Test Coverage

Benefits for AI Agents

Backward Compatibility

File Changes

Code Review Updates

Ready for Review

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Nov 21, 2025

Uh oh!

Copilot AI commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 21, 2025 •

edited

Loading