-
Notifications
You must be signed in to change notification settings - Fork 28
Add agent-optimized structured output to audit command #4476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add KeyFindings, Recommendations, FailureAnalysis, and PerformanceMetrics to AuditData - Generate structured insights for agent consumption - Enhance JSON output with actionable sections - Update console output to display new sections with colored severity indicators - All existing tests pass Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- TestKeyFindingsGeneration - validates finding categorization and severity - TestRecommendationsGeneration - validates recommendation prioritization - TestFailureAnalysisGeneration - validates root cause identification - TestPerformanceMetricsGeneration - validates performance calculations - TestAuditDataJSONStructure - validates complete JSON structure - All tests pass (42s test suite runtime) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- TestAgentFriendlyOutputExample - demonstrates successful workflow analysis - TestAgentFriendlyOutputFailureScenario - demonstrates failure analysis - Shows complete JSON and console output examples - Validates quality of findings, recommendations, and performance metrics - All tests pass with realistic workflow scenarios Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds agent-optimized structured output to the audit command, introducing four new analytical sections (Key Findings, Recommendations, Failure Analysis, Performance Metrics) that help AI agents quickly understand workflow failures and performance issues. The changes maintain backward compatibility while significantly enhancing both console and JSON output formats.
Key Changes:
- New structured analysis types with severity/priority levels and categorization
- Intelligent finding generation with thresholds for cost, performance, and errors
- Context-aware recommendations based on workflow state and findings
- Enhanced rendering with visual indicators (emojis) and priority-based ordering
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/cli/audit_report.go | Core implementation adding 4 new struct types, 4 generation functions (generateFindings, generateRecommendations, generateFailureAnalysis, generatePerformanceMetrics), 4 rendering functions, and integration into buildAuditData/renderConsole |
| pkg/cli/audit_agent_output_test.go | Comprehensive unit tests covering finding generation, recommendation logic, failure analysis, performance metrics calculation, and JSON serialization |
| pkg/cli/audit_agent_example_test.go | Integration tests with realistic success and failure scenarios demonstrating the new output format for documentation purposes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if len(toolUsage) > 0 { | ||
| mostUsed := toolUsage[0] | ||
| for _, tool := range toolUsage { | ||
| if tool.CallCount > mostUsed.CallCount { | ||
| mostUsed = tool | ||
| } | ||
| } | ||
| pm.MostUsedTool = fmt.Sprintf("%s (%d calls)", mostUsed.Name, mostUsed.CallCount) | ||
| } |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm for finding the most used tool has O(n²) complexity. It starts with mostUsed = toolUsage[0], then iterates through all tools and compares each with mostUsed.CallCount. This means the first tool is also compared against itself unnecessarily.
Consider simplifying to:
// Find most used tool
if len(toolUsage) > 0 {
mostUsed := toolUsage[0]
for i := 1; i < len(toolUsage); i++ {
if toolUsage[i].CallCount > mostUsed.CallCount {
mostUsed = toolUsage[i]
}
}
pm.MostUsedTool = fmt.Sprintf("%s (%d calls)", mostUsed.Name, mostUsed.CallCount)
}This avoids comparing the first element with itself and makes the iteration pattern clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c872402. Changed to index-based loop starting at index 1 to avoid the unnecessary self-comparison and improve clarity.
|
@copilot lint |
- Change from range loop to index loop to avoid comparing first element with itself - Start iteration at index 1 instead of 0 - Run make fmt to fix code formatting issues - All tests pass, lint clean Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Optimize Audit Output for Agentic Usage
This PR optimizes the
gh aw auditcommand output to make it more suitable for AI agent analysis of workflow runs by adding structured insights, actionable recommendations, failure analysis, and performance metrics.Changes Summary
✅ Completed all planned work:
New Output Structures
1. Key Findings
Categorized insights with severity levels to help agents quickly identify important issues:
Categories:
error- Critical failures, multiple errorscost- High or moderate cost issuesperformance- Token usage, turn count, timeoutstooling- MCP failures, missing toolsnetwork- Firewall blocks, connectivity issuessuccess- Successful completionSeverity Levels:
critical- Immediate action requiredhigh- Important issues to addressmedium- Moderate concernslow- Minor issues or informationinfo- Success indicators2. Recommendations
Prioritized actionable suggestions with examples:
Priority Levels:
high- Critical issues requiring immediate attentionmedium- Important optimizations and fixeslow- Best practices and monitoring suggestionsEach recommendation includes:
3. Failure Analysis (for failed workflows)
Structured analysis including:
4. Performance Metrics
Aggregated performance statistics:
Console Output Example
JSON Output Example
{ "overview": { "run_id": 987654, "workflow_name": "weekly-research", "status": "completed", "conclusion": "success" }, "key_findings": [ { "category": "performance", "severity": "medium", "title": "Many Iterations", "description": "Workflow took 12 turns to complete", "impact": "Many turns may indicate task complexity" } ], "recommendations": [ { "priority": "medium", "action": "Clarify workflow instructions", "reason": "Many iterations may indicate unclear objectives", "example": "Split complex workflows into discrete steps" } ], "performance_metrics": { "tokens_per_minute": 3103.4, "cost_efficiency": "good", "most_used_tool": "github_search_repositories (8 calls)" } }Test Coverage
New Test Files:
audit_agent_output_test.go(568 lines) - Unit tests for all new featuresaudit_agent_example_test.go(444 lines) - Example outputs for documentationTest Coverage:
All tests pass: 42+ tests in 42 seconds ✅
Benefits for AI Agents
Backward Compatibility
File Changes
pkg/cli/audit_report.go- Added 630 lines for new structures and generation logicpkg/cli/audit_agent_output_test.go- 568 lines of unit testspkg/cli/audit_agent_example_test.go- 444 lines of example testsTotal: 1,642 lines added, 13 lines modified
Code Review Updates
Ready for Review
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.