-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Executive Summary
Semantic analysis of 240+ non-test Go files in the pkg/ directory reveals significant opportunities for code consolidation and improved organization, particularly in the safe-outputs system. The analysis identified 29 scattered configuration parsing functions, 85-90% duplication in job builders, and a 1,360-line monolithic file that should be split.
Key Findings:
- 500+ lines of duplicated configuration parsing logic scattered across 29 files
- 1,000+ lines of boilerplate in job building functions (85-90% similarity)
- 200+ lines of duplicated log parsing logic across 4 engine implementations
- Monolithic
safe_outputs.go(1,360 lines) mixing 4 distinct concerns
Total Analysis Scope:
- 240 Go source files analyzed (excluding tests)
- 1,800+ functions cataloged
- 6 major refactoring clusters identified
- 3 critical priority issues requiring immediate attention
Detailed Findings and Recommendations
Critical Priority Issues
1. Scattered Configuration Parsing Functions (29 files, 500+ lines of duplication)
Problem: Each safe-output type has its own parseXXXConfig() function scattered across individual files, creating maintenance overhead and inconsistent patterns.
Affected Files (29):
pkg/workflow/create_issue.go:parseIssuesConfig()
pkg/workflow/create_discussion.go:parseDiscussionsConfig()
pkg/workflow/add_comment.go:parseCommentsConfig()
pkg/workflow/close_issue.go:parseCloseIssuesConfig()
pkg/workflow/close_pull_request.go:parseClosePullRequestsConfig()
pkg/workflow/close_discussion.go:parseCloseDiscussionsConfig()
pkg/workflow/create_pull_request.go:parsePullRequestsConfig()
pkg/workflow/create_pr_review_comment.go:parsePullRequestReviewCommentsConfig()
pkg/workflow/create_code_scanning_alert.go:parseCodeScanningAlertsConfig()
pkg/workflow/add_reviewer.go:parseAddReviewerConfig()
pkg/workflow/add_labels.go:parseLabelsConfig()
pkg/workflow/publish_assets.go:parseUploadAssetConfig()
pkg/workflow/update_issue.go:parseUpdateIssuesConfig()
pkg/workflow/update_project.go:parseUpdateProjectConfig()
pkg/workflow/update_release.go:parseUpdateReleaseConfig()
pkg/workflow/assign_milestone.go:[inline parsing]
pkg/workflow/assign_to_agent.go:parseAgentTaskConfig()
pkg/workflow/create_agent_task.go:parseAgentTaskConfig()
... (17 more files)
Pattern Similarity: 70-75% identical structure
Each function follows this pattern:
func (c *Compiler) parseXXXConfig(outputMap map[string]any) *XXXConfig {
if configData, exists := outputMap["xxx"]; exists {
config := &XXXConfig{}
if configMap, ok := configData.(map[string]any); ok {
// Parse specific fields using shared helpers
config.Field1 = parseField1FromConfig(configMap)
config.Field2 = parseField2FromConfig(configMap)
// Parse common base fields
c.parseBaseSafeOutputConfig(configMap, &config.BaseSafeOutputConfig, defaultMax)
}
return config
}
return nil
}Recommendation:
- Create
safe_outputs_config_parsing.goto consolidate allparseXXXConfig()functions - Move all 29+ parsing functions to this single file
- Group related parsers together (creation, closure, updates, etc.)
- Consider creating a generic configuration parser for common patterns
Estimated Impact: Reduce ~500 lines of scattered code, improve discoverability, enable consistent error handling
2. Monolithic safe_outputs.go (1,360 lines, 4 concerns)
Problem: The safe_outputs.go file has grown to 1,360 lines and mixes multiple concerns, exceeding the recommended 1,000-line limit.
**(redacted) pkg/workflow/safe_outputs.go
Current Structure:
Lines 1-64: Package setup and types
Lines 64-350: Configuration extraction orchestration (parseXXXConfig calls)
Lines 350-600: Job building infrastructure
Lines 600-800: YAML generation helpers
Lines 800+: Environment variable construction
Recommendation: Split into 4 focused files
-
safe_outputs.go(keep core, ~400 lines)- Main types and interfaces
- Public API functions
- Orchestration logic
-
safe_outputs_extraction.go(~300 lines)- Configuration extraction from workflow data
extractSafeOutputs()and related functions- Coordinate calls to parsers in
safe_outputs_config_parsing.go
-
safe_outputs_job_builder.go(~350 lines)- Job building infrastructure
- Common job patterns
- Shared job construction utilities
-
safe_outputs_yaml.go(~310 lines)- YAML generation helpers
- String formatting for GitHub Actions YAML
- Template rendering
Estimated Impact: Improve file navigability, clearer separation of concerns, easier testing
3. Duplicated Numeric Type Conversion (5+ locations)
Problem: The parseIntValue() function exists in map_helpers.go but is duplicated inline in multiple locations.
Example from safe_outputs.go (lines 170-188):
// Parse max (optional)
if maxCount, exists := labelsMap["max"]; exists {
var maxCountInt int
var validMaxCount bool
switch v := maxCount.(type) {
case int:
maxCountInt = v
validMaxCount = true
case int64:
maxCountInt = int(v)
validMaxCount = true
case uint64:
maxCountInt = int(v)
validMaxCount = true
case float64:
maxCountInt = int(v)
validMaxCount = true
}
if validMaxCount {
labelConfig.Max = maxCountInt
}
}But map_helpers.go already has:
// parseIntValue safely parses various numeric types to int
func parseIntValue(value any) (int, bool) {
switch v := value.(type) {
case int:
return v, true
case int64:
return int(v), true
case uint64:
return int(v), true
case float64:
intVal := int(v)
// Warn if truncation occurs
if v != float64(intVal) {
mapHelpersLog.Printf("Float value %.2f truncated to integer %d", v, intVal)
}
return intVal, true
default:
return 0, false
}
}Recommendation:
- Export
parseIntValue()frommap_helpers.go(already exported) - Replace all inline numeric parsing with calls to
parseIntValue() - Benefits: Consistent truncation warnings, centralized numeric handling
Locations with duplication:
safe_outputs.go(lines 170-188)- Similar patterns in 4+ other safe-output config files
Estimated Impact: Remove ~50 lines of duplication, consistent numeric handling
Major Priority Issues
4. Job Builder Function Duplication (29 files, 1,000+ lines)
Problem: All safe-output job builders follow nearly identical patterns with 85-90% code similarity.
Pattern Analysis:
Every buildXXXJob() function:
- Checks if config exists → identical
- Builds environment variables (standard + custom) → 85% identical
- Creates pre-steps (debug output) → 95% identical
- Creates main script step → varies by action
- Creates post-steps (if applicable) → varies by action
- Returns Job with steps → identical
Example from create_issue.go:
func (c *Compiler) buildCreateOutputIssueJob(data *WorkflowData, mainJobName string) (*Job, error) {
if data.SafeOutputs == nil || data.SafeOutputs.CreateIssues == nil {
return nil, fmt.Errorf("safe-outputs.create-issue configuration is required")
}
// Build custom environment variables specific to create-issue
var customEnvVars []string
customEnvVars = append(customEnvVars, buildTitlePrefixEnvVar(...))
customEnvVars = append(customEnvVars, buildLabelsEnvVar(...))
customEnvVars = append(customEnvVars, c.buildStandardSafeOutputEnvVars(...))
// Build post-steps if configured
var postSteps []string
if len(data.SafeOutputs.CreateIssues.Assignees) > 0 {
postSteps = buildCopilotParticipantSteps(...)
}
// Create outputs
outputs := map[string]string{
"issue_number": "${{ steps.create_issue.outputs.issue_number }}",
"issue_url": "${{ steps.create_issue.outputs.issue_url }}",
}
// Build job condition
jobCondition := BuildSafeOutputType("create_issue")
// Return job
return c.buildSafeOutputJob(...)
}Recommendation:
Create a SafeOutputJobBuilder abstraction:
type SafeOutputJobBuilder struct {
ActionType string
ConfigExists func(*WorkflowData) bool
CustomEnvVars func(*WorkflowData) []string
PreSteps func(*WorkflowData) []string
MainScript string
PostSteps func(*WorkflowData) []string
Outputs map[string]string
JobCondition func(*WorkflowData) string
}
func (b *SafeOutputJobBuilder) Build(data *WorkflowData, mainJobName string) (*Job, error) {
// Common job building logic
}Alternative approach: Create a job builder registry with templates.
Estimated Impact: Remove 1,000+ lines of boilerplate, enable consistent job patterns
5. Engine Log Parsing Duplication (4 files, 200+ lines)
Problem: Each AI engine implementation has its own ParseLogMetrics() function with 60-75% code duplication.
Affected Files:
pkg/workflow/claude_logs.go:ParseLogMetrics()
pkg/workflow/copilot_engine.go:ParseLogMetrics()
pkg/workflow/codex_engine.go:ParseLogMetrics()
pkg/workflow/custom_engine.go:ParseLogMetrics()
Common Logic (60-75% similar):
- Tool call extraction and counting
- Token counting (input/output)
- Cost calculation
- Sequence tracking
- Error parsing
Recommendation:
-
Create
log_parsing_helpers.gowith shared utilities:ExtractToolCalls(logs []string) []ToolCallCountTokens(logs []string) (input, output int)CalculateCost(inputTokens, outputTokens int, pricing PricingModel) float64ExtractErrorMessages(logs []string) []string
-
Keep engine-specific parsing in each engine file, but use shared utilities
Estimated Impact: Remove ~200 lines of duplication, consistent metric tracking
Minor Priority Issues
6. String Normalization Functions (Multiple Files)
Status: Mostly well-organized, minor opportunity for consolidation
Current Distribution:
strings.go- Primary location forSanitizeName(),SanitizeWorkflowName(),SortStrings()workflow_name.go-SanitizeIdentifier()resolve.go-normalizeWorkflowName()(private)safe_outputs.go-normalizeSafeOutputIdentifier()(private, embedded)
Recommendation (Optional):
Consider consolidating all normalize* and sanitize* functions into strings.go for discoverability. Current organization is acceptable but could be improved for consistency.
Estimated Impact: Documentation improvement, better discoverability
Function Clustering Analysis
Cluster 1: Safe-Output Creation Functions
Files: 6 create_*.go files
create_issue.gocreate_discussion.gocreate_pull_request.gocreate_pr_review_comment.gocreate_code_scanning_alert.gocreate_agent_task.go
Organization Status:
**Functions per (redacted)
parseXXXConfig()- Configuration parsingbuildCreateOutputXXXJob()- Job building- Config struct definition
Recommendation: Consider grouping into safe_outputs_create/ subdirectory if consolidation effort is undertaken.
Cluster 2: Safe-Output Update Functions
Files: 4 update_*.go files
update_issue.goupdate_project.goupdate_release.gopush_to_pull_request_branch.go(special case)
Organization Status: ✅ Well-organized by action type
Cluster 3: Safe-Output Closure Functions
Files: 3 close_*.go files
close_issue.goclose_pull_request.goclose_discussion.go
Organization Status: ✅ Well-organized by action type
Cluster 4: Validation Functions
Files: Multiple validation-focused files
permissions_validator.goaction_sha_checker.gomcp_config_validation.gostep_order_validation.gogithub_toolset_validation_error.go
Organization Status: ✅ Well-organized by validation concern
Recommendation: Consider creating a common ValidationResult interface for consistency across validators.
Implementation Roadmap
Phase 1: High-Impact Consolidation (Priority 1)
Tasks:
-
✅ Create
safe_outputs_config_parsing.go- Move all 29+
parseXXXConfig()functions - Group by action type (creation, closure, update, etc.)
- Estimated effort: 4-6 hours
- Benefits: 500+ lines consolidated, improved discoverability
- Move all 29+
-
✅ Split
safe_outputs.gointo 4 filessafe_outputs.go(core, ~400 lines)safe_outputs_extraction.go(~300 lines)safe_outputs_job_builder.go(~350 lines)safe_outputs_yaml.go(~310 lines)- Estimated effort: 6-8 hours
- Benefits: Clearer organization, easier navigation
-
✅ Consolidate numeric type conversion
- Use
parseIntValue()frommap_helpers.goeverywhere - Remove inline duplication
- Estimated effort: 2-3 hours
- Benefits: Consistent truncation warnings, 50+ lines removed
- Use
Total Phase 1 Effort: 12-17 hours
Total Phase 1 Impact: ~600 lines consolidated, improved maintainability
Phase 2: Job Builder Abstraction (Priority 2)
Tasks:
-
✅ Design SafeOutputJobBuilder abstraction
- Create builder interface or registry pattern
- Implement template for common job structure
- Estimated effort: 8-10 hours
-
✅ Migrate existing builders
- Convert 29 job builders to use new abstraction
- Estimated effort: 12-15 hours
Total Phase 2 Effort: 20-25 hours
Total Phase 2 Impact: ~1,000 lines of boilerplate removed
Phase 3: Engine Log Parsing (Priority 3)
Tasks:
-
✅ Create
log_parsing_helpers.go- Extract shared utilities
- Estimated effort: 4-6 hours
-
✅ Update engine implementations
- Refactor to use shared helpers
- Estimated effort: 4-6 hours
Total Phase 3 Effort: 8-12 hours
Total Phase 3 Impact: ~200 lines of duplication removed
Testing Strategy
For each refactoring phase:
- ✅ Ensure existing tests pass before changes
- ✅ Run
make test-unitafter each file move - ✅ Verify
make lintpasses - ✅ Check
make buildsucceeds - ✅ No changes to public APIs (internal refactoring only)
Risk Mitigation:
- These are internal refactorings with no public API changes
- Existing test coverage should catch regressions
- Changes are primarily organizational (moving code, not rewriting)
Summary of Findings
| Priority | Issue | Files Affected | Lines Impact | Effort |
|---|---|---|---|---|
| 1 | Scattered config parsing | 29 files | 500+ lines | 4-6 hours |
| 1 | Monolithic safe_outputs.go | 1 file | Split to 4 files | 6-8 hours |
| 1 | Numeric conversion duplication | 5+ files | 50+ lines | 2-3 hours |
| 2 | Job builder duplication | 29 files | 1,000+ lines | 20-25 hours |
| 3 | Engine log parsing | 4 files | 200+ lines | 8-12 hours |
| 4 | String normalization (minor) | 4 files | Documentation | 2-3 hours |
Total Impact: ~1,750+ lines of duplicated code identified
Total Estimated Effort: 42-57 hours for complete refactoring
Recommended Approach: Implement Phase 1 first (highest impact, lowest risk)
Analysis Metadata
- Repository: githubnext/gh-aw
- Analysis Date: 2025-11-27
- Files Analyzed: 240 non-test Go files
- Functions Cataloged: 1,800+
- Primary Focus:
pkg/workflow/(207 files) andpkg/cli/(74 files) - Detection Method: Semantic clustering + naming pattern analysis + code similarity
- Tool Used: Claude Code Agent with Explore agent for thorough codebase analysis
Next Steps
- Review and prioritize - Determine which refactorings to pursue
- Start with Phase 1 - High-impact, low-risk consolidation (12-17 hours)
- Create implementation issues - Break down into smaller, trackable tasks
- Maintain backward compatibility - All changes are internal refactorings
Labels
refactoring- Code organization improvementcode-health- Technical debt reductionautomated-analysis- Generated by semantic clustering analysis
AI generated by Semantic Function Refactoring