Skip to content

[refactor] Semantic Function Clustering Analysis - Safe-Output Configuration Consolidation Opportunities #4917

@github-actions

Description

@github-actions

Executive Summary

Semantic analysis of 240+ non-test Go files in the pkg/ directory reveals significant opportunities for code consolidation and improved organization, particularly in the safe-outputs system. The analysis identified 29 scattered configuration parsing functions, 85-90% duplication in job builders, and a 1,360-line monolithic file that should be split.

Key Findings:

  • 500+ lines of duplicated configuration parsing logic scattered across 29 files
  • 1,000+ lines of boilerplate in job building functions (85-90% similarity)
  • 200+ lines of duplicated log parsing logic across 4 engine implementations
  • Monolithic safe_outputs.go (1,360 lines) mixing 4 distinct concerns

Total Analysis Scope:

  • 240 Go source files analyzed (excluding tests)
  • 1,800+ functions cataloged
  • 6 major refactoring clusters identified
  • 3 critical priority issues requiring immediate attention
Detailed Findings and Recommendations

Critical Priority Issues

1. Scattered Configuration Parsing Functions (29 files, 500+ lines of duplication)

Problem: Each safe-output type has its own parseXXXConfig() function scattered across individual files, creating maintenance overhead and inconsistent patterns.

Affected Files (29):

pkg/workflow/create_issue.go:parseIssuesConfig()
pkg/workflow/create_discussion.go:parseDiscussionsConfig()
pkg/workflow/add_comment.go:parseCommentsConfig()
pkg/workflow/close_issue.go:parseCloseIssuesConfig()
pkg/workflow/close_pull_request.go:parseClosePullRequestsConfig()
pkg/workflow/close_discussion.go:parseCloseDiscussionsConfig()
pkg/workflow/create_pull_request.go:parsePullRequestsConfig()
pkg/workflow/create_pr_review_comment.go:parsePullRequestReviewCommentsConfig()
pkg/workflow/create_code_scanning_alert.go:parseCodeScanningAlertsConfig()
pkg/workflow/add_reviewer.go:parseAddReviewerConfig()
pkg/workflow/add_labels.go:parseLabelsConfig()
pkg/workflow/publish_assets.go:parseUploadAssetConfig()
pkg/workflow/update_issue.go:parseUpdateIssuesConfig()
pkg/workflow/update_project.go:parseUpdateProjectConfig()
pkg/workflow/update_release.go:parseUpdateReleaseConfig()
pkg/workflow/assign_milestone.go:[inline parsing]
pkg/workflow/assign_to_agent.go:parseAgentTaskConfig()
pkg/workflow/create_agent_task.go:parseAgentTaskConfig()
... (17 more files)

Pattern Similarity: 70-75% identical structure

Each function follows this pattern:

func (c *Compiler) parseXXXConfig(outputMap map[string]any) *XXXConfig {
    if configData, exists := outputMap["xxx"]; exists {
        config := &XXXConfig{}
        if configMap, ok := configData.(map[string]any); ok {
            // Parse specific fields using shared helpers
            config.Field1 = parseField1FromConfig(configMap)
            config.Field2 = parseField2FromConfig(configMap)
            
            // Parse common base fields
            c.parseBaseSafeOutputConfig(configMap, &config.BaseSafeOutputConfig, defaultMax)
        }
        return config
    }
    return nil
}

Recommendation:

  1. Create safe_outputs_config_parsing.go to consolidate all parseXXXConfig() functions
  2. Move all 29+ parsing functions to this single file
  3. Group related parsers together (creation, closure, updates, etc.)
  4. Consider creating a generic configuration parser for common patterns

Estimated Impact: Reduce ~500 lines of scattered code, improve discoverability, enable consistent error handling


2. Monolithic safe_outputs.go (1,360 lines, 4 concerns)

Problem: The safe_outputs.go file has grown to 1,360 lines and mixes multiple concerns, exceeding the recommended 1,000-line limit.

**(redacted) pkg/workflow/safe_outputs.go

Current Structure:

Lines 1-64:     Package setup and types
Lines 64-350:   Configuration extraction orchestration (parseXXXConfig calls)
Lines 350-600:  Job building infrastructure
Lines 600-800:  YAML generation helpers  
Lines 800+:     Environment variable construction

Recommendation: Split into 4 focused files

  1. safe_outputs.go (keep core, ~400 lines)

    • Main types and interfaces
    • Public API functions
    • Orchestration logic
  2. safe_outputs_extraction.go (~300 lines)

    • Configuration extraction from workflow data
    • extractSafeOutputs() and related functions
    • Coordinate calls to parsers in safe_outputs_config_parsing.go
  3. safe_outputs_job_builder.go (~350 lines)

    • Job building infrastructure
    • Common job patterns
    • Shared job construction utilities
  4. safe_outputs_yaml.go (~310 lines)

    • YAML generation helpers
    • String formatting for GitHub Actions YAML
    • Template rendering

Estimated Impact: Improve file navigability, clearer separation of concerns, easier testing


3. Duplicated Numeric Type Conversion (5+ locations)

Problem: The parseIntValue() function exists in map_helpers.go but is duplicated inline in multiple locations.

Example from safe_outputs.go (lines 170-188):

// Parse max (optional)
if maxCount, exists := labelsMap["max"]; exists {
    var maxCountInt int
    var validMaxCount bool
    switch v := maxCount.(type) {
    case int:
        maxCountInt = v
        validMaxCount = true
    case int64:
        maxCountInt = int(v)
        validMaxCount = true
    case uint64:
        maxCountInt = int(v)
        validMaxCount = true
    case float64:
        maxCountInt = int(v)
        validMaxCount = true
    }
    if validMaxCount {
        labelConfig.Max = maxCountInt
    }
}

But map_helpers.go already has:

// parseIntValue safely parses various numeric types to int
func parseIntValue(value any) (int, bool) {
    switch v := value.(type) {
    case int:
        return v, true
    case int64:
        return int(v), true
    case uint64:
        return int(v), true
    case float64:
        intVal := int(v)
        // Warn if truncation occurs
        if v != float64(intVal) {
            mapHelpersLog.Printf("Float value %.2f truncated to integer %d", v, intVal)
        }
        return intVal, true
    default:
        return 0, false
    }
}

Recommendation:

  1. Export parseIntValue() from map_helpers.go (already exported)
  2. Replace all inline numeric parsing with calls to parseIntValue()
  3. Benefits: Consistent truncation warnings, centralized numeric handling

Locations with duplication:

  • safe_outputs.go (lines 170-188)
  • Similar patterns in 4+ other safe-output config files

Estimated Impact: Remove ~50 lines of duplication, consistent numeric handling


Major Priority Issues

4. Job Builder Function Duplication (29 files, 1,000+ lines)

Problem: All safe-output job builders follow nearly identical patterns with 85-90% code similarity.

Pattern Analysis:

Every buildXXXJob() function:

  1. Checks if config exists → identical
  2. Builds environment variables (standard + custom) → 85% identical
  3. Creates pre-steps (debug output) → 95% identical
  4. Creates main script step → varies by action
  5. Creates post-steps (if applicable) → varies by action
  6. Returns Job with steps → identical

Example from create_issue.go:

func (c *Compiler) buildCreateOutputIssueJob(data *WorkflowData, mainJobName string) (*Job, error) {
    if data.SafeOutputs == nil || data.SafeOutputs.CreateIssues == nil {
        return nil, fmt.Errorf("safe-outputs.create-issue configuration is required")
    }
    
    // Build custom environment variables specific to create-issue
    var customEnvVars []string
    customEnvVars = append(customEnvVars, buildTitlePrefixEnvVar(...))
    customEnvVars = append(customEnvVars, buildLabelsEnvVar(...))
    customEnvVars = append(customEnvVars, c.buildStandardSafeOutputEnvVars(...))
    
    // Build post-steps if configured
    var postSteps []string
    if len(data.SafeOutputs.CreateIssues.Assignees) > 0 {
        postSteps = buildCopilotParticipantSteps(...)
    }
    
    // Create outputs
    outputs := map[string]string{
        "issue_number": "${{ steps.create_issue.outputs.issue_number }}",
        "issue_url":    "${{ steps.create_issue.outputs.issue_url }}",
    }
    
    // Build job condition
    jobCondition := BuildSafeOutputType("create_issue")
    
    // Return job
    return c.buildSafeOutputJob(...)
}

Recommendation:
Create a SafeOutputJobBuilder abstraction:

type SafeOutputJobBuilder struct {
    ActionType       string
    ConfigExists     func(*WorkflowData) bool
    CustomEnvVars    func(*WorkflowData) []string
    PreSteps         func(*WorkflowData) []string
    MainScript       string
    PostSteps        func(*WorkflowData) []string
    Outputs          map[string]string
    JobCondition     func(*WorkflowData) string
}

func (b *SafeOutputJobBuilder) Build(data *WorkflowData, mainJobName string) (*Job, error) {
    // Common job building logic
}

Alternative approach: Create a job builder registry with templates.

Estimated Impact: Remove 1,000+ lines of boilerplate, enable consistent job patterns


5. Engine Log Parsing Duplication (4 files, 200+ lines)

Problem: Each AI engine implementation has its own ParseLogMetrics() function with 60-75% code duplication.

Affected Files:

pkg/workflow/claude_logs.go:ParseLogMetrics()
pkg/workflow/copilot_engine.go:ParseLogMetrics()
pkg/workflow/codex_engine.go:ParseLogMetrics()
pkg/workflow/custom_engine.go:ParseLogMetrics()

Common Logic (60-75% similar):

  • Tool call extraction and counting
  • Token counting (input/output)
  • Cost calculation
  • Sequence tracking
  • Error parsing

Recommendation:

  1. Create log_parsing_helpers.go with shared utilities:

    • ExtractToolCalls(logs []string) []ToolCall
    • CountTokens(logs []string) (input, output int)
    • CalculateCost(inputTokens, outputTokens int, pricing PricingModel) float64
    • ExtractErrorMessages(logs []string) []string
  2. Keep engine-specific parsing in each engine file, but use shared utilities

Estimated Impact: Remove ~200 lines of duplication, consistent metric tracking


Minor Priority Issues

6. String Normalization Functions (Multiple Files)

Status: Mostly well-organized, minor opportunity for consolidation

Current Distribution:

  • strings.go - Primary location for SanitizeName(), SanitizeWorkflowName(), SortStrings()
  • workflow_name.go - SanitizeIdentifier()
  • resolve.go - normalizeWorkflowName() (private)
  • safe_outputs.go - normalizeSafeOutputIdentifier() (private, embedded)

Recommendation (Optional):
Consider consolidating all normalize* and sanitize* functions into strings.go for discoverability. Current organization is acceptable but could be improved for consistency.

Estimated Impact: Documentation improvement, better discoverability


Function Clustering Analysis

Cluster 1: Safe-Output Creation Functions

Files: 6 create_*.go files

  • create_issue.go
  • create_discussion.go
  • create_pull_request.go
  • create_pr_review_comment.go
  • create_code_scanning_alert.go
  • create_agent_task.go

Organization Status: ⚠️ Poorly organized - Each file is 150-300 lines with redundant structure

**Functions per (redacted)

  • parseXXXConfig() - Configuration parsing
  • buildCreateOutputXXXJob() - Job building
  • Config struct definition

Recommendation: Consider grouping into safe_outputs_create/ subdirectory if consolidation effort is undertaken.


Cluster 2: Safe-Output Update Functions

Files: 4 update_*.go files

  • update_issue.go
  • update_project.go
  • update_release.go
  • push_to_pull_request_branch.go (special case)

Organization Status: ✅ Well-organized by action type


Cluster 3: Safe-Output Closure Functions

Files: 3 close_*.go files

  • close_issue.go
  • close_pull_request.go
  • close_discussion.go

Organization Status: ✅ Well-organized by action type


Cluster 4: Validation Functions

Files: Multiple validation-focused files

  • permissions_validator.go
  • action_sha_checker.go
  • mcp_config_validation.go
  • step_order_validation.go
  • github_toolset_validation_error.go

Organization Status: ✅ Well-organized by validation concern

Recommendation: Consider creating a common ValidationResult interface for consistency across validators.


Implementation Roadmap

Phase 1: High-Impact Consolidation (Priority 1)

Tasks:

  1. Create safe_outputs_config_parsing.go

    • Move all 29+ parseXXXConfig() functions
    • Group by action type (creation, closure, update, etc.)
    • Estimated effort: 4-6 hours
    • Benefits: 500+ lines consolidated, improved discoverability
  2. Split safe_outputs.go into 4 files

    • safe_outputs.go (core, ~400 lines)
    • safe_outputs_extraction.go (~300 lines)
    • safe_outputs_job_builder.go (~350 lines)
    • safe_outputs_yaml.go (~310 lines)
    • Estimated effort: 6-8 hours
    • Benefits: Clearer organization, easier navigation
  3. Consolidate numeric type conversion

    • Use parseIntValue() from map_helpers.go everywhere
    • Remove inline duplication
    • Estimated effort: 2-3 hours
    • Benefits: Consistent truncation warnings, 50+ lines removed

Total Phase 1 Effort: 12-17 hours
Total Phase 1 Impact: ~600 lines consolidated, improved maintainability


Phase 2: Job Builder Abstraction (Priority 2)

Tasks:

  1. Design SafeOutputJobBuilder abstraction

    • Create builder interface or registry pattern
    • Implement template for common job structure
    • Estimated effort: 8-10 hours
  2. Migrate existing builders

    • Convert 29 job builders to use new abstraction
    • Estimated effort: 12-15 hours

Total Phase 2 Effort: 20-25 hours
Total Phase 2 Impact: ~1,000 lines of boilerplate removed


Phase 3: Engine Log Parsing (Priority 3)

Tasks:

  1. Create log_parsing_helpers.go

    • Extract shared utilities
    • Estimated effort: 4-6 hours
  2. Update engine implementations

    • Refactor to use shared helpers
    • Estimated effort: 4-6 hours

Total Phase 3 Effort: 8-12 hours
Total Phase 3 Impact: ~200 lines of duplication removed


Testing Strategy

For each refactoring phase:

  1. ✅ Ensure existing tests pass before changes
  2. ✅ Run make test-unit after each file move
  3. ✅ Verify make lint passes
  4. ✅ Check make build succeeds
  5. ✅ No changes to public APIs (internal refactoring only)

Risk Mitigation:

  • These are internal refactorings with no public API changes
  • Existing test coverage should catch regressions
  • Changes are primarily organizational (moving code, not rewriting)

Summary of Findings

Priority Issue Files Affected Lines Impact Effort
1 Scattered config parsing 29 files 500+ lines 4-6 hours
1 Monolithic safe_outputs.go 1 file Split to 4 files 6-8 hours
1 Numeric conversion duplication 5+ files 50+ lines 2-3 hours
2 Job builder duplication 29 files 1,000+ lines 20-25 hours
3 Engine log parsing 4 files 200+ lines 8-12 hours
4 String normalization (minor) 4 files Documentation 2-3 hours

Total Impact: ~1,750+ lines of duplicated code identified
Total Estimated Effort: 42-57 hours for complete refactoring
Recommended Approach: Implement Phase 1 first (highest impact, lowest risk)


Analysis Metadata

  • Repository: githubnext/gh-aw
  • Analysis Date: 2025-11-27
  • Files Analyzed: 240 non-test Go files
  • Functions Cataloged: 1,800+
  • Primary Focus: pkg/workflow/ (207 files) and pkg/cli/ (74 files)
  • Detection Method: Semantic clustering + naming pattern analysis + code similarity
  • Tool Used: Claude Code Agent with Explore agent for thorough codebase analysis

Next Steps

  1. Review and prioritize - Determine which refactorings to pursue
  2. Start with Phase 1 - High-impact, low-risk consolidation (12-17 hours)
  3. Create implementation issues - Break down into smaller, trackable tasks
  4. Maintain backward compatibility - All changes are internal refactorings

Labels

  • refactoring - Code organization improvement
  • code-health - Technical debt reduction
  • automated-analysis - Generated by semantic clustering analysis

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions