Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis - Refactoring Opportunities #2273

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Analysis of repository: githubnext/gh-aw

Executive Summary

Comprehensive semantic analysis of 153 non-test Go files across the pkg/ directory identified significant refactoring opportunities to improve code organization, reduce duplication, and enhance maintainability.

Key Findings:

  • Good: Well-organized create_*.go pattern for GitHub Actions (issues, PRs, discussions, etc.)
  • Good: Separate engine files (*_engine.go) for different AI engines
  • ⚠️ Issue: compiler.go is massive (3,472 lines, 70 functions) and contains mixed responsibilities
  • ⚠️ Issue: Parse/extract/validate functions scattered across 20+ files
  • ⚠️ Issue: Similar helper functions duplicated or near-duplicated
  • ⚠️ Issue: Outlier functions in wrong files (extraction in runtime files, generation in utility files)

Analysis Metadata

  • Total Go Files Analyzed: 153
  • Packages Analyzed: cli (51 files), workflow (90 files), parser (6 files), console (3 files), constants (1), logger (1)
  • Primary Focus: pkg/workflow (largest package with most refactoring opportunities)
  • Function Clusters Identified: 8 major clusters
  • Outliers Found: 10+ functions in wrong files
  • Detection Method: Serena semantic analysis + regex pattern analysis + manual review
  • Analysis Date: 2025-10-24

Function Inventory

By Package

pkg/workflow/ (90 files)

Primary Purpose: Core workflow compilation, execution engine, and GitHub Actions generation

File Organization:

  • Well-organized: create_*.go (8 files) - Each GitHub action type has its own file
  • Well-organized: *_engine.go (4 files) - Separate files per AI engine (Claude, Codex, Copilot, Custom)
  • ⚠️ Needs attention: compiler.go (3,472 lines, 70 functions) - Too large and mixed responsibilities
  • ⚠️ Scattered: Helper functions across *_helpers.go, *_utils.go, and main logic files

pkg/cli/ (51 files)

Primary Purpose: CLI command implementations and command handlers

File Organization:

  • Well-organized: *_command.go pattern for CLI commands
  • Well-organized: mcp_*.go files for MCP-related functionality (11 files)
  • ⚠️ Note: Has shared_utils.go (not in workflow package)

Identified Issues

1. Oversized Compiler File

Issue: pkg/workflow/compiler.go is extremely large and contains multiple responsibilities

Details:

  • Size: 3,472 lines
  • Function Count: 70 functions
  • Responsibilities Mixed:
    • 12 extract* functions (data extraction from frontmatter)
    • 20 generate* functions (YAML generation)
    • 7 build* functions (job building)
    • 2 parse* functions (parsing configuration)
    • 4 validate* functions (validation logic)
    • Schema validation, file I/O, orchestration

Functions in compiler.go that belong elsewhere:

Extraction Functions (should be in parse_utils.go or new extraction.go):
- extractTopLevelYAMLSection
- extractPermissions
- extractIfCondition
- extractFeatures
- extractDescription
- extractSource
- extractSafetyPromptSetting
- extractToolsTimeout
- extractToolsStartupTimeout
- extractExpressionFromIfString
- extractCommandConfig
- extractYAMLValue

Generation Functions (should be in code_generation.go or yaml_generation.go):
- generateJobName
- generateYAML
- generateMainJobSteps
- generateUploadAgentLogs
- generateUploadAssets
- generateLogParsing
- generateErrorValidation
- generateUploadAwInfo
- generateUploadPrompt
- generateExtractAccessLogs
- generateUploadAccessLogs
- generateUploadMCPLogs
- generatePrompt
- generateCacheMemoryPromptStep
- generateSafeOutputsPromptStep
- generatePostSteps
- generateEngineExecutionSteps
- generateAgentVersionCapture
- generateCreateAwInfo
- generateOutputCollectionStep
- convertGoPatternToJavaScript
- convertErrorPatternsToJavaScript

Recommendation:

  1. High Priority: Extract generation functions to yaml_generation.go or code_generation.go
  2. High Priority: Extract extraction/parsing functions to enhance parse_utils.go
  3. Medium Priority: Extract validation functions to validation.go
  4. Result: Slim down compiler.go to orchestration logic (~500-800 lines)

Estimated Impact:

  • Improved maintainability and testability
  • Easier code navigation
  • Reduced cognitive load
  • Better separation of concerns

2. Outlier Functions (Functions in Wrong Files)

Issue: Functions whose purpose doesn't match their containing file

Example 1: Extraction in Runtime Files

File: pkg/workflow/network.go
Function: extractNetworkPermissions()
Issue: Extraction/parsing function in a runtime networking file
Expected Location: parse_utils.go or compiler.go
Code:

func (c *Compiler) extractNetworkPermissions(frontmatter map[string]any) *NetworkPermissions

Example 2: Generation in Utility Files

File: pkg/workflow/git.go
Function: generateGitConfigurationSteps()
Issue: Generation function in git utility file
Expected Location: yaml_generation.go or compiler.go
Code:

func (c *Compiler) generateGitConfigurationSteps() []string

Other Outliers Detected

Validation functions scattered across:

  • pkg/workflow/cache.go - has validate functions (should be in validation.go)
  • pkg/workflow/docker.go - has validate functions
  • pkg/workflow/engine.go - has validate functions
  • pkg/workflow/npm.go - has validate functions
  • pkg/workflow/pip.go - has validate functions
  • pkg/workflow/strict_mode.go - has validate functions
  • pkg/workflow/expression_safety.go - has validate functions (acceptable here)

Total validation function files: 12 files contain validation functions, but only validation.go is dedicated to it

Recommendation:

  1. Move extractNetworkPermissions to parse_utils.go
  2. Move generateGitConfigurationSteps to a generation helper file
  3. Consolidate validation functions into validation.go (keep domain-specific validation like expression_safety.go)

Estimated Impact: Clearer file organization, easier to find related functions


3. Duplicate or Near-Duplicate Extraction Functions

Issue: Similar extraction functions with overlapping functionality

Example: extractStringValue vs extractYAMLValue

Occurrence 1: pkg/workflow/parse_utils.go:extractStringValue()

func extractStringValue(frontmatter map[string]any, key string) string {
    value, exists := frontmatter[key]
    if !exists {
        return ""
    }
    if strValue, ok := value.(string); ok {
        return strValue
    }
    return ""
}

Occurrence 2: pkg/workflow/parse_utils.go:(*Compiler).extractYAMLValue()

func (c *Compiler) extractYAMLValue(frontmatter map[string]any, key string) string {
    if value, exists := frontmatter[key]; exists {
        if str, ok := value.(string); ok {
            return str
        }
        if num, ok := value.(int); ok {
            return fmt.Sprintf("%d", num)
        }
        // ... more type conversions ...
    }
    return ""
}

Similarity: Both extract string values from frontmatter map, but extractYAMLValue handles more types

Recommendation:

  • Consolidate into single function with optional type coercion flag
  • Or rename to clarify distinction: extractStringValue (strict) vs extractYAMLValueAsString (with conversion)

Estimated Impact: Reduced confusion, clearer API


4. Scattered Parse Config Functions

Issue: Config parsing functions spread across 15+ files

Pattern Detected: parse*Config functions in many files

Files with parseConfig functions:

  1. add_comment.go - parseCommentsConfig
  2. compiler.go - parseBaseSafeOutputConfig
  3. config.go - parseLabelsFromConfig, parseTitlePrefixFromConfig, parseTargetRepoFromConfig
  4. create_agent_task.go - parseAgentTaskConfig
  5. create_code_scanning_alert.go - parseCodeScanningAlertsConfig
  6. create_discussion.go - parseDiscussionsConfig
  7. create_issue.go - parseIssuesConfig
  8. create_pr_review_comment.go - parsePullRequestReviewCommentsConfig
  9. create_pull_request.go - parsePullRequestsConfig
  10. missing_tool.go - parseMissingToolConfig
  11. publish_assets.go - parseUploadAssetConfig
  12. push_to_pull_request_branch.go - parsePushToPullRequestBranchConfig
  13. safe_jobs.go - parseSafeJobsConfig
  14. threat_detection.go - parseThreatDetectionConfig

Analysis:

  • Good: Each config parser is co-located with its domain logic (e.g., parseIssuesConfig in create_issue.go)
  • Good: Clear naming pattern
  • ⚠️ Issue: Common helper logic (parseLabelsFromConfig, parseTitlePrefixFromConfig, etc.) in separate config.go

Recommendation:

  • Keep current organization - domain-specific parsers should stay with their domain
  • Enhance: Document that config.go contains shared parsing helpers
  • Consider: Rename config.go to config_helpers.go for clarity

Estimated Impact: Low - current organization is reasonable


5. Helper File Proliferation

Issue: Multiple helper files with unclear boundaries

Helper Files Identified:

  • parse_utils.go (143 lines) - Parsing utilities
  • engine_helpers.go (36 lines) - Engine-specific helpers
  • engine_shared_helpers.go (372 lines) - Shared engine helpers
  • safe_output_helpers.go (349 lines) - Safe output helpers
  • prompt_step_helper.go - Prompt step helpers

Analysis:

  • Good: Helpers are domain-specific (engine, safe outputs, prompt)
  • ⚠️ Issue: parse_utils.go only has 2 extraction functions but could have more
  • ⚠️ Issue: Unclear boundary between engine_helpers.go (36 lines) and engine_shared_helpers.go (372 lines)

Recommendation:

  1. Merge: Combine engine_helpers.go into engine_shared_helpers.go (36 lines is too small for separate file)
  2. Enhance: Move scattered extraction functions into parse_utils.go
  3. Consider: Rename files for clarity:
    • parse_utils.gofrontmatter_extraction.go
    • engine_shared_helpers.goengine_helpers.go (after merge)

Estimated Impact: Clearer helper organization


6. Validation Functions Distribution

Issue: Validation logic scattered across 12 files

Files with validation functions:

  1. cache.go - validateNoDuplicateCacheIDs
  2. compiler.go - 4 validation functions
  3. docker.go - validateDockerImage
  4. engine.go - validateEngine, validateSingleEngineSpecification
  5. expression_safety.go - validateExpressionSafety, validateSingleExpression
  6. mcp-config.go - validateStringProperty, validateMCPRequirements
  7. npm.go - validateNpxPackages
  8. pip.go - validatePipPackages, validateUvPackages, validatePythonPackagesWithPip, validateUvPackagesWithPip
  9. redact_secrets.go - validateSecretReferences
  10. strict_mode.go - validateStrictMode, validateStrictPermissions
  11. template.go - validation functions
  12. validation.go - 3 validation functions

Analysis:

  • Total validate functions: ~35+ across 12 files
  • Dedicated validation file: Only 3 functions (196 lines)
  • Acceptable: Domain-specific validation (expression_safety, docker, npm, pip) in their respective files
  • ⚠️ Issue: Generic validation scattered (cache validation, secret validation, etc.)

Recommendation:

  1. Keep: Domain-specific validation in domain files (expression_safety, docker, npm, pip)
  2. Move: Generic validation to validation.go:
    • validateNoDuplicateCacheIDs (from cache.go)
    • validateSecretReferences (from redact_secrets.go)
    • Compiler validation functions from compiler.go
  3. Enhance: Make validation.go the central validation logic file

Estimated Impact: Centralized validation logic, easier testing


Detailed Function Clusters

Cluster 1: Creation Functions (create_*.go files)

Pattern: create* functions for GitHub Actions
Files: 8 dedicated files

Functions:

  • create_issue.go - buildCreateOutputIssueJob, parseIssuesConfig
  • create_pull_request.go - buildCreateOutputPullRequestJob, parsePullRequestsConfig
  • create_discussion.go - buildCreateOutputDiscussionJob, parseDiscussionsConfig
  • create_pr_review_comment.go - buildCreateOutputPullRequestReviewCommentJob, parsePullRequestReviewCommentsConfig
  • create_code_scanning_alert.go - buildCreateOutputCodeScanningAlertJob, parseCodeScanningAlertsConfig
  • create_agent_task.go - buildCreateOutputAgentTaskJob, parseAgentTaskConfig
  • add_comment.go - buildCreateOutputAddCommentJob, parseCommentsConfig
  • add_labels.go - buildAddLabelsJob

Analysis: ✅ Excellent organization - Clear one-file-per-feature pattern, consistent naming

Recommendation: Keep as-is - This is a model for good organization


Cluster 2: Parsing Functions

Pattern: parse* functions (24+ files with parse functions)
Files: Widespread across codebase

Sub-clusters:

  1. Config parsers (15 files) - Domain-specific config parsing
  2. Engine parsers (3 files) - Tool call parsing per engine
  3. Expression parsers (1 file) - Expression tree parsing
  4. Utility parsers (1 file) - Generic parsing helpers

Analysis:

  • ✅ Domain-specific parsers co-located with logic
  • ⚠️ Could benefit from better documentation of parsing patterns

Recommendation:

  • Keep domain-specific parsers with their domains
  • Document parsing patterns in code comments or architecture doc
  • Enhance parse_utils.go with more common parsing utilities

Cluster 3: Extraction Functions

Pattern: extract* functions (22 files with extract functions)

Primary Files:

  • compiler.go - 12 extract functions (should move most out)
  • parse_utils.go - 2 extract functions (should have more)
  • Various domain files - specific extraction logic

Examples:

compiler.go:
- extractTopLevelYAMLSection
- extractPermissions
- extractIfCondition
- extractFeatures
- extractDescription
- extractSource
- extractSafetyPromptSetting
- extractToolsTimeout
- extractToolsStartupTimeout
- extractExpressionFromIfString
- extractCommandConfig
- extractYAMLValue

Other files:
- args.go: extractAddDirPaths
- cache.go: extractCustomArgs
- network.go: extractNetworkPermissions
- npm.go: extractNpxPackages, extractNpxFromCommands
- ... and more

Analysis:

  • ⚠️ Poor: Extraction logic heavily concentrated in compiler.go
  • ⚠️ Poor: parse_utils.go underutilized (only 2 functions)
  • ⚠️ Poor: Extraction functions scattered across domain files

Recommendation:

  1. High Priority: Move generic extraction from compiler.go to parse_utils.go or new frontmatter_extraction.go
  2. Medium Priority: Move domain extraction to appropriate domain files
  3. Result: Clear separation between generic and domain-specific extraction

Estimated Impact: Major improvement in code organization


Cluster 4: Generation Functions

Pattern: generate* functions (Multiple files)

Primary Location: compiler.go has 20 generate functions (should be extracted)

Files with generation:

  • compiler.go - 20 functions (YAML generation, step generation, etc.)
  • cache.go - Cache generation
  • git.go - Git config generation
  • Engine files - Engine-specific generation

Recommendation:

  1. Create yaml_generation.go for YAML generation functions
  2. Create step_generation.go for GitHub Actions step generation
  3. Move generation functions from compiler.go to new files
  4. Keep domain-specific generation in domain files

Estimated Impact: Major improvement, clearer separation


Cluster 5: Validation Functions

Pattern: validate* functions (12 files, 35+ functions)

Analysis: See "Issue #6: Validation Functions Distribution" above


Cluster 6: Build Functions

Pattern: build* functions (Job and component building)

Primary Location: compiler.go and safe output files

Sub-patterns:

  • build*Job - Job construction
  • build*Steps - Step construction
  • build*Condition - Condition building

Analysis:

  • ✅ Good organization for job building
  • ⚠️ Could conso
    [Content truncated due to length]

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions