Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis - Refactoring Opportunities #3478

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Analysis performed on githubnext/gh-aw repository to identify refactoring opportunities through semantic function clustering and duplicate detection.

Executive Summary

Analyzed 191 Go source files across 6 packages in the repository. The analysis identified:

  • Well-organized creation patterns: create_*.go files follow best practices
  • ⚠️ Validation function scatter: Validation logic spread across 6+ files
  • ⚠️ Config parsing duplication: Near-identical parsing functions with 85%+ similarity
  • ⚠️ Validation methods in compiler: Several validation functions attached to Compiler that could be standalone
Full Analysis Report

Repository Structure

Packages Overview

Package Files Primary Purpose
pkg/workflow 113 Core workflow compilation and execution
pkg/cli 64 CLI commands and interface
pkg/parser 7 Parsing utilities (frontmatter, GitHub, YAML)
pkg/console 4 Console output and formatting
pkg/timeutil 1 Time formatting utilities
pkg/logger 1 Logging configuration
pkg/constants 1 Global constants

Total: 191 non-test Go files analyzed

Function Clustering Results

Cluster 1: Creation Functions ✅

Pattern: create_* functions and files
Organization Status: Excellent - Well-organized

Examples:

  • pkg/workflow/create_issue.go - Issue creation logic
  • pkg/workflow/create_pull_request.go - PR creation logic
  • pkg/workflow/create_discussion.go - Discussion creation logic
  • pkg/workflow/create_agent_task.go - Agent task creation logic
  • pkg/workflow/create_code_scanning_alert.go - Alert creation logic
  • pkg/workflow/create_pr_review_comment.go - Review comment creation

Analysis: ✅ Each creation function has its own dedicated file following the "one file per feature" rule. This is an exemplary organization pattern.

Cluster 2: Validation Functions ⚠️

Pattern: validate* functions
Organization Status: Scattered across multiple files

Files containing validation logic:

  1. pkg/workflow/validation.go (26 validation-related functions)
  2. pkg/workflow/validation_strict_mode.go (3 validation functions)
  3. pkg/workflow/docker_validation.go (1 function: validateDockerImage)
  4. pkg/workflow/npm_validation.go (1 method: validateNpxPackages)
  5. pkg/workflow/pip_validation.go (4 methods: validatePipPackages, validateUvPackages, etc.)
  6. pkg/workflow/bundler_validation.go (1 function: validateNoLocalRequires)
  7. pkg/workflow/expression_safety.go (2 functions: validateExpressionSafety, validateSingleExpression)
  8. pkg/workflow/engine.go (2 validation methods in mixed file)

Key Validation Functions:

  • ✅ Core validation: validation.go (primary validation file)
  • ⚠️ Package-specific: *_validation.go files (good pattern, but inconsistent)
  • ⚠️ Mixed concerns: Validation methods in engine.go, strict_mode.go

Issue: While package-specific validation files (docker_validation.go, npm_validation.go, etc.) make sense, there are validation functions scattered in non-validation files like engine.go.

Cluster 3: Parsing Functions ⚠️

Pattern: parse* functions
Organization Status: Widely distributed

70+ parsing functions found, including:

  • Config parsing: parseLabelsFromConfig, parseTitlePrefixFromConfig, parseTargetRepoFromConfig
  • Time parsing: parseTimeDelta, parseAbsoluteDateTime, parseRelativeDate
  • Package parsing: parseNpmPackage, parsePipPackage, parseGoPackage
  • Tool parsing: parseGitHubTool, parseBashTool, parsePlaywrightTool, etc.
  • Log parsing: parseClaudeJSONLog, parseCodexToolCallsWithSequence

Files with parse functions:

  • pkg/workflow/compiler.go - Main compilation parsing
  • pkg/workflow/config.go - Config parsing helpers
  • pkg/workflow/dependabot.go - Package parsing
  • pkg/workflow/tools_types.go - Tool parsing
  • pkg/workflow/time_delta.go - Time parsing
  • Multiple create_*.go files - Feature-specific parsing

Analysis: Generally well-organized with parsing functions near their usage, but see duplicates below.

Cluster 4: Extraction Functions ✅

Pattern: extract* functions
Organization Status: Well-distributed

50+ extraction functions found, primarily in:

  • pkg/workflow/frontmatter_extraction.go - 15+ extraction methods (well-organized)
  • pkg/workflow/dependabot.go - Package extraction functions
  • pkg/workflow/pip.go - Pip package extraction
  • pkg/workflow/npm.go - NPM package extraction

Analysis: ✅ Good organization with a dedicated frontmatter_extraction.go file for frontmatter-related extractions.

Cluster 5: Script Getters ✅

Pattern: get*Script functions
Organization Status: Excellent

File: pkg/workflow/scripts.go

20+ script getter functions, all centralized:

  • getCreateIssueScript()
  • getAddLabelsScript()
  • getParseFirewallLogsScript()
  • getCreateDiscussionScript()
  • getUpdateIssueScript()
  • getParseClaudeLogScript()
  • And many more...

Analysis: ✅ Exemplary organization - all script getters in a single file with consistent naming and structure.

Identified Issues

Issue 1: Near-Duplicate Config Parsing Functions

Location: pkg/workflow/config.go

Functions with 85%+ similarity:

parseTitlePrefixFromConfig (lines 34-42)

func parseTitlePrefixFromConfig(configMap map[string]any) string {
    if titlePrefix, exists := configMap["title-prefix"]; exists {
        if titlePrefixStr, ok := titlePrefix.(string); ok {
            configLog.Printf("Parsed title-prefix from config: %s", titlePrefixStr)
            return titlePrefixStr
        }
    }
    return ""
}

parseTargetRepoFromConfig (lines 48-56)

func parseTargetRepoFromConfig(configMap map[string]any) string {
    if targetRepoSlug, exists := configMap["target-repo"]; exists {
        if targetRepoStr, ok := targetRepoSlug.(string); ok {
            configLog.Printf("Parsed target-repo from config: %s", targetRepoStr)
            return targetRepoStr
        }
    }
    return ""
}

Similarity: ~90% - These functions differ only in:

  • Key name ("title-prefix" vs "target-repo")
  • Variable names
  • Log message text

Recommendation: Create a generic helper function:

func parseStringFromConfig(configMap map[string]any, key string) string {
    if value, exists := configMap[key]; exists {
        if valueStr, ok := value.(string); ok {
            configLog.Printf("Parsed %s from config: %s", key, valueStr)
            return valueStr
        }
    }
    return ""
}

Then simplify:

func parseTitlePrefixFromConfig(configMap map[string]any) string {
    return parseStringFromConfig(configMap, "title-prefix")
}

func parseTargetRepoFromConfig(configMap map[string]any) string {
    return parseStringFromConfig(configMap, "target-repo")
}

Impact: Reduces duplication, improves maintainability
Files: pkg/workflow/config.go

Issue 2: Validation Functions in Compiler File

Location: pkg/workflow/compiler.go

Validation methods attached to Compiler that might work better as standalone:

  • (*Compiler).parseBaseSafeOutputConfig (line 1567-1581)
  • (*Compiler).computeAllowedDomainsForSanitization (validation-like logic)
  • (*Compiler).detectTextOutputUsage (validation/detection logic)

Current Issue: The main compiler.go file (1600+ lines) contains a mix of:

  • Core compilation logic ✅
  • Validation logic ⚠️
  • Parsing logic ⚠️
  • Configuration detection ⚠️

Recommendation: Consider extracting pure validation logic to validation-related files:

  • Move detectTextOutputUsage → potentially to a detection or validation file
  • Move domain computation logic → domain-related file or validation

Impact: Reduces compiler.go complexity, improves separation of concerns
Estimated Effort: 3-4 hours

Issue 3: Scattered Validation Functions Across Engine Files

Location: Multiple files

Validation methods in non-validation files:

  • pkg/workflow/engine.go: validateEngine, validateSingleEngineSpecification
  • pkg/workflow/strict_mode.go: validateStrictMode

Issue: While the Compiler methods pattern makes sense, having validation logic in files named after their primary feature (engine, strict mode) rather than validation reduces discoverability.

Recommendation:

  • Option A: Keep as-is if these are tightly coupled to their respective features
  • Option B: Move to engine_validation.go and strict_mode_validation.go following the pattern of docker_validation.go

Impact: Improved discoverability and consistency
Estimated Effort: 1-2 hours if pursuing Option B

Issue 4: Repository Feature Checking Duplication

Location: pkg/workflow/validation.go

Duplicate pattern in repository feature checking:

// Lines 555-601
func checkRepositoryHasDiscussions(...) bool { ... }
func checkRepositoryHasDiscussionsUncached(...) (bool, error) { ... }

// Lines 604-631  
func checkRepositoryHasIssues(...) bool { ... }
func checkRepositoryHasIssuesUncached(...) (bool, error) { ... }

Similarity: Both pairs follow identical caching patterns with slight variations in API calls.

Analysis: This is acceptable duplication since:

  • Different GitHub API endpoints
  • Different response structures
  • Caching logic is standard pattern

Recommendation: ✅ Keep as-is - the duplication serves clarity. Could consider a generic cached feature checker in the future if more features are added.

Well-Organized Patterns (No Changes Needed)

✅ Create Operations

Each creation operation has its own file:

  • create_issue.go
  • create_pull_request.go
  • create_discussion.go
  • create_agent_task.go
  • create_code_scanning_alert.go
  • create_pr_review_comment.go

✅ Package-Specific Validation Files

Good pattern of dedicated validation files:

  • docker_validation.go
  • npm_validation.go
  • pip_validation.go
  • bundler_validation.go

✅ Centralized Scripts

All script getters in scripts.go with consistent patterns.

✅ Dedicated Extraction File

frontmatter_extraction.go centralizes frontmatter parsing logic.

Refactoring Recommendations

Priority 1: High Impact, Low Effort

1. Consolidate Config Parsing Functions

  • File: pkg/workflow/config.go
  • Action: Create generic parseStringFromConfig helper
  • Functions to refactor: parseTitlePrefixFromConfig, parseTargetRepoFromConfig
  • Estimated effort: 30 minutes
  • Benefits:
    • Reduces code duplication
    • Single source of truth for string config parsing
    • Easier to add new config fields

Priority 2: Medium Impact, Medium Effort

2. Review Compiler File Organization

  • File: pkg/workflow/compiler.go (1600+ lines)
  • Action: Consider extracting detection/validation methods
  • Candidates:
    • detectTextOutputUsage → validation or detection file
    • computeAllowedDomainsForSanitization → domain-related file
  • Estimated effort: 3-4 hours
  • Benefits:
    • Smaller, more focused compiler.go
    • Better separation of concerns
    • Improved testability

3. Standardize Validation File Naming

  • Files: engine.go, strict_mode.go
  • Action: Extract validation methods to *_validation.go files
  • Pattern: Follow existing docker_validation.go, npm_validation.go pattern
  • Estimated effort: 1-2 hours
  • Benefits:
    • Consistent file organization
    • Improved discoverability
    • Easier to locate validation logic

Priority 3: Long-Term Considerations

4. Generic Caching Pattern for Feature Checks

  • File: pkg/workflow/validation.go
  • Action: Consider generic cached feature checker if more features added
  • Current state: Acceptable duplication
  • Estimated effort: 4-6 hours
  • Benefits:
    • Reduced boilerplate for new feature checks
    • Consistent caching strategy

Implementation Checklist

  • P1: Refactor config parsing functions in config.go
  • P1: Write tests for new parseStringFromConfig helper
  • P2: Review and plan extraction of detection/validation from compiler.go
  • P2: Extract validation methods from engine.go to engine_validation.go (optional)
  • P2: Extract validation methods from strict_mode.go to strict_mode_validation.go (optional)
  • P3: Consider generic caching pattern for future feature checks
  • Verify no functionality broken after changes
  • Run full test suite

Analysis Metadata

  • Total Go Files Analyzed: 191 non-test files
  • Packages Analyzed: 6 (cli, console, constants, logger, parser, timeutil, workflow)
  • Function Clusters Identified: 5 major clusters
  • Outliers Found: 3-4 validation methods in mixed-concern files
  • Near-Duplicates Detected: 2 config parsing functions with 90% similarity
  • Detection Method: Serena semantic code analysis + manual pattern analysis
  • Primary Focus: pkg/workflow (113 files, majority of codebase)
  • Analysis Date: 2025-11-08

Conclusion

The codebase demonstrates good organization practices overall, particularly in:

  • Creation operations (one file per feature)
  • Package-specific validation files
  • Centralized script management

Key opportunities for improvement:

  1. Quick Win: Consolidate near-duplicate config parsing functions (~30 min)
  2. Medium Effort: Consider extracting validation/detection logic from compiler.go (3-4 hours)
  3. Consistency: Standardize validation file naming across all features (1-2 hours)

These refactorings would improve code maintainability, reduce duplication, and enhance the already solid architectural patterns in place.


Note: This analysis focused on high-impact, actionable findings. The codebase is generally well-organized and follows Go best practices. The recommendations aim to enhance consistency and reduce the identified duplication patterns.

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions