Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis - Refactoring Opportunities #3604

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Analysis of repository: githubnext/gh-aw

Executive Summary

This analysis examined 201 non-test Go files across 7 packages to identify semantic function clustering opportunities and refactoring needs. The analysis discovered:

  • 1 confirmed duplicate function with 95% code similarity
  • Multiple scattered helper functions across 4 separate helper files
  • Render functions dispersed across 9 different files
  • Large monolithic files requiring decomposition (largest: 2,812 lines)
  • Well-organized patterns in create_, mcp_, and *_engine files that serve as good examples

The analysis focuses on high-impact refactoring opportunities that would improve code maintainability, reduce duplication, and enhance discoverability.

Full Analysis Details

Repository Structure

Package Distribution

Package Non-Test Files Primary Purpose
pkg/workflow 122 Workflow compilation and execution
pkg/cli 65 CLI commands and utilities
pkg/parser 7 Parsing utilities
pkg/console 4 Console output formatting
pkg/logger 1 Logging utilities
pkg/constants 1 Application constants
pkg/timeutil 1 Time formatting utilities
TOTAL 201

Large Files Requiring Attention

File Lines Functions Issue
pkg/cli/logs.go 2,812 36 Monolithic file - could be split by responsibility
pkg/cli/trial_command.go 1,611 - Large command file
pkg/workflow/compiler.go 1,560 - Core compiler - size may be justified
pkg/workflow/compiler_yaml.go 999 - YAML generation - could extract helpers
pkg/workflow/expressions.go 948 - Expression handling
pkg/workflow/copilot_engine.go 943 - Engine implementation
pkg/workflow/mcp-config.go 934 12 render functions Render functions clustered here

Function Naming Pattern Analysis

Parse Functions (48+ functions)

Parse functions are scattered across multiple files, organized by domain:

pkg/cli files:

  • logs.go: parseAwInfo, parseLogFileWithEngine, parseAgentLog, parseFirewallLogs
  • spec.go: parseRepoSpec, parseGitHubURL, parseWorkflowSpec, parseLocalWorkflowSpec, parseSourceSpec
  • access_log.go: parseSquidAccessLog, parseSquidLogLine
  • firewall_log.go: parseFirewallLogLine, parseFirewallLog
  • pr_command.go: parsePRURL
  • audit.go: parseRunURL
  • And 15+ more...

pkg/workflow files:

  • time_delta.go: parseTimeDelta, parseTimeDeltaForStopAfter, parseAbsoluteDateTime, parseRelativeDate
  • dependabot.go: parseNpmPackage, parsePipPackage, parseGoPackage
  • tools_types.go: parseGitHubTool, parseBashTool, parsePlaywrightTool, parseWebFetchTool, etc. (8+ parse functions)
  • config_helpers.go: parseLabelsFromConfig, parseStringFromConfig, parseTitlePrefixFromConfig, parseTargetRepoFromConfig

Analysis: Parse functions are generally well-organized by domain. The tools_types.go file with 8+ parse functions is appropriately named and organized.

Validate Functions (30+ functions)

Validation functions are well-organized in dedicated files:

Dedicated validation files (13 files):

  • validation.go - General validation
  • strict_mode_validation.go - Strict mode checks
  • npm_validation.go - NPM package validation
  • pip_validation.go - Python package validation
  • engine_validation.go - Engine configuration validation
  • expression_validation.go - Expression safety validation
  • docker_validation.go - Docker image validation
  • bundler_validation.go - Bundler validation
  • template_validation.go - Template validation
  • step_order_validation.go - Step ordering
  • mcp_config_validation.go - MCP configuration
  • permissions_validator.go - Permission checks
  • github_toolset_validation_error.go - Toolset errors

Analysis: ✅ Validation functions are exemplary in their organization. Each validation domain has its own file.

Render Functions (28+ functions)

Render functions are scattered across 9 files:

  1. pkg/workflow/mcp-config.go (12 functions): renderPlaywrightMCPConfig, renderBuiltinMCPServerBlock, renderSafeOutputsMCPConfig, renderAgenticWorkflowsMCPConfig, renderCustomMCPConfigWrapper, etc.
  2. pkg/console/render.go (4 functions): renderValue, renderStruct, renderSlice, renderMap
  3. pkg/cli/logs_report.go (2 functions): renderLogsJSON, renderLogsConsole
  4. pkg/cli/audit_report.go (7 functions): renderJSON, renderConsole, renderOverview, renderMetrics, renderJobsTable, renderToolUsageTable, renderFirewallAnalysis
  5. pkg/workflow/fetch.go: renderMCPFetchServerConfig
  6. pkg/workflow/jobs.go: Render functions for job generation
  7. pkg/workflow/prompt_step.go: Prompt rendering
  8. pkg/workflow/redact_secrets.go: Secret redaction rendering
  9. pkg/workflow/claude_mcp.go, copilot_engine.go, codex_engine.go, custom_engine.go: Engine-specific rendering

Analysis: While some scattering is justified (engine-specific rendering in engine files), the concentration of 12 MCP render functions in mcp-config.go is good. However, general console rendering could be better consolidated.

Build Functions (29+ functions)

Build functions show good organization by purpose:

  • pkg/cli/logs_report.go: 7 build functions for aggregating data (buildLogsData, buildToolUsageSummary, buildMissingToolsSummary, etc.)
  • pkg/workflow files: Various build functions for workflow construction (buildEventAwareCommandCondition, buildCopilotParticipantSteps, buildArtifactDownloadSteps, etc.)

Analysis: ✅ Build functions are reasonably well-organized by domain.

Generate Functions (25+ functions)

Generate functions are distributed across files:

  • MCP configuration: Multiple generate functions in appropriate files
  • Setup and installation: generateSetupStep, generateAWFInstallationStep
  • Cache: generateDefaultCacheKey, generateCacheSteps, generateCacheMemorySteps
  • Docker: generateDownloadDockerImagesStep
  • Schema: generateSchemaBasedSuggestions, generateFieldSuggestions, generateExampleJSONForPath

Analysis: ✅ Generally well-organized by functionality.

Extract Functions (20+ functions)

Extract functions show some clustering opportunities:

pkg/parser/frontmatter.go (8 functions - well organized):

  • extractToolsFromContent
  • extractSafeOutputsFromContent
  • extractMCPServersFromContent
  • extractStepsFromContent
  • extractEngineFromContent
  • extractRuntimesFromContent
  • extractServicesFromContent
  • extractNetworkFromContent
  • extractSecretMaskingFromContent

Other files:

  • pkg/cli/secrets.go: extractSecretName, extractSecretsFromConfig
  • pkg/cli/logs.go: extractZipFile, extractLogMetrics

Analysis: ✅ Good organization - frontmatter extraction functions are properly clustered.

Create Functions (11+ functions)

Create functions are excellently organized with dedicated files:

Well-organized create_ files in pkg/workflow:*

  1. create_agent_task.go - Agent task creation
  2. create_code_scanning_alert.go - Security alert creation
  3. create_discussion.go - Discussion creation
  4. create_issue.go - Issue creation
  5. create_pr_review_comment.go - PR review comment creation
  6. create_pull_request.go - Pull request creation

pkg/cli files:

  • git.go: createAndSwitchBranch
  • pr_command.go: createForkIfNeeded, createPatchFromPR, createTransferPR
  • add_command.go: createPR
  • mcp_add.go: createMCPToolConfig
  • mcp_server.go: createMCPServer

Analysis:Exemplary organization - Each creation operation has its own dedicated file. This is a model pattern that could be applied elsewhere.

Sanitize Functions (5 functions)

Sanitize functions are well-concentrated in pkg/workflow/strings.go:

  • SanitizeName
  • SanitizeWorkflowName
  • SanitizeIdentifier (in workflow_name.go)

Plus domain sanitization:

  • pkg/workflow/domain_sanitization.go: computeAllowedDomainsForSanitization

Analysis: ✅ Good organization for a small set of functions.

Identified Issues

1. Duplicate Functions ⚠️

HIGH PRIORITY: Nearly Identical String Extraction Functions

Location 1: pkg/workflow/config_helpers.go:34-42

func parseStringFromConfig(configMap map[string]any, key string) string {
    if value, exists := configMap[key]; exists {
        if valueStr, ok := value.(string); ok {
            configHelpersLog.Printf("Parsed %s from config: %s", key, valueStr)
            return valueStr
        }
    }
    return ""
}

Location 2: pkg/workflow/frontmatter_helpers.go:3-14

func extractStringValue(frontmatter map[string]any, key string) string {
    value, exists := frontmatter[key]
    if !exists {
        return ""
    }

    if strValue, ok := value.(string); ok {
        return strValue
    }

    return ""
}

Similarity: ~95% - Both functions extract a string value from a map[string]any with identical logic, only difference is logging.

Impact: Code duplication, inconsistent naming (parse vs extract).

Recommendation:

  • Consolidate into a single function in a common helpers file
  • Use a consistent naming convention (suggest: extractStringFromMap or parseStringValue)
  • Make logging optional via a parameter or remove logging from the helper
  • Estimated Effort: 1 hour
  • Files affected: 2 files + any files using these functions

2. Scattered Helper Functions ⚠️

Issue: Helper/utility functions are scattered across 4 separate files in pkg/workflow:

  1. config_helpers.go (4 functions):

    • parseLabelsFromConfig
    • parseStringFromConfig ← DUPLICATE
    • parseTitlePrefixFromConfig
    • parseTargetRepoFromConfig
  2. frontmatter_helpers.go (3 functions):

    • extractStringValue ← DUPLICATE
    • parseIntValue
    • filterMapKeys
  3. engine_helpers.go (14 functions):

    • ResolveAgentFilePath
    • BuildStandardNpmEngineInstallSteps
    • InjectCustomEngineSteps
    • RenderCustomMCPToolConfigHandler
    • HandleCustomMCPToolInSwitch
    • FormatStepWithCommandAndEnv
    • RenderGitHubMCPDockerConfig
    • RenderGitHubMCPRemoteConfig
    • RenderJSONMCPConfig
    • And 5 more...
  4. prompt_step_helper.go (1 function):

    • generateStaticPromptStep

Analysis:

  • The separation between config_helpers.go and frontmatter_helpers.go is unclear - both work with map extraction
  • engine_helpers.go is appropriately named and scoped for engine-related helpers
  • prompt_step_helper.go has only one function - consider consolidating

Recommendation:

  • Option A (Consolidation): Merge config_helpers.go and frontmatter_helpers.go into map_helpers.go or config_parsing.go
  • Option B (Clarification): Rename files to clearly indicate their scope and add documentation
  • Keep engine_helpers.go separate (appropriate scope)
  • Move generateStaticPromptStep to a more general file or inline it if only used once
  • Estimated Effort: 2-3 hours
  • Files affected: 4 helper files + dependent files

3. Large Monolithic Files 📊

Issue: Some files have grown very large and could benefit from decomposition.

pkg/cli/logs.go (2,812 lines, 36 functions)

This file handles multiple responsibilities:

  • Workflow run fetching and pagination
  • Artifact downloading
  • Log parsing
  • Metrics extraction
  • Display formatting
  • File I/O operations

Recommendation:

  • Extract into separate files by responsibility:
    • logs_download.go - Downloading artifacts and logs
    • logs_parsing.go - Parsing log files
    • logs_metrics.go - Metrics extraction and analysis
    • logs_display.go - Display/formatting (or merge with logs_report.go)
    • Keep logs.go as the main command coordinator
  • Estimated Effort: 4-6 hours
  • Benefits: Improved navigability, clearer separation of concerns

pkg/cli/trial_command.go (1,611 lines)

Large command file - review if subcommands can be extracted into separate files.

Recommendation:

  • Consider extracting subcommand logic into separate files: trial_run.go, trial_status.go, etc.
  • Estimated Effort: 3-4 hours

4. Render Function Scatter 🎨

Issue: Render functions are spread across 9 files, though some scatter is justified.

Appropriate locations:

  • ✅ Engine-specific rendering in engine files (claude_engine.go, copilot_engine.go, etc.)
  • ✅ MCP config rendering clustered in mcp-config.go (12 functions)
  • ✅ Console rendering in console/render.go

Potential improvement:

  • Consider if audit_report.go and logs_report.go render functions could share common utilities
  • Both have renderJSON and renderConsole functions - possible abstraction opportunity

Recommendation:

  • Extract common report rendering patterns into pkg/console/report.go or similar
  • Create shared interfaces for report rendering
  • Estimated Effort: 2-3 hours
  • Benefits: Reduced duplication in report generation

5. Utility File Organization 📁

Issue: Three utility files in pkg/cli have unclear purposes:

  1. shared_utils.go - Only 3 functions, all related to PR auto-merging
  2. frontmatter_utils.go - Only 3 functions for frontmatter field updates
  3. repeat_utils.go - Only 2 functions for retry logic

Recommendation:

  • Option A: Rename files to match their specific purpose:
    • shared_utils.gopr_merge_utils.go or move to pr_command.go
    • frontmatter_utils.go → inline into files that use it or create frontmatter_editor.go
    • repeat_utils.goretry.go or move to a general utilities package
  • Option B: Consolidate truly shared utilities into a single pkg/cli/utils.go
  • Estimated Effort: 1-2 hours

Well-Organized Patterns ✅

The following patterns demonstrate excellent code organization and should be maintained:

1. Create Operations (pkg/workflow)

  • ✅ Each creation type has its own file: create_issue.go, create_pull_request.go, create_discussion.go, etc.
  • ✅ Clear naming convention
  • ✅ Easy to locate functionality

2. MCP Files (pkg/cli)

  • ✅ 17 MCP-related files all prefixed with mcp_*
  • ✅ Clear separation by functionality: mcp_add.go, mcp_list.go, mcp_inspect.go, mcp_validation.go, etc.
  • ✅ Easy to navigate and maintain

3. Engine Files (pkg/workflow)

  • ✅ 5 engine implementation files: *_engine.go
  • ✅ Plus dedicated engine.go for base functionality
  • ✅ Clear separation of engine-specific logic

4. Validation Files (pkg/workflow)

  • ✅ 13 validation files, each focused on a specific validation domain
  • ✅ Clear naming: npm_validation.go, docker_validation.go, strict_mode_validation.go, etc.
  • ✅ Excellent example of organizing functions by purpose

Refactoring Recommendations

Priority 1: High Impact, Low Risk

1. Consolidate Duplicate String Extraction Functions

  • Issue: parseStringFromConfig and extractStringValue are duplicates
  • Action: Create single function in appropriate location
  • Effort: 1 hour
  • Benefits: Eliminates duplication, consistent API

2. Reorganize Helper Files

  • Issue: config_helpers.go and frontmatter_helpers.go overlap
  • Action: Merge or clearly separate concerns
  • Effort: 2-3 hours
  • Benefits: Clearer helper organization

3. Rename Small Utility Files

  • Issue: shared_utils.go, frontmatter_utils.go names don't match content
  • Action: Rename to match their specific purpose
  • Effort: 1-2 hours
  • Benefits: Improved discoverability

Priority 2: Medium Impact, Medium Effort

4. Split Large logs.go File

  • Issue: 2,812 lines in single file
  • Action: Extract into logs_download.go, logs_parsing.go, logs_metrics.go
  • Effort: 4-6 hours
  • Benefits: Improved maintainability, easier testing

5. Extract Common Report Rendering

  • Issue: Duplicate rendering patterns in audit_report.go and logs_report.go
  • Action: Create shared report rendering utilities
  • Effort: 2-3 hours
  • Benefits: DRY principle, consistent reporting

Priority 3: Long-term Improvements

6. Review trial_command.go Structure

  • Issue: 1,611 lines - may benefit from subcommand extraction
  • Action: Consider extracting subcommand logic
  • Effort: 3-4 hours
  • Benefits: Improved modularity

7. Consolidate prompt_step_helper.go

  • Issue: Single-function file
  • Action: Move function to more appropriate location
  • Effort: 30 minutes
  • Benefits: Fewer files to navigate

Implementation Checklist

  • Review and prioritize - Discuss findings with team
  • Phase 1: Consolidate duplicate string extraction functions
  • Phase 1: Reorganize helper files in pkg/workflow
  • Phase 1: Rename small utility files to match purpose
  • Phase 2: Split pkg/cli/logs.go into smaller focused files
  • Phase 2: Extract common report rendering patterns
  • Phase 3: Review trial_command.go for subcommand extraction
  • Phase 3: Move single-function helper files
  • Testing: Ensure all refactorings maintain existing functionality
  • Documentation: Update package documentation to reflect new organization

Analysis Methodology

This analysis used:

  • Serena MCP semantic code analysis for symbol discovery and relationship mapping
  • Pattern matching for function naming conventions (parse*, validate*, create*, render*, etc.)
  • File size analysis to identify monolithic files
  • Similarity detection for duplicate code patterns
  • Manual review of function organization and file structure

Statistics Summary

  • Total Files Analyzed: 201 non-test Go files
  • Packages: 7
  • Function Patterns Identified: 8 major patterns (parse, validate, render, build, generate, extract, create, sanitize)
  • Duplicate Functions Found: 1 confirmed
  • Large Files (>1000 lines): 7 files
  • Well-Organized Patterns: 4 excellent examples (create_, mcp_, *_engine, *_validation)
  • Helper Files: 4 files (potential consolidation opportunity)
  • Analysis Date: 2025-11-11

Conclusion

The githubnext/gh-aw repository demonstrates generally good code organization with several excellent patterns (create_, mcp_, validation files). The main opportunities for improvement are:

  1. Eliminating the duplicate string extraction function (highest priority)
  2. Consolidating scattered helper functions for better discoverability
  3. Decomposing large monolithic files (especially logs.go) for maintainability
  4. Extracting common report rendering patterns to reduce duplication

These refactorings would improve code maintainability, reduce duplication, and make the codebase more navigable for new contributors, while maintaining the excellent organizational patterns already in place.


🤖 Generated with semantic code analysis using Claude Code and Serena MCP

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions