-
Notifications
You must be signed in to change notification settings - Fork 28
Description
🔧 Semantic Function Clustering Analysis
Analysis of repository: githubnext/gh-aw
Executive Summary
Comprehensive semantic analysis of 153 non-test Go files across the pkg/ directory identified significant refactoring opportunities to improve code organization, reduce duplication, and enhance maintainability.
Key Findings:
- ✅ Good: Well-organized
create_*.gopattern for GitHub Actions (issues, PRs, discussions, etc.) - ✅ Good: Separate engine files (
*_engine.go) for different AI engines ⚠️ Issue:compiler.gois massive (3,472 lines, 70 functions) and contains mixed responsibilities⚠️ Issue: Parse/extract/validate functions scattered across 20+ files⚠️ Issue: Similar helper functions duplicated or near-duplicated⚠️ Issue: Outlier functions in wrong files (extraction in runtime files, generation in utility files)
Analysis Metadata
- Total Go Files Analyzed: 153
- Packages Analyzed: cli (51 files), workflow (90 files), parser (6 files), console (3 files), constants (1), logger (1)
- Primary Focus: pkg/workflow (largest package with most refactoring opportunities)
- Function Clusters Identified: 8 major clusters
- Outliers Found: 10+ functions in wrong files
- Detection Method: Serena semantic analysis + regex pattern analysis + manual review
- Analysis Date: 2025-10-24
Function Inventory
By Package
pkg/workflow/ (90 files)
Primary Purpose: Core workflow compilation, execution engine, and GitHub Actions generation
File Organization:
- ✅ Well-organized:
create_*.go(8 files) - Each GitHub action type has its own file - ✅ Well-organized:
*_engine.go(4 files) - Separate files per AI engine (Claude, Codex, Copilot, Custom) ⚠️ Needs attention:compiler.go(3,472 lines, 70 functions) - Too large and mixed responsibilities⚠️ Scattered: Helper functions across*_helpers.go,*_utils.go, and main logic files
pkg/cli/ (51 files)
Primary Purpose: CLI command implementations and command handlers
File Organization:
- ✅ Well-organized:
*_command.gopattern for CLI commands - ✅ Well-organized:
mcp_*.gofiles for MCP-related functionality (11 files) ⚠️ Note: Hasshared_utils.go(not in workflow package)
Identified Issues
1. Oversized Compiler File
Issue: pkg/workflow/compiler.go is extremely large and contains multiple responsibilities
Details:
- Size: 3,472 lines
- Function Count: 70 functions
- Responsibilities Mixed:
- 12
extract*functions (data extraction from frontmatter) - 20
generate*functions (YAML generation) - 7
build*functions (job building) - 2
parse*functions (parsing configuration) - 4
validate*functions (validation logic) - Schema validation, file I/O, orchestration
- 12
Functions in compiler.go that belong elsewhere:
Extraction Functions (should be in parse_utils.go or new extraction.go):
- extractTopLevelYAMLSection
- extractPermissions
- extractIfCondition
- extractFeatures
- extractDescription
- extractSource
- extractSafetyPromptSetting
- extractToolsTimeout
- extractToolsStartupTimeout
- extractExpressionFromIfString
- extractCommandConfig
- extractYAMLValue
Generation Functions (should be in code_generation.go or yaml_generation.go):
- generateJobName
- generateYAML
- generateMainJobSteps
- generateUploadAgentLogs
- generateUploadAssets
- generateLogParsing
- generateErrorValidation
- generateUploadAwInfo
- generateUploadPrompt
- generateExtractAccessLogs
- generateUploadAccessLogs
- generateUploadMCPLogs
- generatePrompt
- generateCacheMemoryPromptStep
- generateSafeOutputsPromptStep
- generatePostSteps
- generateEngineExecutionSteps
- generateAgentVersionCapture
- generateCreateAwInfo
- generateOutputCollectionStep
- convertGoPatternToJavaScript
- convertErrorPatternsToJavaScript
Recommendation:
- High Priority: Extract generation functions to
yaml_generation.goorcode_generation.go - High Priority: Extract extraction/parsing functions to enhance
parse_utils.go - Medium Priority: Extract validation functions to
validation.go - Result: Slim down
compiler.goto orchestration logic (~500-800 lines)
Estimated Impact:
- Improved maintainability and testability
- Easier code navigation
- Reduced cognitive load
- Better separation of concerns
2. Outlier Functions (Functions in Wrong Files)
Issue: Functions whose purpose doesn't match their containing file
Example 1: Extraction in Runtime Files
File: pkg/workflow/network.go
Function: extractNetworkPermissions()
Issue: Extraction/parsing function in a runtime networking file
Expected Location: parse_utils.go or compiler.go
Code:
func (c *Compiler) extractNetworkPermissions(frontmatter map[string]any) *NetworkPermissionsExample 2: Generation in Utility Files
File: pkg/workflow/git.go
Function: generateGitConfigurationSteps()
Issue: Generation function in git utility file
Expected Location: yaml_generation.go or compiler.go
Code:
func (c *Compiler) generateGitConfigurationSteps() []stringOther Outliers Detected
Validation functions scattered across:
pkg/workflow/cache.go- has validate functions (should be in validation.go)pkg/workflow/docker.go- has validate functionspkg/workflow/engine.go- has validate functionspkg/workflow/npm.go- has validate functionspkg/workflow/pip.go- has validate functionspkg/workflow/strict_mode.go- has validate functionspkg/workflow/expression_safety.go- has validate functions (acceptable here)
Total validation function files: 12 files contain validation functions, but only validation.go is dedicated to it
Recommendation:
- Move
extractNetworkPermissionstoparse_utils.go - Move
generateGitConfigurationStepsto a generation helper file - Consolidate validation functions into
validation.go(keep domain-specific validation likeexpression_safety.go)
Estimated Impact: Clearer file organization, easier to find related functions
3. Duplicate or Near-Duplicate Extraction Functions
Issue: Similar extraction functions with overlapping functionality
Example: extractStringValue vs extractYAMLValue
Occurrence 1: pkg/workflow/parse_utils.go:extractStringValue()
func extractStringValue(frontmatter map[string]any, key string) string {
value, exists := frontmatter[key]
if !exists {
return ""
}
if strValue, ok := value.(string); ok {
return strValue
}
return ""
}Occurrence 2: pkg/workflow/parse_utils.go:(*Compiler).extractYAMLValue()
func (c *Compiler) extractYAMLValue(frontmatter map[string]any, key string) string {
if value, exists := frontmatter[key]; exists {
if str, ok := value.(string); ok {
return str
}
if num, ok := value.(int); ok {
return fmt.Sprintf("%d", num)
}
// ... more type conversions ...
}
return ""
}Similarity: Both extract string values from frontmatter map, but extractYAMLValue handles more types
Recommendation:
- Consolidate into single function with optional type coercion flag
- Or rename to clarify distinction:
extractStringValue(strict) vsextractYAMLValueAsString(with conversion)
Estimated Impact: Reduced confusion, clearer API
4. Scattered Parse Config Functions
Issue: Config parsing functions spread across 15+ files
Pattern Detected: parse*Config functions in many files
Files with parseConfig functions:
add_comment.go-parseCommentsConfigcompiler.go-parseBaseSafeOutputConfigconfig.go-parseLabelsFromConfig,parseTitlePrefixFromConfig,parseTargetRepoFromConfigcreate_agent_task.go-parseAgentTaskConfigcreate_code_scanning_alert.go-parseCodeScanningAlertsConfigcreate_discussion.go-parseDiscussionsConfigcreate_issue.go-parseIssuesConfigcreate_pr_review_comment.go-parsePullRequestReviewCommentsConfigcreate_pull_request.go-parsePullRequestsConfigmissing_tool.go-parseMissingToolConfigpublish_assets.go-parseUploadAssetConfigpush_to_pull_request_branch.go-parsePushToPullRequestBranchConfigsafe_jobs.go-parseSafeJobsConfigthreat_detection.go-parseThreatDetectionConfig
Analysis:
- ✅ Good: Each config parser is co-located with its domain logic (e.g.,
parseIssuesConfigincreate_issue.go) - ✅ Good: Clear naming pattern
⚠️ Issue: Common helper logic (parseLabelsFromConfig,parseTitlePrefixFromConfig, etc.) in separateconfig.go
Recommendation:
- Keep current organization - domain-specific parsers should stay with their domain
- Enhance: Document that
config.gocontains shared parsing helpers - Consider: Rename
config.gotoconfig_helpers.gofor clarity
Estimated Impact: Low - current organization is reasonable
5. Helper File Proliferation
Issue: Multiple helper files with unclear boundaries
Helper Files Identified:
parse_utils.go(143 lines) - Parsing utilitiesengine_helpers.go(36 lines) - Engine-specific helpersengine_shared_helpers.go(372 lines) - Shared engine helperssafe_output_helpers.go(349 lines) - Safe output helpersprompt_step_helper.go- Prompt step helpers
Analysis:
- ✅ Good: Helpers are domain-specific (engine, safe outputs, prompt)
⚠️ Issue:parse_utils.goonly has 2 extraction functions but could have more⚠️ Issue: Unclear boundary betweenengine_helpers.go(36 lines) andengine_shared_helpers.go(372 lines)
Recommendation:
- Merge: Combine
engine_helpers.gointoengine_shared_helpers.go(36 lines is too small for separate file) - Enhance: Move scattered extraction functions into
parse_utils.go - Consider: Rename files for clarity:
parse_utils.go→frontmatter_extraction.goengine_shared_helpers.go→engine_helpers.go(after merge)
Estimated Impact: Clearer helper organization
6. Validation Functions Distribution
Issue: Validation logic scattered across 12 files
Files with validation functions:
cache.go-validateNoDuplicateCacheIDscompiler.go- 4 validation functionsdocker.go-validateDockerImageengine.go-validateEngine,validateSingleEngineSpecificationexpression_safety.go-validateExpressionSafety,validateSingleExpressionmcp-config.go-validateStringProperty,validateMCPRequirementsnpm.go-validateNpxPackagespip.go-validatePipPackages,validateUvPackages,validatePythonPackagesWithPip,validateUvPackagesWithPipredact_secrets.go-validateSecretReferencesstrict_mode.go-validateStrictMode,validateStrictPermissionstemplate.go- validation functionsvalidation.go- 3 validation functions
Analysis:
- Total validate functions: ~35+ across 12 files
- Dedicated validation file: Only 3 functions (196 lines)
- ✅ Acceptable: Domain-specific validation (expression_safety, docker, npm, pip) in their respective files
⚠️ Issue: Generic validation scattered (cache validation, secret validation, etc.)
Recommendation:
- Keep: Domain-specific validation in domain files (expression_safety, docker, npm, pip)
- Move: Generic validation to
validation.go:validateNoDuplicateCacheIDs(from cache.go)validateSecretReferences(from redact_secrets.go)- Compiler validation functions from
compiler.go
- Enhance: Make
validation.gothe central validation logic file
Estimated Impact: Centralized validation logic, easier testing
Detailed Function Clusters
Cluster 1: Creation Functions (create_*.go files)
Pattern: create* functions for GitHub Actions
Files: 8 dedicated files
Functions:
create_issue.go-buildCreateOutputIssueJob,parseIssuesConfigcreate_pull_request.go-buildCreateOutputPullRequestJob,parsePullRequestsConfigcreate_discussion.go-buildCreateOutputDiscussionJob,parseDiscussionsConfigcreate_pr_review_comment.go-buildCreateOutputPullRequestReviewCommentJob,parsePullRequestReviewCommentsConfigcreate_code_scanning_alert.go-buildCreateOutputCodeScanningAlertJob,parseCodeScanningAlertsConfigcreate_agent_task.go-buildCreateOutputAgentTaskJob,parseAgentTaskConfigadd_comment.go-buildCreateOutputAddCommentJob,parseCommentsConfigadd_labels.go-buildAddLabelsJob
Analysis: ✅ Excellent organization - Clear one-file-per-feature pattern, consistent naming
Recommendation: Keep as-is - This is a model for good organization
Cluster 2: Parsing Functions
Pattern: parse* functions (24+ files with parse functions)
Files: Widespread across codebase
Sub-clusters:
- Config parsers (15 files) - Domain-specific config parsing
- Engine parsers (3 files) - Tool call parsing per engine
- Expression parsers (1 file) - Expression tree parsing
- Utility parsers (1 file) - Generic parsing helpers
Analysis:
- ✅ Domain-specific parsers co-located with logic
⚠️ Could benefit from better documentation of parsing patterns
Recommendation:
- Keep domain-specific parsers with their domains
- Document parsing patterns in code comments or architecture doc
- Enhance
parse_utils.gowith more common parsing utilities
Cluster 3: Extraction Functions
Pattern: extract* functions (22 files with extract functions)
Primary Files:
compiler.go- 12 extract functions (should move most out)parse_utils.go- 2 extract functions (should have more)- Various domain files - specific extraction logic
Examples:
compiler.go:
- extractTopLevelYAMLSection
- extractPermissions
- extractIfCondition
- extractFeatures
- extractDescription
- extractSource
- extractSafetyPromptSetting
- extractToolsTimeout
- extractToolsStartupTimeout
- extractExpressionFromIfString
- extractCommandConfig
- extractYAMLValue
Other files:
- args.go: extractAddDirPaths
- cache.go: extractCustomArgs
- network.go: extractNetworkPermissions
- npm.go: extractNpxPackages, extractNpxFromCommands
- ... and more
Analysis:
⚠️ Poor: Extraction logic heavily concentrated incompiler.go⚠️ Poor:parse_utils.gounderutilized (only 2 functions)⚠️ Poor: Extraction functions scattered across domain files
Recommendation:
- High Priority: Move generic extraction from
compiler.gotoparse_utils.goor newfrontmatter_extraction.go - Medium Priority: Move domain extraction to appropriate domain files
- Result: Clear separation between generic and domain-specific extraction
Estimated Impact: Major improvement in code organization
Cluster 4: Generation Functions
Pattern: generate* functions (Multiple files)
Primary Location: compiler.go has 20 generate functions (should be extracted)
Files with generation:
compiler.go- 20 functions (YAML generation, step generation, etc.)cache.go- Cache generationgit.go- Git config generation- Engine files - Engine-specific generation
Recommendation:
- Create
yaml_generation.gofor YAML generation functions - Create
step_generation.gofor GitHub Actions step generation - Move generation functions from
compiler.goto new files - Keep domain-specific generation in domain files
Estimated Impact: Major improvement, clearer separation
Cluster 5: Validation Functions
Pattern: validate* functions (12 files, 35+ functions)
Analysis: See "Issue #6: Validation Functions Distribution" above
Cluster 6: Build Functions
Pattern: build* functions (Job and component building)
Primary Location: compiler.go and safe output files
Sub-patterns:
build*Job- Job constructionbuild*Steps- Step constructionbuild*Condition- Condition building
Analysis:
- ✅ Good organization for job building
⚠️ Could conso
[Content truncated due to length]
AI generated by Semantic Function Refactoring