-
Notifications
You must be signed in to change notification settings - Fork 28
Description
🔧 Semantic Function Clustering Analysis
Analysis performed on githubnext/gh-aw repository to identify refactoring opportunities through semantic function clustering and duplicate detection.
Executive Summary
Analyzed 191 Go source files across 6 packages in the repository. The analysis identified:
- ✅ Well-organized creation patterns:
create_*.gofiles follow best practices ⚠️ Validation function scatter: Validation logic spread across 6+ files⚠️ Config parsing duplication: Near-identical parsing functions with 85%+ similarity⚠️ Validation methods in compiler: Several validation functions attached to Compiler that could be standalone
Full Analysis Report
Repository Structure
Packages Overview
| Package | Files | Primary Purpose |
|---|---|---|
pkg/workflow |
113 | Core workflow compilation and execution |
pkg/cli |
64 | CLI commands and interface |
pkg/parser |
7 | Parsing utilities (frontmatter, GitHub, YAML) |
pkg/console |
4 | Console output and formatting |
pkg/timeutil |
1 | Time formatting utilities |
pkg/logger |
1 | Logging configuration |
pkg/constants |
1 | Global constants |
Total: 191 non-test Go files analyzed
Function Clustering Results
Cluster 1: Creation Functions ✅
Pattern: create_* functions and files
Organization Status: Excellent - Well-organized
Examples:
pkg/workflow/create_issue.go- Issue creation logicpkg/workflow/create_pull_request.go- PR creation logicpkg/workflow/create_discussion.go- Discussion creation logicpkg/workflow/create_agent_task.go- Agent task creation logicpkg/workflow/create_code_scanning_alert.go- Alert creation logicpkg/workflow/create_pr_review_comment.go- Review comment creation
Analysis: ✅ Each creation function has its own dedicated file following the "one file per feature" rule. This is an exemplary organization pattern.
Cluster 2: Validation Functions ⚠️
Pattern: validate* functions
Organization Status: Scattered across multiple files
Files containing validation logic:
pkg/workflow/validation.go(26 validation-related functions)pkg/workflow/validation_strict_mode.go(3 validation functions)pkg/workflow/docker_validation.go(1 function:validateDockerImage)pkg/workflow/npm_validation.go(1 method:validateNpxPackages)pkg/workflow/pip_validation.go(4 methods:validatePipPackages,validateUvPackages, etc.)pkg/workflow/bundler_validation.go(1 function:validateNoLocalRequires)pkg/workflow/expression_safety.go(2 functions:validateExpressionSafety,validateSingleExpression)pkg/workflow/engine.go(2 validation methods in mixed file)
Key Validation Functions:
- ✅ Core validation:
validation.go(primary validation file) ⚠️ Package-specific:*_validation.gofiles (good pattern, but inconsistent)⚠️ Mixed concerns: Validation methods inengine.go,strict_mode.go
Issue: While package-specific validation files (docker_validation.go, npm_validation.go, etc.) make sense, there are validation functions scattered in non-validation files like engine.go.
Cluster 3: Parsing Functions ⚠️
Pattern: parse* functions
Organization Status: Widely distributed
70+ parsing functions found, including:
- Config parsing:
parseLabelsFromConfig,parseTitlePrefixFromConfig,parseTargetRepoFromConfig - Time parsing:
parseTimeDelta,parseAbsoluteDateTime,parseRelativeDate - Package parsing:
parseNpmPackage,parsePipPackage,parseGoPackage - Tool parsing:
parseGitHubTool,parseBashTool,parsePlaywrightTool, etc. - Log parsing:
parseClaudeJSONLog,parseCodexToolCallsWithSequence
Files with parse functions:
pkg/workflow/compiler.go- Main compilation parsingpkg/workflow/config.go- Config parsing helperspkg/workflow/dependabot.go- Package parsingpkg/workflow/tools_types.go- Tool parsingpkg/workflow/time_delta.go- Time parsing- Multiple
create_*.gofiles - Feature-specific parsing
Analysis: Generally well-organized with parsing functions near their usage, but see duplicates below.
Cluster 4: Extraction Functions ✅
Pattern: extract* functions
Organization Status: Well-distributed
50+ extraction functions found, primarily in:
pkg/workflow/frontmatter_extraction.go- 15+ extraction methods (well-organized)pkg/workflow/dependabot.go- Package extraction functionspkg/workflow/pip.go- Pip package extractionpkg/workflow/npm.go- NPM package extraction
Analysis: ✅ Good organization with a dedicated frontmatter_extraction.go file for frontmatter-related extractions.
Cluster 5: Script Getters ✅
Pattern: get*Script functions
Organization Status: Excellent
File: pkg/workflow/scripts.go
20+ script getter functions, all centralized:
getCreateIssueScript()getAddLabelsScript()getParseFirewallLogsScript()getCreateDiscussionScript()getUpdateIssueScript()getParseClaudeLogScript()- And many more...
Analysis: ✅ Exemplary organization - all script getters in a single file with consistent naming and structure.
Identified Issues
Issue 1: Near-Duplicate Config Parsing Functions
Location: pkg/workflow/config.go
Functions with 85%+ similarity:
parseTitlePrefixFromConfig (lines 34-42)
func parseTitlePrefixFromConfig(configMap map[string]any) string {
if titlePrefix, exists := configMap["title-prefix"]; exists {
if titlePrefixStr, ok := titlePrefix.(string); ok {
configLog.Printf("Parsed title-prefix from config: %s", titlePrefixStr)
return titlePrefixStr
}
}
return ""
}parseTargetRepoFromConfig (lines 48-56)
func parseTargetRepoFromConfig(configMap map[string]any) string {
if targetRepoSlug, exists := configMap["target-repo"]; exists {
if targetRepoStr, ok := targetRepoSlug.(string); ok {
configLog.Printf("Parsed target-repo from config: %s", targetRepoStr)
return targetRepoStr
}
}
return ""
}Similarity: ~90% - These functions differ only in:
- Key name (
"title-prefix"vs"target-repo") - Variable names
- Log message text
Recommendation: Create a generic helper function:
func parseStringFromConfig(configMap map[string]any, key string) string {
if value, exists := configMap[key]; exists {
if valueStr, ok := value.(string); ok {
configLog.Printf("Parsed %s from config: %s", key, valueStr)
return valueStr
}
}
return ""
}Then simplify:
func parseTitlePrefixFromConfig(configMap map[string]any) string {
return parseStringFromConfig(configMap, "title-prefix")
}
func parseTargetRepoFromConfig(configMap map[string]any) string {
return parseStringFromConfig(configMap, "target-repo")
}Impact: Reduces duplication, improves maintainability
Files: pkg/workflow/config.go
Issue 2: Validation Functions in Compiler File
Location: pkg/workflow/compiler.go
Validation methods attached to Compiler that might work better as standalone:
(*Compiler).parseBaseSafeOutputConfig(line 1567-1581)(*Compiler).computeAllowedDomainsForSanitization(validation-like logic)(*Compiler).detectTextOutputUsage(validation/detection logic)
Current Issue: The main compiler.go file (1600+ lines) contains a mix of:
- Core compilation logic ✅
- Validation logic
⚠️ - Parsing logic
⚠️ - Configuration detection
⚠️
Recommendation: Consider extracting pure validation logic to validation-related files:
- Move
detectTextOutputUsage→ potentially to a detection or validation file - Move domain computation logic → domain-related file or validation
Impact: Reduces compiler.go complexity, improves separation of concerns
Estimated Effort: 3-4 hours
Issue 3: Scattered Validation Functions Across Engine Files
Location: Multiple files
Validation methods in non-validation files:
pkg/workflow/engine.go:validateEngine,validateSingleEngineSpecificationpkg/workflow/strict_mode.go:validateStrictMode
Issue: While the Compiler methods pattern makes sense, having validation logic in files named after their primary feature (engine, strict mode) rather than validation reduces discoverability.
Recommendation:
- Option A: Keep as-is if these are tightly coupled to their respective features
- Option B: Move to
engine_validation.goandstrict_mode_validation.gofollowing the pattern ofdocker_validation.go
Impact: Improved discoverability and consistency
Estimated Effort: 1-2 hours if pursuing Option B
Issue 4: Repository Feature Checking Duplication
Location: pkg/workflow/validation.go
Duplicate pattern in repository feature checking:
// Lines 555-601
func checkRepositoryHasDiscussions(...) bool { ... }
func checkRepositoryHasDiscussionsUncached(...) (bool, error) { ... }
// Lines 604-631
func checkRepositoryHasIssues(...) bool { ... }
func checkRepositoryHasIssuesUncached(...) (bool, error) { ... }Similarity: Both pairs follow identical caching patterns with slight variations in API calls.
Analysis: This is acceptable duplication since:
- Different GitHub API endpoints
- Different response structures
- Caching logic is standard pattern
Recommendation: ✅ Keep as-is - the duplication serves clarity. Could consider a generic cached feature checker in the future if more features are added.
Well-Organized Patterns (No Changes Needed)
✅ Create Operations
Each creation operation has its own file:
create_issue.gocreate_pull_request.gocreate_discussion.gocreate_agent_task.gocreate_code_scanning_alert.gocreate_pr_review_comment.go
✅ Package-Specific Validation Files
Good pattern of dedicated validation files:
docker_validation.gonpm_validation.gopip_validation.gobundler_validation.go
✅ Centralized Scripts
All script getters in scripts.go with consistent patterns.
✅ Dedicated Extraction File
frontmatter_extraction.go centralizes frontmatter parsing logic.
Refactoring Recommendations
Priority 1: High Impact, Low Effort
1. Consolidate Config Parsing Functions
- File:
pkg/workflow/config.go - Action: Create generic
parseStringFromConfighelper - Functions to refactor:
parseTitlePrefixFromConfig,parseTargetRepoFromConfig - Estimated effort: 30 minutes
- Benefits:
- Reduces code duplication
- Single source of truth for string config parsing
- Easier to add new config fields
Priority 2: Medium Impact, Medium Effort
2. Review Compiler File Organization
- File:
pkg/workflow/compiler.go(1600+ lines) - Action: Consider extracting detection/validation methods
- Candidates:
detectTextOutputUsage→ validation or detection filecomputeAllowedDomainsForSanitization→ domain-related file
- Estimated effort: 3-4 hours
- Benefits:
- Smaller, more focused compiler.go
- Better separation of concerns
- Improved testability
3. Standardize Validation File Naming
- Files:
engine.go,strict_mode.go - Action: Extract validation methods to
*_validation.gofiles - Pattern: Follow existing
docker_validation.go,npm_validation.gopattern - Estimated effort: 1-2 hours
- Benefits:
- Consistent file organization
- Improved discoverability
- Easier to locate validation logic
Priority 3: Long-Term Considerations
4. Generic Caching Pattern for Feature Checks
- File:
pkg/workflow/validation.go - Action: Consider generic cached feature checker if more features added
- Current state: Acceptable duplication
- Estimated effort: 4-6 hours
- Benefits:
- Reduced boilerplate for new feature checks
- Consistent caching strategy
Implementation Checklist
- P1: Refactor config parsing functions in
config.go - P1: Write tests for new
parseStringFromConfighelper - P2: Review and plan extraction of detection/validation from
compiler.go - P2: Extract validation methods from
engine.gotoengine_validation.go(optional) - P2: Extract validation methods from
strict_mode.gotostrict_mode_validation.go(optional) - P3: Consider generic caching pattern for future feature checks
- Verify no functionality broken after changes
- Run full test suite
Analysis Metadata
- Total Go Files Analyzed: 191 non-test files
- Packages Analyzed: 6 (cli, console, constants, logger, parser, timeutil, workflow)
- Function Clusters Identified: 5 major clusters
- Outliers Found: 3-4 validation methods in mixed-concern files
- Near-Duplicates Detected: 2 config parsing functions with 90% similarity
- Detection Method: Serena semantic code analysis + manual pattern analysis
- Primary Focus: pkg/workflow (113 files, majority of codebase)
- Analysis Date: 2025-11-08
Conclusion
The codebase demonstrates good organization practices overall, particularly in:
- Creation operations (one file per feature)
- Package-specific validation files
- Centralized script management
Key opportunities for improvement:
- Quick Win: Consolidate near-duplicate config parsing functions (~30 min)
- Medium Effort: Consider extracting validation/detection logic from compiler.go (3-4 hours)
- Consistency: Standardize validation file naming across all features (1-2 hours)
These refactorings would improve code maintainability, reduce duplication, and enhance the already solid architectural patterns in place.
Note: This analysis focused on high-impact, actionable findings. The codebase is generally well-organized and follows Go best practices. The recommendations aim to enhance consistency and reduce the identified duplication patterns.
AI generated by Semantic Function Refactoring