github · pelikhan · May 3, 2026 · May 3, 2026 · May 3, 2026 · May 3, 2026
diff --git a/docs/adr/29952-semantic-function-clustering-pkg-cli.md b/docs/adr/29952-semantic-function-clustering-pkg-cli.md
@@ -0,0 +1,73 @@
+# ADR-29952: Semantic Function Clustering in pkg/cli
+
+**Date**: 2026-05-03
+**Status**: Draft
+**Deciders**: Unknown (automated refactor by copilot-swe-agent)
+
+---
+
+## Part 1 — Narrative (Human-Friendly)
+
+### Context
+
+The `pkg/cli` package in this repository had grown organically, with utility functions placed near their first point of use rather than alongside semantically related functions. This produced four concrete cohesion findings: format helpers living in `audit_diff.go` instead of `audit_math_helpers.go`, external tool runners (actionlint, zizmor, poutine, runner-guard) mixed into a general-purpose `compile_batch_operations.go`, file-discovery functions in a parsing-focused `logs_parsing_core.go`, and path utilities in the compilation-specific `compile_file_operations.go`. The result was files with weak cohesion, long import lists, and poor discoverability for new contributors.
+
+### Decision
+
+We will reorganize `pkg/cli` by moving each function to the file whose existing responsibility most closely matches that function's domain, a practice known as semantic function clustering. Concretely: format helpers move to `audit_math_helpers.go`; external tool runners move to a new `compile_external_tools.go`; post-processing cleanup and warning display move to `compile_post_processing.go`; and path utilities move to `helpers.go`. The old `compile_batch_operations.go` becomes a stub with redirect comments pointing to the new locations.
+
+### Alternatives Considered
+
+#### Alternative 1: Leave Functions at Their Point of First Use
+
+The status quo — functions remain in the file where they were first written, close to the call site that introduced them. This is simple and requires no migration work, but it produces the cohesion violations observed: unrelated functions accumulate in files, making each file harder to reason about and extending import graphs unnecessarily (e.g., `logs_parsing_core.go` importing `fileutil` solely to support a discovery function).
+
+#### Alternative 2: Consolidate All Utilities into a Single `utils.go`
+
+A common Go pattern is to create a single `utils.go` (or `helpers.go`) catch-all. This avoids cohesion violations by brute force but introduces a different anti-pattern: a file with no coherent theme that grows without bound. It also does not express domain intent; `compile_external_tools.go` signals "these are external tool integrations" in a way that `utils.go` never could.
+
+### Consequences
+
+#### Positive
+- Each file in `pkg/cli` now has a narrower, better-defined responsibility, making it easier to locate the right file when adding or debugging a function.
+- Import graphs are cleaner: `logs_parsing_core.go` drops `strings`, `constants`, and `fileutil` imports that were only needed for the moved discovery functions.
+- New contributors can infer where to put a function from file names alone (e.g., a new external tool runner clearly belongs in `compile_external_tools.go`).
+
+#### Negative
+- The number of files in `pkg/cli` increases, which can feel heavyweight for a single package.
+- `compile_batch_operations.go` becomes a comment-only stub rather than being deleted, preserving a redirect that may confuse readers until a follow-up PR removes it.
+- Function moves break `git blame` lineage; the original authorship and rationale are only recoverable by looking at the commit that introduced the function in its old location.
+
+#### Neutral
+- No public API surface changes: all exported functions (`RunActionlintOnFiles`, `RunZizmorOnFiles`, `RunPoutineOnDirectory`, `RunRunnerGuardOnDirectory`) are re-exported from the new file, preserving callers.
+- Tests are unaffected because Go's same-package compilation treats all `.go` files in `pkg/cli` as a single compilation unit.
+
+---
+
+## Part 2 — Normative Specification (RFC 2119)
+
+> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).
+
+### File Organization
+
+1. Functions in `pkg/cli` **MUST** be placed in the file whose stated responsibility most closely matches the function's domain, not in the file where the function was first needed.
+2. A function **MUST NOT** be placed in a file solely because it is called from that file; co-location with semantically related functions takes precedence.
+3. Format and math helpers for audit reporting **MUST** reside in `audit_math_helpers.go`.
+4. External tool runner functions (actionlint, zizmor, poutine, runner-guard) **MUST** reside in `compile_external_tools.go`.
+5. Post-compilation cleanup, warning display, and action cache pruning **MUST** reside in `compile_post_processing.go`.
+6. General path and repository utilities **MUST** reside in `helpers.go`.
+7. Log file discovery functions **MUST** reside in `logs_utils.go`, not in parsing-specific files.
+
+### Cohesion Maintenance
+
+1. When a new utility function is added to `pkg/cli`, the author **MUST** identify the file whose existing responsibilities most closely match before creating a new file.
+2. A new file in `pkg/cli` **SHOULD** be created only when no existing file's responsibility covers the new function's domain and the function is unlikely to be a one-off.
+3. Import lists in a file **SHOULD NOT** contain packages that are only needed by a function whose domain belongs elsewhere; such imports are a signal that the function **SHOULD** be moved.
+
+### Conformance
+
+An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.
+
+---
+
+*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/25283930938) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
diff --git a/pkg/cli/audit_diff.go b/pkg/cli/audit_diff.go
@@ -7,7 +7,6 @@ import (
 	"math"
 	"path/filepath"
 	"sort"
-	"strconv"
 	"strings"
 	"time"
 
@@ -215,18 +214,6 @@ func classifyFirewallDomainStatus(stats DomainRequestStats) string {
 	return "unknown"
 }
 
-// formatVolumeChange formats the volume change as a human-readable string
-func formatVolumeChange(total1, total2 int) string {
-	if total1 == 0 {
-		return "+∞"
-	}
-	pctChange := (float64(total2-total1) / float64(total1)) * 100
-	if pctChange >= 0 {
-		return "+" + formatPercent(pctChange)
-	}
-	return formatPercent(pctChange)
-}
-
 // MCPToolDiffEntry represents the diff for a single MCP tool between two runs
 type MCPToolDiffEntry struct {
 	ServerName      string `json:"server_name"`
@@ -939,25 +926,6 @@ func computeTokenUsageDiff(tu1, tu2 *TokenUsageSummary) *TokenUsageDiff {
 	return diff
 }
 
-// formatPercentagePointChange formats the change between two ratio values (0.0-1.0) as a
-// percentage-point delta (e.g. "+1.5pp", "-2.3pp")
-func formatPercentagePointChange(ratio1, ratio2 float64) string {
-	delta := (ratio2 - ratio1) * 100
-	if delta >= 0 {
-		return fmt.Sprintf("+%.1fpp", delta)
-	}
-	return fmt.Sprintf("%.1fpp", delta)
-}
-
-// formatCountChange formats the absolute change in a count value (e.g. "+3", "-1")
-func formatCountChange(count1, count2 int) string {
-	delta := count2 - count1
-	if delta >= 0 {
-		return fmt.Sprintf("+%d", delta)
-	}
-	return strconv.Itoa(delta)
-}
-
 // loadRunSummaryForDiff loads or builds a RunSummary for a given run for use in diffing.
 // It first tries to load from a cached RunSummary (which includes MCP tool usage and run
 // metrics); otherwise it downloads artifacts and analyzes firewall logs, returning a partial

diff --git a/pkg/cli/audit_math_helpers.go b/pkg/cli/audit_math_helpers.go
@@ -1,6 +1,9 @@
 package cli
 
-import "fmt"
+import (
+	"fmt"
+	"strconv"
+)
 
 // safePercent returns percentage of part/total, returning 0 when total is 0.
 func safePercent(part, total int) float64 {
@@ -14,3 +17,34 @@ func safePercent(part, total int) float64 {
 func formatPercent(pct float64) string {
 	return fmt.Sprintf("%.0f%%", pct)
 }
+
+// formatVolumeChange formats the volume change as a human-readable string
+func formatVolumeChange(total1, total2 int) string {
+	if total1 == 0 {
+		return "+∞"
+	}
+	pctChange := (float64(total2-total1) / float64(total1)) * 100
+	if pctChange >= 0 {
+		return "+" + formatPercent(pctChange)
+	}
+	return formatPercent(pctChange)
+}
+
+// formatPercentagePointChange formats the change between two ratio values (0.0-1.0) as a
+// percentage-point delta (e.g. "+1.5pp", "-2.3pp")
+func formatPercentagePointChange(ratio1, ratio2 float64) string {
+	delta := (ratio2 - ratio1) * 100
+	if delta >= 0 {
+		return fmt.Sprintf("+%.1fpp", delta)
+	}
+	return fmt.Sprintf("%.1fpp", delta)
+}
+
+// formatCountChange formats the absolute change in a count value (e.g. "+3", "-1")
+func formatCountChange(count1, count2 int) string {
+	delta := count2 - count1
+	if delta >= 0 {
+		return fmt.Sprintf("+%d", delta)
+	}
+	return strconv.Itoa(delta)
+}
diff --git a/pkg/cli/compile_batch_operations.go b/pkg/cli/compile_batch_operations.go
@@ -1,194 +1,7 @@
-// This file provides batch operations for workflow compilation.
+// This file previously contained batch operations for workflow compilation.
 //
-// This file contains functions that perform batch operations on compiled workflows,
-// such as running linters, security scanners, and cleaning up orphaned files.
-//
-// # Organization Rationale
-//
-// These batch operation functions are grouped here because they:
-//   - Operate on multiple files at once
-//   - Run external tools (actionlint, zizmor, poutine)
-//   - Have a clear domain focus (batch operations)
-//   - Keep the main orchestrator focused on coordination
-//
-// # Key Functions
-//
-// Batch Linting:
-//   - RunActionlintOnFiles() - Run actionlint on multiple lock files
-//
-// File Cleanup:
-//   - purgeOrphanedLockFiles() - Remove orphaned .lock.yml files
-//   - purgeInvalidFiles() - Remove .invalid.yml files
-//
-// These functions abstract batch operations, allowing the main compile
-// orchestrator to focus on coordination while these handle batch processing.
+// Functions have been reorganized into more focused files:
+//   - compile_external_tools.go: external tool runners (actionlint, zizmor, poutine, runner-guard)
+//   - compile_post_processing.go: file cleanup/purge operations
 
 package cli
-
-import (
-	"fmt"
-	"os"
-	"path/filepath"
-	"strings"
-
-	"github.com/github/gh-aw/pkg/console"
-	"github.com/github/gh-aw/pkg/logger"
-)
-
-var compileBatchOperationsLog = logger.New("cli:compile_batch_operations")
-
-// RunActionlintOnFiles runs actionlint on multiple lock files in a single batch.
-// This is more efficient than running actionlint once per file.
-func RunActionlintOnFiles(lockFiles []string, verbose bool, strict bool) error {
-	return runBatchLockFileTool("actionlint", lockFiles, verbose, strict, runActionlintOnFiles)
-}
-
-// RunZizmorOnFiles runs zizmor on multiple lock files in a single batch.
-// This is more efficient than running zizmor once per file.
-func RunZizmorOnFiles(lockFiles []string, verbose bool, strict bool) error {
-	return runBatchLockFileTool("zizmor", lockFiles, verbose, strict, runZizmorOnFiles)
-}
-
-// RunPoutineOnDirectory runs poutine security scanner once on a directory.
-// Poutine scans all workflows in a directory, so it only needs to run once.
-func RunPoutineOnDirectory(workflowDir string, verbose bool, strict bool) error {
-	return runPoutineOnDirectory(workflowDir, verbose, strict)
-}
-
-// RunRunnerGuardOnDirectory runs runner-guard taint analysis scanner once on a directory.
-// Runner-guard scans all workflows in a directory, so it only needs to run once.
-func RunRunnerGuardOnDirectory(workflowDir string, verbose bool, strict bool) error {
-	return runRunnerGuardOnDirectory(workflowDir, verbose, strict)
-}
-
-// runBatchLockFileTool runs a batch tool on lock files with uniform error handling
-func runBatchLockFileTool(toolName string, lockFiles []string, verbose bool, strict bool, runner func([]string, bool, bool) error) error {
-	if len(lockFiles) == 0 {
-		compileBatchOperationsLog.Printf("No lock files to process with %s", toolName)
-		return nil
-	}
-
-	compileBatchOperationsLog.Printf("Running batch %s on %d lock files", toolName, len(lockFiles))
-
-	if err := runner(lockFiles, verbose, strict); err != nil {
-		if strict {
-			return fmt.Errorf("%s failed: %w", toolName, err)
-		}
-		// In non-strict mode, errors are warnings
-		if verbose {
-			fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%s warnings: %v", toolName, err)))
-		}
-	}
-
-	return nil
-}
-
-// runBatchDirectoryTool runs a directory-based batch tool with uniform error handling
-func runBatchDirectoryTool(toolName string, workflowDir string, verbose bool, strict bool, runner func(string, bool, bool) error) error {
-	compileBatchOperationsLog.Printf("Running batch %s on directory: %s", toolName, workflowDir)
-
-	if err := runner(workflowDir, verbose, strict); err != nil {
-		if strict {
-			return fmt.Errorf("%s failed: %w", toolName, err)
-		}
-		// In non-strict mode, errors are warnings
-		if verbose {
-			fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%s warnings: %v", toolName, err)))
-		}
-	}
-
-	return nil
-}
-
-// purgeOrphanedLockFiles removes orphaned .lock.yml files
-// These are lock files that exist but don't have a corresponding .md file
-func purgeOrphanedLockFiles(workflowsDir string, expectedLockFiles []string, verbose bool) error {
-	compileBatchOperationsLog.Printf("Purging orphaned lock files in %s", workflowsDir)
-
-	// Find all existing .lock.yml files
-	existingLockFiles, err := filepath.Glob(filepath.Join(workflowsDir, "*.lock.yml"))
-	if err != nil {
-		return fmt.Errorf("failed to find existing lock files: %w", err)
-	}
-
-	if len(existingLockFiles) == 0 {
-		compileBatchOperationsLog.Print("No lock files found")
-		return nil
-	}
-
-	if verbose {
-		fmt.Fprintln(os.Stderr, console.FormatInfoMessage(fmt.Sprintf("Found %d existing .lock.yml files", len(existingLockFiles))))
-	}
-
-	// Build a set of expected lock files
-	expectedLockFileSet := make(map[string]bool)
-	for _, expected := range expectedLockFiles {
-		expectedLockFileSet[expected] = true
-	}
-
-	// Find lock files that should be deleted (exist but aren't expected)
-	var orphanedFiles []string
-	for _, existing := range existingLockFiles {
-		// Skip .campaign.lock.yml files - they're handled by purgeOrphanedCampaignOrchestratorLockFiles
-		if strings.HasSuffix(existing, ".campaign.lock.yml") {
-			continue
-		}
-		if !expectedLockFileSet[existing] {
-			orphanedFiles = append(orphanedFiles, existing)
-		}
-	}
-
-	// Delete orphaned lock files
-	if len(orphanedFiles) > 0 {
-		for _, orphanedFile := range orphanedFiles {
-			if err := os.Remove(orphanedFile); err != nil {
-				fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to remove orphaned lock file %s: %v", filepath.Base(orphanedFile), err)))
-			} else {
-				fmt.Fprintln(os.Stderr, console.FormatSuccessMessage("Removed orphaned lock file: "+filepath.Base(orphanedFile)))
-			}
-		}
-		if verbose {
-			fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(fmt.Sprintf("Purged %d orphaned .lock.yml files", len(orphanedFiles))))
-		}
-	}
-
-	compileBatchOperationsLog.Printf("Purged %d orphaned lock files", len(orphanedFiles))
-	return nil
-}
-
-// purgeInvalidFiles removes all .invalid.yml files
-// These are temporary debugging artifacts that should not persist
-func purgeInvalidFiles(workflowsDir string, verbose bool) error {
-	compileBatchOperationsLog.Printf("Purging invalid files in %s", workflowsDir)
-
-	// Find all existing .invalid.yml files
-	existingInvalidFiles, err := filepath.Glob(filepath.Join(workflowsDir, "*.invalid.yml"))
-	if err != nil {
-		return fmt.Errorf("failed to find existing invalid files: %w", err)
-	}
-
-	if len(existingInvalidFiles) == 0 {
-		compileBatchOperationsLog.Print("No invalid files found")
-		return nil
-	}
-
-	if verbose {
-		fmt.Fprintln(os.Stderr, console.FormatInfoMessage(fmt.Sprintf("Found %d existing .invalid.yml files", len(existingInvalidFiles))))
-	}
-
-	// Delete all .invalid.yml files
-	for _, invalidFile := range existingInvalidFiles {
-		if err := os.Remove(invalidFile); err != nil {
-			fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to remove invalid file %s: %v", filepath.Base(invalidFile), err)))
-		} else {
-			fmt.Fprintln(os.Stderr, console.FormatSuccessMessage("Removed invalid file: "+filepath.Base(invalidFile)))
-		}
-	}
-
-	if verbose {
-		fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(fmt.Sprintf("Purged %d .invalid.yml files", len(existingInvalidFiles))))
-	}
-
-	compileBatchOperationsLog.Printf("Purged %d invalid files", len(existingInvalidFiles))
-	return nil
-}