Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions docs/adr/29952-semantic-function-clustering-pkg-cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# ADR-29952: Semantic Function Clustering in pkg/cli

**Date**: 2026-05-03
**Status**: Draft
**Deciders**: Unknown (automated refactor by copilot-swe-agent)

---

## Part 1 — Narrative (Human-Friendly)

### Context

The `pkg/cli` package in this repository had grown organically, with utility functions placed near their first point of use rather than alongside semantically related functions. This produced four concrete cohesion findings: format helpers living in `audit_diff.go` instead of `audit_math_helpers.go`, external tool runners (actionlint, zizmor, poutine, runner-guard) mixed into a general-purpose `compile_batch_operations.go`, file-discovery functions in a parsing-focused `logs_parsing_core.go`, and path utilities in the compilation-specific `compile_file_operations.go`. The result was files with weak cohesion, long import lists, and poor discoverability for new contributors.

### Decision

We will reorganize `pkg/cli` by moving each function to the file whose existing responsibility most closely matches that function's domain, a practice known as semantic function clustering. Concretely: format helpers move to `audit_math_helpers.go`; external tool runners move to a new `compile_external_tools.go`; post-processing cleanup and warning display move to `compile_post_processing.go`; and path utilities move to `helpers.go`. The old `compile_batch_operations.go` becomes a stub with redirect comments pointing to the new locations.

### Alternatives Considered

#### Alternative 1: Leave Functions at Their Point of First Use

The status quo — functions remain in the file where they were first written, close to the call site that introduced them. This is simple and requires no migration work, but it produces the cohesion violations observed: unrelated functions accumulate in files, making each file harder to reason about and extending import graphs unnecessarily (e.g., `logs_parsing_core.go` importing `fileutil` solely to support a discovery function).

#### Alternative 2: Consolidate All Utilities into a Single `utils.go`

A common Go pattern is to create a single `utils.go` (or `helpers.go`) catch-all. This avoids cohesion violations by brute force but introduces a different anti-pattern: a file with no coherent theme that grows without bound. It also does not express domain intent; `compile_external_tools.go` signals "these are external tool integrations" in a way that `utils.go` never could.

### Consequences

#### Positive
- Each file in `pkg/cli` now has a narrower, better-defined responsibility, making it easier to locate the right file when adding or debugging a function.
- Import graphs are cleaner: `logs_parsing_core.go` drops `strings`, `constants`, and `fileutil` imports that were only needed for the moved discovery functions.
- New contributors can infer where to put a function from file names alone (e.g., a new external tool runner clearly belongs in `compile_external_tools.go`).

#### Negative
- The number of files in `pkg/cli` increases, which can feel heavyweight for a single package.
- `compile_batch_operations.go` becomes a comment-only stub rather than being deleted, preserving a redirect that may confuse readers until a follow-up PR removes it.
- Function moves break `git blame` lineage; the original authorship and rationale are only recoverable by looking at the commit that introduced the function in its old location.

#### Neutral
- No public API surface changes: all exported functions (`RunActionlintOnFiles`, `RunZizmorOnFiles`, `RunPoutineOnDirectory`, `RunRunnerGuardOnDirectory`) are re-exported from the new file, preserving callers.
- Tests are unaffected because Go's same-package compilation treats all `.go` files in `pkg/cli` as a single compilation unit.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

### File Organization

1. Functions in `pkg/cli` **MUST** be placed in the file whose stated responsibility most closely matches the function's domain, not in the file where the function was first needed.
2. A function **MUST NOT** be placed in a file solely because it is called from that file; co-location with semantically related functions takes precedence.
3. Format and math helpers for audit reporting **MUST** reside in `audit_math_helpers.go`.
4. External tool runner functions (actionlint, zizmor, poutine, runner-guard) **MUST** reside in `compile_external_tools.go`.
5. Post-compilation cleanup, warning display, and action cache pruning **MUST** reside in `compile_post_processing.go`.
6. General path and repository utilities **MUST** reside in `helpers.go`.
7. Log file discovery functions **MUST** reside in `logs_utils.go`, not in parsing-specific files.

### Cohesion Maintenance

1. When a new utility function is added to `pkg/cli`, the author **MUST** identify the file whose existing responsibilities most closely match before creating a new file.
2. A new file in `pkg/cli` **SHOULD** be created only when no existing file's responsibility covers the new function's domain and the function is unlikely to be a one-off.
3. Import lists in a file **SHOULD NOT** contain packages that are only needed by a function whose domain belongs elsewhere; such imports are a signal that the function **SHOULD** be moved.

### Conformance

An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/25283930938) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
32 changes: 0 additions & 32 deletions pkg/cli/audit_diff.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ import (
"math"
"path/filepath"
"sort"
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -215,18 +214,6 @@ func classifyFirewallDomainStatus(stats DomainRequestStats) string {
return "unknown"
}

// formatVolumeChange formats the volume change as a human-readable string
func formatVolumeChange(total1, total2 int) string {
if total1 == 0 {
return "+∞"
}
pctChange := (float64(total2-total1) / float64(total1)) * 100
if pctChange >= 0 {
return "+" + formatPercent(pctChange)
}
return formatPercent(pctChange)
}

// MCPToolDiffEntry represents the diff for a single MCP tool between two runs
type MCPToolDiffEntry struct {
ServerName string `json:"server_name"`
Expand Down Expand Up @@ -939,25 +926,6 @@ func computeTokenUsageDiff(tu1, tu2 *TokenUsageSummary) *TokenUsageDiff {
return diff
}

// formatPercentagePointChange formats the change between two ratio values (0.0-1.0) as a
// percentage-point delta (e.g. "+1.5pp", "-2.3pp")
func formatPercentagePointChange(ratio1, ratio2 float64) string {
delta := (ratio2 - ratio1) * 100
if delta >= 0 {
return fmt.Sprintf("+%.1fpp", delta)
}
return fmt.Sprintf("%.1fpp", delta)
}

// formatCountChange formats the absolute change in a count value (e.g. "+3", "-1")
func formatCountChange(count1, count2 int) string {
delta := count2 - count1
if delta >= 0 {
return fmt.Sprintf("+%d", delta)
}
return strconv.Itoa(delta)
}

// loadRunSummaryForDiff loads or builds a RunSummary for a given run for use in diffing.
// It first tries to load from a cached RunSummary (which includes MCP tool usage and run
// metrics); otherwise it downloads artifacts and analyzes firewall logs, returning a partial
Expand Down
36 changes: 35 additions & 1 deletion pkg/cli/audit_math_helpers.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
package cli

import "fmt"
import (
"fmt"
"strconv"
)

// safePercent returns percentage of part/total, returning 0 when total is 0.
func safePercent(part, total int) float64 {
Expand All @@ -14,3 +17,34 @@ func safePercent(part, total int) float64 {
func formatPercent(pct float64) string {
return fmt.Sprintf("%.0f%%", pct)
}

// formatVolumeChange formats the volume change as a human-readable string
func formatVolumeChange(total1, total2 int) string {
if total1 == 0 {
return "+∞"
}
pctChange := (float64(total2-total1) / float64(total1)) * 100
if pctChange >= 0 {
return "+" + formatPercent(pctChange)
}
return formatPercent(pctChange)
}

// formatPercentagePointChange formats the change between two ratio values (0.0-1.0) as a
// percentage-point delta (e.g. "+1.5pp", "-2.3pp")
func formatPercentagePointChange(ratio1, ratio2 float64) string {
delta := (ratio2 - ratio1) * 100
if delta >= 0 {
return fmt.Sprintf("+%.1fpp", delta)
}
return fmt.Sprintf("%.1fpp", delta)
}

// formatCountChange formats the absolute change in a count value (e.g. "+3", "-1")
func formatCountChange(count1, count2 int) string {
delta := count2 - count1
if delta >= 0 {
return fmt.Sprintf("+%d", delta)
}
return strconv.Itoa(delta)
}
195 changes: 4 additions & 191 deletions pkg/cli/compile_batch_operations.go
Original file line number Diff line number Diff line change
@@ -1,194 +1,7 @@
// This file provides batch operations for workflow compilation.
// This file previously contained batch operations for workflow compilation.
//
// This file contains functions that perform batch operations on compiled workflows,
// such as running linters, security scanners, and cleaning up orphaned files.
//
// # Organization Rationale
//
// These batch operation functions are grouped here because they:
// - Operate on multiple files at once
// - Run external tools (actionlint, zizmor, poutine)
// - Have a clear domain focus (batch operations)
// - Keep the main orchestrator focused on coordination
//
// # Key Functions
//
// Batch Linting:
// - RunActionlintOnFiles() - Run actionlint on multiple lock files
//
// File Cleanup:
// - purgeOrphanedLockFiles() - Remove orphaned .lock.yml files
// - purgeInvalidFiles() - Remove .invalid.yml files
//
// These functions abstract batch operations, allowing the main compile
// orchestrator to focus on coordination while these handle batch processing.
// Functions have been reorganized into more focused files:
// - compile_external_tools.go: external tool runners (actionlint, zizmor, poutine, runner-guard)
// - compile_post_processing.go: file cleanup/purge operations

package cli

import (
"fmt"
"os"
"path/filepath"
"strings"

"github.com/github/gh-aw/pkg/console"
"github.com/github/gh-aw/pkg/logger"
)

var compileBatchOperationsLog = logger.New("cli:compile_batch_operations")

// RunActionlintOnFiles runs actionlint on multiple lock files in a single batch.
// This is more efficient than running actionlint once per file.
func RunActionlintOnFiles(lockFiles []string, verbose bool, strict bool) error {
return runBatchLockFileTool("actionlint", lockFiles, verbose, strict, runActionlintOnFiles)
}

// RunZizmorOnFiles runs zizmor on multiple lock files in a single batch.
// This is more efficient than running zizmor once per file.
func RunZizmorOnFiles(lockFiles []string, verbose bool, strict bool) error {
return runBatchLockFileTool("zizmor", lockFiles, verbose, strict, runZizmorOnFiles)
}

// RunPoutineOnDirectory runs poutine security scanner once on a directory.
// Poutine scans all workflows in a directory, so it only needs to run once.
func RunPoutineOnDirectory(workflowDir string, verbose bool, strict bool) error {
return runPoutineOnDirectory(workflowDir, verbose, strict)
}

// RunRunnerGuardOnDirectory runs runner-guard taint analysis scanner once on a directory.
// Runner-guard scans all workflows in a directory, so it only needs to run once.
func RunRunnerGuardOnDirectory(workflowDir string, verbose bool, strict bool) error {
return runRunnerGuardOnDirectory(workflowDir, verbose, strict)
}

// runBatchLockFileTool runs a batch tool on lock files with uniform error handling
func runBatchLockFileTool(toolName string, lockFiles []string, verbose bool, strict bool, runner func([]string, bool, bool) error) error {
if len(lockFiles) == 0 {
compileBatchOperationsLog.Printf("No lock files to process with %s", toolName)
return nil
}

compileBatchOperationsLog.Printf("Running batch %s on %d lock files", toolName, len(lockFiles))

if err := runner(lockFiles, verbose, strict); err != nil {
if strict {
return fmt.Errorf("%s failed: %w", toolName, err)
}
// In non-strict mode, errors are warnings
if verbose {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%s warnings: %v", toolName, err)))
}
}

return nil
}

// runBatchDirectoryTool runs a directory-based batch tool with uniform error handling
func runBatchDirectoryTool(toolName string, workflowDir string, verbose bool, strict bool, runner func(string, bool, bool) error) error {
compileBatchOperationsLog.Printf("Running batch %s on directory: %s", toolName, workflowDir)

if err := runner(workflowDir, verbose, strict); err != nil {
if strict {
return fmt.Errorf("%s failed: %w", toolName, err)
}
// In non-strict mode, errors are warnings
if verbose {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%s warnings: %v", toolName, err)))
}
}

return nil
}

// purgeOrphanedLockFiles removes orphaned .lock.yml files
// These are lock files that exist but don't have a corresponding .md file
func purgeOrphanedLockFiles(workflowsDir string, expectedLockFiles []string, verbose bool) error {
compileBatchOperationsLog.Printf("Purging orphaned lock files in %s", workflowsDir)

// Find all existing .lock.yml files
existingLockFiles, err := filepath.Glob(filepath.Join(workflowsDir, "*.lock.yml"))
if err != nil {
return fmt.Errorf("failed to find existing lock files: %w", err)
}

if len(existingLockFiles) == 0 {
compileBatchOperationsLog.Print("No lock files found")
return nil
}

if verbose {
fmt.Fprintln(os.Stderr, console.FormatInfoMessage(fmt.Sprintf("Found %d existing .lock.yml files", len(existingLockFiles))))
}

// Build a set of expected lock files
expectedLockFileSet := make(map[string]bool)
for _, expected := range expectedLockFiles {
expectedLockFileSet[expected] = true
}

// Find lock files that should be deleted (exist but aren't expected)
var orphanedFiles []string
for _, existing := range existingLockFiles {
// Skip .campaign.lock.yml files - they're handled by purgeOrphanedCampaignOrchestratorLockFiles
if strings.HasSuffix(existing, ".campaign.lock.yml") {
continue
}
if !expectedLockFileSet[existing] {
orphanedFiles = append(orphanedFiles, existing)
}
}

// Delete orphaned lock files
if len(orphanedFiles) > 0 {
for _, orphanedFile := range orphanedFiles {
if err := os.Remove(orphanedFile); err != nil {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to remove orphaned lock file %s: %v", filepath.Base(orphanedFile), err)))
} else {
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage("Removed orphaned lock file: "+filepath.Base(orphanedFile)))
}
}
if verbose {
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(fmt.Sprintf("Purged %d orphaned .lock.yml files", len(orphanedFiles))))
}
}

compileBatchOperationsLog.Printf("Purged %d orphaned lock files", len(orphanedFiles))
return nil
}

// purgeInvalidFiles removes all .invalid.yml files
// These are temporary debugging artifacts that should not persist
func purgeInvalidFiles(workflowsDir string, verbose bool) error {
compileBatchOperationsLog.Printf("Purging invalid files in %s", workflowsDir)

// Find all existing .invalid.yml files
existingInvalidFiles, err := filepath.Glob(filepath.Join(workflowsDir, "*.invalid.yml"))
if err != nil {
return fmt.Errorf("failed to find existing invalid files: %w", err)
}

if len(existingInvalidFiles) == 0 {
compileBatchOperationsLog.Print("No invalid files found")
return nil
}

if verbose {
fmt.Fprintln(os.Stderr, console.FormatInfoMessage(fmt.Sprintf("Found %d existing .invalid.yml files", len(existingInvalidFiles))))
}

// Delete all .invalid.yml files
for _, invalidFile := range existingInvalidFiles {
if err := os.Remove(invalidFile); err != nil {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to remove invalid file %s: %v", filepath.Base(invalidFile), err)))
} else {
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage("Removed invalid file: "+filepath.Base(invalidFile)))
}
}

if verbose {
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(fmt.Sprintf("Purged %d .invalid.yml files", len(existingInvalidFiles))))
}

compileBatchOperationsLog.Printf("Purged %d invalid files", len(existingInvalidFiles))
return nil
}
Loading