Skip to content

[refactor] Semantic function clustering: dedupe linter helpers, drop stub files, rename outliers #33415

@github-actions

Description

@github-actions

Overview

Semantic clustering analysis of 818 non-test .go files under pkg/ (≈4,193 top-level functions/methods) surfaced four actionable categories: (1) near-identical duplicated helpers across two linter packages, (2) stub files that are now pure comment-only legacy bridges, (3) misnamed / outlier files that don't match their cluster, and (4) trivial wrapper duplicates.

The codebase is generally well-organized — most clusters (create_*, update_*, codemod_*, audit_*, compile_*, mcp_*, safe_outputs_*, expression_*, frontmatter_*, engine_*) follow a clean one-feature-per-file pattern. The findings below are concentrated, high-confidence wins that don't require structural change.

Summary

Category Files affected Estimated effort
Duplicate linter helpers 2 1–2 h
Comment-only stub files 6 30 m
Misnamed outlier validation file 1 15 m
Stale package-doc references 2 15 m
Trivial wrapper duplicates (isValidFullSHA) 2 30 m
Cross-package name collision (resolveContainerDigest) 2 30 m
Single-function util.go 1 10 m

Priority 1 — Duplicate functions in pkg/linters/

pkg/linters/errstringmatch/errstringmatch.go and pkg/linters/errormessage/errormessage.go both define near-identical helpers:

  • hasNoLintDirective(position token.Position, noLintLinesByFile map[string]map[int]struct{}) boolbyte-for-byte identical bodies at pkg/linters/errstringmatch/errstringmatch.go:146 and pkg/linters/errormessage/errormessage.go:267.
  • buildNoLintLineIndex(pass *analysis.Pass) map[string]map[int]struct{} — identical except for the per-linter nolint: prefix string at pkg/linters/errstringmatch/errstringmatch.go:161 ("nolint:errstringmatch") and pkg/linters/errormessage/errormessage.go:282 ("nolint:errormessage").
  • implementsError(...) — same conceptual check (does t implement error?), slightly different signatures: pkg/linters/errstringmatch/errstringmatch.go:110 takes (pass *analysis.Pass, t types.Type); pkg/linters/errormessage/errormessage.go:248 takes only (t types.Type).

Recommendation: Extract a shared helper, e.g. pkg/linters/internal/nolint/nolint.go, exposing:

func HasDirective(pos token.Position, idx map[string]map[int]struct{}) bool
func BuildLineIndex(pass *analysis.Pass, linterName string) map[string]map[int]struct{}
func ImplementsError(t types.Type) bool

The linterName parameter replaces the hard-coded "nolint:errstringmatch" / "nolint:errormessage" literal so the comparison becomes strings.HasPrefix(text, "nolint:"+linterName) || strings.HasPrefix(text, "nolint:all").

Impact: removes ~40 lines of duplication, single source of truth for the nolint directive contract, and makes adding a third linter (already implied by pkg/linters/ctxbackground, excessivefuncparams, largefunc, osexitinlibrary, rawloginlib, ssljson) much cheaper.

Priority 1 — Delete comment-only stub files

These files contain zero Go declarations beyond package — only a stale leading comment explaining that functions "have moved". The comments are not load-bearing (no //go:build, no logger, no const, no import side-effects):

File Lines Content
pkg/cli/list_command.go 1 package cli only (no comment, no code)
pkg/cli/verify_command.go 2 One-line comment: "superseded by validate_command.go"
pkg/cli/compile_batch_operations.go 7 Comment pointing to compile_external_tools.go / compile_post_processing.go
pkg/cli/gateway_logs.go 9 Comment listing the 5 files this was split into
pkg/cli/run_command.go 12 Comment listing where IsRunnable / RunWorkflowOnGitHub live now
pkg/workflow/scripts.go 10 Comment pointing readers to script_registry.go

Recommendation: Delete all six. Grep verifies none are referenced by name and they hold no symbols; the per-conversation reader can use git history or grep to find functions if needed.

Priority 2 — Misnamed outlier: update_check_validation.go

pkg/workflow/update_check_validation.go defines (c *Compiler) validateUpdateCheck (called from pkg/workflow/compiler_orchestrator_engine.go:107). Despite the update_* filename prefix, it has nothing to do with the update_*.go cluster (update_issue.go, update_discussion.go, update_pull_request.go, update_project.go, update_release.go, update_entity_helpers.go) which is the safe-output "update entity" job builder family. It's actually a strict-mode validator for the check-for-updates: false frontmatter flag.

Recommendation: Rename to pkg/workflow/strict_mode_update_check_validation.go (consistent with strict_mode_permissions_validation.go, strict_mode_network_validation.go, strict_mode_env_validation.go, strict_mode_steps_validation.go, strict_mode_sandbox_validation.go) or fold the 30-line function into strict_mode_validation.go.

Priority 2 — Stale validation package-doc references

The doc-only file pkg/workflow/validation.go enumerates sibling files but the list has drifted:

  • References nonexistent files: bundler_safety_validation.go, bundler_script_validation.go, bundler_runtime_validation.go, gateway_validation.go.
  • Wrong name: expression_safety.go (actual file is expression_safety_validation.go).

Similarly, pkg/workflow/strict_mode_validation.go lists only permissions, network, env variants but the actual cluster also includes strict_mode_steps_validation.go and strict_mode_sandbox_validation.go.

Recommendation: Update the file lists to reflect reality (or drop the lists; the directory listing is the source of truth).

Priority 3 — Trivial wrapper duplicates: isValidFullSHA

Two packages each define an unexported 2-line wrapper around gitutil.IsValidFullSHA:

// pkg/workflow/features_validation.go:100
func isValidFullSHA(s string) bool { return gitutil.IsValidFullSHA(s) }

// pkg/actionpins/actionpins.go:258
func isValidFullSHA(s string) bool { return gitutil.IsValidFullSHA(s) }

Recommendation: Delete both wrappers, inline gitutil.IsValidFullSHA(...) at the call sites. Two unexported re-exports add no abstraction value.

Priority 3 — Name collision: resolveContainerDigest

Same name, different responsibilities in two packages — easy to confuse during search/IDE navigation:

  • pkg/cli/update_container_pins.go:231resolveContainerDigest(ctx context.Context, image string, verbose bool) (string, error) — actively queries Docker buildx / crane / docker pull to compute a digest.
  • pkg/workflow/awf_helpers.go:581resolveContainerDigest(image string, workflowData *WorkflowData) string — pure pin-cache lookup, returns "" on miss.

Recommendation: Rename the CLI one to fetchContainerDigest (or computeContainerDigest) and the workflow one to lookupContainerDigest to reflect what each actually does.

Priority 3 — Single-function pkg/cli/util.go

pkg/cli/util.go is 4 lines and exposes a single boolPtr(b bool) *bool helper used only in pkg/cli/mcp_tools_readonly.go, mcp_tools_management.go, and mcp_tools_privileged.go. A standalone util.go invites future drift (anything-goes dumping ground).

Recommendation: Move boolPtr into pkg/cli/mcp_helpers.go (already the natural home for shared MCP helpers) and delete util.go.

Method & metadata
  • Scope: all .go files in pkg/ excluding *_test.go (818 files; 381 in pkg/workflow, 303 in pkg/cli, 41 in pkg/parser, rest in 25+ smaller packages).
  • Function inventory: ~4,193 top-level func declarations grouped by base name.
  • Duplicate detection: name collisions across packages were enumerated with a single grep+sort+uniq pass; each collision was then read in source to distinguish (a) legitimate //go:build wasm/non-wasm pairs (e.g. RunGH, RenderTable, IsStdoutTerminal, setupGHCommand, findGitRoot, RunGitCombined, validateDockerImage, isErrNpmNotAvailable — all correctly tagged), (b) thin re-export wrappers (e.g. SanitizeName in pkg/workflow/strings.go simply delegates to pkg/stringutil/sanitize.go), and (c) the genuine duplicates reported above.
  • Cluster outlier detection: filename prefix grouping in each package, then inspection of files whose primary function semantics didn't match the prefix (update_check_validation.go was the only clear outlier).
  • What was deliberately NOT flagged: the compiler_safe_outputs_* vs safe_outputs_* split — the boundary is unclear from filenames alone but the cluster is large (~33 files), so any reorganization would be high-risk and out of scope for this report.
  • Out of scope: test files, generated code, the pkg/parser/schemas/ and pkg/workflow/{assets,prompts,schemas,sh,js} data directories.

Implementation checklist

  • Extract pkg/linters/internal/nolint/nolint.go with HasDirective, BuildLineIndex, ImplementsError; update both linters to call it
  • Delete the 6 comment-only stub files
  • Rename update_check_validation.gostrict_mode_update_check_validation.go (or inline into strict_mode_validation.go)
  • Refresh file-list comments in validation.go and strict_mode_validation.go
  • Inline gitutil.IsValidFullSHA at the two wrapper sites; delete the wrappers
  • Rename one of the two resolveContainerDigest functions to reflect its actual role
  • Move boolPtr into pkg/cli/mcp_helpers.go; delete pkg/cli/util.go
  • Run go test ./... and go vet ./... after each step

Generated by 🔧 Semantic Function Refactoring · ● 29.7M ·

  • expires on May 22, 2026, 12:14 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions