Skip to content

[refactor] Semantic function clustering: 3 cross-file consolidation opportunities + 10 large files to split #32715

@github-actions

Description

@github-actions

Overview

Semantic clustering of functions across pkg/workflow/ (377 non-test files) and pkg/cli/ (302 non-test files) surfaces three high-confidence consolidation patterns plus ten files that have grown past 940 lines and should be split along feature seams. None of these overlap with existing open sergo refactor issues #31633 (parsePermissionsConfig switch) or #31298 (extractSafeOutputsConfig table). Issue #31300 tracks long functions; this issue tracks long files and cross-file patterns.

All findings below have been verified by reading the cited functions, not by name-matching alone.

Summary

  • 3 consolidation opportunities across files (markdown/pretty renderers, workflow-asset extractors, codemod factory).
  • 10 large files (≥944 lines each, 5 in each package) ripe for feature-axis splitting.
  • Skipped (false positives or already tracked): the *_wasm.go companion files use idiomatic //go:build tags — not duplication; the 61 parseXConfig methods are already partially consolidated via parseConfigScaffold (pkg/workflow/config_helpers.go:267) and the larger refactor is tracked in Refactor: 637-line extractSafeOutputsConfig has 45 near-identical parse-and-assign blocks (table-driven candidate) #31298.

Finding 1: Parallel Markdown / Pretty renderer pairs (10+ pairs)

pkg/cli/audit_diff_render.go defines two parallel families of rendering functions — one writing GitHub-flavored markdown to stdout, the other writing styled console output to stderr. Each pair takes the same input and emits the same logical sections, differing only in the writer + formatter functions. The current count is 10 confirmed pairs in one file, plus the same pattern recurring in pkg/cli/audit_cross_run_render.go.

Confirmed pairs (all in pkg/cli/audit_diff_render.go):

Markdown function Pretty function
renderSingleAuditDiffMarkdown (audit_diff_render.go:60) renderSingleAuditDiffPretty (audit_diff_render.go:75)
renderFirewallDiffMarkdownSection (audit_diff_render.go:140) renderFirewallDiffPrettySection (audit_diff_render.go:314)
renderMCPToolsDiffMarkdownSection (audit_diff_render.go:197) renderMCPToolsDiffPrettySection (audit_diff_render.go:403)
renderRunMetricsDiffMarkdownSection (audit_diff_render.go:246) renderRunMetricsDiffPrettySection (audit_diff_render.go:475)
renderTokenUsageDiffMarkdownSection (audit_diff_render.go:283) renderTokenUsageDiffPrettySection (audit_diff_render.go:540)
renderGitHubRateLimitDiffMarkdownSection (audit_diff_render.go:612) renderGitHubRateLimitDiffPrettySection (audit_diff_render.go:634)
renderToolCallsDiffMarkdownSection (audit_diff_render.go:758) renderToolCallsDiffPrettySection (audit_diff_render.go:683)
renderBashCommandsDiffMarkdownSection (audit_diff_render.go:785) renderBashCommandsDiffPrettySection (audit_diff_render.go:720)
Plus 2 more in audit_cross_run_render.go:27,208 (same pair)

The pretty variants generally do extra summary work (e.g. renderSingleAuditDiffPretty aggregates anomaly counts before delegating), so a naïve merge would lose features. Realistic refactor: introduce a small DiffRenderer interface with Heading, Subheading, KeyValueRow, Bullet, Warn methods, implemented twice (markdown and pretty). Each section function becomes one implementation that takes the renderer. Estimated reduction: ~857 lines of rendering code → ~400 lines + interface, no behavior change.

Finding 2: extract*From{ParsedWorkflow,YAMLFile,MDFile} triplets in call_workflow_*.go

Three separate files in pkg/workflow/ each implement the exact same three-step shape: "extract X from a YAML file", "extract X from a markdown file", "extract X from an already-parsed workflow map":

Concern Parsed-map fn YAML-file fn MD-file fn
Permissions extractJobPermissionsFromParsedWorkflow (call_workflow_permissions.go:15) extractPermissionsFromYAMLFile (call_workflow_permissions.go:85) extractPermissionsFromMDFile (call_workflow_permissions.go:99)
Secrets extractWorkflowCallSecretsFromParsed (call_workflow_secrets.go:57) extractSecretsFromWorkflowFile (call_workflow_secrets.go:44) extractCallWorkflowSecrets (call_workflow_secrets.go:22) — dispatches by extension
Inputs extractWorkflowCallInputsFromParsed (call_workflow_validation.go:233) extractWorkflowCallInputs (call_workflow_validation.go:184) extractMDWorkflowCallInputs (call_workflow_validation.go:195)

All three families share the load + extension dispatch + parse + drill-down sequence; only the final drill-down is concern-specific. Realistic refactor: introduce a single shared loadParsedWorkflow(path string) (map[string]any, error) in a new pkg/workflow/workflow_loader.go that handles .md vs .yml dispatch once, and let each concern keep only its FromParsedWorkflow function. Eliminates 6 small file-handling functions (~120 lines) and removes the YAML/MD branching that is currently re-implemented per concern.

Finding 3: pkg/cli/codemod_*.go — 90+ get<Name>Codemod() factories, no central registry

pkg/cli/ contains ~50 codemod_*.go files, each defining its own get<Name>Codemod() constructor. The codemod registry / dispatch lives in codemod_factory.go, but the registry itself is not declarative — adding a new codemod means (a) creating the file, (b) writing the constructor, (c) wiring it into codemod_factory.go by hand. There is also no shared validation helper for the small repeated checks (hasTopLevelSection, hasAgentJobSection, hasToolsMountAsCLIs) that recur across multiple codemod files.

Realistic refactor: convert the dispatch in codemod_factory.go into a slice of Codemod structs registered via init() in each codemod_*.go file (or a single declarative table). Extract the repeated frontmatter inspection helpers into a codemod_helpers.go (or fold into the existing codemod_factory.go). Reduces edit-this-when-adding-a-codemod from 3 places to 1.

Finding 4: Top 5 largest files in pkg/workflow/ (≥1007 lines)
File Lines Primary contents Suggested split
pkg/workflow/frontmatter_extraction_yaml.go 1153 YAML field extraction for permissions, conditions, commands, labels Split per-concern: frontmatter_extraction_permissions.go, _conditions.go, _commands.go
pkg/workflow/compiler_yaml_main_job.go 1089 Main job step generation: runtime setup, engine install, post-collection Split into _main_job_setup.go, _main_job_engine.go, _main_job_post.go
pkg/workflow/domains.go 1041 Domain allowlisting: ecosystem lookup, provider defaults, network merge Split into domain_defaults.go, domain_ecosystem.go, domain_allowlist.go
pkg/workflow/compiler_jobs.go 1028 Job orchestration: safe-outputs job, pre-activation, maintenance Split into compiler_job_safe_outputs.go, compiler_job_preactivation.go, compiler_job_maintenance.go
pkg/workflow/cache.go 1007 Cache config parsing & validation Split into cache_parsing.go and cache_validation.go
Finding 5: Top 5 largest files in pkg/cli/ (≥944 lines)
File Lines Primary contents Suggested split
pkg/cli/forecast.go 1142 Forecast command + Monte Carlo simulation + cost/token calculation Extract forecast_montecarlo.go, forecast_costs.go
pkg/cli/audit.go 1120 Audit command, multi-run orchestration, result caching Extract audit_multirun.go, audit_cache.go
pkg/cli/logs_orchestrator.go 1046 Orchestrator step parsing + task/job correlation Split parsing from correlation
pkg/cli/audit_diff.go 977 Diff computation for firewall, metrics, tool calls Already coherent — consider only if Finding 1 lands (drives shrinkage in companion render file)
pkg/cli/logs_download.go 944 Artifact download, caching, progress tracking, parallel ops Extract logs_download_parallel.go, logs_download_cache.go

Recommendations

Priority 1 (high signal, low risk)

  1. Findings 1 & 2 — both have clear refactor shapes (renderer interface; shared loadParsedWorkflow). Each is a self-contained ~3–5 hour change with strong test coverage already present (audit_diff_*_test.go, call_workflow_*_test.go).

Priority 2 (medium signal, medium risk)

  1. Finding 3 (codemod registry) — design the declarative table first; the rewrite is mostly mechanical once the shape is decided.
  2. Findings 4 & 5 (file splits) — start with frontmatter_extraction_yaml.go and forecast.go since their internal seams are most obvious; the others are judgment calls and may be left as-is if reviewers prefer the current layout.

Out of scope (intentionally excluded)

Analysis Metadata

  • Total Go non-test files analyzed: 800 (focus: pkg/workflow/ 377, pkg/cli/ 302)
  • High-confidence consolidation findings: 3
  • Large files flagged for splitting: 10
  • Detection method: function-name pattern grep + targeted body reads for verification; pair-wise signature comparison for duplicate claims
  • Analysis date: 2026-05-17

References:

Generated by 🔧 Semantic Function Refactoring · ● 10.1M ·

  • expires on May 19, 2026, 12:09 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions