[refactor] Semantic function clustering: 3 cross-file consolidation opportunities + 10 large files to split

### Overview

Semantic clustering of functions across `pkg/workflow/` (377 non-test files) and `pkg/cli/` (302 non-test files) surfaces three high-confidence consolidation patterns plus ten files that have grown past 940 lines and should be split along feature seams. None of these overlap with existing open `sergo` refactor issues #31633 (parsePermissionsConfig switch) or #31298 (extractSafeOutputsConfig table). Issue #31300 tracks long *functions*; this issue tracks long *files* and cross-file *patterns*.

All findings below have been verified by reading the cited functions, not by name-matching alone.

### Summary

- **3 consolidation opportunities** across files (markdown/pretty renderers, workflow-asset extractors, codemod factory).
- **10 large files** (≥944 lines each, 5 in each package) ripe for feature-axis splitting.
- **Skipped (false positives or already tracked)**: the `*_wasm.go` companion files use idiomatic `//go:build` tags — not duplication; the 61 `parseXConfig` methods are already partially consolidated via `parseConfigScaffold` (`pkg/workflow/config_helpers.go:267`) and the larger refactor is tracked in #31298.

### Finding 1: Parallel `Markdown` / `Pretty` renderer pairs (10+ pairs)

`pkg/cli/audit_diff_render.go` defines two parallel families of rendering functions — one writing GitHub-flavored markdown to stdout, the other writing styled console output to stderr. Each pair takes the same input and emits the same logical sections, differing only in the writer + formatter functions. The current count is **10 confirmed pairs in one file**, plus the same pattern recurring in `pkg/cli/audit_cross_run_render.go`.

Confirmed pairs (all in `pkg/cli/audit_diff_render.go`):

| Markdown function | Pretty function |
| --- | --- |
| `renderSingleAuditDiffMarkdown` (`audit_diff_render.go:60`) | `renderSingleAuditDiffPretty` (`audit_diff_render.go:75`) |
| `renderFirewallDiffMarkdownSection` (`audit_diff_render.go:140`) | `renderFirewallDiffPrettySection` (`audit_diff_render.go:314`) |
| `renderMCPToolsDiffMarkdownSection` (`audit_diff_render.go:197`) | `renderMCPToolsDiffPrettySection` (`audit_diff_render.go:403`) |
| `renderRunMetricsDiffMarkdownSection` (`audit_diff_render.go:246`) | `renderRunMetricsDiffPrettySection` (`audit_diff_render.go:475`) |
| `renderTokenUsageDiffMarkdownSection` (`audit_diff_render.go:283`) | `renderTokenUsageDiffPrettySection` (`audit_diff_render.go:540`) |
| `renderGitHubRateLimitDiffMarkdownSection` (`audit_diff_render.go:612`) | `renderGitHubRateLimitDiffPrettySection` (`audit_diff_render.go:634`) |
| `renderToolCallsDiffMarkdownSection` (`audit_diff_render.go:758`) | `renderToolCallsDiffPrettySection` (`audit_diff_render.go:683`) |
| `renderBashCommandsDiffMarkdownSection` (`audit_diff_render.go:785`) | `renderBashCommandsDiffPrettySection` (`audit_diff_render.go:720`) |
| Plus 2 more in `audit_cross_run_render.go:27,208` | (same pair) |

The pretty variants generally do extra summary work (e.g. `renderSingleAuditDiffPretty` aggregates anomaly counts before delegating), so a naïve merge would lose features. **Realistic refactor**: introduce a small `DiffRenderer` interface with `Heading`, `Subheading`, `KeyValueRow`, `Bullet`, `Warn` methods, implemented twice (markdown and pretty). Each section function becomes one implementation that takes the renderer. Estimated reduction: ~857 lines of rendering code → ~400 lines + interface, no behavior change.

### Finding 2: `extract*From{ParsedWorkflow,YAMLFile,MDFile}` triplets in `call_workflow_*.go`

Three separate files in `pkg/workflow/` each implement the exact same three-step shape: "extract X from a YAML file", "extract X from a markdown file", "extract X from an already-parsed workflow map":

| Concern | Parsed-map fn | YAML-file fn | MD-file fn |
| --- | --- | --- | --- |
| Permissions | `extractJobPermissionsFromParsedWorkflow` (`call_workflow_permissions.go:15`) | `extractPermissionsFromYAMLFile` (`call_workflow_permissions.go:85`) | `extractPermissionsFromMDFile` (`call_workflow_permissions.go:99`) |
| Secrets | `extractWorkflowCallSecretsFromParsed` (`call_workflow_secrets.go:57`) | `extractSecretsFromWorkflowFile` (`call_workflow_secrets.go:44`) | `extractCallWorkflowSecrets` (`call_workflow_secrets.go:22`) — dispatches by extension |
| Inputs | `extractWorkflowCallInputsFromParsed` (`call_workflow_validation.go:233`) | `extractWorkflowCallInputs` (`call_workflow_validation.go:184`) | `extractMDWorkflowCallInputs` (`call_workflow_validation.go:195`) |

All three families share the *load + extension dispatch + parse + drill-down* sequence; only the final drill-down is concern-specific. **Realistic refactor**: introduce a single shared `loadParsedWorkflow(path string) (map[string]any, error)` in a new `pkg/workflow/workflow_loader.go` that handles `.md` vs `.yml` dispatch once, and let each concern keep only its `FromParsedWorkflow` function. Eliminates 6 small file-handling functions (~120 lines) and removes the YAML/MD branching that is currently re-implemented per concern.

### Finding 3: `pkg/cli/codemod_*.go` — 90+ `get<Name>Codemod()` factories, no central registry

`pkg/cli/` contains ~50 `codemod_*.go` files, each defining its own `get<Name>Codemod()` constructor. The codemod registry / dispatch lives in `codemod_factory.go`, but the registry itself is not declarative — adding a new codemod means (a) creating the file, (b) writing the constructor, (c) wiring it into `codemod_factory.go` by hand. There is also no shared validation helper for the small repeated checks (`hasTopLevelSection`, `hasAgentJobSection`, `hasToolsMountAsCLIs`) that recur across multiple codemod files.

**Realistic refactor**: convert the dispatch in `codemod_factory.go` into a slice of `Codemod` structs registered via `init()` in each `codemod_*.go` file (or a single declarative table). Extract the repeated frontmatter inspection helpers into a `codemod_helpers.go` (or fold into the existing `codemod_factory.go`). Reduces edit-this-when-adding-a-codemod from 3 places to 1.

<details>
<summary>Finding 4: Top 5 largest files in <code>pkg/workflow/</code> (≥1007 lines)</summary>

| File | Lines | Primary contents | Suggested split |
| --- | --- | --- | --- |
| `pkg/workflow/frontmatter_extraction_yaml.go` | 1153 | YAML field extraction for permissions, conditions, commands, labels | Split per-concern: `frontmatter_extraction_permissions.go`, `_conditions.go`, `_commands.go` |
| `pkg/workflow/compiler_yaml_main_job.go` | 1089 | Main job step generation: runtime setup, engine install, post-collection | Split into `_main_job_setup.go`, `_main_job_engine.go`, `_main_job_post.go` |
| `pkg/workflow/domains.go` | 1041 | Domain allowlisting: ecosystem lookup, provider defaults, network merge | Split into `domain_defaults.go`, `domain_ecosystem.go`, `domain_allowlist.go` |
| `pkg/workflow/compiler_jobs.go` | 1028 | Job orchestration: safe-outputs job, pre-activation, maintenance | Split into `compiler_job_safe_outputs.go`, `compiler_job_preactivation.go`, `compiler_job_maintenance.go` |
| `pkg/workflow/cache.go` | 1007 | Cache config parsing & validation | Split into `cache_parsing.go` and `cache_validation.go` |

</details>

<details>
<summary>Finding 5: Top 5 largest files in <code>pkg/cli/</code> (≥944 lines)</summary>

| File | Lines | Primary contents | Suggested split |
| --- | --- | --- | --- |
| `pkg/cli/forecast.go` | 1142 | Forecast command + Monte Carlo simulation + cost/token calculation | Extract `forecast_montecarlo.go`, `forecast_costs.go` |
| `pkg/cli/audit.go` | 1120 | Audit command, multi-run orchestration, result caching | Extract `audit_multirun.go`, `audit_cache.go` |
| `pkg/cli/logs_orchestrator.go` | 1046 | Orchestrator step parsing + task/job correlation | Split parsing from correlation |
| `pkg/cli/audit_diff.go` | 977 | Diff computation for firewall, metrics, tool calls | Already coherent — consider only if Finding 1 lands (drives shrinkage in companion render file) |
| `pkg/cli/logs_download.go` | 944 | Artifact download, caching, progress tracking, parallel ops | Extract `logs_download_parallel.go`, `logs_download_cache.go` |

</details>

### Recommendations

**Priority 1 (high signal, low risk)**

1. **Findings 1 & 2** — both have clear refactor shapes (renderer interface; shared `loadParsedWorkflow`). Each is a self-contained ~3–5 hour change with strong test coverage already present (`audit_diff_*_test.go`, `call_workflow_*_test.go`).

**Priority 2 (medium signal, medium risk)**

2. **Finding 3 (codemod registry)** — design the declarative table first; the rewrite is mostly mechanical once the shape is decided.
3. **Findings 4 & 5 (file splits)** — start with `frontmatter_extraction_yaml.go` and `forecast.go` since their internal seams are most obvious; the others are judgment calls and may be left as-is if reviewers prefer the current layout.

**Out of scope (intentionally excluded)**

- The `pkg/workflow/*_wasm.go` files mirror native files but use `//go:build js || wasm` vs `//go:build !js && !wasm`. This is idiomatic conditional compilation, not duplication — only one variant ever links into a binary.
- The 61 `parseXConfig` methods already share `parseConfigScaffold` (`pkg/workflow/config_helpers.go:267`); the larger refactor of the 637-line `extractSafeOutputsConfig` dispatch site is tracked in #31298.
- The 40-case scope→field switch in `parsePermissionsConfig` is tracked in #31633.

### Analysis Metadata

- **Total Go non-test files analyzed**: 800 (focus: `pkg/workflow/` 377, `pkg/cli/` 302)
- **High-confidence consolidation findings**: 3
- **Large files flagged for splitting**: 10
- **Detection method**: function-name pattern grep + targeted body reads for verification; pair-wise signature comparison for duplicate claims
- **Analysis date**: 2026-05-17

**References:**
- [§25976367710](https://github.com/github/gh-aw/actions/runs/25976367710)
- Related: #31298, #31300, #31633




> Generated by [🔧 Semantic Function Refactoring](https://github.com/github/gh-aw/actions/runs/25976367710) · ● 10.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fsemantic-function-refactor%22&type=issues)
> - [x] expires  on May 19, 2026, 12:09 AM UTC

Markdown function	Pretty function
`renderSingleAuditDiffMarkdown` (`audit_diff_render.go:60`)	`renderSingleAuditDiffPretty` (`audit_diff_render.go:75`)
`renderFirewallDiffMarkdownSection` (`audit_diff_render.go:140`)	`renderFirewallDiffPrettySection` (`audit_diff_render.go:314`)
`renderMCPToolsDiffMarkdownSection` (`audit_diff_render.go:197`)	`renderMCPToolsDiffPrettySection` (`audit_diff_render.go:403`)
`renderRunMetricsDiffMarkdownSection` (`audit_diff_render.go:246`)	`renderRunMetricsDiffPrettySection` (`audit_diff_render.go:475`)
`renderTokenUsageDiffMarkdownSection` (`audit_diff_render.go:283`)	`renderTokenUsageDiffPrettySection` (`audit_diff_render.go:540`)
`renderGitHubRateLimitDiffMarkdownSection` (`audit_diff_render.go:612`)	`renderGitHubRateLimitDiffPrettySection` (`audit_diff_render.go:634`)
`renderToolCallsDiffMarkdownSection` (`audit_diff_render.go:758`)	`renderToolCallsDiffPrettySection` (`audit_diff_render.go:683`)
`renderBashCommandsDiffMarkdownSection` (`audit_diff_render.go:785`)	`renderBashCommandsDiffPrettySection` (`audit_diff_render.go:720`)
Plus 2 more in `audit_cross_run_render.go:27,208`	(same pair)

Concern	Parsed-map fn	YAML-file fn	MD-file fn
Permissions	`extractJobPermissionsFromParsedWorkflow` (`call_workflow_permissions.go:15`)	`extractPermissionsFromYAMLFile` (`call_workflow_permissions.go:85`)	`extractPermissionsFromMDFile` (`call_workflow_permissions.go:99`)
Secrets	`extractWorkflowCallSecretsFromParsed` (`call_workflow_secrets.go:57`)	`extractSecretsFromWorkflowFile` (`call_workflow_secrets.go:44`)	`extractCallWorkflowSecrets` (`call_workflow_secrets.go:22`) — dispatches by extension
Inputs	`extractWorkflowCallInputsFromParsed` (`call_workflow_validation.go:233`)	`extractWorkflowCallInputs` (`call_workflow_validation.go:184`)	`extractMDWorkflowCallInputs` (`call_workflow_validation.go:195`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] Semantic function clustering: 3 cross-file consolidation opportunities + 10 large files to split #32715

Overview

Summary

Finding 1: Parallel `Markdown` / `Pretty` renderer pairs (10+ pairs)

Finding 2: `extractFrom{ParsedWorkflow,YAMLFile,MDFile}` triplets in `call_workflow_.go`

Finding 3: `pkg/cli/codemod_*.go` — 90+ `get<Name>Codemod()` factories, no central registry

Recommendations

Analysis Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	Lines	Primary contents	Suggested split
`pkg/workflow/frontmatter_extraction_yaml.go`	1153	YAML field extraction for permissions, conditions, commands, labels	Split per-concern: `frontmatter_extraction_permissions.go`, `_conditions.go`, `_commands.go`
`pkg/workflow/compiler_yaml_main_job.go`	1089	Main job step generation: runtime setup, engine install, post-collection	Split into `_main_job_setup.go`, `_main_job_engine.go`, `_main_job_post.go`
`pkg/workflow/domains.go`	1041	Domain allowlisting: ecosystem lookup, provider defaults, network merge	Split into `domain_defaults.go`, `domain_ecosystem.go`, `domain_allowlist.go`
`pkg/workflow/compiler_jobs.go`	1028	Job orchestration: safe-outputs job, pre-activation, maintenance	Split into `compiler_job_safe_outputs.go`, `compiler_job_preactivation.go`, `compiler_job_maintenance.go`
`pkg/workflow/cache.go`	1007	Cache config parsing & validation	Split into `cache_parsing.go` and `cache_validation.go`

File	Lines	Primary contents	Suggested split
`pkg/cli/forecast.go`	1142	Forecast command + Monte Carlo simulation + cost/token calculation	Extract `forecast_montecarlo.go`, `forecast_costs.go`
`pkg/cli/audit.go`	1120	Audit command, multi-run orchestration, result caching	Extract `audit_multirun.go`, `audit_cache.go`
`pkg/cli/logs_orchestrator.go`	1046	Orchestrator step parsing + task/job correlation	Split parsing from correlation
`pkg/cli/audit_diff.go`	977	Diff computation for firewall, metrics, tool calls	Already coherent — consider only if Finding 1 lands (drives shrinkage in companion render file)
`pkg/cli/logs_download.go`	944	Artifact download, caching, progress tracking, parallel ops	Extract `logs_download_parallel.go`, `logs_download_cache.go`

[refactor] Semantic function clustering: 3 cross-file consolidation opportunities + 10 large files to split #32715

Description

Overview

Summary

Finding 1: Parallel Markdown / Pretty renderer pairs (10+ pairs)

Finding 2: extract*From{ParsedWorkflow,YAMLFile,MDFile} triplets in call_workflow_*.go

Finding 3: pkg/cli/codemod_*.go — 90+ get<Name>Codemod() factories, no central registry

Recommendations

Analysis Metadata

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Finding 1: Parallel `Markdown` / `Pretty` renderer pairs (10+ pairs)

Finding 2: `extractFrom{ParsedWorkflow,YAMLFile,MDFile}` triplets in `call_workflow_.go`

Finding 3: `pkg/cli/codemod_*.go` — 90+ `get<Name>Codemod()` factories, no central registry