[repository-quality] Repository Quality Improvement Report - Large File Decomposition #31693
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Repository Quality Improvement Agent. A newer discussion is available at Discussion #31945. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🎯 Repository Quality Improvement Report - Large File Decomposition
Analysis Date: 2026-05-12
Focus Area: Large File Decomposition
Strategy Type: Custom
Custom Area: Yes — the repository's own
AGENTS.mddocuments a hard limit of 300 lines per file and a recommended 100–200 lines per validator, yet 211 source files exceed 300 lines and 97 exceed 500 lines, making this the most actionable gap.Executive Summary
gh-aw has grown a class of source files that significantly exceeds its own documented complexity guidelines. Of 372 non-test Go files in
pkg/, 211 (57%) are above the 300-line hard limit, and 97 (26%) exceed 500 lines. The most critical offender is a single 508-line method —extractAllImportFieldsinpkg/parser/import_field_extractor.go— that handles YAML parsing, validation, schema defaults, input substitution, and observability extraction all in one function. Similarly,pkg/workflow/domains.gobundles per-engine domain data as inlinevarblocks alongside the logic that assembles them, making adding or auditing a new engine error-prone.These oversized files create concrete costs: slower code review, test coverage gaps (a 508-line function with a single test exercises very little branching), and higher cognitive load for contributors. Decomposition is low-risk here because the repository has an exceptionally strong test suite (2.1× test-to-source LOC ratio) and thorough
make agent-finishvalidation, meaning refactored units can be validated confidently.The four tasks below are independently actionable, sized from a 30-minute extraction to a 2-hour careful split. All targets were chosen for clear internal seams rather than arbitrary line counts.
Full Analysis Report
Focus Area: Large File Decomposition
Current State Assessment
Metrics Collected:
pkg/parser/import_field_extractor.go— 1127 linesextractAllImportFields— 508 linesTop 10 oversized files:
pkg/parser/import_field_extractor.gopkg/workflow/frontmatter_extraction_yaml.gocommentOutProcessedFieldsInOnSectionspans lines 107–688 (581 lines)pkg/workflow/compiler_yaml_main_job.gopkg/cli/forecast.gopkg/cli/audit.gopkg/workflow/domains.govarblocks + logicpkg/workflow/compiler_jobs.gopkg/workflow/cache.gopkg/cli/logs_orchestrator.gopkg/workflow/threat_detection.goFindings
Strengths
make agent-finishprovides a fast, reliable validation gateAreas for Improvement
extractAllImportFields(508 lines) is a god-method with at least 6 distinct responsibilities: frontmatter parsing, tool/MCP merging, engine detection, input schema validation, input substitution, and observability extractioncommentOutProcessedFieldsInOnSectioninfrontmatter_extraction_yaml.go(581 lines) contains deeply nested YAML string manipulation that should be broken into named stagesdomains.goinlines engine domain data asvarblocks mixed with domain-assembly logic; the data and logic are distinct concernsforecast.gomixes struct definitions, business logic (forecastWorkflow), metadata loading, variant computation, and three rendering formats in one fileDetailed Analysis
extractAllImportFields(508 lines) — The function has identifiable phases, each suitable for extraction:parseImportFrontmatter()mergeImportTools()accumulateImportMetadata()validateWithImportSchema()but inlined here tooextractImportObservability()domains.go— 18 namedvarblocks (CopilotDefaultDomains,ClaudeDefaultDomains, etc.) are data, not logic. Moving them to adomains_data.gofile and keeping only the assembly functions indomains.gowould halve both files and make engine-specific domain lists easy to find and audit.forecast.go— Natural split into:forecast_types.go(struct definitions),forecast_run.go(core logic),forecast_render.go(table/JSON rendering). This mirrors patterns already used elsewhere inpkg/cli/(e.g.,audit.go/audit_diff.go/audit_report.go).🤖 Tasks for Copilot Agent
NOTE TO PLANNER AGENT: Split the following tasks into individual work items.
Improvement Tasks
Task 1: Extract phases from
extractAllImportFieldsinto helper methodsPriority: High
Estimated Effort: Medium
Focus Area: Large File Decomposition
Description: The
extractAllImportFieldsmethod inpkg/parser/import_field_extractor.gois 508 lines long. Extract each logical phase into a private helper method onimportAccumulator: tool/MCP merging, engine/bot/label accumulation, and observability extraction. Each helper should have its own unit test assertions in the existing test file.Acceptance Criteria:
extractAllImportFieldsis reduced to ≤150 lines (orchestration only)mergeImportTools,accumulateImportMetadata,extractImportObservability)pkg/parsertests pass (go test ./pkg/parser/)make agent-finishpassesCode Region:
pkg/parser/import_field_extractor.goTask 2: Split
domains.gointo data file and logic filePriority: High
Estimated Effort: Small
Focus Area: Large File Decomposition
Description:
pkg/workflow/domains.go(1040 lines) contains 18 engine-specific domainvarblocks alongside the assembly and validation logic. Separate the data intodomains_data.goand keep only functions indomains.go. This is a mechanical move with zero logic changes.Acceptance Criteria:
pkg/workflow/domains_data.gocreated with allvardomain declarationspkg/workflow/domains.gocontains only functions (≤400 lines)pkg/workflowtests pass:go test ./pkg/workflow/make agent-finishpassesCode Region:
pkg/workflow/domains.goTask 3: Split
pkg/cli/forecast.gointo types, run, and render filesPriority: Medium
Estimated Effort: Medium
Focus Area: Large File Decomposition
Description:
pkg/cli/forecast.go(1058 lines, 25 functions) mixes struct type definitions, business logic, metadata helpers, and two rendering functions. Split following the pattern already established byaudit.go/audit_report.go/audit_diff.go.Acceptance Criteria:
forecast_types.gocreated with all struct and type definitionsforecast_render.gocreated withrenderForecastJSON,renderForecastTable,printEpisodeBreakdown,printEvalBreakdown,printVariantBreakdownforecast.gocontains only command setup and core logic functions (≤400 lines)go test ./pkg/cli/ -run ".*[Ff]orecast"passesmake agent-finishpassesCode Region:
pkg/cli/forecast.goTask 4: Add
AGENTS.mdguideline enforcement note tocommentOutProcessedFieldsInOnSectionPriority: Low
Estimated Effort: Small
Focus Area: Large File Decomposition
Description:
commentOutProcessedFieldsInOnSectioninpkg/workflow/frontmatter_extraction_yaml.gois 581 lines (lines 107–688) — the largest single function in the codebase. It processes YAML string manipulation through deeply nested branching. Add a// TODO(decomp):comment at the top of the function outlining the 4 identifiable stages to guide a future decomposition, and open a follow-up issue in the repository to track it. This serves as a low-cost breadcrumb so the function doesn't keep getting extended without visibility.Acceptance Criteria:
// TODO(decomp):block comment added at line 107 listing the 4 stages: (1) field classification, (2) on-section rewriting, (3) inline comment injection, (4) whitespace normalisationmake build && make fmtpassesCode Region:
pkg/workflow/frontmatter_extraction_yaml.go:107No logic changes. Run
make build && make fmtto verify.Beta Was this translation helpful? Give feedback.
All reactions