Refactor pkg/parser long production functions into focused helper units#34297
Conversation
|
Hey Process Issue: The CONTRIBUTING.md explicitly states that "Traditional Pull Requests Are Not Enabled for non-Core team members" and that contributors should "create detailed agentic plans in issues, discuss with the team, and a core team member will create and implement the PR for you using agents." This PR appears to have been created by an automated agent rather than following the issue-first workflow. Missing Implementation: The PR is marked [WIP] with 0 lines changed. Before a PR is opened, it should contain the actual implementation. Next Steps:
If you'd like to complete the implementation, here's a prompt for your coding agent: Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "patchdiff.githubusercontent.com"See Network Configuration for more information.
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
pkg/parser long production functions into focused helper units
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
|
|
🧪 Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR. This is a pure refactoring effort in pkg/parser with production code changes only. Test Quality Sentinel analysis skipped. |
|
❌ Design Decision Gate 🏗️ failed during design decision gate check. |
There was a problem hiding this comment.
Pull request overview
Refactors pkg/parser to break up several long production functions (imports/include expansion, schema validation/suggestions, MCP extraction, schedule scattering, URL parsing, YAML workflow imports, and frontmatter parsing/hashing) into smaller helper units, aiming to preserve existing behavior while improving readability and maintainability.
Changes:
- Decomposed import/include processing pipelines (BFS, topo sort, include expansion/processing) into focused helper functions.
- Split schema validation/suggestion/error-hint logic into composable helpers while keeping existing error formatting behavior.
- Refactored MCP extraction/parsing, remote include fetching (incl. temp-file cleanup), schedule scattering, and GitHub URL parsing into smaller units.
Show a summary per file
| File | Description |
|---|---|
| pkg/parser/yaml_import.go | Splits YAML workflow import into read/parse/validate + jobs/services extraction helpers. |
| pkg/parser/tools_merger.go | Refactors tool JSON merge flow into single-object, line-parse, and merge helpers. |
| pkg/parser/sub_agent_extractor.go | Extracts inline sub-agent parsing into validation/heading scan/extraction helpers. |
| pkg/parser/schema_suggestions.go | Breaks suggestion generation into enum/additionalProperties/range/type helpers + example builders. |
| pkg/parser/schema_errors.go | Decomposes known-field hint selection and “did you mean”/docs URL appending helpers. |
| pkg/parser/schema_compiler.go | Refactors schema validation-with-location into context-reading and formatting helpers. |
| pkg/parser/schedule_time_utils.go | Splits time parsing into UTC-offset split + base-time parsing helpers. |
| pkg/parser/schedule_fuzzy_scatter.go | Rewrites fuzzy schedule scattering into handler pipeline + shared parsing/scatter utilities. |
| pkg/parser/remote_fetch.go | Splits include path resolution/workflowspec parsing/SHA caching + hardens temp-file cleanup; refactors symlink retry flow. |
| pkg/parser/mcp.go | Decomposes MCP config extraction/parsing and built-in tool config assembly into helper functions. |
| pkg/parser/json_path_locator.go | Refactors nested additional-property location logic into extract/map + nested-section helpers. |
| pkg/parser/include_processor.go | Decomposes include directive handling (fast path, warnings, resolution, visited) into helpers. |
| pkg/parser/include_expander.go | Refactors iterative include expansion + manifest building + fast path into helpers. |
| pkg/parser/import_topological.go | Splits topological sort into set/dependency building/in-degree/root collection/Kahn steps helpers. |
| pkg/parser/import_field_extractor.go | Refactors import frontmatter preparation, activation/feature extraction, and import-input substitution into helper functions. |
| pkg/parser/import_bfs.go | Decomposes BFS import traversal into state + seeding/queue processing + nested import enqueue helpers. |
| pkg/parser/github_urls.go | Splits GitHub URL parsing into host routing + type-specific helper parsers. |
| pkg/parser/frontmatter_hash.go | Extracts sorted marshaling and import-text scanning into helper units/state struct. |
| pkg/parser/frontmatter_content.go | Refactors frontmatter delimiter scanning/YAML parsing/markdown extraction into helpers and preserves fast-path behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
pkg/parser/mcp.go:660
- Same malformed Authorization header example is repeated in this error path. It should use a valid expression (e.g., "Bearer ${{ secrets.API_KEY }}") so users can copy/paste it safely.
return fmt.Errorf(
"url field must be a string, got %T. Example:\n"+
"mcp-servers:\n"+
" %s:\n"+
" type: http\n"+
" url: \"https://api.example.com/mcp\"\n"+
" headers:\n"+
" Authorization: \"****** secrets.API_KEY }}\"",
url, toolName)
}
- Files reviewed: 19/19 changed files
- Comments generated: 3
| } | ||
|
|
||
| func serverFilterMatches(serverName, serverFilter string) bool { | ||
| return serverFilter == "" || strings.Contains(serverName, strings.ToLower(serverFilter)) |
| return fmt.Errorf( | ||
| "http MCP tool '%s' missing required 'url' field. HTTP MCP servers must specify a URL endpoint. "+ | ||
| "Example:\n"+ | ||
| "mcp-servers:\n"+ | ||
| " %s:\n"+ | ||
| " type: http\n"+ | ||
| " url: \"https://api.example.com/mcp\"\n"+ | ||
| " headers:\n"+ | ||
| " Authorization: \"****** secrets.API_KEY }}\"", | ||
| toolName, toolName, | ||
| ) |
| specs, err := parseImportSpecsFromArray(v) | ||
| if err != nil { | ||
| return nil |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out, /improve-codebase-architecture, and /tdd — commenting only; no blocking issues. The structural goal is solid and the decomposition is generally clean.
📋 Key Themes & Highlights
Key Themes
- Pre-existing correctness gap:
marshalSortedMapwrites a comma separator before checking whether key marshaling succeeded, which can produce invalid JSON output on marshal errors. The refactor is an ideal moment to close this. - Misleading predicate conditions:
exitsImportsBlockguards ontrimmed != ""/ not-a-comment, but these are always true due to upstream filtering — slightly misleading for future readers. - Diagnostic quality regression:
parseGitHubURLByTypenow silently returns"unrecognized GitHub URL format"for malformed-but-recognized URL types (e.g.pullURL missing the number segment), whereas a type-specific error message would be far more actionable. - Positional return values:
frontmatterLineBoundsreturning three bareints is harder to follow than a small named struct. - No new tests: The extracted helpers are non-trivial; adding table-driven unit tests for
importExtractionStateand the new URL parsing helpers would lock in the refactored semantics.
Positive Highlights
- ✅
importExtractionStatestruct is a great abstraction — state machine logic is now easy to read and modify independently. - ✅
marshalSortedMap/marshalSortedSlice/marshalSortedValuedecomposition is clean and symmetric. - ✅ The temp-file cleanup hardening in remote include download is a welcome correctness improvement.
- ✅
toImportSet/buildImportDependencies/calculateInDegree/collectRootImports/runKahnTopologicalSortpipeline reads almost like pseudocode now — very nice.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● son46 2.4M
|
|
||
| var result strings.Builder | ||
| result.WriteString("{") | ||
| for i, key := range keys { |
There was a problem hiding this comment.
[/zoom-out] Dangling comma when a key fails to marshal — the comma separator is written before the error check, so a failure at i > 0 produces {...prevEntry,}, which is invalid JSON.
This is a pre-existing issue, but the refactor is a clean opportunity to fix it.
💡 Suggested fix — track first valid entry instead of using the index
first := true
for _, key := range keys {
keyJSON, err := marshalJSONWithoutHTMLEscape(key)
if err != nil {
frontmatterHashLog.Printf("Warning: failed to marshal key %s: %v", key, err)
continue
}
if !first {
result.WriteString(",")
}
first = false
result.WriteString(keyJSON)
result.WriteString(":")
result.WriteString(marshalSorted(v[key]))
}This ensures the comma only appears between successfully written entries.
| return true | ||
| } | ||
|
|
||
| func (s *importExtractionState) exitsImportsBlock(trimmed string, lineIndent int) bool { |
There was a problem hiding this comment.
[/zoom-out] Misleading dead conditions in exitsImportsBlock — trimmed != "" and !strings.HasPrefix(trimmed, "#") are always true here because the caller's loop already filters out empty lines and comments before calling this method.
This makes the predicate look like it handles those cases, but it never will. Either remove the dead conditions, or add a comment noting the caller's pre-filter precondition.
💡 Options
Option A — remove the redundant conditions (simpler):
func (s *importExtractionState) exitsImportsBlock(_ string, lineIndent int) bool {
// Precondition: trimmed is non-empty and not a comment (filtered upstream)
return lineIndent <= s.baseIndent
}Option B — keep conditions and add a comment explaining they are asserted by the caller (defensive).
| Ref: ref, | ||
| }, nil | ||
| } | ||
| func parseGitHubURLByType(parsedURL *url.URL, host, owner, repo string, pathParts []string, urlType string) (*GitHubURLComponents, error) { |
There was a problem hiding this comment.
[/zoom-out] Silent fallthrough gives a misleading error for malformed-but-recognized URL types — if urlType is "pull" but len(pathParts) < 4, the switch matches case "pull" but the length guard is not satisfied, so control falls out of the switch and the function returns "unrecognized GitHub URL format". A caller seeing that error for a github.com/owner/repo/pull/abc URL would have no idea the type was recognized.
💡 Suggested fix — return a type-specific error from each length guard
case "pull":
if len(pathParts) < 4 {
return nil, fmt.Errorf("pull request URL requires at least 4 path segments, got %d", len(pathParts))
}
return parseNumberedURL(host, owner, repo, URLTypePullRequest, "PR", pathParts[3])Apply the same pattern to actions, runs, issues, and blob/tree/raw.
| // the closing delimiter does not create an additional empty frontmatter line. | ||
| if strings.HasSuffix(frontmatterYAML, "\n") { | ||
| frontmatterLines = frontmatterLines[:len(frontmatterLines)-1] | ||
| func frontmatterLineBounds(content string, cursor int) (int, int, int) { |
There was a problem hiding this comment.
[/improve-codebase-architecture] frontmatterLineBounds returns three positional int values — callers must remember the order (lineStart, lineEnd, nextCursor) without any compiler assistance. A small struct would make the site-of-call self-documenting at zero runtime cost.
💡 Suggested refactor
type lineBounds struct{ lineStart, lineEnd, nextCursor int }
func frontmatterLineBounds(content string, cursor int) lineBounds {
b := lineBounds{lineStart: cursor, lineEnd: len(content), nextCursor: len(content) + 1}
if cursor < len(content) {
if rel := strings.IndexByte(content[cursor:], '\n'); rel >= 0 {
b.lineEnd = cursor + rel
b.nextCursor = b.lineEnd + 1
}
}
return b
}Callsites become b := frontmatterLineBounds(content, cursor); b.lineStart; b.lineEnd; b.nextCursor — readable without scrolling back to the function signature.
| isObjectForm bool | ||
| } | ||
|
|
||
| func (s *importExtractionState) beginImportsBlock(line, trimmed string) bool { |
There was a problem hiding this comment.
[/tdd] The newly extracted helper methods on importExtractionState have no unit tests. The logic for detecting object-form imports (aw: subfield), multi-indent transitions, and the extractImportItem path gating is subtle enough that targeted table-driven tests would catch regressions much faster than relying on end-to-end workflow compilation tests.
💡 Suggested test sketch
func TestImportExtractionState(t *testing.T) {
tests := []struct{
name string
yaml string
expected []string
}{
{"array form", "imports:\n - a.md\n - b.md\n", []string{"a.md", "b.md"}},
{"object form aw field", "imports:\n aw:\n - a.md\n other:\n - b.md\n", []string{"a.md"}},
{"exits on dedent", "imports:\n - a.md\nother: val\n", []string{"a.md"}},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got := extractImportsFromText(tc.yaml)
assert.Equal(t, tc.expected, got)
})
}
}|
@copilot lint go, update wasm golden, fix test https://github.com/github/gh-aw/actions/runs/26345865785/job/77555632202?pr=34297, review all comments |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in
|
|
@copilot fix tests https://github.com/github/gh-aw/actions/runs/26346574528/job/77557351286?pr=34297, update wasm golden tests |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in
Validated locally with:
|
|
@copilot make recompile |
Ran |
This issue targeted widespread
largefuncviolations inpkg/parser(notably import/frontmatter/URL/include/MCP/schema paths) and requested structural refactoring without behavior changes. This PR decomposes long production functions into smaller domain helpers to keep parser logic readable and testable.Frontmatter + hash decomposition
frontmatter_content.goandfrontmatter_hash.gowhile keeping existing parse semantics.GitHub URL parsing decomposition
ParseGitHubURLinto smaller route-specific parsing helpers (run/PR/file/repo patterns), isolating validation and extraction concerns.Import pipeline decomposition
MCP / remote fetch / schema / schedule decomposition
Targeted hardening in touched code