Skip to content

Refactor pkg/parser long production functions into focused helper units#34297

Merged
pelikhan merged 10 commits into
mainfrom
copilot/lint-monster-refactor-long-functions
May 24, 2026
Merged

Refactor pkg/parser long production functions into focused helper units#34297
pelikhan merged 10 commits into
mainfrom
copilot/lint-monster-refactor-long-functions

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 23, 2026

This issue targeted widespread largefunc violations in pkg/parser (notably import/frontmatter/URL/include/MCP/schema paths) and requested structural refactoring without behavior changes. This PR decomposes long production functions into smaller domain helpers to keep parser logic readable and testable.

  • Frontmatter + hash decomposition

    • Split frontmatter extraction and hash/import normalization flows into dedicated helper stages.
    • Reduced monolithic control paths in frontmatter_content.go and frontmatter_hash.go while keeping existing parse semantics.
  • GitHub URL parsing decomposition

    • Refactored ParseGitHubURL into smaller route-specific parsing helpers (run/PR/file/repo patterns), isolating validation and extraction concerns.
  • Import pipeline decomposition

    • Broke large import-processing functions into focused units across:
      • BFS import resolution
      • topological import ordering
      • import-field preparation/activation/feature extraction
      • include expansion/processing
    • Preserved existing import behavior and error shaping while reducing function size.
  • MCP / remote fetch / schema / schedule decomposition

    • Split large MCP config extraction/parsing code paths.
    • Decomposed remote include resolution/download/symlink handling into helper functions.
    • Split schedule parsing/scattering and schema validation/error hint logic into smaller units.
    • Kept behavior intact; changes are structural and locality-focused.
  • Targeted hardening in touched code

    • Improved temp-file cleanup flow in remote include download handling to ensure cleanup across error paths.
func writeDownloadedIncludeToTempFile(content []byte) (string, error) {
	tempFile, err := os.CreateTemp("", "gh-aw-include-*.md")
	if err != nil {
		return "", fmt.Errorf("failed to create temp file: %w", err)
	}
	cleanupOnError := true
	fileClosed := false
	defer func() {
		if cleanupOnError {
			if !fileClosed {
				_ = tempFile.Close()
			}
			_ = os.Remove(tempFile.Name())
		}
	}()
	// write + close success path...
}

@github-actions
Copy link
Copy Markdown
Contributor

Hey @copilot-swe-agent 👋 — thanks for working on the pkg/parser refactoring! However, this PR doesn't align with the repository's contribution process:

Process Issue: The CONTRIBUTING.md explicitly states that "Traditional Pull Requests Are Not Enabled for non-Core team members" and that contributors should "create detailed agentic plans in issues, discuss with the team, and a core team member will create and implement the PR for you using agents." This PR appears to have been created by an automated agent rather than following the issue-first workflow.

Missing Implementation: The PR is marked [WIP] with 0 lines changed. Before a PR is opened, it should contain the actual implementation.

Next Steps:

  1. If you're a core team member or authorized bot, please clarify your status and proceed with the implementation.
  2. If you're contributing from outside the core team, please close this PR and work through the issue-based agentic planning process described in CONTRIBUTING.md instead.
  3. Once the code is ready, ensure you've added tests for the refactored functions and run make golint-custom to verify all violations are resolved.

If you'd like to complete the implementation, here's a prompt for your coding agent:

Implement the pkg/parser refactoring described in issue #[issue-number]:
1. Identify all functions in pkg/parser exceeding 60 lines
2. Break each into smaller helper functions (40-60 lines each)
3. Use descriptive names that clarify intent
4. Add unit tests for newly extracted helper functions
5. Run `make golint-custom` to verify all violations are resolved
6. Ensure zero functional changes — only structural improvements

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • patchdiff.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "patchdiff.githubusercontent.com"

See Network Configuration for more information.

Generated by ✅ Contribution Check · ● son45 1.6M ·

Copilot AI and others added 4 commits May 23, 2026 21:22
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor long functions in pkg/parser package Refactor pkg/parser long production functions into focused helper units May 23, 2026
Copilot AI requested a review from pelikhan May 23, 2026 21:53
@pelikhan pelikhan marked this pull request as ready for review May 23, 2026 23:01
Copilot AI review requested due to automatic review settings May 23, 2026 23:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

⚠️ PR Code Quality Reviewer failed during code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

🧪 Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR. This is a pure refactoring effort in pkg/parser with production code changes only. Test Quality Sentinel analysis skipped.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

Design Decision Gate 🏗️ failed during design decision gate check.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors pkg/parser to break up several long production functions (imports/include expansion, schema validation/suggestions, MCP extraction, schedule scattering, URL parsing, YAML workflow imports, and frontmatter parsing/hashing) into smaller helper units, aiming to preserve existing behavior while improving readability and maintainability.

Changes:

  • Decomposed import/include processing pipelines (BFS, topo sort, include expansion/processing) into focused helper functions.
  • Split schema validation/suggestion/error-hint logic into composable helpers while keeping existing error formatting behavior.
  • Refactored MCP extraction/parsing, remote include fetching (incl. temp-file cleanup), schedule scattering, and GitHub URL parsing into smaller units.
Show a summary per file
File Description
pkg/parser/yaml_import.go Splits YAML workflow import into read/parse/validate + jobs/services extraction helpers.
pkg/parser/tools_merger.go Refactors tool JSON merge flow into single-object, line-parse, and merge helpers.
pkg/parser/sub_agent_extractor.go Extracts inline sub-agent parsing into validation/heading scan/extraction helpers.
pkg/parser/schema_suggestions.go Breaks suggestion generation into enum/additionalProperties/range/type helpers + example builders.
pkg/parser/schema_errors.go Decomposes known-field hint selection and “did you mean”/docs URL appending helpers.
pkg/parser/schema_compiler.go Refactors schema validation-with-location into context-reading and formatting helpers.
pkg/parser/schedule_time_utils.go Splits time parsing into UTC-offset split + base-time parsing helpers.
pkg/parser/schedule_fuzzy_scatter.go Rewrites fuzzy schedule scattering into handler pipeline + shared parsing/scatter utilities.
pkg/parser/remote_fetch.go Splits include path resolution/workflowspec parsing/SHA caching + hardens temp-file cleanup; refactors symlink retry flow.
pkg/parser/mcp.go Decomposes MCP config extraction/parsing and built-in tool config assembly into helper functions.
pkg/parser/json_path_locator.go Refactors nested additional-property location logic into extract/map + nested-section helpers.
pkg/parser/include_processor.go Decomposes include directive handling (fast path, warnings, resolution, visited) into helpers.
pkg/parser/include_expander.go Refactors iterative include expansion + manifest building + fast path into helpers.
pkg/parser/import_topological.go Splits topological sort into set/dependency building/in-degree/root collection/Kahn steps helpers.
pkg/parser/import_field_extractor.go Refactors import frontmatter preparation, activation/feature extraction, and import-input substitution into helper functions.
pkg/parser/import_bfs.go Decomposes BFS import traversal into state + seeding/queue processing + nested import enqueue helpers.
pkg/parser/github_urls.go Splits GitHub URL parsing into host routing + type-specific helper parsers.
pkg/parser/frontmatter_hash.go Extracts sorted marshaling and import-text scanning into helper units/state struct.
pkg/parser/frontmatter_content.go Refactors frontmatter delimiter scanning/YAML parsing/markdown extraction into helpers and preserves fast-path behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/parser/mcp.go:660

  • Same malformed Authorization header example is repeated in this error path. It should use a valid expression (e.g., "Bearer ${{ secrets.API_KEY }}") so users can copy/paste it safely.
		return fmt.Errorf(
			"url field must be a string, got %T. Example:\n"+
				"mcp-servers:\n"+
				"  %s:\n"+
				"    type: http\n"+
				"    url: \"https://api.example.com/mcp\"\n"+
				"    headers:\n"+
				"      Authorization: \"****** secrets.API_KEY }}\"",
			url, toolName)
	}
  • Files reviewed: 19/19 changed files
  • Comments generated: 3

Comment thread pkg/parser/mcp.go Outdated
}

func serverFilterMatches(serverName, serverFilter string) bool {
return serverFilter == "" || strings.Contains(serverName, strings.ToLower(serverFilter))
Comment thread pkg/parser/mcp.go
Comment on lines +637 to +647
return fmt.Errorf(
"http MCP tool '%s' missing required 'url' field. HTTP MCP servers must specify a URL endpoint. "+
"Example:\n"+
"mcp-servers:\n"+
" %s:\n"+
" type: http\n"+
" url: \"https://api.example.com/mcp\"\n"+
" headers:\n"+
" Authorization: \"****** secrets.API_KEY }}\"",
toolName, toolName,
)
Comment thread pkg/parser/import_bfs.go Outdated
Comment on lines +380 to +382
specs, err := parseImportSpecsFromArray(v)
if err != nil {
return nil
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out, /improve-codebase-architecture, and /tdd — commenting only; no blocking issues. The structural goal is solid and the decomposition is generally clean.

📋 Key Themes & Highlights

Key Themes

  • Pre-existing correctness gap: marshalSortedMap writes a comma separator before checking whether key marshaling succeeded, which can produce invalid JSON output on marshal errors. The refactor is an ideal moment to close this.
  • Misleading predicate conditions: exitsImportsBlock guards on trimmed != "" / not-a-comment, but these are always true due to upstream filtering — slightly misleading for future readers.
  • Diagnostic quality regression: parseGitHubURLByType now silently returns "unrecognized GitHub URL format" for malformed-but-recognized URL types (e.g. pull URL missing the number segment), whereas a type-specific error message would be far more actionable.
  • Positional return values: frontmatterLineBounds returning three bare ints is harder to follow than a small named struct.
  • No new tests: The extracted helpers are non-trivial; adding table-driven unit tests for importExtractionState and the new URL parsing helpers would lock in the refactored semantics.

Positive Highlights

  • importExtractionState struct is a great abstraction — state machine logic is now easy to read and modify independently.
  • marshalSortedMap / marshalSortedSlice / marshalSortedValue decomposition is clean and symmetric.
  • ✅ The temp-file cleanup hardening in remote include download is a welcome correctness improvement.
  • toImportSet / buildImportDependencies / calculateInDegree / collectRootImports / runKahnTopologicalSort pipeline reads almost like pseudocode now — very nice.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● son46 2.4M


var result strings.Builder
result.WriteString("{")
for i, key := range keys {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Dangling comma when a key fails to marshal — the comma separator is written before the error check, so a failure at i > 0 produces {...prevEntry,}, which is invalid JSON.

This is a pre-existing issue, but the refactor is a clean opportunity to fix it.

💡 Suggested fix — track first valid entry instead of using the index
first := true
for _, key := range keys {
    keyJSON, err := marshalJSONWithoutHTMLEscape(key)
    if err != nil {
        frontmatterHashLog.Printf("Warning: failed to marshal key %s: %v", key, err)
        continue
    }
    if !first {
        result.WriteString(",")
    }
    first = false
    result.WriteString(keyJSON)
    result.WriteString(":")
    result.WriteString(marshalSorted(v[key]))
}

This ensures the comma only appears between successfully written entries.

return true
}

func (s *importExtractionState) exitsImportsBlock(trimmed string, lineIndent int) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Misleading dead conditions in exitsImportsBlocktrimmed != "" and !strings.HasPrefix(trimmed, "#") are always true here because the caller's loop already filters out empty lines and comments before calling this method.

This makes the predicate look like it handles those cases, but it never will. Either remove the dead conditions, or add a comment noting the caller's pre-filter precondition.

💡 Options

Option A — remove the redundant conditions (simpler):

func (s *importExtractionState) exitsImportsBlock(_ string, lineIndent int) bool {
    // Precondition: trimmed is non-empty and not a comment (filtered upstream)
    return lineIndent <= s.baseIndent
}

Option B — keep conditions and add a comment explaining they are asserted by the caller (defensive).

Comment thread pkg/parser/github_urls.go
Ref: ref,
}, nil
}
func parseGitHubURLByType(parsedURL *url.URL, host, owner, repo string, pathParts []string, urlType string) (*GitHubURLComponents, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Silent fallthrough gives a misleading error for malformed-but-recognized URL types — if urlType is "pull" but len(pathParts) < 4, the switch matches case "pull" but the length guard is not satisfied, so control falls out of the switch and the function returns "unrecognized GitHub URL format". A caller seeing that error for a github.com/owner/repo/pull/abc URL would have no idea the type was recognized.

💡 Suggested fix — return a type-specific error from each length guard
case "pull":
    if len(pathParts) < 4 {
        return nil, fmt.Errorf("pull request URL requires at least 4 path segments, got %d", len(pathParts))
    }
    return parseNumberedURL(host, owner, repo, URLTypePullRequest, "PR", pathParts[3])

Apply the same pattern to actions, runs, issues, and blob/tree/raw.

// the closing delimiter does not create an additional empty frontmatter line.
if strings.HasSuffix(frontmatterYAML, "\n") {
frontmatterLines = frontmatterLines[:len(frontmatterLines)-1]
func frontmatterLineBounds(content string, cursor int) (int, int, int) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] frontmatterLineBounds returns three positional int values — callers must remember the order (lineStart, lineEnd, nextCursor) without any compiler assistance. A small struct would make the site-of-call self-documenting at zero runtime cost.

💡 Suggested refactor
type lineBounds struct{ lineStart, lineEnd, nextCursor int }

func frontmatterLineBounds(content string, cursor int) lineBounds {
    b := lineBounds{lineStart: cursor, lineEnd: len(content), nextCursor: len(content) + 1}
    if cursor < len(content) {
        if rel := strings.IndexByte(content[cursor:], '\n'); rel >= 0 {
            b.lineEnd = cursor + rel
            b.nextCursor = b.lineEnd + 1
        }
    }
    return b
}

Callsites become b := frontmatterLineBounds(content, cursor); b.lineStart; b.lineEnd; b.nextCursor — readable without scrolling back to the function signature.

isObjectForm bool
}

func (s *importExtractionState) beginImportsBlock(line, trimmed string) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The newly extracted helper methods on importExtractionState have no unit tests. The logic for detecting object-form imports (aw: subfield), multi-indent transitions, and the extractImportItem path gating is subtle enough that targeted table-driven tests would catch regressions much faster than relying on end-to-end workflow compilation tests.

💡 Suggested test sketch
func TestImportExtractionState(t *testing.T) {
    tests := []struct{
        name     string
        yaml     string
        expected []string
    }{
        {"array form", "imports:\n  - a.md\n  - b.md\n", []string{"a.md", "b.md"}},
        {"object form aw field", "imports:\n  aw:\n    - a.md\n  other:\n    - b.md\n", []string{"a.md"}},
        {"exits on dedent", "imports:\n  - a.md\nother: val\n", []string{"a.md"}},
    }
    for _, tc := range tests {
        t.Run(tc.name, func(t *testing.T) {
            got := extractImportsFromText(tc.yaml)
            assert.Equal(t, tc.expected, got)
        })
    }
}

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot lint go, update wasm golden, fix test https://github.com/github/gh-aw/actions/runs/26345865785/job/77555632202?pr=34297, review all comments

pelikhan and others added 3 commits May 23, 2026 16:27
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 23, 2026

@copilot lint go, update wasm golden, fix test https://github.com/github/gh-aw/actions/runs/26345865785/job/77555632202?pr=34297, review all comments

Done in 3d5e908 and 844d3b9.

  • Ran CI failure triage from the linked run/job logs.
  • Addressed parser review feedback (MCP filter case handling, MCP header example fix, nested-import lenient parsing + regression test).
  • Ran Go lint for pkg/parser and parser tests.
  • Updated wasm golden fixtures and verified with make test-wasm-golden.

@pelikhan
Copy link
Copy Markdown
Collaborator

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 23, 2026

@copilot fix tests https://github.com/github/gh-aw/actions/runs/26346574528/job/77557351286?pr=34297, update wasm golden tests

Fixed in 72ef2f5.

  • Updated the failing TestApplyContainerPins expected node:lts-alpine digest in pkg/workflow/docker_pin_test.go.
  • Updated Node wasm golden normalization in scripts/test-wasm-golden.mjs to match Go wasm golden normalization for Copilot default model fallback.

Validated locally with:

  • go test ./pkg/workflow -run TestApplyContainerPins -v
  • node scripts/test-wasm-golden.mjs
  • make test-wasm-golden

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot make recompile

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 24, 2026

@copilot make recompile

Ran make recompile on 72ef2f5; it completed successfully and produced no file changes (working tree remains clean).

@pelikhan pelikhan merged commit f2738c9 into main May 24, 2026
27 checks passed
@pelikhan pelikhan deleted the copilot/lint-monster-refactor-long-functions branch May 24, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[lint-monster] Refactor long functions in pkg/parser package (80+ violations)

3 participants