Reduce duplicate frontmatter scanning in ParseWorkflow by Copilot · Pull Request #34819 · github/gh-aw

Copilot · 2026-05-26T01:14:03Z

ParseWorkflow showed a benchmark regression in the workflow parsing phase. This change trims repeated work in frontmatter extraction so the parser does less per-file scanning on the hot path.

What changed
- Consolidate frontmatter metadata extraction into a single pass.
- Build FrontmatterLines and top-level FieldLines together instead of scanning the same YAML twice.
- Keep existing parsing behavior intact, including blank-line preservation and top-level key line attribution.
Why this matters
- ParseWorkflowFile always goes through frontmatter extraction.
- Removing a redundant pass reduces steady-state parsing overhead without changing schema validation or workflow semantics.
Behavior preserved
- Blank lines before the closing delimiter remain represented in FrontmatterLines.
- Comment lines are still ignored for top-level field mapping.
- First top-level key occurrence still wins for FieldLines.

Example

frontmatterYAML := content[searchStart:frontmatterEndStart]
frontmatterLines, fieldLines := extractFrontmatterMetadata(frontmatterYAML, frontmatterStartLine)

Targeted coverage
- Add a focused regression test covering mixed frontmatter content:
  - top-level keys
  - blank lines
  - comments
  - absolute line-number mapping

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Copilot

Pull request overview

This PR optimizes workflow frontmatter parsing by consolidating frontmatter line extraction and top-level field line attribution into a single metadata pass, reducing redundant scanning work during ParseWorkflow/ParseWorkflowFile parsing.

Changes:

Replaced separate frontmatter line splitting and top-level key scanning with a combined extractFrontmatterMetadata pass.
Updated ExtractFrontmatterFromContent to use the combined metadata output directly.
Added a regression test to ensure blank-line preservation and correct absolute line-number mapping for top-level fields.

Show a summary per file

File	Description
pkg/parser/frontmatter_content.go	Consolidates frontmatter line capture + top-level key line attribution into `extractFrontmatterMetadata` and wires it into `ExtractFrontmatterFromContent`.
pkg/parser/frontmatter_field_lines_test.go	Adds coverage for preserving blank lines and mapping top-level field lines correctly in mixed frontmatter content.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/parser/frontmatter_content.go:150

strings.Count(frontmatterYAML, "\n") adds an extra full pass over the frontmatter text. Since this code is on the hot path, consider avoiding the pre-count (or using a small fixed initial capacity) so metadata extraction can stay truly single-pass over frontmatterYAML.

	lines := make([]string, 0, strings.Count(frontmatterYAML, "\n")+1)
	fieldLines := make(map[string]int)

Files reviewed: 2/2 changed files
Comments generated: 2

-			}
-		}
-	}
+	_, fieldLines := extractFrontmatterMetadata(yamlContent, frontmatterStart)


+func extractFrontmatterMetadata(frontmatterYAML string, frontmatterStart int) ([]string, map[string]int) {
+	if frontmatterYAML == "" {
+		return []string{}, map[string]int{}
+	}


github-actions · 2026-05-26T01:40:03Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #34819 does not have the 'implementation' label and has only 52 new lines in business logic directories (below the 100-line threshold).

github-actions · 2026-05-26T01:40:04Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-05-26T01:40:12Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-05-26T01:40:14Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-05-26T01:44:59Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 95/100 — Excellent

Analyzed 1 test: 1 design (behavioral contract), 0 implementation, 0 guideline violations — 2 minor assertion messages missing.

📊 Metrics & Test Classification (1 test analyzed)

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	1 (100%)
Duplicate test clusters	0
Test inflation detected	No (9 test lines added vs. 43 production lines)
🚨 Coding-guideline violations	0 (no mock libraries; build tag present)

Test Classification Details

Test	File	Classification	Issues Detected
`preserves blank frontmatter lines while tracking top level fields`	`pkg/parser/frontmatter_field_lines_test.go:106`	✅ Design	2 `assert.Equal` calls missing descriptive message arguments (minor)

Language Support

Tests analyzed:

🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
�� JavaScript (*.test.cjs, *.test.js): 0 tests

⚠️ Minor Style Notes — Not a Failure (1 issue)

⚠️ `preserves blank frontmatter lines while tracking top level fields` (`pkg/parser/frontmatter_field_lines_test.go:110–111`)

Issue: Two assert.Equal calls are missing descriptive message arguments. The project guideline asks for a message on every assertion so a failing test output is self-explanatory without reading the source.

Suggested fix:

assert.Equal(t, []string{"engine: copilot", "", "# comment", "on: push"}, result.FrontmatterLines, "FrontmatterLines should preserve blank lines and comments")
assert.Equal(t, map[string]int{"engine": 2, "on": 5}, result.FieldLines, "FieldLines should record 1-based line numbers of top-level keys")

This is a minor style note, not a blocking issue.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). The new sub-test verifies that ExtractFrontmatterFromContent correctly preserves blank lines and YAML comment lines in FrontmatterLines and maps top-level keys to their correct 1-based line numbers in FieldLines — a clear behavioral contract covering an edge case (blank/comment lines in frontmatter).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1M · ◷

github-actions

Test Quality Sentinel: 95/100. Test quality is excellent - 0% of new tests are implementation tests (threshold: 30%).

github-actions

Review: Reduce duplicate frontmatter scanning

The consolidation is correct — single-pass extractFrontmatterMetadata produces identical results to the two separate passes it replaces, and the new test confirms blank-line/comment/field-line parity. No correctness or concurrency issues found in the changed lines.

Findings summary

Both unresolved existing comments identify the two real issues with this PR:

extractTopLevelFieldLines now allocates a full []string it immediately discards — the wrapper exists only for the test in frontmatter_field_lines_test.go, so this wasted allocation is on the test path only. Still, a PR whose stated goal is reducing parsing overhead should not leave a caller that is strictly more expensive than before. Consider inlining or eliminating the wrapper.
splitFrontmatterLines is now dead code — nothing in production or tests calls it after this change. It should be deleted to avoid drift.

All other candidate issues (relLine arithmetic, trailing-newline trim vs. fieldLines correlation, scope of frontmatterStartLine) are either pre-existing or not actionable in the changed lines.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.8M

github-actions

Skills-Based Review 🧠

Applied /zoom-out and /tdd — no blocking issues, but two cleanup opportunities and a test coverage gap worth addressing.

📋 Key Themes & Highlights

Key Themes

Dead code left behind: splitFrontmatterLines (lines 131–142) is no longer called anywhere — it should be deleted.
Thin wrapper: extractTopLevelFieldLines is now a one-liner that only exists to satisfy its own test; consider collapsing it.
Test coverage: The added regression test is well-targeted, but the trailing-no-newline branch and the duplicate-key dedup path in extractFrontmatterMetadata are uncovered.

Positive Highlights

✅ Clean single-pass consolidation — extractFrontmatterMetadata is easy to reason about and neatly replaces two separate scans.
✅ strings.IndexByte micro-optimisation is a sensible improvement over strings.Index for a single-byte delimiter.
✅ Capacity hint (strings.Count(..., "\n")+1) avoids slice reallocations on the hot path — good attention to detail.
✅ frontmatterStartLine const promotion is clean and the value is now easier to find.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.6M

Comments that could not be inline-anchored

pkg/parser/frontmatter_content.go:131

[/zoom-out] splitFrontmatterLines is now unreachable dead code — ExtractFrontmatterFromContent was its only call site, and that now delegates to extractFrontmatterMetadata. The function can be deleted.

<details>
<summary>💡 Suggested cleanup</summary>

Remove lines 131–142 (splitFrontmatterLines). Keeping it silently diverges from the new single-pass contract and will confuse future readers about which path is canonical.

</details>

pkg/parser/frontmatter_content.go:34

[/zoom-out] extractTopLevelFieldLines is now a one-liner wrapper around extractFrontmatterMetadata that discards the []string return value. Its only remaining caller is the direct unit test (frontmatter_field_lines_test.go:76). Consider either removing the wrapper and updating the test to call extractFrontmatterMetadata directly, or documenting why the indirection is intentional.

<details>
<summary>💡 Options</summary>

Option A – remove the wrapper (preferred): update `front…

pkg/parser/frontmatter_field_lines_test.go:111

[/tdd] The regression test covers the happy path well, but two boundary conditions for the new extractFrontmatterMetadata logic are unexercised: (1) input with no trailing newline and (2) a duplicate top-level key (second occurrence should be ignored).

<details>
<summary>💡 Suggested additions</summary>

t.Run(&quot;no trailing newline preserves all lines&quot;, func(t *testing.T) {
    // frontmatterYAML without a trailing \n — the trim branch at line 175
    // must NOT remove the …

</details>

pelikhan · 2026-05-26T01:49:11Z

@copilot debug https://github.com/github/gh-aw/actions/runs/26427072897/job/77793663918

Copilot · 2026-05-26T02:01:03Z

@copilot debug https://github.com/github/gh-aw/actions/runs/26427072897/job/77793663918

The failing job is test in the CGO workflow, and the failure is unrelated to this parser change. It failed in pkg/cli on TestRenderCrossRunReportMarkdown_IncludesNewSections because the captured output did not contain Token Trend.

I checked the scope of the last two commits: they only touch pkg/parser/frontmatter_content.go and pkg/parser/frontmatter_field_lines_test.go.

I also reproduced locally:

go test ./pkg/cli -run '^TestRenderCrossRunReportMarkdown_IncludesNewSections$' -v ✅
go test ./pkg/cli -run '^Test' -count=10 ✅
go test ./pkg/cli -run '^TestRenderCrossRunReportMarkdown_IncludesNewSections$' -parallel=32 -shuffle=on -count=100 ✅

So this looks like an unrelated/flaky pkg/cli stdout-capture test failure rather than a regression from Reduce duplicate frontmatter scanning in ParseWorkflow.

Initial plan

9b384e5

Copilot AI assigned Copilot and gh-aw-bot May 26, 2026

Copilot started work on behalf of gh-aw-bot May 26, 2026 01:14 View session

Copilot AI linked an issue May 26, 2026 that may be closed by this pull request

[performance] Regression in ParseWorkflow: +34.3% slower #34694

Closed

github-actions Bot mentioned this pull request May 26, 2026

[aw] No-Op Runs #34821

Open

perf: streamline frontmatter metadata extraction

891126a

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix performance regression in ParseWorkflow~~ Reduce duplicate frontmatter scanning in ParseWorkflow May 26, 2026

Copilot AI requested a review from gh-aw-bot May 26, 2026 01:25

Copilot finished work on behalf of gh-aw-bot May 26, 2026 01:25

pelikhan marked this pull request as ready for review May 26, 2026 01:27

Copilot AI review requested due to automatic review settings May 26, 2026 01:27

Copilot started reviewing on behalf of pelikhan May 26, 2026 01:28 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 26, 2026

[Contribution Check Report] Contribution Check — 2026-05-26 #34830

Open

github-actions Bot approved these changes May 26, 2026

View reviewed changes

github-actions Bot reviewed May 26, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 26, 2026 01:49 View session

Copilot AI requested a review from pelikhan May 26, 2026 02:01

Copilot finished work on behalf of pelikhan May 26, 2026 02:01

pelikhan merged commit b20b030 into main May 26, 2026
83 of 94 checks passed

pelikhan deleted the copilot/performance-regression-fix-parseworkflow branch May 26, 2026 02:15

Conversation

Copilot AI commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

⚠️ preserves blank frontmatter lines while tracking top level fields (pkg/parser/frontmatter_field_lines_test.go:110–111)

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Review: Reduce duplicate frontmatter scanning

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Comments that could not be inline-anchored

Uh oh!

pelikhan commented May 26, 2026

Uh oh!

Copilot AI commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

⚠️ `preserves blank frontmatter lines while tracking top level fields` (`pkg/parser/frontmatter_field_lines_test.go:110–111`)