Skip to content

Reduce duplicate frontmatter scanning in ParseWorkflow#34819

Merged
pelikhan merged 2 commits into
mainfrom
copilot/performance-regression-fix-parseworkflow
May 26, 2026
Merged

Reduce duplicate frontmatter scanning in ParseWorkflow#34819
pelikhan merged 2 commits into
mainfrom
copilot/performance-regression-fix-parseworkflow

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 26, 2026

ParseWorkflow showed a benchmark regression in the workflow parsing phase. This change trims repeated work in frontmatter extraction so the parser does less per-file scanning on the hot path.

  • What changed

    • Consolidate frontmatter metadata extraction into a single pass.
    • Build FrontmatterLines and top-level FieldLines together instead of scanning the same YAML twice.
    • Keep existing parsing behavior intact, including blank-line preservation and top-level key line attribution.
  • Why this matters

    • ParseWorkflowFile always goes through frontmatter extraction.
    • Removing a redundant pass reduces steady-state parsing overhead without changing schema validation or workflow semantics.
  • Behavior preserved

    • Blank lines before the closing delimiter remain represented in FrontmatterLines.
    • Comment lines are still ignored for top-level field mapping.
    • First top-level key occurrence still wins for FieldLines.
  • Example

    frontmatterYAML := content[searchStart:frontmatterEndStart]
    frontmatterLines, fieldLines := extractFrontmatterMetadata(frontmatterYAML, frontmatterStartLine)
  • Targeted coverage

    • Add a focused regression test covering mixed frontmatter content:
      • top-level keys
      • blank lines
      • comments
      • absolute line-number mapping

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix performance regression in ParseWorkflow Reduce duplicate frontmatter scanning in ParseWorkflow May 26, 2026
Copilot AI requested a review from gh-aw-bot May 26, 2026 01:25
@pelikhan pelikhan marked this pull request as ready for review May 26, 2026 01:27
Copilot AI review requested due to automatic review settings May 26, 2026 01:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes workflow frontmatter parsing by consolidating frontmatter line extraction and top-level field line attribution into a single metadata pass, reducing redundant scanning work during ParseWorkflow/ParseWorkflowFile parsing.

Changes:

  • Replaced separate frontmatter line splitting and top-level key scanning with a combined extractFrontmatterMetadata pass.
  • Updated ExtractFrontmatterFromContent to use the combined metadata output directly.
  • Added a regression test to ensure blank-line preservation and correct absolute line-number mapping for top-level fields.
Show a summary per file
File Description
pkg/parser/frontmatter_content.go Consolidates frontmatter line capture + top-level key line attribution into extractFrontmatterMetadata and wires it into ExtractFrontmatterFromContent.
pkg/parser/frontmatter_field_lines_test.go Adds coverage for preserving blank lines and mapping top-level field lines correctly in mixed frontmatter content.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/parser/frontmatter_content.go:150

  • strings.Count(frontmatterYAML, "\n") adds an extra full pass over the frontmatter text. Since this code is on the hot path, consider avoiding the pre-count (or using a small fixed initial capacity) so metadata extraction can stay truly single-pass over frontmatterYAML.
	lines := make([]string, 0, strings.Count(frontmatterYAML, "\n")+1)
	fieldLines := make(map[string]int)
  • Files reviewed: 2/2 changed files
  • Comments generated: 2

}
}
}
_, fieldLines := extractFrontmatterMetadata(yamlContent, frontmatterStart)
Comment on lines +144 to +147
func extractFrontmatterMetadata(frontmatterYAML string, frontmatterStart int) ([]string, map[string]int) {
if frontmatterYAML == "" {
return []string{}, map[string]int{}
}
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #34819 does not have the 'implementation' label and has only 52 new lines in business logic directories (below the 100-line threshold).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 95/100 — Excellent

Analyzed 1 test: 1 design (behavioral contract), 0 implementation, 0 guideline violations — 2 minor assertion messages missing.

📊 Metrics & Test Classification (1 test analyzed)
Metric Value
New/modified tests analyzed 1
✅ Design tests (behavioral contracts) 1 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (100%)
Duplicate test clusters 0
Test inflation detected No (9 test lines added vs. 43 production lines)
🚨 Coding-guideline violations 0 (no mock libraries; build tag present)

Test Classification Details

Test File Classification Issues Detected
preserves blank frontmatter lines while tracking top level fields pkg/parser/frontmatter_field_lines_test.go:106 ✅ Design 2 assert.Equal calls missing descriptive message arguments (minor)

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
  • �� JavaScript (*.test.cjs, *.test.js): 0 tests
⚠️ Minor Style Notes — Not a Failure (1 issue)

⚠️ preserves blank frontmatter lines while tracking top level fields (pkg/parser/frontmatter_field_lines_test.go:110–111)

Issue: Two assert.Equal calls are missing descriptive message arguments. The project guideline asks for a message on every assertion so a failing test output is self-explanatory without reading the source.

Suggested fix:

assert.Equal(t, []string{"engine: copilot", "", "# comment", "on: push"}, result.FrontmatterLines, "FrontmatterLines should preserve blank lines and comments")
assert.Equal(t, map[string]int{"engine": 2, "on": 5}, result.FieldLines, "FieldLines should record 1-based line numbers of top-level keys")

This is a minor style note, not a blocking issue.

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The new sub-test verifies that ExtractFrontmatterFromContent correctly preserves blank lines and YAML comment lines in FrontmatterLines and maps top-level keys to their correct 1-based line numbers in FieldLines — a clear behavioral contract covering an edge case (blank/comment lines in frontmatter).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Quality Sentinel: 95/100. Test quality is excellent - 0% of new tests are implementation tests (threshold: 30%).

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Reduce duplicate frontmatter scanning

The consolidation is correct — single-pass extractFrontmatterMetadata produces identical results to the two separate passes it replaces, and the new test confirms blank-line/comment/field-line parity. No correctness or concurrency issues found in the changed lines.

Findings summary

Both unresolved existing comments identify the two real issues with this PR:

  1. extractTopLevelFieldLines now allocates a full []string it immediately discards — the wrapper exists only for the test in frontmatter_field_lines_test.go, so this wasted allocation is on the test path only. Still, a PR whose stated goal is reducing parsing overhead should not leave a caller that is strictly more expensive than before. Consider inlining or eliminating the wrapper.

  2. splitFrontmatterLines is now dead code — nothing in production or tests calls it after this change. It should be deleted to avoid drift.

All other candidate issues (relLine arithmetic, trailing-newline trim vs. fieldLines correlation, scope of frontmatterStartLine) are either pre-existing or not actionable in the changed lines.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.8M

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out and /tdd — no blocking issues, but two cleanup opportunities and a test coverage gap worth addressing.

📋 Key Themes & Highlights

Key Themes

  • Dead code left behind: splitFrontmatterLines (lines 131–142) is no longer called anywhere — it should be deleted.
  • Thin wrapper: extractTopLevelFieldLines is now a one-liner that only exists to satisfy its own test; consider collapsing it.
  • Test coverage: The added regression test is well-targeted, but the trailing-no-newline branch and the duplicate-key dedup path in extractFrontmatterMetadata are uncovered.

Positive Highlights

  • ✅ Clean single-pass consolidation — extractFrontmatterMetadata is easy to reason about and neatly replaces two separate scans.
  • strings.IndexByte micro-optimisation is a sensible improvement over strings.Index for a single-byte delimiter.
  • ✅ Capacity hint (strings.Count(..., "\n")+1) avoids slice reallocations on the hot path — good attention to detail.
  • frontmatterStartLine const promotion is clean and the value is now easier to find.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.6M

Comments that could not be inline-anchored

pkg/parser/frontmatter_content.go:131

[/zoom-out] splitFrontmatterLines is now unreachable dead code — ExtractFrontmatterFromContent was its only call site, and that now delegates to extractFrontmatterMetadata. The function can be deleted.

<details>
<summary>💡 Suggested cleanup</summary>

Remove lines 131–142 (splitFrontmatterLines). Keeping it silently diverges from the new single-pass contract and will confuse future readers about which path is canonical.

</details>

pkg/parser/frontmatter_content.go:34

[/zoom-out] extractTopLevelFieldLines is now a one-liner wrapper around extractFrontmatterMetadata that discards the []string return value. Its only remaining caller is the direct unit test (frontmatter_field_lines_test.go:76). Consider either removing the wrapper and updating the test to call extractFrontmatterMetadata directly, or documenting why the indirection is intentional.

<details>
<summary>💡 Options</summary>

Option A – remove the wrapper (preferred): update `front…

pkg/parser/frontmatter_field_lines_test.go:111

[/tdd] The regression test covers the happy path well, but two boundary conditions for the new extractFrontmatterMetadata logic are unexercised: (1) input with no trailing newline and (2) a duplicate top-level key (second occurrence should be ignored).

<details>
<summary>💡 Suggested additions</summary>

t.Run(&quot;no trailing newline preserves all lines&quot;, func(t *testing.T) {
    // frontmatterYAML without a trailing \n — the trim branch at line 175
    // must NOT remove the …

</details>

@pelikhan
Copy link
Copy Markdown
Collaborator

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 26, 2026

@copilot debug https://github.com/github/gh-aw/actions/runs/26427072897/job/77793663918

The failing job is test in the CGO workflow, and the failure is unrelated to this parser change. It failed in pkg/cli on TestRenderCrossRunReportMarkdown_IncludesNewSections because the captured output did not contain Token Trend.

I checked the scope of the last two commits: they only touch pkg/parser/frontmatter_content.go and pkg/parser/frontmatter_field_lines_test.go.

I also reproduced locally:

  • go test ./pkg/cli -run '^TestRenderCrossRunReportMarkdown_IncludesNewSections$' -v
  • go test ./pkg/cli -run '^Test' -count=10
  • go test ./pkg/cli -run '^TestRenderCrossRunReportMarkdown_IncludesNewSections$' -parallel=32 -shuffle=on -count=100

So this looks like an unrelated/flaky pkg/cli stdout-capture test failure rather than a regression from Reduce duplicate frontmatter scanning in ParseWorkflow.

Copilot AI requested a review from pelikhan May 26, 2026 02:01
@pelikhan pelikhan merged commit b20b030 into main May 26, 2026
83 of 94 checks passed
@pelikhan pelikhan deleted the copilot/performance-regression-fix-parseworkflow branch May 26, 2026 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[performance] Regression in ParseWorkflow: +34.3% slower

4 participants