[repository-quality] 🎯 Repository Quality Improvement Report - Benchmark Infrastructure & Regression Prevention #38628

2026-06-11T14:31:00Z

github-actions[bot]
Bot Jun 11, 2026

Analysis Date: 2026-06-11
Focus Area: Benchmark Infrastructure Completeness & Regression Prevention
Strategy Type: Custom (random=24, 60% tier)
Custom Area: Yes — 93 benchmark functions exist with a CI job, but no automated regression detection.

Executive Summary

The gh-aw codebase has 93 benchmark functions across 10 files and a CI bench job, yet no regression-prevention loop. benchstat is cited in four places (Makefile, CLI performance docs, daily workflow, developer instructions) but never installed or run. The CI job saves a 14-day artifact without any baseline comparison or failure gate.

Two additional gaps: pkg/parser/frontmatter_benchmark_test.go ends with two comment-only stubs (BenchmarkValidateSchema, BenchmarkValidateSchema_Complex) — planned benchmarks that were never implemented. And 77/93 benchmarks (83%) omit b.ReportAllocs(), so allocation regressions are invisible in CI output.

Full Analysis Report

Metrics Collected:

Metric	Value	Status
Total benchmark functions	93	✅
Benchmarks with `b.ReportAllocs()`	16/93 (17%)	❌
`benchstat` automation in CI	0 steps	❌
Baseline comparison in CI	None	❌
Benchmark artifact retention	14 days	⚠️
Unimplemented benchmark stubs	2	❌
CI uses `bench-performance` target	No (uses `make bench`)	⚠️

Strengths

93 benchmarks covering compiler, expressions, regex, processing, MCP, and frontmatter
bench-performance target pre-selects 14 critical benchmarks for daily monitoring
compiler_performance_benchmark_test.go is exemplary: b.ReportAllocs(), b.ResetTimer(), documented targets

Areas for Improvement

No regression gate: CI displays output but never fails on degradation
benchstat not installed: Referenced in Makefile:177, cli-performance.md:315, daily-cli-performance.md:443 but absent from all compiled CI steps
Wrong target in CI: cgo.yml uses make bench (93 benchmarks) not make bench-performance (14 critical)
Two comment-only stubs in pkg/parser/frontmatter_benchmark_test.go: schema validation hot-path unmeasured
83% omit b.ReportAllocs(): processing (7), expressions (11), regex (6), compiler (7), mcp (1), frontmatter (4), logs (10)
14-day retention: Baselines expire before monthly review cycles

🤖 Tasks for Copilot Agent

NOTE TO PLANNER AGENT: Split the following tasks into individual work items.

Task 1: Implement Missing Schema Validation Benchmarks

Priority: High | Effort: Small | Code Region: pkg/parser/frontmatter_benchmark_test.go

Implement the two comment-only stubs at the end of the file as real benchmark functions. Parse frontmatter once outside the loop, benchmark schema validation inside b.Loop(), add b.ReportAllocs() and b.ResetTimer().

Acceptance Criteria:

BenchmarkValidateSchema implemented with simple fixture and b.Loop() body calling schema validation
BenchmarkValidateSchema_Complex implemented with complex fixture (MCP, imports, tools)
Both include b.ReportAllocs() and b.ResetTimer() before the loop
go test -bench=BenchmarkValidateSchema -benchmem ./pkg/parser/ succeeds with ns/op and allocs/op

In `pkg/parser/frontmatter_benchmark_test.go`, implement the two stubs at the end of the file.

`BenchmarkValidateSchema`: simple fixture (on: push, engine: claude), call `ExtractFrontmatterFromContent` outside the loop, call schema validation inside `b.Loop()`. Add `b.ReportAllocs()` and `b.ResetTimer()`.

`BenchmarkValidateSchema_Complex`: complex fixture from `BenchmarkParseFrontmatter_Complex`. Look at `pkg/parser/schema_test.go` for the correct validation function to call.

Task 2: Add `b.ReportAllocs()` to All Benchmark Functions

Priority: High | Effort: Small

Add b.ReportAllocs() to every benchmark function missing it across 7 files: processing_benchmark_test.go (7), expressions_benchmark_test.go (11), regex_benchmark_test.go (6), compiler_benchmark_test.go (7), mcp_benchmark_test.go (1), logs_benchmark_test.go (10), frontmatter_benchmark_test.go (4 existing).

Acceptance Criteria:

Every func Benchmark* in the listed files has b.ReportAllocs() before b.Loop()
go test -bench=. -benchmem -benchtime=1x -run=^$ ./pkg/workflow/ ./pkg/parser/ ./pkg/cli/ shows allocs/op for all

Code Region: The 7 files above in pkg/workflow/, pkg/cli/, and pkg/parser/

Add `b.ReportAllocs()` as the first line of each benchmark function body in the 7 listed files. If the function already has `b.ResetTimer()`, place `b.ReportAllocs()` on the line immediately before it. Skip functions that already have it.

Task 3: Add benchstat Baseline Comparison to CI

Priority: High | Effort: Medium | Code Region: .github/workflows/cgo.yml bench job (~lines 906–981)

After Run benchmarks, add: cache restore for bench_baseline.txt, install benchstat, compare and append delta to $GITHUB_STEP_SUMMARY, fail on >15% regression in BenchmarkCompile*/BenchmarkParse*/BenchmarkValidation*, save new baseline (main only). Handle no-baseline gracefully.

Acceptance Criteria:

benchstat installed via go install golang.org/x/perf/cmd/benchstat@latest
Cache restore/save steps using actions/cache with key bench-baseline-${{ runner.os }}
Comparison output appended to step summary
Job fails on >15% ns/op regression in named critical benchmarks
No-baseline case exits 0

Modify the `bench` job in `.github/workflows/cgo.yml`. After `Run benchmarks`:
1. Restore `bench_baseline.txt` from Actions cache (key: `bench-baseline-${{ runner.os }}`)
2. Install benchstat: `go install golang.org/x/perf/cmd/benchstat@latest`
3. If baseline exists: run `benchstat bench_baseline.txt bench_results.txt`, append to GITHUB_STEP_SUMMARY, exit 1 if any BenchmarkCompile*, BenchmarkParse*, or BenchmarkValidation* shows >15% ns/op slowdown
4. Save `bench_results.txt` as new baseline to cache (main branch only)

Task 4: Switch CI to `bench-performance` Target

Priority: Medium | Effort: Small | Code Region: .github/workflows/cgo.yml bench job Run benchmarks step

Change make bench to make bench-performance; update summary grep; keep full suite available on workflow_dispatch.

Acceptance Criteria:

Run benchmarks step uses make bench-performance
Summary grep updated to match bench-performance output
Full suite available via workflow_dispatch input full_bench: true

Change `make bench` to `make bench-performance` in the bench CI job. Update the summary grep to: CompileSimpleWorkflow|CompileComplexWorkflow|CompileMCPWorkflow|ParseWorkflow|Validation|YAMLGeneration. Add conditional step for full suite when `github.event.inputs.full_bench == 'true'`.

Task 5: Extend Benchmark Artifact Retention to 90 Days

Priority: Low | Effort: Small | Code Region: .github/workflows/cgo.yml ~line 976

Change retention-days: 14 to retention-days: 90 and update the adjacent comment.

Acceptance Criteria:

retention-days changed from 14 to 90
Comment updated to "90 days enables quarterly performance trend analysis"

📊 Historical Context

Previous Focus Areas

Date	Focus Area	Type
2026-06-08	Compiler Error Hint Rendering Gap	Custom
2026-06-09	Context Propagation & Process Cancellability	Reuse
2026-06-10	Security	Standard

📈 Success Metrics

Regression detection: 0 → benchstat on every main push
b.ReportAllocs() coverage: 17% (16/93) → 100% (93/93)
Schema validation benchmarks: 0 → 2 implemented
Artifact retention: 14-day → 90-day

References: Run 27352874353 · CI bench job: cgo.yml lines 906–981

Generated by ⚡ Repository Quality Improvement Agent · 1K AIC · ⌖ 23.9 AIC · ⊞ 22.4K · ◷

expires on Jun 12, 2026, 6:31 AM UTC-08:00

2026-06-11T15:09:53Z

github-actions[bot]
Bot Jun 11, 2026
Author

Smoke ping from run §27355745225. ✅

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 203.3 AIC · ⌖ 21.5 AIC · ◷

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[repository-quality] 🎯 Repository Quality Improvement Report - Benchmark Infrastructure & Regression Prevention #38628

Uh oh!

{{title}}

Uh oh!

Strengths

Areas for Improvement

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[repository-quality] 🎯 Repository Quality Improvement Report - Benchmark Infrastructure & Regression Prevention #38628

Uh oh!

github-actions[bot] Bot Jun 11, 2026

Executive Summary

Strengths

Areas for Improvement

🤖 Tasks for Copilot Agent

Task 1: Implement Missing Schema Validation Benchmarks

Task 2: Add b.ReportAllocs() to All Benchmark Functions

Task 3: Add benchstat Baseline Comparison to CI

Task 4: Switch CI to bench-performance Target

Task 5: Extend Benchmark Artifact Retention to 90 Days

📊 Historical Context

📈 Success Metrics

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 11, 2026 Author

github-actions[bot]
Bot Jun 11, 2026

Task 2: Add `b.ReportAllocs()` to All Benchmark Functions

Task 4: Switch CI to `bench-performance` Target

github-actions[bot]
Bot Jun 11, 2026
Author