fix: use 1s benchtime for CLI helper benchmarks to eliminate false regressions#35763
Merged
pelikhan merged 2 commits intoMay 29, 2026
Merged
Conversation
…gressions The bench-performance Makefile target was using -benchtime=3x (3 iterations) for the CLI helper benchmarks (FindIncludesInContent, ExtractWorkflowNameFromFile). With only 3 iterations, b.Loop() overhead and OS scheduling jitter dominate the measurement for fast functions, inflating results ~22x (3366 ns/op vs true 160 ns/op). This caused the daily-cli-performance CI workflow to report a false regression: Historical: 2413 ns/op (also inflated by 3x timing) Current: 3516 ns/op (inflated; true value is ~160 ns/op) Fix: change the CLI helper benchmark line to -benchtime=1s, which runs for at least 1 second (~7M iterations for FindIncludesInContent), giving stable and accurate ns/op measurements consistent across runs. Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix performance regression in FindIncludesInContent
fix: use 1s benchtime for CLI helper benchmarks to eliminate false regressions
May 29, 2026
Closed
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes false-positive regression alerts in the daily CLI performance workflow by switching the CLI helper benchmarks from a fixed 3-iteration count to a 1-second duration, which provides stable measurements for sub-microsecond functions.
Changes:
- Update
bench-performancetarget's CLI helper benchmark invocation from-benchtime=3xto-benchtime=1s.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Increases sample size for ExtractWorkflowNameFromFile and FindIncludesInContent benchmarks to eliminate measurement noise. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The daily-cli-performance workflow was reporting a false +45.7% regression in
BenchmarkFindIncludesInContent(3,516 ns/op vs historical 2,413 ns/op). Both values were fabricated by measurement noise—the function's true steady-state cost is ~160 ns/op.Root cause
bench-performanceran CLI helper benchmarks with-benchtime=3x(3 iterations). For a ~160 ns/op function,b.Loop()setup overhead and OS scheduling jitter dominate at that sample count, inflating measurements ~22×:FindIncludesInContentExtractWorkflowNameFromFile3x(before)1s(after)Fix
Changed
-benchtime=3x→-benchtime=1sfor the CLI helper benchmark line inMakefile. The workflow benchmarks inpkg/workflow(slow, O(100ms+) each) retain3xto keep CI runtime bounded.With
1s,FindIncludesInContentaccumulates ~7M iterations yielding measurements stable to <1% variance across runs.