perf: dedicated test project + workflow (ubuntu-only)#89
Merged
Conversation
Split performance work into its own csproj and CI workflow so the
correctness-focused CI stays fast across all three OSes and the perf
gate stops being a silent no-op.
New layout
tests/Terminal.Gui.Editor.PerformanceTests/
PerformanceSmokeTests.cs (moved from Editor.Tests/)
Terminal.Gui.Editor.PerformanceTests.csproj
.github/workflows/perf.yml (ubuntu-latest only)
- Release build
- Run PerformanceTests (stopwatch smoke tests)
- Run benchmarks/compare-baseline.sh (VisualLineBuild gate)
- workflow_dispatch with `full-suite: true` runs the full
BenchmarkDotNet matrix and uploads results as an artifact —
the operator path for refreshing baseline.json (#78).
.github/workflows/ci.yml
- Perf step removed; comment points to perf.yml.
Why a separate workflow
- Windows / macOS GitHub-hosted runners share hosts with neighbour
VMs; wall-time assertions there are too noisy to gate on. Linux
runners are still noisy but consistent enough for a 3× threshold.
- The full BDN suite takes minutes; CI for correctness needs to be
fast. Per-PR perf only runs the focused VisualLineBuild filter.
Fix while we're here: compare-baseline.sh used `--job ShortRun`,
which BenchmarkDotNet rejects ("invalid base job"). BDN exited
without running any benchmarks, the script saw no JSON report,
warned "skipping comparison", and exited 0. So the perf gate has
been a silent no-op since PR #53 — neither the >3× fail nor the
<0.8× celebrate could ever fire (see issue #78, PR #77 didn't
trigger the celebration for exactly this reason). Switched to
`--job short` (the lowercase form BDN accepts) and added a comment
documenting the history.
Tests on this branch (local Release):
Text.Tests: 230 passing
Editor.Tests: 87 passing (was 91; 4 perf tests moved out)
IntegrationTests: 108 passing
PerformanceTests: 4 passing (new project)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fa8429031
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Reflects the project layout from this PR in the docs:
CLAUDE.md
- Adds tests/Terminal.Gui.Editor.PerformanceTests to the test
runner list, with the -c Release note + perf.yml pointer.
- "Testing tiers": now four projects with a workflow + OS column;
adds the "Performance gates" subsection that explains the two
layers (stopwatch smoke + BenchmarkDotNet baseline) and calls
out the `--job short` lowercase requirement.
specs/constitution.md
- §VI "Testing Tiers" table grows the fourth row + workflow + OS
matrix columns. Adds rationale for ubuntu-only and the manual
workflow_dispatch path for baseline.json refreshes.
specs/plan.md
- Repo-layout block adds tests/Terminal.Gui.Editor.PerformanceTests
and a benchmarks/ subtree. EditorBenchmarks placeholder removed
(it's not in the tree).
- DoD checkbox updated from "three test projects" to "four", with
the BenchmarkDotNet 3× baseline gate called out.
README.md
- Repo-layout adds benchmarks/ + examples/ rows.
- Build block adds the PerformanceTests run line, with the
ubuntu-latest-only / perf.yml pointer.
No behavior change; this is the doc tail of perf-suite-split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split performance work into its own csproj and CI workflow so the correctness-focused
ci.ymlstays fast on all three OSes, and so the perf gate stops being a silent no-op.Also fixes the
--job ShortRunbug that caused PR #77's perf wins to never trigger the celebration message (per issue #78 context).Layout
What changes
New
Terminal.Gui.Editor.PerformanceTestsprojectPerformanceSmokeTests.cs(stopwatch-based, 4 tests) moves out ofTerminal.Gui.Editor.Testsinto its own xUnit.v3 exe project. Namespace updated toTerminal.Gui.Editor.PerformanceTests. No source changes to the tests themselves.New
.github/workflows/perf.yml(ubuntu-latest only)Triggers on every push + every PR to
main/develop, plus a manualworkflow_dispatchfor full-suite runs. Steps:dotnet run -c Release --project tests/Terminal.Gui.Editor.PerformanceTests).benchmarks/compare-baseline.sh 3.0 0.8(focused*VisualLineBuild*filter; fails on >3× regression, celebrates on <0.8× improvement).full-suite: true, runs the full BenchmarkDotNet matrix (Scrolling,EndToEndScroll,CaretMovement,DocumentAccess, plus the gated set) and uploadsBenchmarkDotNet.Artifacts/as a 30-day artifact. That's the operator path for refreshingbaseline.json(issue Update baseline.json from CI after scroll-perf merge #78).Why ubuntu-only:
ci.yml: perf step removedThe old
Performance checkstep is gone; a comment in its place points atperf.yml. CI now ends afterTerminal.Gui.Editor.IntegrationTests.compare-baseline.sh:--job ShortRun→--job short(the actual bug)This script has been passing
--job ShortRunsince PR #53. BenchmarkDotNet rejects that — it only accepts lowercase names (default | dry | short | medium | long | verylong) — prints `The provided base job "ShortRun" is invalid` and exits without running anything. The script then sees no JSON report and falls into:…exit 0. Every PR between #53 and now has sailed through the gate without a single benchmark actually executing. Neither the `❌ REGRESSION` failure nor the `🎉 FASTER` celebration could ever fire. PR #77 should have produced the celebration banner per issue #78; this is why it didn't.
The fix is one character:
ShortRun→short. Added a comment in the script documenting the bug history so we don't reintroduce it.Test plan
Local Release run:
Terminal.Gui.Text.TestsTerminal.Gui.Editor.TestsTerminal.Gui.Editor.IntegrationTestsTerminal.Gui.Editor.PerformanceTestsdotnet format Terminal.Gui.Text.slnx --no-restore --exclude third_party/clean.The compare-baseline.sh fix's actual gate behavior is verifiable from the workflow run on this PR — for the first time it should produce a populated comparison table in the step summary rather than the prior "skipping comparison" warning.
Follow-up
After this merges, issue #78 (refresh
baseline.jsonfrom CI hardware) can be closed via oneworkflow_dispatchrun ofperf.ymlwithfull-suite: true, then committing the resulting numbers.🤖 Generated with Claude Code