ci: performance gate — smoke tests + baseline comparison#53
Merged
Conversation
Two layers that catch regressions without slowing CI:
1. PerformanceSmokeTests (xUnit, runs in normal test suite):
- Stopwatch-based with fat thresholds (50–250x headroom)
- Catches catastrophic regressions only
- 4 tests: viewport build, long-line build, 100K-line tree
lookup, full 1K-line scroll
2. Benchmark baseline comparison (CI step, Ubuntu only):
- Runs VisualLineBuild benchmarks (ShortRun, ~30s)
- Compares to benchmarks/baseline.json
- Fails CI if any benchmark > 3x baseline (regression)
- Celebrates in step summary if any < 0.8x baseline (improvement)
- Results posted to GitHub step summary as markdown table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI runners (shared, no turbo) are 2–4x slower than local M-series. The 10ms threshold was too tight — Ubuntu hit 23ms, macOS 38ms, Windows 20ms. Bump to 100ms to keep fat headroom. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tig
added a commit
that referenced
this pull request
May 12, 2026
Split performance work into its own csproj and CI workflow so the
correctness-focused CI stays fast across all three OSes and the perf
gate stops being a silent no-op.
New layout
tests/Terminal.Gui.Editor.PerformanceTests/
PerformanceSmokeTests.cs (moved from Editor.Tests/)
Terminal.Gui.Editor.PerformanceTests.csproj
.github/workflows/perf.yml (ubuntu-latest only)
- Release build
- Run PerformanceTests (stopwatch smoke tests)
- Run benchmarks/compare-baseline.sh (VisualLineBuild gate)
- workflow_dispatch with `full-suite: true` runs the full
BenchmarkDotNet matrix and uploads results as an artifact —
the operator path for refreshing baseline.json (#78).
.github/workflows/ci.yml
- Perf step removed; comment points to perf.yml.
Why a separate workflow
- Windows / macOS GitHub-hosted runners share hosts with neighbour
VMs; wall-time assertions there are too noisy to gate on. Linux
runners are still noisy but consistent enough for a 3× threshold.
- The full BDN suite takes minutes; CI for correctness needs to be
fast. Per-PR perf only runs the focused VisualLineBuild filter.
Fix while we're here: compare-baseline.sh used `--job ShortRun`,
which BenchmarkDotNet rejects ("invalid base job"). BDN exited
without running any benchmarks, the script saw no JSON report,
warned "skipping comparison", and exited 0. So the perf gate has
been a silent no-op since PR #53 — neither the >3× fail nor the
<0.8× celebrate could ever fire (see issue #78, PR #77 didn't
trigger the celebration for exactly this reason). Switched to
`--job short` (the lowercase form BDN accepts) and added a comment
documenting the history.
Tests on this branch (local Release):
Text.Tests: 230 passing
Editor.Tests: 87 passing (was 91; 4 perf tests moved out)
IntegrationTests: 108 passing
PerformanceTests: 4 passing (new project)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two lightweight layers that catch performance regressions and celebrate improvements without slowing CI:
Layer 1: Performance smoke tests (xUnit)
4 Stopwatch-based tests in
Terminal.Gui.Editor.Tests/PerformanceSmokeTests.csthat run in the normal test suite on every CI run. Thresholds are deliberately fat (50–250x typical) so they only fail on catastrophic regressions — not CI-runner noise.BuildViewport_50LinesBuildSingleLongLineDocumentLineLookup_100KFullDocumentScroll_1KLayer 2: Benchmark baseline comparison (CI step)
A new CI step (
Performance check, Ubuntu only) that:VisualLineBuildBenchmarks(ShortRun, ~30s)benchmarks/baseline.jsonUpdating the baseline
After a deliberate performance change (optimization or known cost increase):
Test plan
dotnet build Terminal.Gui.Text.slnxsucceedsdotnet format --verify-no-changesclean🤖 Generated with Claude Code