Skip to content

perf: dedicated test project + workflow (ubuntu-only)#89

Merged
tig merged 2 commits into
developfrom
perf-suite-split
May 12, 2026
Merged

perf: dedicated test project + workflow (ubuntu-only)#89
tig merged 2 commits into
developfrom
perf-suite-split

Conversation

@tig
Copy link
Copy Markdown
Member

@tig tig commented May 12, 2026

Summary

Split performance work into its own csproj and CI workflow so the correctness-focused ci.yml stays fast on all three OSes, and so the perf gate stops being a silent no-op.

Also fixes the --job ShortRun bug that caused PR #77's perf wins to never trigger the celebration message (per issue #78 context).

Layout

tests/Terminal.Gui.Editor.PerformanceTests/
  PerformanceSmokeTests.cs                   ← moved from Editor.Tests/
  Terminal.Gui.Editor.PerformanceTests.csproj

.github/workflows/perf.yml                   ← new, ubuntu-latest only
.github/workflows/ci.yml                     ← perf step removed

What changes

New Terminal.Gui.Editor.PerformanceTests project

PerformanceSmokeTests.cs (stopwatch-based, 4 tests) moves out of Terminal.Gui.Editor.Tests into its own xUnit.v3 exe project. Namespace updated to Terminal.Gui.Editor.PerformanceTests. No source changes to the tests themselves.

New .github/workflows/perf.yml (ubuntu-latest only)

Triggers on every push + every PR to main/develop, plus a manual workflow_dispatch for full-suite runs. Steps:

  1. Release build of the solution.
  2. Run the perf smoke tests (dotnet run -c Release --project tests/Terminal.Gui.Editor.PerformanceTests).
  3. Run benchmarks/compare-baseline.sh 3.0 0.8 (focused *VisualLineBuild* filter; fails on >3× regression, celebrates on <0.8× improvement).
  4. Manual only: when dispatched with full-suite: true, runs the full BenchmarkDotNet matrix (Scrolling, EndToEndScroll, CaretMovement, DocumentAccess, plus the gated set) and uploads BenchmarkDotNet.Artifacts/ as a 30-day artifact. That's the operator path for refreshing baseline.json (issue Update baseline.json from CI after scroll-perf merge #78).

Why ubuntu-only:

  • Windows / macOS GitHub-hosted runners share hosts with neighbour VMs; wall-time assertions there are too noisy for a meaningful regression gate.
  • Linux runners are still noisy but consistent enough that a 3× threshold catches real regressions.

ci.yml: perf step removed

The old Performance check step is gone; a comment in its place points at perf.yml. CI now ends after Terminal.Gui.Editor.IntegrationTests.

compare-baseline.sh: --job ShortRun--job short (the actual bug)

This script has been passing --job ShortRun since PR #53. BenchmarkDotNet rejects that — it only accepts lowercase names (default | dry | short | medium | long | verylong) — prints `The provided base job "ShortRun" is invalid` and exits without running anything. The script then sees no JSON report and falls into:

if [ -z \"\$REPORT\" ]; then
  echo \"::warning::No benchmark JSON report found — skipping comparison.\"
  exit 0
fi

…exit 0. Every PR between #53 and now has sailed through the gate without a single benchmark actually executing. Neither the `❌ REGRESSION` failure nor the `🎉 FASTER` celebration could ever fire. PR #77 should have produced the celebration banner per issue #78; this is why it didn't.

The fix is one character: ShortRunshort. Added a comment in the script documenting the bug history so we don't reintroduce it.

Test plan

Local Release run:

Project Tests Notes
Terminal.Gui.Text.Tests 230 ✓ unchanged
Terminal.Gui.Editor.Tests 87 ✓ was 91; 4 perf tests moved out
Terminal.Gui.Editor.IntegrationTests 108 ✓ unchanged
Terminal.Gui.Editor.PerformanceTests 4 ✓ new project

dotnet format Terminal.Gui.Text.slnx --no-restore --exclude third_party/ clean.

The compare-baseline.sh fix's actual gate behavior is verifiable from the workflow run on this PR — for the first time it should produce a populated comparison table in the step summary rather than the prior "skipping comparison" warning.

Follow-up

After this merges, issue #78 (refresh baseline.json from CI hardware) can be closed via one workflow_dispatch run of perf.yml with full-suite: true, then committing the resulting numbers.

🤖 Generated with Claude Code

Split performance work into its own csproj and CI workflow so the
correctness-focused CI stays fast across all three OSes and the perf
gate stops being a silent no-op.

New layout

  tests/Terminal.Gui.Editor.PerformanceTests/
    PerformanceSmokeTests.cs            (moved from Editor.Tests/)
    Terminal.Gui.Editor.PerformanceTests.csproj

  .github/workflows/perf.yml             (ubuntu-latest only)
    - Release build
    - Run PerformanceTests (stopwatch smoke tests)
    - Run benchmarks/compare-baseline.sh (VisualLineBuild gate)
    - workflow_dispatch with `full-suite: true` runs the full
      BenchmarkDotNet matrix and uploads results as an artifact —
      the operator path for refreshing baseline.json (#78).

  .github/workflows/ci.yml
    - Perf step removed; comment points to perf.yml.

Why a separate workflow
  - Windows / macOS GitHub-hosted runners share hosts with neighbour
    VMs; wall-time assertions there are too noisy to gate on. Linux
    runners are still noisy but consistent enough for a 3× threshold.
  - The full BDN suite takes minutes; CI for correctness needs to be
    fast. Per-PR perf only runs the focused VisualLineBuild filter.

Fix while we're here: compare-baseline.sh used `--job ShortRun`,
which BenchmarkDotNet rejects ("invalid base job"). BDN exited
without running any benchmarks, the script saw no JSON report,
warned "skipping comparison", and exited 0. So the perf gate has
been a silent no-op since PR #53 — neither the >3× fail nor the
<0.8× celebrate could ever fire (see issue #78, PR #77 didn't
trigger the celebration for exactly this reason). Switched to
`--job short` (the lowercase form BDN accepts) and added a comment
documenting the history.

Tests on this branch (local Release):
  Text.Tests:        230 passing
  Editor.Tests:       87 passing  (was 91; 4 perf tests moved out)
  IntegrationTests:  108 passing
  PerformanceTests:    4 passing  (new project)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tig tig force-pushed the perf-suite-split branch from 1b8a73f to 0fa8429 Compare May 12, 2026 18:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0fa8429031

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/perf.yml Outdated
Reflects the project layout from this PR in the docs:

  CLAUDE.md
    - Adds tests/Terminal.Gui.Editor.PerformanceTests to the test
      runner list, with the -c Release note + perf.yml pointer.
    - "Testing tiers": now four projects with a workflow + OS column;
      adds the "Performance gates" subsection that explains the two
      layers (stopwatch smoke + BenchmarkDotNet baseline) and calls
      out the `--job short` lowercase requirement.

  specs/constitution.md
    - §VI "Testing Tiers" table grows the fourth row + workflow + OS
      matrix columns. Adds rationale for ubuntu-only and the manual
      workflow_dispatch path for baseline.json refreshes.

  specs/plan.md
    - Repo-layout block adds tests/Terminal.Gui.Editor.PerformanceTests
      and a benchmarks/ subtree. EditorBenchmarks placeholder removed
      (it's not in the tree).
    - DoD checkbox updated from "three test projects" to "four", with
      the BenchmarkDotNet 3× baseline gate called out.

  README.md
    - Repo-layout adds benchmarks/ + examples/ rows.
    - Build block adds the PerformanceTests run line, with the
      ubuntu-latest-only / perf.yml pointer.

No behavior change; this is the doc tail of perf-suite-split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tig tig force-pushed the perf-suite-split branch from 0fa8429 to 3f142a7 Compare May 12, 2026 19:03
@tig tig merged commit eef57ef into develop May 12, 2026
8 checks passed
@tig tig deleted the perf-suite-split branch May 12, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant