refactor(smoke): dedupe Phase 2 harness, fail loudly#1049
Merged
Aaronontheweb merged 3 commits intoMay 18, 2026
Conversation
Code-quality pass over the Phase 2 native smoke harness: - Hoist the per-scenario boilerplate (provider/model seeding, daemon start, health wait, the fail-and-exit idiom) into lib/common.sh as seed_and_start_daemon / seed_provider_model / nc / die. The 7 scenario scripts shed ~150 lines of lockstep-fragile copy-paste. - run_timed and the vhs tape timeout now fail loudly when no timeout tool is available instead of silently running unbounded (constitution: no silent fallbacks). - context-window.sh parses the status API with jq instead of an undeclared python3 dependency, and fails loudly on an invalid/empty response instead of masking it as a 0-token warn. - pairing.sh extracts the bearer token with jq instead of a fragile sed regex. - smoke.yml: drop the redundant post-run collect step (run-smoke.sh already collects on failure), add a NuGet package cache, and give the Ollama model cache a restore-keys fallback. No behavior change to what the harness tests — pure cleanup.
The repo is at its 10GB GitHub Actions cache storage limit, so the new smoke workflow must not consume cache space. Remove both the NuGet package cache and the Ollama model-store cache from smoke.yml. Each run re-pulls the smoke models and does a cold restore — slower, but it keeps the harness off the shared cache budget.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Code-quality follow-up to #1048 (a
/simplifypass over the Phase 2 native smoke harness). No change to what the harness tests — pure cleanup.tests/smoke/scenarios/*.sheach copy-pasted the same ~22-line preamble (provider/model seeding, daemon start, health wait, the fail-and-exit idiom). Hoisted intolib/common.shasseed_and_start_daemon/seed_provider_model/nc/die. Net -138 lines.run_timedand the VHS tape timeout previously fell through to running a command unbounded when notimeouttool was found. Now they error out, per the repo constitution's no-silent-fallbacks rule.context-window.sh— parses the status API withjqinstead of an undeclaredpython3dependency; fails loudly on an invalid/empty response instead of masking it as a0-token warn.pairing.sh— extracts the bearer token withjqinstead of a fragilesedregex.smoke.yml— dropped a redundant post-run collect step (run-smoke.shalready collects on failure before teardown). No CI caches: the repo is at its GitHub Actions cache storage limit, so the workflow deliberately uses no NuGet or Ollama-model cache.Test plan
smokeworkflow green — exercises the deduped scenarios end-to-endpr_validation+smoke_sandboxstill green