Skip to content

refactor(smoke): dedupe Phase 2 harness, fail loudly#1049

Merged
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:smoke-phase2-cleanup
May 18, 2026
Merged

refactor(smoke): dedupe Phase 2 harness, fail loudly#1049
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:smoke-phase2-cleanup

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Collaborator

@Aaronontheweb Aaronontheweb commented May 18, 2026

Summary

Code-quality follow-up to #1048 (a /simplify pass over the Phase 2 native smoke harness). No change to what the harness tests — pure cleanup.

  • Dedup — the 7 tests/smoke/scenarios/*.sh each copy-pasted the same ~22-line preamble (provider/model seeding, daemon start, health wait, the fail-and-exit idiom). Hoisted into lib/common.sh as seed_and_start_daemon / seed_provider_model / nc / die. Net -138 lines.
  • Fail loudlyrun_timed and the VHS tape timeout previously fell through to running a command unbounded when no timeout tool was found. Now they error out, per the repo constitution's no-silent-fallbacks rule.
  • context-window.sh — parses the status API with jq instead of an undeclared python3 dependency; fails loudly on an invalid/empty response instead of masking it as a 0-token warn.
  • pairing.sh — extracts the bearer token with jq instead of a fragile sed regex.
  • smoke.yml — dropped a redundant post-run collect step (run-smoke.sh already collects on failure before teardown). No CI caches: the repo is at its GitHub Actions cache storage limit, so the workflow deliberately uses no NuGet or Ollama-model cache.

Test plan

  • New smoke workflow green — exercises the deduped scenarios end-to-end
  • pr_validation + smoke_sandbox still green

Code-quality pass over the Phase 2 native smoke harness:

- Hoist the per-scenario boilerplate (provider/model seeding, daemon
  start, health wait, the fail-and-exit idiom) into lib/common.sh as
  seed_and_start_daemon / seed_provider_model / nc / die. The 7 scenario
  scripts shed ~150 lines of lockstep-fragile copy-paste.
- run_timed and the vhs tape timeout now fail loudly when no timeout
  tool is available instead of silently running unbounded (constitution:
  no silent fallbacks).
- context-window.sh parses the status API with jq instead of an
  undeclared python3 dependency, and fails loudly on an invalid/empty
  response instead of masking it as a 0-token warn.
- pairing.sh extracts the bearer token with jq instead of a fragile sed
  regex.
- smoke.yml: drop the redundant post-run collect step (run-smoke.sh
  already collects on failure), add a NuGet package cache, and give the
  Ollama model cache a restore-keys fallback.

No behavior change to what the harness tests — pure cleanup.
@Aaronontheweb Aaronontheweb added cleanup Code quality improvements and tech debt reduction tests All issues related to testing, quality assurance, and smoke testing. labels May 18, 2026
The repo is at its 10GB GitHub Actions cache storage limit, so the new
smoke workflow must not consume cache space. Remove both the NuGet
package cache and the Ollama model-store cache from smoke.yml. Each
run re-pulls the smoke models and does a cold restore — slower, but it
keeps the harness off the shared cache budget.
@Aaronontheweb Aaronontheweb changed the title refactor(smoke): dedupe Phase 2 harness, fail loudly, cache CI deps refactor(smoke): dedupe Phase 2 harness, fail loudly May 18, 2026
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) May 18, 2026 03:11
@Aaronontheweb Aaronontheweb merged commit e3c6a37 into netclaw-dev:dev May 18, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Code quality improvements and tech debt reduction tests All issues related to testing, quality assurance, and smoke testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant