refactor(smoke): dedupe Phase 2 harness, fail loudly by Aaronontheweb · Pull Request #1049 · netclaw-dev/netclaw

Aaronontheweb · 2026-05-18T03:08:14Z

Summary

Code-quality follow-up to #1048 (a /simplify pass over the Phase 2 native smoke harness). No change to what the harness tests — pure cleanup.

Dedup — the 7 tests/smoke/scenarios/*.sh each copy-pasted the same ~22-line preamble (provider/model seeding, daemon start, health wait, the fail-and-exit idiom). Hoisted into lib/common.sh as seed_and_start_daemon / seed_provider_model / nc / die. Net -138 lines.
Fail loudly — run_timed and the VHS tape timeout previously fell through to running a command unbounded when no timeout tool was found. Now they error out, per the repo constitution's no-silent-fallbacks rule.
context-window.sh — parses the status API with jq instead of an undeclared python3 dependency; fails loudly on an invalid/empty response instead of masking it as a 0-token warn.
pairing.sh — extracts the bearer token with jq instead of a fragile sed regex.
smoke.yml — dropped a redundant post-run collect step (run-smoke.sh already collects on failure before teardown). No CI caches: the repo is at its GitHub Actions cache storage limit, so the workflow deliberately uses no NuGet or Ollama-model cache.

Test plan

New smoke workflow green — exercises the deduped scenarios end-to-end
pr_validation + smoke_sandbox still green

Code-quality pass over the Phase 2 native smoke harness: - Hoist the per-scenario boilerplate (provider/model seeding, daemon start, health wait, the fail-and-exit idiom) into lib/common.sh as seed_and_start_daemon / seed_provider_model / nc / die. The 7 scenario scripts shed ~150 lines of lockstep-fragile copy-paste. - run_timed and the vhs tape timeout now fail loudly when no timeout tool is available instead of silently running unbounded (constitution: no silent fallbacks). - context-window.sh parses the status API with jq instead of an undeclared python3 dependency, and fails loudly on an invalid/empty response instead of masking it as a 0-token warn. - pairing.sh extracts the bearer token with jq instead of a fragile sed regex. - smoke.yml: drop the redundant post-run collect step (run-smoke.sh already collects on failure), add a NuGet package cache, and give the Ollama model cache a restore-keys fallback. No behavior change to what the harness tests — pure cleanup.

The repo is at its 10GB GitHub Actions cache storage limit, so the new smoke workflow must not consume cache space. Remove both the NuGet package cache and the Ollama model-store cache from smoke.yml. Each run re-pulls the smoke models and does a cold restore — slower, but it keeps the harness off the shared cache budget.

Aaronontheweb added cleanup Code quality improvements and tech debt reduction tests All issues related to testing, quality assurance, and smoke testing. labels May 18, 2026

Aaronontheweb changed the title ~~refactor(smoke): dedupe Phase 2 harness, fail loudly, cache CI deps~~ refactor(smoke): dedupe Phase 2 harness, fail loudly May 18, 2026

Aaronontheweb enabled auto-merge (squash) May 18, 2026 03:11

Merge branch 'dev' into smoke-phase2-cleanup

25fbcf8

Aaronontheweb merged commit e3c6a37 into netclaw-dev:dev May 18, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(smoke): dedupe Phase 2 harness, fail loudly#1049

refactor(smoke): dedupe Phase 2 harness, fail loudly#1049
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:smoke-phase2-cleanup

Aaronontheweb commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aaronontheweb commented May 18, 2026 •

edited

Loading