v0.71.4 — Adapter lifecycle + loop wiring
Closes #172, #173, #176, #177, #220, #223. Validated end-to-end on real SmolLM2-135M (RTX 3050 4 GB).
What's New
- Live canary verdict on merge —
soup adapters merge … --canary suite.jsonscores the merged adapter and reports OK / MINOR / MAJOR (drop <2% OK, <5% MINOR, else MAJOR).--strict-verdictexits 2 on a MAJOR regression. A pre-scored{"baseline_scores","candidate_scores"}suite needs no model load; a{"tasks":[…]}suite uses an injectable scorer. (replaces the v0.57UNKNOWNstub — #172) - Evolutionary merge for real —
soup adapters merge --strategy cmaes --eval suite --budget 1hnow runs the full CMA-ES search: each candidate is merged, scored against your eval, and the best blend is written to--output. (was plan-only — #220) - Publish an adapter PR to GitHub —
soup adapters pr <title> --base-sha <hex> --adapter <path> --push owner/repo#42posts the rendered PR Markdown straight to a GitHub PR comment viagh api. Auth fromGITHUB_TOKEN/GH_TOKEN. (#223) - Continuous fine-tuning loop, wired up —
soup loop watch --pre-wiredruns the real traces → DPO → eval-gate → canary pipeline (instead of the v0.58 no-op stubs);--pack-canssnapshots every iteration as a shareable Soup Can with Registry lineage, andsoup loop replay <id> --extract dirunpacks it.soup loop statusshows thepre_wiredflag. (#176, #177) - Branches ↔ Registry —
soup adapters branch <name> --attach-to-registry <id>links a training-env snapshot into the Registry lineage DAG (shown as abranchesnode insoup history);--from-registry <id>derives a fresh snapshot's config + base from a Registry entry. (#173)
Install / Upgrade
pip install -U soup-cliSecurity
- The backdoor-scan gate (v0.71.2) and license-conflict gate (v0.60) now run for all merge strategies, including
--strategy cmaes(previously bypassed because cmaes returned before the gates). soup loopcanary deploy restrictsSOUP_LOOP_SERVE_ENDPOINTto loopback / RFC1918-private hosts (a serve endpoint is the operator's own box/LAN), beyond the general webhook SSRF policy.soup adapters pr --pushbuilds theghchild environment from an allowlist so unrelated secrets (HF_TOKEN/OPENAI_API_KEY/ …) never reach the subprocess.- The canary-suite JSON read uses
O_NOFOLLOW+os.fstat(size cap on the same fd) to close the symlink/size-cap TOCTOU window. - A
pack_entryfailure during loop-iteration packing rolls back the just-pushed Registry entry, so the lineage chain never points at an orphan.
Known Limitations
- Pre-wired loop cost estimate is a placeholder —
estimate_costreturns0.0, so the v0.58 dollar-budget gate (monthly_budget_usd) is a no-op for pre-wired loops; use--max-runs-per-dayuntilutils/run_costis wired in. (Tracked follow-up.) - The cmaes default scorer reloads the base model per candidate — fine for tiny models; for large bases pass a small
--population/--max-generations, or inject a cached scorer via the Python API.
Tests: 12342 → 12474 (+130 in tests/test_v0714.py). Full suite green across ubuntu/windows/macos × Python 3.10/3.11/3.12.