Skip to content

v0.71.4 — Adapter lifecycle + loop wiring

Choose a tag to compare

@MakazhanAlpamys MakazhanAlpamys released this 02 Jun 07:48
· 50 commits to main since this release

Closes #172, #173, #176, #177, #220, #223. Validated end-to-end on real SmolLM2-135M (RTX 3050 4 GB).

What's New

  • Live canary verdict on mergesoup adapters merge … --canary suite.json scores the merged adapter and reports OK / MINOR / MAJOR (drop <2% OK, <5% MINOR, else MAJOR). --strict-verdict exits 2 on a MAJOR regression. A pre-scored {"baseline_scores","candidate_scores"} suite needs no model load; a {"tasks":[…]} suite uses an injectable scorer. (replaces the v0.57 UNKNOWN stub — #172)
  • Evolutionary merge for realsoup adapters merge --strategy cmaes --eval suite --budget 1h now runs the full CMA-ES search: each candidate is merged, scored against your eval, and the best blend is written to --output. (was plan-only — #220)
  • Publish an adapter PR to GitHubsoup adapters pr <title> --base-sha <hex> --adapter <path> --push owner/repo#42 posts the rendered PR Markdown straight to a GitHub PR comment via gh api. Auth from GITHUB_TOKEN / GH_TOKEN. (#223)
  • Continuous fine-tuning loop, wired upsoup loop watch --pre-wired runs the real traces → DPO → eval-gate → canary pipeline (instead of the v0.58 no-op stubs); --pack-cans snapshots every iteration as a shareable Soup Can with Registry lineage, and soup loop replay <id> --extract dir unpacks it. soup loop status shows the pre_wired flag. (#176, #177)
  • Branches ↔ Registrysoup adapters branch <name> --attach-to-registry <id> links a training-env snapshot into the Registry lineage DAG (shown as a branches node in soup history); --from-registry <id> derives a fresh snapshot's config + base from a Registry entry. (#173)

Install / Upgrade

pip install -U soup-cli

Security

  • The backdoor-scan gate (v0.71.2) and license-conflict gate (v0.60) now run for all merge strategies, including --strategy cmaes (previously bypassed because cmaes returned before the gates).
  • soup loop canary deploy restricts SOUP_LOOP_SERVE_ENDPOINT to loopback / RFC1918-private hosts (a serve endpoint is the operator's own box/LAN), beyond the general webhook SSRF policy.
  • soup adapters pr --push builds the gh child environment from an allowlist so unrelated secrets (HF_TOKEN / OPENAI_API_KEY / …) never reach the subprocess.
  • The canary-suite JSON read uses O_NOFOLLOW + os.fstat (size cap on the same fd) to close the symlink/size-cap TOCTOU window.
  • A pack_entry failure during loop-iteration packing rolls back the just-pushed Registry entry, so the lineage chain never points at an orphan.

Known Limitations

  • Pre-wired loop cost estimate is a placeholderestimate_cost returns 0.0, so the v0.58 dollar-budget gate (monthly_budget_usd) is a no-op for pre-wired loops; use --max-runs-per-day until utils/run_cost is wired in. (Tracked follow-up.)
  • The cmaes default scorer reloads the base model per candidate — fine for tiny models; for large bases pass a small --population / --max-generations, or inject a cached scorer via the Python API.

Tests: 12342 → 12474 (+130 in tests/test_v0714.py). Full suite green across ubuntu/windows/macos × Python 3.10/3.11/3.12.