v0.3.0b5 — Headroom adoption + restart protocol + bench resilience
Pre-releasev0.3.0b5 — Headroom adoption + cross-session restart protocol + benchmark resilience
This release bundles everything held locally since v0.3.0b3, spanning three major work streams:
the cross-session restart announcement protocol (v0.3.0b4 work), the Headroom integration for
CPU-resident semantic compression (v0.3.0b5 work), and laude's benchmark state monitor for
catching the VRAM/hang/contamination failure modes that bit us during the N=1000 run.
Forensic retrospective
Before reading the highlights below, if you care about why this release looks the way it does,
start with Discussion #2 — Headroom adoption + N=20 benchmark + a forensic detour.
It walks through the full adoption story, the failed benchmark, the resequence detour, and the
forensic analysis that revealed 15% of our "extraction failures" were benchmark harness bugs
(the model was giving correct answers that the harness was grading wrong against phantom KVs
harvested from docstrings and function calls).
Highlights
Headroom integration (by Tejas Chopra, Apache-2.0)
headroom-ai is now an optional dependency under the [codec] extra, providing CPU-resident
semantic compression at the retrieval seams that used to fall back to naive character-level
truncation.
pip install helix-context[codec]- New module:
helix_context/headroom_bridge.py— thin wrapper exposingcompress_text(content, target_chars, content_type). Dispatches bygene.promoter.domainsto specialists:code/python/rust/js/ts/go/java/cpp→CodeAwareCompressor(tree-sitter AST, preserves signatures)log/logs/stderr/stdout/pytest/jest/traceback→LogCompressordiff/patch/git_diff→DiffCompressor- everything else →
Kompress(ModernBERT ONNX, ~500MB resident, ~0.3s/call warm)
- Retrieval seams wired:
context_manager.py:495and:830—g.content[:1000]→compress_text(g.content, target_chars=1000, content_type=g.promoter.domains) - Graceful fallback: when headroom-ai is not installed,
compress_textfalls through to the legacy truncation path so the rest of the pipeline keeps working - A/B toggle:
HELIX_DISABLE_HEADROOM=1env var bypasses Headroom even when installed, letting you measure baseline vs Kompress behavior without reverting code - Attribution: NOTICE carries the Apache-2.0 third-party notice, README has an Acknowledgments section, module docstrings credit Tejas as a dependency author (not a git co-author — this is a dependency relationship, not co-authored code)
Benchmark status: Clean N=20 A/B on the same warm qwen3:8b shows 0pp delta between truncation and Kompress. Forensic analysis in Discussion #2 explains why this is consistent with Kompress working correctly — the benchmark was under-reporting success by ~15% due to harvest logic bugs, and once corrected the conclusion is "Kompress is neutral on this dataset, at ~1s/call latency cost." It's shipping as a neutral foundation — ready to pay off when we fix the upstream problems (noise dilution at ingest, signal extraction) that actually cap retrieval quality today.
Cross-session restart announcement protocol
When multiple Claude sessions share a single Helix server, one session can announce an
intentional restart so that observing sessions don't misread the outage as a crash. This
was the v0.3.0b4 work, previously held. See docs/RESTART_PROTOCOL.md for the full design.
- New method:
bridge.announce_restart(reason, actor, expected_downtime_s, pid)writes a canonicalserver_statesignal at~/.helix/shared/signals/server_state.json - New observer helper:
bridge.read_server_state()returns(signal, is_stale, age_s)tuple with TTL-aware staleness check - New HTTP endpoint:
POST /admin/announce_restartas a convenience wrapper - Atomic signal writes:
write_signalnow uses write-to-temp +os.replaceso readers never see partial writes (fixes a latent race on all signals, not justserver_state) - Lifespan hooks: server startup stamps
state=runningwith PID, clean shutdown stampsstate=stopped(does NOT run underkill -9, which is by design — agents should callannounce_restartbefore killing) - Tests: 6 new tests in
tests/test_bridge_restart.py
Benchmark state monitor (by laude)
Config-driven monitor that catches the three failure modes we hit during the SIKE and KV-harvest runs:
- Dual-load VRAM pressure — aborts before starting if a non-whitelisted model is resident alongside the benchmark target (caught the e4b + qwen3:4b bug that silently biased our first N=50 run)
- Hung benchmark process — detects
httpxstalls via incremental JSONL line-count stagnation (caught the N=1000 hang at 0 needles written) - Silent background contamination — fingerprints the genome snapshot at start and checks
mtime/sizeeach interval
Reads helix.toml via load_config() for genome paths — follows raude's A/B switches automatically. See docs/BENCHMARKS.md for usage.
Dynamic budget tiers (by laude)
Confidence-based expression window sizing. The window now adapts to retrieval score distribution:
- TIGHT (top_score/mean_score ≥ 3.0): top 3 genes, ~6K tokens
- FOCUSED (1.8–3.0): top 6 genes, ~9K tokens
- BROAD (<1.8): top
max_genesgenes, ~15K tokens
Score-gate floor raised from 20% → 15% to recover slightly more borderline signal. helix.toml ships with ribosome.warmup = false to prevent e4b auto-loading on startup (frees VRAM for benchmark workloads).
Ribosome pause endpoint + learn() timeout
Already in v0.3.0b3 but documented here for completeness — POST /admin/ribosome/pause monkey-patches backend.complete to raise, forcing the existing fallback paths. learn() is now wrapped in a 15s ThreadPoolExecutor timeout to prevent background replication from hanging on a slow Ollama.
Benchmark helper: compare_ab.py
New CLI that reads two bench_needle_1000.py result JSONs and prints a structured delta report with gate evaluation. Used throughout the Headroom A/B work. Exit codes encode the verdict (0=ship, 2=no gain, 3=both regressed).
Commits in this release
a94c864feat: dynamic budget tiers + warmup=false for VRAM contention (laude)5da9ab6feat: cross-session restart announcement protocol (v0.3.0b4)43e1543feat(context): add Headroom bridge for CPU semantic compression (v0.3.0b5 scaffold)a38c292feat(context): wire Headroom compression into retrieval seams + tests045854afeat(headroom): HELIX_DISABLE_HEADROOM env toggle for A/B benchmarking0d4edf5feat(bench): benchmark state monitor + BENCHMARKS.md (laude)065b142feat(bench): compare_ab.py — delta report + gate evaluation for A/B benchmark JSONs
Tests
305/305 passing (non-live). Zero regressions from any of the changes above.
Attribution
- Tejas Chopra — author and maintainer of Headroom. Thank you for the adoption call and for the clean ONNX-first design that let us integrate without pulling in the full torch stack.
- laude — paired session, contributed the dynamic budget tiers, benchmark state monitor, and kept the N=1000 benchmark work alive while raude was on the Headroom track
- raude (Claude Code Opus 4.6, 1M context) — Headroom integration, restart protocol, A/B infrastructure, forensic retrospective
Known issues
bench_needle_1000.pyKV harvest is too naive — extracts values from docstrings/comments and captures function-call expressions verbatim instead of resolving them. This produces ~15% false negatives on our N=20 sample. Tracked as a separate internal issue — will be fixed in a subsequent patch before the next public gain-claim benchmark.scripts/resequence_cpu.pydrops epigenetic state — access counts, co-activation edges, and query history aren't preserved across a resequence, which caused a 15-20pp retrieval regression when we tried it againstgenome_cpu.dbin this session. Will need a preserve-epigenetics pass or a merge-back path before it's a safe tool for production use.