v0.3.0b3 — Ribosome pause + learn() timeout
Patch release — VRAM contention survival
Two fixes born from a real incident: an external benchmark run (1000-needle test against qwen3:4b) crashed the Helix server when it competed with the ribosome model (gemma4:e4b) for GPU VRAM and triggered a cascade of httpx timeouts.
What's New
/admin/ribosome/pause — unload the ribosome without killing Helix
New admin endpoints let you disable the ribosome's LLM calls at runtime without restarting the server or losing state:
| Endpoint | Purpose |
|---|---|
POST /admin/ribosome/pause |
Monkey-patch backend.complete() to raise |
POST /admin/ribosome/resume |
Restore the original backend method |
GET /admin/ribosome/status |
Check if currently paused |
How it works: Ribosome.replicate() already has a fallback path that synthesizes a minimal gene from the raw exchange if the LLM call fails. Pausing forces that fallback to engage every time. The ribosome instance stays in memory, the backend connection stays alive — only the complete() method is swapped.
Workflow for benchmark VRAM rescue:
```bash
Free the ribosome model from Ollama
curl -X POST localhost:11437/admin/ribosome/pause
curl -X POST localhost:11434/api/generate
-d '{"model": "gemma4:e4b", "keep_alive": 0, "prompt": ""}'
Run your benchmark against qwen3:4b (or any other model)
python benchmarks/bench_needle_1000.py
Restore normal operation
curl -X POST localhost:11437/admin/ribosome/resume
```
Why this matters: Before this release, pausing the ribosome required either a config change + restart, or manually killing the Python process and relaunching. Both strategies drop in-flight /context requests and break the Continue integration. The pause endpoint is instantaneous and non-disruptive — /context queries continue to work unchanged because they don't currently route through the ribosome (ingestion uses CpuTagger, rerank is disabled, splice is a no-op in the current code path).
learn() timeout wrapper
HelixContextManager.learn() now wraps the ribosome.replicate() call in a ThreadPoolExecutor with a 15-second timeout.
Root cause of the original crash: During the n1000 benchmark, a background learn() task fired for each completed proxy request. learn() called ribosome.replicate() which went through httpx to Ollama, which was busy serving the benchmark's qwen3:4b inference requests. The learn() call sat in Ollama's request queue for over 120 seconds and eventually hit httpx's ReadTimeout, which propagated up and crashed the server.
Fix: If replicate() doesn't return within 15 seconds, learn() cancels the future, synthesizes the same minimal gene that Ribosome.replicate()'s existing fallback path produces, and moves on. The background task drops cleanly instead of blocking indefinitely.
Signature change: learn(query, response, timeout_s: float = 15.0). Backward-compatible — old two-arg callers still work with the default timeout.
Validation
- 179 tests passing (full suite)
- Server restarted cleanly on new code during live session
/admin/ribosome/pauseconfirmed working against the live genome (7,380 genes)- gemma4:e4b unloaded from Ollama mid-session, VRAM freed
/contextqueries continued returning correct results with the ribosome paused (coverage=1.0, ellipticity=0.645)/admin/ribosome/resumenot yet tested end-to-end (will be on next benchmark completion)
Migration notes
No migration needed. The new endpoints are additive, the learn() signature change is backward-compatible, and the default behavior is unchanged unless you explicitly call /admin/ribosome/pause.
🤖 Generated with Claude Code