Patch release — VRAM contention survival

Two fixes born from a real incident: an external benchmark run (1000-needle test against qwen3:4b) crashed the Helix server when it competed with the ribosome model (gemma4:e4b) for GPU VRAM and triggered a cascade of httpx timeouts.

What's New

/admin/ribosome/pause — unload the ribosome without killing Helix

New admin endpoints let you disable the ribosome's LLM calls at runtime without restarting the server or losing state:

Endpoint	Purpose
`POST /admin/ribosome/pause`	Monkey-patch `backend.complete()` to raise
`POST /admin/ribosome/resume`	Restore the original backend method
`GET /admin/ribosome/status`	Check if currently paused

How it works: Ribosome.replicate() already has a fallback path that synthesizes a minimal gene from the raw exchange if the LLM call fails. Pausing forces that fallback to engage every time. The ribosome instance stays in memory, the backend connection stays alive — only the complete() method is swapped.

Workflow for benchmark VRAM rescue:
```bash

Free the ribosome model from Ollama

curl -X POST localhost:11437/admin/ribosome/pause
curl -X POST localhost:11434/api/generate
-d '{"model": "gemma4:e4b", "keep_alive": 0, "prompt": ""}'

Run your benchmark against qwen3:4b (or any other model)

python benchmarks/bench_needle_1000.py

Restore normal operation

curl -X POST localhost:11437/admin/ribosome/resume
```

Why this matters: Before this release, pausing the ribosome required either a config change + restart, or manually killing the Python process and relaunching. Both strategies drop in-flight /context requests and break the Continue integration. The pause endpoint is instantaneous and non-disruptive — /context queries continue to work unchanged because they don't currently route through the ribosome (ingestion uses CpuTagger, rerank is disabled, splice is a no-op in the current code path).

learn() timeout wrapper

HelixContextManager.learn() now wraps the ribosome.replicate() call in a ThreadPoolExecutor with a 15-second timeout.

Root cause of the original crash: During the n1000 benchmark, a background learn() task fired for each completed proxy request. learn() called ribosome.replicate() which went through httpx to Ollama, which was busy serving the benchmark's qwen3:4b inference requests. The learn() call sat in Ollama's request queue for over 120 seconds and eventually hit httpx's ReadTimeout, which propagated up and crashed the server.

Fix: If replicate() doesn't return within 15 seconds, learn() cancels the future, synthesizes the same minimal gene that Ribosome.replicate()'s existing fallback path produces, and moves on. The background task drops cleanly instead of blocking indefinitely.

Signature change: learn(query, response, timeout_s: float = 15.0). Backward-compatible — old two-arg callers still work with the default timeout.

Validation

179 tests passing (full suite)
Server restarted cleanly on new code during live session
/admin/ribosome/pause confirmed working against the live genome (7,380 genes)
gemma4:e4b unloaded from Ollama mid-session, VRAM freed
/context queries continued returning correct results with the ribosome paused (coverage=1.0, ellipticity=0.645)
/admin/ribosome/resume not yet tested end-to-end (will be on next benchmark completion)

Migration notes

No migration needed. The new endpoints are additive, the learn() signature change is backward-compatible, and the default behavior is unchanged unless you explicitly call /admin/ribosome/pause.

🤖 Generated with Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0b3 — Ribosome pause + learn() timeout

Choose a tag to compare

Sorry, something went wrong.