Summary
When running the full docs/examples/ suite, each example runs as a subprocess and loads its Ollama model. Ollama keeps models resident for the default 5-minute keep-alive timeout after each request. Heavyweight models (e.g. ibm/granite3.3-guardian:8b at ~11 GB) then sit in memory long after the example finishes, starving later tests of headroom.
This caused the SOFAI graph-colouring example to crash with a 500 from the model runner — not because the machine lacked memory in total, but because a guardian model from an earlier test was still resident.
Contrast with test/
test/conftest.py already handles this: it warms up CI models with keep_alive: -1 before the ollama group and evicts them with keep_alive: 0 afterwards. docs/examples/conftest.py has no equivalent.
Options
- Set
OLLAMA_KEEP_ALIVE=0 (or a short timeout) as an env var for example runs — models unload immediately after each example finishes
- Add warm-up/eviction hooks to
docs/examples/conftest.py, similar to test/conftest.py
- Backend-level option — allow
OllamaModelBackend to accept a keep_alive parameter passed through to the Ollama API
Related: #783 (VRAM predicate uses static heuristic)
Summary
When running the full
docs/examples/suite, each example runs as a subprocess and loads its Ollama model. Ollama keeps models resident for the default 5-minute keep-alive timeout after each request. Heavyweight models (e.g.ibm/granite3.3-guardian:8bat ~11 GB) then sit in memory long after the example finishes, starving later tests of headroom.This caused the SOFAI graph-colouring example to crash with a 500 from the model runner — not because the machine lacked memory in total, but because a guardian model from an earlier test was still resident.
Contrast with
test/test/conftest.pyalready handles this: it warms up CI models withkeep_alive: -1before the ollama group and evicts them withkeep_alive: 0afterwards.docs/examples/conftest.pyhas no equivalent.Options
OLLAMA_KEEP_ALIVE=0(or a short timeout) as an env var for example runs — models unload immediately after each example finishesdocs/examples/conftest.py, similar totest/conftest.pyOllamaModelBackendto accept akeep_aliveparameter passed through to the Ollama APIRelated: #783 (VRAM predicate uses static heuristic)