Skip to content

Example tests leave Ollama models resident, starving subsequent tests #798

@planetf1

Description

@planetf1

Summary

When running the full docs/examples/ suite, each example runs as a subprocess and loads its Ollama model. Ollama keeps models resident for the default 5-minute keep-alive timeout after each request. Heavyweight models (e.g. ibm/granite3.3-guardian:8b at ~11 GB) then sit in memory long after the example finishes, starving later tests of headroom.

This caused the SOFAI graph-colouring example to crash with a 500 from the model runner — not because the machine lacked memory in total, but because a guardian model from an earlier test was still resident.

Contrast with test/

test/conftest.py already handles this: it warms up CI models with keep_alive: -1 before the ollama group and evicts them with keep_alive: 0 afterwards. docs/examples/conftest.py has no equivalent.

Options

  1. Set OLLAMA_KEEP_ALIVE=0 (or a short timeout) as an env var for example runs — models unload immediately after each example finishes
  2. Add warm-up/eviction hooks to docs/examples/conftest.py, similar to test/conftest.py
  3. Backend-level option — allow OllamaModelBackend to accept a keep_alive parameter passed through to the Ollama API

Related: #783 (VRAM predicate uses static heuristic)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtesting

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions