Example tests leave Ollama models resident, starving subsequent tests

## Summary

When running the full `docs/examples/` suite, each example runs as a subprocess and loads its Ollama model. Ollama keeps models resident for the default 5-minute keep-alive timeout after each request. Heavyweight models (e.g. `ibm/granite3.3-guardian:8b` at ~11 GB) then sit in memory long after the example finishes, starving later tests of headroom.

This caused the SOFAI graph-colouring example to crash with a 500 from the model runner — not because the machine lacked memory in total, but because a guardian model from an earlier test was still resident.

## Contrast with `test/`

`test/conftest.py` already handles this: it warms up CI models with `keep_alive: -1` before the ollama group and evicts them with `keep_alive: 0` afterwards. `docs/examples/conftest.py` has no equivalent.

## Options

1. **Set `OLLAMA_KEEP_ALIVE=0`** (or a short timeout) as an env var for example runs — models unload immediately after each example finishes
2. **Add warm-up/eviction hooks** to `docs/examples/conftest.py`, similar to `test/conftest.py`
3. **Backend-level option** — allow `OllamaModelBackend` to accept a `keep_alive` parameter passed through to the Ollama API

Related: #783 (VRAM predicate uses static heuristic)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example tests leave Ollama models resident, starving subsequent tests #798

Summary

Contrast with `test/`

Options

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Example tests leave Ollama models resident, starving subsequent tests #798

Description

Summary

Contrast with test/

Options

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Contrast with `test/`