-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Summary
The run_tests_with_ollama.sh script has a fast path for when it detects an existing Ollama server (lines 82–84), but it blindly trusts that server without verifying it is actually functional. If the existing server is in a bad state, the entire test run fails with Ollama connectivity errors rather than a clear setup failure.
What goes wrong
When the script detects Ollama already running, it skips starting its own server and proceeds directly to model pulls and warmups. If a warmup times out, the script logs a warning but carries on anyway:
Warning: warmup for granite4:micro timed out (will load on first test)
The subsequent tests then error with "could not create OllamaModelBackend: ollama server not running at None" rather than failing fast. The run still takes the full ~80 minutes working through connection timeouts on every affected test before reporting the failures.
Suggested improvements
- Treat a warmup timeout as a fatal error rather than a warning — either
die()with a clear message or attempt to restart the server - When reusing an existing server, verify it is responsive with a lightweight check (e.g.
ollama ps) before proceeding to warmups - Consider adding a
--force-restart-ollamaflag for environments where stale servers are common
Context
Encountered during a manual cluster test run on an IBM LSF p-series GPU node (preemptable queue). The node had a stale Ollama server from a previous session that was running but unresponsive. Re-running after confirming no Ollama process was running produced a clean result.