Skip to content

test(backends): fix vision Ollama tests failing in CI with 400 model does not support vision (#1185)#1188

Merged
ajbozarth merged 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1185
Jun 2, 2026
Merged

test(backends): fix vision Ollama tests failing in CI with 400 model does not support vision (#1185)#1188
ajbozarth merged 2 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1185

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Jun 2, 2026

Summary

Two vision tests — test_image_block_in_instruction and test_image_block_in_chat — have been silently broken in CI. The fixture used start_session("ollama") with no model, which defaulted to granite4.1:3b. Sending an image to a text-only model gets a 400 model does not support vision, which means post_processing never ran and the image-embedding assertions were never reached. The tests appeared to pass (or skip gracefully) while doing nothing useful.

The deeper issue: there was no way to construct an OllamaModelBackend without a live server. This PR fixes that, then fixes the tests on top of it.

What changed

test/backends/conftest.py (new) — a shared mock_ollama_backend fixture that patches all four server-touching points in __init__ and returns a fully constructed backend offline. Other tests can now reuse this instead of rolling their own patch blocks.

test/backends/test_vision_ollama.py (rewrite) — three tiers:

  • Construction: pure ImageBlock logic, no backend. Module-level e2e/ollama markers removed; runs as a plain unit test.
  • Structural payload: verifies that images=[...] lands correctly in the Ollama message dict. Uses the offline fixture with _async_client patched via PropertyMock at the class level — required because the property is event-loop-keyed and a simple instance attribute is bypassed when _run_async_in_thread spins up a background thread. Runs in CI with no server, no vision model.
  • Dormant e2e: full round-trip against granite-vision-4.1, skipped until the model appears in the Ollama library. The skip gate clears automatically once ollama pull granite-vision-4.1 works. Activation checklist in test(vision): activate Ollama vision e2e once granite-vision-4.1 lands on Ollama #1187.

test/backends/test_ollama_unit.py (refactor) — swapped the private _make_backend() helper for the shared fixture. No behaviour change.

Notes

Closes #1185. Dormant e2e tracked in #1187.

test/backends/test_vision_openai.py has the same structural gap (assertions hidden behind qualitative + xfail). Worth a follow-up.

Testing

# No Ollama server running:
uv run pytest test/backends/test_vision_ollama.py -v
# 4 passed, 2 skipped (dormant e2e)

uv run pytest test/backends/test_ollama_unit.py -v
# 18 passed

CI: quality matrix green on 3.11, 3.12, 3.13.

@planetf1 planetf1 marked this pull request as ready for review June 2, 2026 10:40
@planetf1 planetf1 requested a review from a team as a code owner June 2, 2026 10:40
…does not support vision

Structural payload tests (test_image_block_in_instruction, test_image_block_in_chat)
were failing in CI because the m_session fixture called start_session() with no
model_id, resolving to IBM_GRANITE_4_1_3B (granite4.1:3b) — a text-only model.
Attaching images caused Ollama to reject the request with 400, preventing
post_processing from running and the structural assertions from ever executing.

Fixes are:

1. Add test/backends/conftest.py with a shared mock_ollama_backend fixture that
   constructs an OllamaModelBackend entirely offline (patches _check_ollama_server,
   _pull_ollama_model, ollama.Client, ollama.AsyncClient). No live server required.

2. Rewrite test_vision_ollama.py into three tiers:
   - Tier 1 (construction): pure ImageBlock unit tests, no model or server.
   - Tier 2 (structural payload): mocked offline tests that verify images are
     embedded correctly in the Ollama conversation payload. The _async_client
     property is mocked via PropertyMock at the class level so the mock is
     returned regardless of which event loop _run_async_in_thread creates in
     its background thread. Runs in CI unconditionally.
   - Tier 3 (dormant e2e): skipped until granite-vision-4.1 lands on Ollama;
     tracked in generative-computing#1187.

3. Refactor test_ollama_unit.py to use the shared mock_ollama_backend fixture,
   removing the duplicated _make_backend() helper.

Closes generative-computing#1185.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the worktree-issue-1185 branch from 0a5722f to 862182c Compare June 2, 2026 10:43
Remove dead code after unconditional pytest.skip() in vision_session —
the availability check is now the sole gate, which auto-activates once
granite-vision-4.1 lands on Ollama. Update conftest.py docstring to show
the correct PropertyMock class-level patching pattern (instance assignment
does not override the event-loop-keyed _async_client property). Compute
ImageBlock.from_pil_image() once in test_image_block_in_chat instead of
three times.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@ajbozarth ajbozarth added this pull request to the merge queue Jun 2, 2026
Merged via the queue into generative-computing:main with commit 006485c Jun 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(vision): test_vision_ollama CI path sends images to non-vision model, causing 400 Bad Request

3 participants