Skip to content

OllamaLanguageModel: vision images sent at request top level, ignored by /api/chat #167

Description

@james-333i

Summary

Image input to Ollama vision models (e.g. llava) is silently dropped. OllamaLanguageModel base64-encodes the image correctly but attaches it at the top level of the request body (images), while it POSTs to /api/chat — which only reads images from inside each message (messages[].images). Top-level images is an /api/generate field; /api/chat ignores it, so the model never receives the image and responds as if only text was sent.

Environment

  • AnyLanguageModel 0.8.0
  • Backend: Ollama (/api/chat), model llava

Steps to reproduce

let model = OllamaLanguageModel(baseURL: URL(string: "http://localhost:11434")!, model: "llava")
let session = LanguageModelSession(model: model)
let image = Transcript.ImageSegment(/* JPEG data */)
let reply = try await session.respond(to: "What's in this image?", images: [image])
// reply describes nothing / asks for an image — the image was never delivered

Root cause

In Sources/AnyLanguageModel/Models/OllamaLanguageModel.swift:

The user message is built without images, and images are passed separately into createChatParams, which places them at the top level:

let (ollamaText, ollamaImages) = convertSegmentsToOllama(userSegments)
let messages = [ OllamaMessage(role: .user, content: ollamaText) ]   // no images
...
let params = try createChatParams(
    ..., images: ollamaImages.isEmpty ? nil : ollamaImages, ...      // top-level
)
let url = baseURL.appendingPathComponent("api/chat")                 // chat endpoint

createChatParams writes them to the request root:

if let images, !images.isEmpty {
    params["images"] = .array(images.map { .string($0) })            // wrong level for /api/chat
}

And OllamaMessage has no images field at all:

private struct OllamaMessage: Hashable, Codable, Sendable {
    enum Role: String, ... { case system, user, assistant, tool }
    let role: Role
    let content: String
    // no `images`
}

Ollama's /api/chat expects:

{ "model": "llava", "messages": [ { "role": "user", "content": "...", "images": ["<base64>"] } ] }

This affects both the non-streaming and streaming chat paths (both build messages this way and POST to /api/chat).

Suggested fix

  1. Add an optional images field to OllamaMessage:
private struct OllamaMessage: Hashable, Codable, Sendable {
    let role: Role
    let content: String
    let images: [String]?
}
  1. Attach the images to the user message instead of the request root:
let messages = [
    OllamaMessage(role: .user, content: ollamaText,
                  images: ollamaImages.isEmpty ? nil : ollamaImages)
]
  1. Remove the top-level images handling from createChatParams (and drop the images: parameter), since /api/chat doesn't use it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions