Skip to content

Issues with qwen3:32b, llama4:16x17b, gpt-oss:20b #597

@wolframwi

Description

@wolframwi

Up until recently, the Ollama Python library worked with all models that I downloaded, but now I run into issues with some of the newer models:

For reference, here is how I call Ollama:

    response = ollama.generate(
        model=llm_engine,
        prompt=query,
        stream=False,
        format=response_model.model_json_schema(),
        options={"temperature": 0.0, "top_k": 1, "num_ctx": num_ctx},
        images=images,
    )

(1) Minor -- qwen3:32b responds only in "thinking" instead of "response":

Instead of just using response["response"], I now need to use

response_text = (response["response"] or response["thinking"] or "").strip()

since qwen3:32b consistently responds in response["thinking"] and leaves response["response"] blank.

(2) Major -- llama4:16x17b and gpt-oss:20b don't seem to provide any response:

Here is the response I got from invoking gpt-oss:20b:

model='gpt-oss:20b' created_at='2025-11-03T09:07:14.833064Z' done=True done_reason='stop' total_duration=11868059417 load_duration=3789110375 prompt_eval_count=2640 prompt_eval_duration=7464908083 eval_count=7 eval_duration=371895000 response='' thinking=None context=[...]

Apologies in advance if I'm doing something wrong.

Many thanks
Wolfram

I am running Ollama's Python client version 0.6.0 and Ollama version 0.12.9.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions