Skip to content

extra_body in OpenAIChatCompletionClient config is silently ignored when loaded via AutoGen Studio JSON #7418

@danmaxis

Description

@danmaxis

What happened?

extra_body in OpenAIChatCompletionClient config is silently ignored when loaded via AutoGen Studio JSON

Describe the bug

When configuring a model client through AutoGen Studio's JSON editor with an extra_body field
(e.g., to pass enable_thinking: false to a Qwen3-compatible endpoint), the field appears to be
silently dropped during deserialization. The extra parameters never reach the underlying HTTP
request, even though the same configuration works correctly when instantiated directly via Python.

I'm not 100% sure if this is a serialization issue in the Studio layer or a limitation in how
OpenAIChatCompletionClient handles extra_body during load_component() — happy to be
corrected if I'm misunderstanding the intended behavior.


To Reproduce

  1. Spin up AutoGen Studio with a local OpenAI-compatible endpoint that serves a Qwen3 model
    with enable_thinking active by default (e.g., LM Studio, llama-server, vLLM).

  2. Configure a model client via the Studio JSON editor with extra_body:

{
  "provider": "autogen_ext.models.openai.OpenAIChatCompletionClient",
  "component_type": "model",
  "version": 1,
  "component_version": 1,
  "label": "OpenAIChatCompletionClient",
  "config": {
    "model": "Qwen3-30B-A3B",
    "api_key": "placeholder",
    "base_url": "http://localhost:8080/v1",
    "extra_body": {
      "enable_thinking": false
    },
    "model_info": {
      "vision": false,
      "function_calling": true,
      "json_output": true,
      "structured_output": true,
      "family": "unknown",
      "context_window": 32768
    }
  }
}
  1. Assign this model client to an AssistantAgent inside a RoundRobinGroupChat team.

  2. Run any task in the Playground.

Observed error (from docker logs):

openai.BadRequestError: Error code: 400 - {
  'error': {
    'code': 400,
    'message': 'Assistant response prefill is incompatible with enable_thinking.',
    'type': 'invalid_request_error'
  }
}

The error confirms the endpoint is still receiving requests with enable_thinking active,
meaning extra_body: { enable_thinking: false } was never forwarded.

Full traceback:

File ".../autogen_agentchat/agents/_assistant_agent.py", line 955, in _call_llm
    model_result = await model_client.create(
File ".../autogen_ext/models/openai/_openai_client.py", line 624, in create
    result = await future
File ".../openai/resources/chat/completions/completions.py", line 2714, in create
    return await self._post(
...
openai.BadRequestError: Error code: 400 - {'error': {'code': 400,
  'message': 'Assistant response prefill is incompatible with enable_thinking.',
  'type': 'invalid_request_error'}}

Expected behavior

The extra_body field should be forwarded as-is to the underlying openai client's
create() / create_stream() calls, exactly as it would be when constructing
OpenAIChatCompletionClient directly in Python:

# This works fine in Python — extra_body is respected
client = OpenAIChatCompletionClient(
    model="Qwen3-30B-A3B",
    base_url="http://localhost:8080/v1",
    api_key="placeholder",
    extra_body={"enable_thinking": False},
    model_info=ModelInfo(...)
)

The same behavior should be achievable via the Studio JSON config, since the field is
supported by the underlying client.


Environment

  • autogenstudio version: latest via pip install autogenstudio (as of March 2026)
  • autogen-agentchat / autogen-ext: installed as dependencies
  • Running inside Docker (python:3.11-slim base image)
  • Local model server: LM Studio / llama-server (OpenAI-compatible endpoint)
  • Model: Qwen3 family (any variant with enable_thinking support)

Additional context

This affects any use case requiring vendor-specific parameters that go beyond the standard
OpenAI API spec — enable_thinking for Qwen3 being a common one, but the same issue would
surface with other extra_body fields used by vLLM, TabbyAPI, or similar servers.

A workaround is to disable enable_thinking at the inference server level directly, or to
insert <think>\n\n</think> at the start of the agent's system_message to trick the
model's chat template into skipping the thinking block. Neither is ideal.

If extra_body deserialization is intentionally unsupported in the component config, it
would be helpful to document this limitation and suggest the recommended alternative.

Thanks for the great project — really appreciate the work going into AutoGen Studio!

Which packages was the bug in?

AutoGen Studio (autogensudio)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

Qwen3.5-35B-A3B-Q4_K_M

Model provider

LlamaCpp

Other model provider

No response

Python version

3.11

.NET version

None

Operating system

Other

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions