What happened?
extra_body in OpenAIChatCompletionClient config is silently ignored when loaded via AutoGen Studio JSON
Describe the bug
When configuring a model client through AutoGen Studio's JSON editor with an extra_body field
(e.g., to pass enable_thinking: false to a Qwen3-compatible endpoint), the field appears to be
silently dropped during deserialization. The extra parameters never reach the underlying HTTP
request, even though the same configuration works correctly when instantiated directly via Python.
I'm not 100% sure if this is a serialization issue in the Studio layer or a limitation in how
OpenAIChatCompletionClient handles extra_body during load_component() — happy to be
corrected if I'm misunderstanding the intended behavior.
To Reproduce
-
Spin up AutoGen Studio with a local OpenAI-compatible endpoint that serves a Qwen3 model
with enable_thinking active by default (e.g., LM Studio, llama-server, vLLM).
-
Configure a model client via the Studio JSON editor with extra_body:
{
"provider": "autogen_ext.models.openai.OpenAIChatCompletionClient",
"component_type": "model",
"version": 1,
"component_version": 1,
"label": "OpenAIChatCompletionClient",
"config": {
"model": "Qwen3-30B-A3B",
"api_key": "placeholder",
"base_url": "http://localhost:8080/v1",
"extra_body": {
"enable_thinking": false
},
"model_info": {
"vision": false,
"function_calling": true,
"json_output": true,
"structured_output": true,
"family": "unknown",
"context_window": 32768
}
}
}
-
Assign this model client to an AssistantAgent inside a RoundRobinGroupChat team.
-
Run any task in the Playground.
Observed error (from docker logs):
openai.BadRequestError: Error code: 400 - {
'error': {
'code': 400,
'message': 'Assistant response prefill is incompatible with enable_thinking.',
'type': 'invalid_request_error'
}
}
The error confirms the endpoint is still receiving requests with enable_thinking active,
meaning extra_body: { enable_thinking: false } was never forwarded.
Full traceback:
File ".../autogen_agentchat/agents/_assistant_agent.py", line 955, in _call_llm
model_result = await model_client.create(
File ".../autogen_ext/models/openai/_openai_client.py", line 624, in create
result = await future
File ".../openai/resources/chat/completions/completions.py", line 2714, in create
return await self._post(
...
openai.BadRequestError: Error code: 400 - {'error': {'code': 400,
'message': 'Assistant response prefill is incompatible with enable_thinking.',
'type': 'invalid_request_error'}}
Expected behavior
The extra_body field should be forwarded as-is to the underlying openai client's
create() / create_stream() calls, exactly as it would be when constructing
OpenAIChatCompletionClient directly in Python:
# This works fine in Python — extra_body is respected
client = OpenAIChatCompletionClient(
model="Qwen3-30B-A3B",
base_url="http://localhost:8080/v1",
api_key="placeholder",
extra_body={"enable_thinking": False},
model_info=ModelInfo(...)
)
The same behavior should be achievable via the Studio JSON config, since the field is
supported by the underlying client.
Environment
autogenstudio version: latest via pip install autogenstudio (as of March 2026)
autogen-agentchat / autogen-ext: installed as dependencies
- Running inside Docker (
python:3.11-slim base image)
- Local model server: LM Studio / llama-server (OpenAI-compatible endpoint)
- Model: Qwen3 family (any variant with
enable_thinking support)
Additional context
This affects any use case requiring vendor-specific parameters that go beyond the standard
OpenAI API spec — enable_thinking for Qwen3 being a common one, but the same issue would
surface with other extra_body fields used by vLLM, TabbyAPI, or similar servers.
A workaround is to disable enable_thinking at the inference server level directly, or to
insert <think>\n\n</think> at the start of the agent's system_message to trick the
model's chat template into skipping the thinking block. Neither is ideal.
If extra_body deserialization is intentionally unsupported in the component config, it
would be helpful to document this limitation and suggest the recommended alternative.
Thanks for the great project — really appreciate the work going into AutoGen Studio!
Which packages was the bug in?
AutoGen Studio (autogensudio)
AutoGen library version.
Python dev (main branch)
Other library version.
No response
Model used
Qwen3.5-35B-A3B-Q4_K_M
Model provider
LlamaCpp
Other model provider
No response
Python version
3.11
.NET version
None
Operating system
Other
What happened?
extra_bodyinOpenAIChatCompletionClientconfig is silently ignored when loaded via AutoGen Studio JSONDescribe the bug
When configuring a model client through AutoGen Studio's JSON editor with an
extra_bodyfield(e.g., to pass
enable_thinking: falseto a Qwen3-compatible endpoint), the field appears to besilently dropped during deserialization. The extra parameters never reach the underlying HTTP
request, even though the same configuration works correctly when instantiated directly via Python.
I'm not 100% sure if this is a serialization issue in the Studio layer or a limitation in how
OpenAIChatCompletionClienthandlesextra_bodyduringload_component()— happy to becorrected if I'm misunderstanding the intended behavior.
To Reproduce
Spin up AutoGen Studio with a local OpenAI-compatible endpoint that serves a Qwen3 model
with
enable_thinkingactive by default (e.g., LM Studio, llama-server, vLLM).Configure a model client via the Studio JSON editor with
extra_body:{ "provider": "autogen_ext.models.openai.OpenAIChatCompletionClient", "component_type": "model", "version": 1, "component_version": 1, "label": "OpenAIChatCompletionClient", "config": { "model": "Qwen3-30B-A3B", "api_key": "placeholder", "base_url": "http://localhost:8080/v1", "extra_body": { "enable_thinking": false }, "model_info": { "vision": false, "function_calling": true, "json_output": true, "structured_output": true, "family": "unknown", "context_window": 32768 } } }Assign this model client to an
AssistantAgentinside aRoundRobinGroupChatteam.Run any task in the Playground.
Observed error (from
docker logs):The error confirms the endpoint is still receiving requests with
enable_thinkingactive,meaning
extra_body: { enable_thinking: false }was never forwarded.Full traceback:
Expected behavior
The
extra_bodyfield should be forwarded as-is to the underlyingopenaiclient'screate()/create_stream()calls, exactly as it would be when constructingOpenAIChatCompletionClientdirectly in Python:The same behavior should be achievable via the Studio JSON config, since the field is
supported by the underlying client.
Environment
autogenstudioversion: latest viapip install autogenstudio(as of March 2026)autogen-agentchat/autogen-ext: installed as dependenciespython:3.11-slimbase image)enable_thinkingsupport)Additional context
This affects any use case requiring vendor-specific parameters that go beyond the standard
OpenAI API spec —
enable_thinkingfor Qwen3 being a common one, but the same issue wouldsurface with other
extra_bodyfields used by vLLM, TabbyAPI, or similar servers.A workaround is to disable
enable_thinkingat the inference server level directly, or toinsert
<think>\n\n</think>at the start of the agent'ssystem_messageto trick themodel's chat template into skipping the thinking block. Neither is ideal.
If
extra_bodydeserialization is intentionally unsupported in the component config, itwould be helpful to document this limitation and suggest the recommended alternative.
Thanks for the great project — really appreciate the work going into AutoGen Studio!
Which packages was the bug in?
AutoGen Studio (autogensudio)
AutoGen library version.
Python dev (main branch)
Other library version.
No response
Model used
Qwen3.5-35B-A3B-Q4_K_M
Model provider
LlamaCpp
Other model provider
No response
Python version
3.11
.NET version
None
Operating system
Other