-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Please read this first
- Have you read the docs?[Agents SDK docs] - YES
- Have you searched for related issues? Others may have faced similar issues. - YES
Describe the bug
The extract_all_content method currently raises a UserError("Unknown content") when given a message item of type input_audio.
However, the SDK’s parameter schema clearly defines that ResponseInputContentParam supports "input_audio" as a valid message type. This means the current implementation is incomplete and inconsistent with the declared interface.
As a result, any agent or converter handling audio input messages will fail at runtime.
Debug information
- Agents SDK version: 0.3.3
- Python version: 3.11
Repro steps
The issue can be reproduced with the following code examples:
Example 1:
from openai.types.responses.response_input_audio_param import InputAudio
from openai.types.responses import ResponseInputAudioParam
from agents import Agent, Runner, AsyncOpenAI
from openai import AsyncOpenAI
import asyncio
custom_client = AsyncOpenAI()
agent: Agent = Agent(name="Assistant", instructions="You are a helpful assistant")
input_items = [
dict(
content=[
ResponseInputAudioParam(
type="input_audio",
input_audio=InputAudio(
data="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==",
format="wav"
)
)],
role="user",
type="message"
)
]
result = asyncio.run(Runner.run(agent, input=input_items))
print("\nCALLING AGENT\n")
print(result.final_output)
Example 2:
from agents.models.openai_chatcompletions import Converter
from openai.types.responses.response_input_audio_param import InputAudio
from openai.types.responses import ResponseInputAudioParam
content = [ResponseInputAudioParam(
type="input_audio",
input_audio=InputAudio(
data="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==",
format="wav"
)
)]
Converter.extract_all_content(content)
This results in the following error:Traceback (most recent call last):
Traceback (most recent call last):
File "e:\bairong\zzy-scripts\test.py", line 54, in <module>
Converter.extract_all_content(content)
File "E:\bairong\zzy-scripts\venv\Lib\site-packages\agents\models\chatcmpl_converter.py", line 308, in extract_all_content
raise UserError(f"Unknown content: {c}")
agents.exceptions.UserError: Unknown content: {'type': 'input_audio', 'input_audio': {'data': 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==', 'format': 'wav'}}
Expected behavior
extract_all_content should support "input_audio" as a valid input type and convert it properly to a ChatCompletionContentPartAudioParam (or equivalent).
Proposed fix
Add handling for "input_audio" inside extract_all_content:
elif isinstance(c, dict) and c.get("type") == "input_audio":
casted_audio_param = cast(ResponseInputAudioParam, c)
if "input_audio" not in casted_audio_param or not casted_audio_param["input_audio"]:
raise UserError(
f"Only audio data is supported for input_audio {casted_audio_param}"
)
out.append(
ChatCompletionContentPartInputAudioParam(
type="input_audio",
input_audio={
'data': casted_audio_param["input_audio"].data,
'format': casted_audio_param["input_audio"].format
}
)
)
Additional context
The inconsistency causes failures when using models that require audio input (e.g., gpt-4o-audio-preview).