Skip to content

openai_chatcompletions.Converter.extract_all_content does not support input_audio type items #1916

@yu-supplication

Description

@yu-supplication

Please read this first

  • Have you read the docs?[Agents SDK docs] - YES
  • Have you searched for related issues? Others may have faced similar issues. - YES

Describe the bug

The extract_all_content method currently raises a UserError("Unknown content") when given a message item of type input_audio.
However, the SDK’s parameter schema clearly defines that ResponseInputContentParam supports "input_audio" as a valid message type. This means the current implementation is incomplete and inconsistent with the declared interface.
As a result, any agent or converter handling audio input messages will fail at runtime.

Debug information

  • Agents SDK version: 0.3.3
  • Python version: 3.11

Repro steps

The issue can be reproduced with the following code examples:
Example 1:

from openai.types.responses.response_input_audio_param import InputAudio
from openai.types.responses import ResponseInputAudioParam
from agents import Agent, Runner, AsyncOpenAI
from openai import AsyncOpenAI
import asyncio

custom_client = AsyncOpenAI()
agent: Agent = Agent(name="Assistant", instructions="You are a helpful assistant")

input_items = [
    dict(
        content=[
            ResponseInputAudioParam(
                type="input_audio",
                input_audio=InputAudio(
                    data="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==",
                    format="wav"
                )
            )],
        role="user",
        type="message"
    )
]
result = asyncio.run(Runner.run(agent, input=input_items))

print("\nCALLING AGENT\n")
print(result.final_output)

Example 2:

from agents.models.openai_chatcompletions import Converter
from openai.types.responses.response_input_audio_param import InputAudio
from openai.types.responses import ResponseInputAudioParam

content = [ResponseInputAudioParam(
                type="input_audio",
                input_audio=InputAudio(
                    data="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==",
                    format="wav"
                )
            )]
Converter.extract_all_content(content)

This results in the following error:Traceback (most recent call last):

Traceback (most recent call last):
  File "e:\bairong\zzy-scripts\test.py", line 54, in <module>
    Converter.extract_all_content(content)
  File "E:\bairong\zzy-scripts\venv\Lib\site-packages\agents\models\chatcmpl_converter.py", line 308, in extract_all_content
    raise UserError(f"Unknown content: {c}")
agents.exceptions.UserError: Unknown content: {'type': 'input_audio', 'input_audio': {'data': 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==', 'format': 'wav'}}   

Expected behavior

extract_all_content should support "input_audio" as a valid input type and convert it properly to a ChatCompletionContentPartAudioParam (or equivalent).

Proposed fix

Add handling for "input_audio" inside extract_all_content:

elif isinstance(c, dict) and c.get("type") == "input_audio":
    casted_audio_param = cast(ResponseInputAudioParam, c)
    if "input_audio" not in casted_audio_param or not casted_audio_param["input_audio"]:
        raise UserError(
            f"Only audio data is supported for input_audio {casted_audio_param}"
        )
    out.append(
        ChatCompletionContentPartInputAudioParam(
            type="input_audio",
            input_audio={
                'data': casted_audio_param["input_audio"].data,
                'format': casted_audio_param["input_audio"].format
            }
        )
    )

Additional context

The inconsistency causes failures when using models that require audio input (e.g., gpt-4o-audio-preview).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions