Skip to content

Ability to Play/Inject Audio During SIP Tool Calls #710

@LordOfMim

Description

@LordOfMim

When executing tool calls via the OpenAI SIP interface, there currently doesn’t seem to be a way to play or inject audio to indicate that an action is happening. This kind of feedback would be very helpful in user-facing voice applications, where silence can be mistaken for inactivity.

With the Web SDK, it’s possible to detect pauses and insert audio cues manually, but I haven’t found an equivalent approach for SIP. It would be great if SIP provided a built-in method or event hook to trigger audio output during tool execution.

Proposed Enhancement:

  • Add a mechanism within SIP to output audio (e.g., tones, brief clips, TTS) while a tool call is being processed.
  • Alternatively, expose events that allow developers to inject audio at specific points in the SIP pipeline.

This feature would significantly improve UX for voice-driven applications relying on SIP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions