Skip to content

Support half-duplex mode for Openai Realtime API#814

Merged
toubatbrian merged 14 commits intomainfrom
brian/realtime-with-tts
Nov 13, 2025
Merged

Support half-duplex mode for Openai Realtime API#814
toubatbrian merged 14 commits intomainfrom
brian/realtime-with-tts

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian commented Nov 7, 2025

Allow openai realtime model to have text output piped with a custom TTS model

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Nov 7, 2025

🦋 Changeset detected

Latest commit: 0cb3f42

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-openai Patch
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@toubatbrian toubatbrian changed the title brianyin/ajs-322-openai-half-duplex-mode [draft] brianyin/ajs-322-openai-half-duplex-mode Nov 7, 2025
@toubatbrian toubatbrian changed the title [draft] brianyin/ajs-322-openai-half-duplex-mode brianyin/ajs-322-openai-half-duplex-mode Nov 7, 2025
@samuelcastro
Copy link
Copy Markdown

samuelcastro commented Nov 7, 2025

@Shubhrakanti @theomonnom Any idea when we could get this reviewed and deployed? This is blocking us atm. Thank you!

@samuelcastro
Copy link
Copy Markdown

@toubatbrian I tested this changes and initially it worked just fine (with some latency) but after follow up questions the agent remains silent, my logs:

{"level":40,"time":1762644702889,"pid":63178,"hostname":"Sams-MacBook-Pro.local","msg":"SegmentSynchronizerImpl text marked as ended in capture text, rotating segment"}
{"level":50,"time":1762644703046,"pid":63178,"hostname":"Sams-MacBook-Pro.local","msg":"Error in SynthesizeStream"}
{"level":30,"time":1762644703049,"pid":63178,"hostname":"Sams-MacBook-Pro.local","ttftMs":302,"input_tokens":854,"cached_input_tokens":832,"output_tokens":17,"total_tokens":871,"tokens_per_second":36.48,"msg":"RealtimeModel metrics"}
{"level":40,"time":1762644715328,"pid":63164,"hostname":"Sams-MacBook-Pro.local","msg":"job is unresponsive"}

@toubatbrian
Copy link
Copy Markdown
Contributor Author

@samuelcastro, which STT / TTS are you using. I've tested multiple time on my end as well and it worked fine. Could you also post the full logs?

@samuelcastro
Copy link
Copy Markdown

@toubatbrian Tested with cartesia sonic 3.

@samuelcastro
Copy link
Copy Markdown

@toubatbrian Any idea when we can get this in?

@toubatbrian
Copy link
Copy Markdown
Contributor Author

@samuelcastro I've tested with sonic-3, and it worked also fine. Have you tested with this example: https://github.com/livekit/agents-js/blob/06eceabc78c2d8b14071e8eef43c0c0e1fe74c78/examples/src/realtime_with_tts.ts?

Any idea when we can get this in?

Let me check with my team and I'll get back to you shortly!

@slpn1
Copy link
Copy Markdown

slpn1 commented Nov 12, 2025

@toubatbrian I tested this changes and initially it worked just fine (with some latency) but after follow up questions the agent remains silent, my logs:

@toubatbrian You're not encountering the 5 responses limit when using cartesia without an API key are you?

@toubatbrian
Copy link
Copy Markdown
Contributor Author

@samuelcastro You can also try Livekit Inference Gateway: https://docs.livekit.io/agents/models/tts/inference/cartesia/, with something like:

import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
    // ... tts, stt, vad, turn_detection, etc.
});

or (if you want more custom control):

import { AgentSession } from '@livekit/agents';

session = new AgentSession({
    tts: new inference.TTS({ 
        model: "cartesia/sonic-3", 
        voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", 
        language: "en",
        modelOptions: {
            speed: 1.5,
            volume: 1.2,
            emotion: "excited"
        }
    }),
    // ... tts, stt, vad, turn_detection, etc.
});

This would be much easier to test and setup

@samuelcastro
Copy link
Copy Markdown

ok great @toubatbrian I will test it again.

@toubatbrian toubatbrian changed the title brianyin/ajs-322-openai-half-duplex-mode Support half-duplex mode for Openai Realtime API Nov 12, 2025
@toubatbrian toubatbrian merged commit 9a58cd3 into main Nov 13, 2025
8 checks passed
@toubatbrian toubatbrian deleted the brian/realtime-with-tts branch November 13, 2025 09:27
@github-actions github-actions Bot mentioned this pull request Nov 13, 2025
@simllll simllll mentioned this pull request Nov 17, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants