Support half-duplex mode for Openai Realtime API#814
Conversation
🦋 Changeset detectedLatest commit: 0cb3f42 The changes in this PR will be included in the next version bump. This PR includes changesets to release 14 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@Shubhrakanti @theomonnom Any idea when we could get this reviewed and deployed? This is blocking us atm. Thank you! |
|
@toubatbrian I tested this changes and initially it worked just fine (with some latency) but after follow up questions the agent remains silent, my logs: |
|
@samuelcastro, which STT / TTS are you using. I've tested multiple time on my end as well and it worked fine. Could you also post the full logs? |
|
@toubatbrian Tested with cartesia sonic 3. |
|
@toubatbrian Any idea when we can get this in? |
|
@samuelcastro I've tested with sonic-3, and it worked also fine. Have you tested with this example: https://github.com/livekit/agents-js/blob/06eceabc78c2d8b14071e8eef43c0c0e1fe74c78/examples/src/realtime_with_tts.ts?
Let me check with my team and I'll get back to you shortly! |
@toubatbrian You're not encountering the 5 responses limit when using cartesia without an API key are you? |
|
@samuelcastro You can also try Livekit Inference Gateway: https://docs.livekit.io/agents/models/tts/inference/cartesia/, with something like: import { AgentSession } from '@livekit/agents';
session = new AgentSession({
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
// ... tts, stt, vad, turn_detection, etc.
});or (if you want more custom control): import { AgentSession } from '@livekit/agents';
session = new AgentSession({
tts: new inference.TTS({
model: "cartesia/sonic-3",
voice: "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
language: "en",
modelOptions: {
speed: 1.5,
volume: 1.2,
emotion: "excited"
}
}),
// ... tts, stt, vad, turn_detection, etc.
});This would be much easier to test and setup |
|
ok great @toubatbrian I will test it again. |
Allow openai realtime model to have text output piped with a custom TTS model