Enable Gemini Realtime Model to Produce Error Log#1016
Enable Gemini Realtime Model to Produce Error Log#1016toubatbrian wants to merge 28 commits intomainfrom
Conversation
Co-authored-by: Brian Yin <57741529+Toubat@users.noreply.github.com>
Co-authored-by: Brian Yin <57741529+Toubat@users.noreply.github.com>
Remove baseUrl and useProxy from interruptionOptionDefaults so they are resolved dynamically in the constructor. Previously, the defaults pre-populated baseUrl with the cloud inference URL, which prevented the LIVEKIT_REMOTE_EOT_URL environment variable from being used.
|
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis pull request introduces an adaptive interruption detection system for voice agents with HTTP and WebSocket transport options, implements model usage metrics collection, restructures turn-handling configuration, and adds enhanced telemetry attributes for tracing and monitoring. Changes
Sequence Diagram(s)sequenceDiagram
participant AudioInput as Audio Input
participant AudioRecognition as Audio Recognition
participant InterruptionDetector as Interruption Detector
participant Transport as HTTP/WS Transport
participant InferenceAPI as Inference API
participant AgentActivity as Agent Activity
AudioInput->>AudioRecognition: Push audio frame
AudioRecognition->>InterruptionDetector: Forward to interruption stream
InterruptionDetector->>InterruptionDetector: Accumulate audio chunks<br/>Detect overlap speech
alt Overlap Speech Detected
InterruptionDetector->>Transport: Send audio data with metadata
Transport->>InferenceAPI: POST /bargein with audio + token
InferenceAPI-->>Transport: Prediction response
Transport->>InterruptionDetector: Cache prediction result
alt Interruption Confidence High
InterruptionDetector->>AgentActivity: Emit InterruptionEvent
AgentActivity->>AgentActivity: Handle interruption<br/>Update span
end
end
sequenceDiagram
participant Agent as Agent
participant Generator as Generation
participant LLM as LLM Model
participant TTS as TTS Model
participant Tracer as Tracer
Agent->>Generator: performLLMInference(model, provider)
Generator->>Tracer: Create span with ATTR_GEN_AI_REQUEST_MODEL
Generator->>LLM: Start inference
LLM-->>Generator: First token arrives
Generator->>Tracer: Record ATTR_RESPONSE_TTFT
LLM-->>Generator: Complete inference
Generator->>Generator: performTTSInference(model, provider)
Generator->>Tracer: Create span with model/provider
Generator->>TTS: Start synthesis
TTS-->>Generator: First bytes written
Generator->>Tracer: Record ATTR_RESPONSE_TTFB
TTS-->>Generator: Complete synthesis
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (40)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ce96e1855
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| state.cache.set(createdAt, entry); | ||
|
|
||
| if (state.overlapSpeechStarted && entry.isInterruption) { | ||
| if (updateUserSpeakingSpan) { | ||
| updateUserSpeakingSpan(entry); |
There was a problem hiding this comment.
Re-check overlap state after HTTP await
Because state is captured before the await predictHTTP(...), an overlap that ends while the request is in flight will still have state.overlapSpeechStarted === true here, which can emit an interruption event after overlap speech has already ended. This shows up when overlap ends quickly or the HTTP call is slow, producing false-positive interruptions. Consider re-reading getState() after the await (or checking a monotonic overlap token) before emitting.
Useful? React with 👍 / 👎.
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores