Enable Gemini Realtime Model to Produce Error Log by toubatbrian · Pull Request #1016 · livekit/agents-js

toubatbrian · 2026-02-03T08:42:10Z

Summary by CodeRabbit

Release Notes

New Features
- Added adaptive interruption detection system with configurable thresholds and model support
- Introduced model usage metrics tracking for LLM, TTS, and STT services
- Enhanced telemetry with latency measurements and interruption attributes
- New turn-handling configuration system for flexible interruption modes
Tests
- Added comprehensive test coverage for interruption utilities and model usage aggregation
Chores
- Updated dependencies and refined configuration structures
- Improved error handling in WebSocket closures

Co-authored-by: Brian Yin <57741529+Toubat@users.noreply.github.com>

Remove baseUrl and useProxy from interruptionOptionDefaults so they are resolved dynamically in the constructor. Previously, the defaults pre-populated baseUrl with the cloud inference URL, which prevented the LIVEKIT_REMOTE_EOT_URL environment variable from being used.

changeset-bot · 2026-02-03T08:42:18Z

⚠️ No Changeset found

Latest commit: 3ce96e1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-02-03T08:42:31Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This pull request introduces an adaptive interruption detection system for voice agents with HTTP and WebSocket transport options, implements model usage metrics collection, restructures turn-handling configuration, and adds enhanced telemetry attributes for tracing and monitoring.

Changes

Cohort / File(s)	Summary
Adaptive Interruption Detection Core `agents/src/inference/interruption/types.ts`, `agents/src/inference/interruption/errors.ts`, `agents/src/inference/interruption/defaults.ts`, `agents/src/inference/interruption/interruption_cache_entry.ts`, `agents/src/inference/interruption/utils.ts`, `agents/src/inference/interruption/utils.test.ts`	Introduces complete type system for interruption detection (InterruptionEvent, InterruptionOptions, sentinel types), custom error class with metadata, default configuration constants, cache entry model, and utility functions (BoundedCache, estimateProbability, slidingWindowMinMax) with comprehensive test coverage.
Interruption Detection Infrastructure `agents/src/inference/interruption/interruption_detector.ts`, `agents/src/inference/interruption/interruption_stream.ts`	Implements AdaptiveInterruptionDetector for managing multiple interruption streams and InterruptionStreamBase for two-stage audio processing pipeline with transport selection (HTTP/WS), audio resampling, and overlap speech detection.
HTTP & WebSocket Transports `agents/src/inference/interruption/http_transport.ts`, `agents/src/inference/interruption/ws_transport.ts`	Adds HTTP-based interruption inference with retry/backoff and WebSocket transport with token-based auth, session management, and bidirectional message handling. Both implement TransformStream pattern for audio chunk processing.
Model Usage Metrics `agents/src/metrics/base.ts`, `agents/src/metrics/model_usage.ts`, `agents/src/metrics/model_usage.test.ts`, `agents/src/metrics/usage_collector.ts`, `agents/src/metrics/index.ts`	Extends metric types with MetricsMetadata, introduces ModelUsageCollector for aggregating per-provider/model usage (LLM, TTS, STT, Realtime), adds filterZeroValues utility, and deprecates legacy UsageCollector with warning.
Turn Configuration Restructuring `agents/src/voice/turn_config/endpointing.ts`, `agents/src/voice/turn_config/interruption.ts`, `agents/src/voice/turn_config/turn_handling.ts`, `agents/src/voice/turn_config/utils.ts`, `agents/src/voice/turn_config/utils.test.ts`	Introduces modular configuration system with EndpointingConfig, InterruptionConfig, TurnHandlingConfig, and migration utilities (migrateLegacyOptions) to bridge legacy voiceOptions with new nested turnHandling structure.
Agent Voice Integration `agents/src/voice/agent.ts`, `agents/src/voice/agent_session.ts`, `agents/src/voice/agent_activity.ts`, `agents/src/voice/audio_recognition.ts`	Integrates interruption detector, turn-handling config, and usage collection into voice agents; adds interruptionDetection getters, SessionOptions/InternalSessionOptions with defaults, ModelUsageCollector instance, interruption event handling, and audio recognition hooks for overlap detection.
Telemetry & Report Enhancements `agents/src/telemetry/trace_types.ts`, `agents/src/telemetry/traces.ts`, `agents/src/voice/report.ts`, `agents/src/voice/generation.ts`	Adds latency (TTFT/TTFB/E2E), interruption (is_interruption, probability, durations, detection_delay), and provider-name trace attributes; integrates usage metrics into session logging; adds TTFT computation for LLM/TTS generation with model/provider parameters.
Model & Provider Metadata `agents/src/llm/llm.ts`, `agents/src/llm/realtime.ts`, `agents/src/stt/stt.ts`, `agents/src/tts/tts.ts`	Adds provider and model getters (defaulting to "unknown") to LLM, STT, and TTS base classes; extends TTS with token usage tracking via setTokenUsage() and updates metrics emission to include metadata and token counts.
Stream Channel Enhancement `agents/src/stream/stream_channel.ts`	Adds generic error type parameter to StreamChannel, introduces abort(error) method for controlled error-driven termination, and addStreamInput(stream) for piping external ReadableStream data.
Example & Configuration Updates `examples/src/basic_agent.ts`, `examples/package.json`, `.changeset/config.json`, `.github/workflows/test.yml`	Updates example agent with interruption config and BackgroundVoiceCancellation, removes ai-coustics dependency, reformats changeset config, and comments out Test examples workflow step.
Gemini Realtime Error Handling `plugins/google/src/beta/realtime/realtime_api.ts`	Surfaces non-normal WebSocket closures (non-1000 codes) as error events instead of silently continuing; normal closures remain at debug level.
Dependency Addition `agents/package.json`	Adds "ofetch" ^1.5.1 as runtime dependency for HTTP transport implementation.

Sequence Diagram(s)

sequenceDiagram
    participant AudioInput as Audio Input
    participant AudioRecognition as Audio Recognition
    participant InterruptionDetector as Interruption Detector
    participant Transport as HTTP/WS Transport
    participant InferenceAPI as Inference API
    participant AgentActivity as Agent Activity
    
    AudioInput->>AudioRecognition: Push audio frame
    AudioRecognition->>InterruptionDetector: Forward to interruption stream
    
    InterruptionDetector->>InterruptionDetector: Accumulate audio chunks<br/>Detect overlap speech
    
    alt Overlap Speech Detected
        InterruptionDetector->>Transport: Send audio data with metadata
        Transport->>InferenceAPI: POST /bargein with audio + token
        InferenceAPI-->>Transport: Prediction response
        Transport->>InterruptionDetector: Cache prediction result
        
        alt Interruption Confidence High
            InterruptionDetector->>AgentActivity: Emit InterruptionEvent
            AgentActivity->>AgentActivity: Handle interruption<br/>Update span
        end
    end

sequenceDiagram
    participant Agent as Agent
    participant Generator as Generation
    participant LLM as LLM Model
    participant TTS as TTS Model
    participant Tracer as Tracer
    
    Agent->>Generator: performLLMInference(model, provider)
    Generator->>Tracer: Create span with ATTR_GEN_AI_REQUEST_MODEL
    Generator->>LLM: Start inference
    LLM-->>Generator: First token arrives
    Generator->>Tracer: Record ATTR_RESPONSE_TTFT
    LLM-->>Generator: Complete inference
    
    Generator->>Generator: performTTSInference(model, provider)
    Generator->>Tracer: Create span with model/provider
    Generator->>TTS: Start synthesis
    TTS-->>Generator: First bytes written
    Generator->>Tracer: Record ATTR_RESPONSE_TTFB
    TTS-->>Generator: Complete synthesis

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

fix basic_agent example #1003: Both PRs modify the same example (basic_agent.ts) and remove the @livekit/plugins-ai-coustics dependency—directly related configuration changes.
Add AdaptiveInterruptionDetector #980: Implements the adaptive interruption detection subsystem (defaults, detector, HTTP & WS transports, cache, types, utils)—core feature overlap.
Add agent activity interruption detector integration #991: Implements the same interruption-detection feature with identical module structure (AdaptiveInterruptionDetector, transports, integration points)—directly related at code level.

Suggested reviewers

chenghao-mou

Poem

🐰 A hopping herald of interruption detection,
Where WebSockets whisper with perfect direction,
Audio flows stream through metrics so keen,
Turn-handling now polished, a config machine!
Hark, the agents shall speak, then pause when we speak—
Adaptive responses, the future looks sleek! 🎙️✨

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch brian/fix-gemini-error-log

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efa8a4b and 3ce96e1.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (40)

.changeset/config.json
.github/workflows/test.yml
agents/package.json
agents/src/inference/interruption/defaults.ts
agents/src/inference/interruption/errors.ts
agents/src/inference/interruption/http_transport.ts
agents/src/inference/interruption/interruption_cache_entry.ts
agents/src/inference/interruption/interruption_detector.ts
agents/src/inference/interruption/interruption_stream.ts
agents/src/inference/interruption/types.ts
agents/src/inference/interruption/utils.test.ts
agents/src/inference/interruption/utils.ts
agents/src/inference/interruption/ws_transport.ts
agents/src/llm/llm.ts
agents/src/llm/realtime.ts
agents/src/metrics/base.ts
agents/src/metrics/index.ts
agents/src/metrics/model_usage.test.ts
agents/src/metrics/model_usage.ts
agents/src/metrics/usage_collector.ts
agents/src/stream/stream_channel.ts
agents/src/stt/stt.ts
agents/src/telemetry/trace_types.ts
agents/src/telemetry/traces.ts
agents/src/tts/tts.ts
agents/src/voice/agent.ts
agents/src/voice/agent_activity.ts
agents/src/voice/agent_session.ts
agents/src/voice/audio_recognition.ts
agents/src/voice/events.ts
agents/src/voice/generation.ts
agents/src/voice/report.ts
agents/src/voice/turn_config/endpointing.ts
agents/src/voice/turn_config/interruption.ts
agents/src/voice/turn_config/turn_handling.ts
agents/src/voice/turn_config/utils.test.ts
agents/src/voice/turn_config/utils.ts
examples/package.json
examples/src/basic_agent.ts
plugins/google/src/beta/realtime/realtime_api.ts

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ce96e1855

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-03T08:44:08Z

agents/src/inference/interruption/http_transport.ts

+          state.cache.set(createdAt, entry);
+
+          if (state.overlapSpeechStarted && entry.isInterruption) {
+            if (updateUserSpeakingSpan) {
+              updateUserSpeakingSpan(entry);


Re-check overlap state after HTTP await

Because state is captured before the await predictHTTP(...), an overlap that ends while the request is in flight will still have state.overlapSpeechStarted === true here, which can emit an interruption event after overlap speech has already ended. This shows up when overlap ends quickly or the HTTP call is slow, producing false-positive interruptions. Consider re-reading getState() after the await (or checking a monotonic overlap token) before emitting.

Useful? React with 👍 / 👎.

lukasIO and others added 28 commits January 16, 2026 15:33

Add turn config interfaces and defaults (#975)

4e31b27

Merge branch 'main' into feat/barge-in

f3c7430

Add AdaptiveInterruptionDetector (#980)

07c5d71

Co-authored-by: Brian Yin <57741529+Toubat@users.noreply.github.com>

Merge branch 'main' into feat/barge-in

f1a2114

Add agent activity interruption detector integration (#991)

c861f50

Co-authored-by: Brian Yin <57741529+Toubat@users.noreply.github.com>

remove aic

1862dc3

reuse

705ed33

remove tests for legacy stream approach

8f53889

fix util migration tests

7d24bf0

comment out example tests

c78cf58

Rename files to underscore cases (#1007)

d5b271c

update date

b020180

update date

dbad1e4

update defaults

d882012

deprecate legacy options and update tests

67e8f6c

fix internal types

96d6b57

rabbit comments

2ee2748

remove unused stuff

62cd448

more rabbit fixes

ec6d9bd

better cleanup

016e3a4

ensure inputStartedAt is set

9a4939c

Fix Inference URL parity (#1011)

e28b1b1

Preserve turnDetection after cloning

4310baa

refine timeout computation

0682f25

save temp

0d2efc6

fix comments

d83d7b6

Update realtime_api.ts

3ce96e1

toubatbrian closed this Feb 3, 2026

chatgpt-codex-connector bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Gemini Realtime Model to Produce Error Log#1016

Enable Gemini Realtime Model to Produce Error Log#1016
toubatbrian wants to merge 28 commits intomainfrom
brian/fix-gemini-error-log

toubatbrian commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

changeset-bot bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

toubatbrian commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

changeset-bot bot commented Feb 3, 2026

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

toubatbrian commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading