Skip to content

Enable Gemini Realtime Model to Produce Error Log#1016

Closed
toubatbrian wants to merge 28 commits intomainfrom
brian/fix-gemini-error-log
Closed

Enable Gemini Realtime Model to Produce Error Log#1016
toubatbrian wants to merge 28 commits intomainfrom
brian/fix-gemini-error-log

Conversation

@toubatbrian
Copy link
Contributor

@toubatbrian toubatbrian commented Feb 3, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added adaptive interruption detection system with configurable thresholds and model support
    • Introduced model usage metrics tracking for LLM, TTS, and STT services
    • Enhanced telemetry with latency measurements and interruption attributes
    • New turn-handling configuration system for flexible interruption modes
  • Tests

    • Added comprehensive test coverage for interruption utilities and model usage aggregation
  • Chores

    • Updated dependencies and refined configuration structures
    • Improved error handling in WebSocket closures

@changeset-bot
Copy link

changeset-bot bot commented Feb 3, 2026

⚠️ No Changeset found

Latest commit: 3ce96e1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@toubatbrian toubatbrian closed this Feb 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This pull request introduces an adaptive interruption detection system for voice agents with HTTP and WebSocket transport options, implements model usage metrics collection, restructures turn-handling configuration, and adds enhanced telemetry attributes for tracing and monitoring.

Changes

Cohort / File(s) Summary
Adaptive Interruption Detection Core
agents/src/inference/interruption/types.ts, agents/src/inference/interruption/errors.ts, agents/src/inference/interruption/defaults.ts, agents/src/inference/interruption/interruption_cache_entry.ts, agents/src/inference/interruption/utils.ts, agents/src/inference/interruption/utils.test.ts
Introduces complete type system for interruption detection (InterruptionEvent, InterruptionOptions, sentinel types), custom error class with metadata, default configuration constants, cache entry model, and utility functions (BoundedCache, estimateProbability, slidingWindowMinMax) with comprehensive test coverage.
Interruption Detection Infrastructure
agents/src/inference/interruption/interruption_detector.ts, agents/src/inference/interruption/interruption_stream.ts
Implements AdaptiveInterruptionDetector for managing multiple interruption streams and InterruptionStreamBase for two-stage audio processing pipeline with transport selection (HTTP/WS), audio resampling, and overlap speech detection.
HTTP & WebSocket Transports
agents/src/inference/interruption/http_transport.ts, agents/src/inference/interruption/ws_transport.ts
Adds HTTP-based interruption inference with retry/backoff and WebSocket transport with token-based auth, session management, and bidirectional message handling. Both implement TransformStream pattern for audio chunk processing.
Model Usage Metrics
agents/src/metrics/base.ts, agents/src/metrics/model_usage.ts, agents/src/metrics/model_usage.test.ts, agents/src/metrics/usage_collector.ts, agents/src/metrics/index.ts
Extends metric types with MetricsMetadata, introduces ModelUsageCollector for aggregating per-provider/model usage (LLM, TTS, STT, Realtime), adds filterZeroValues utility, and deprecates legacy UsageCollector with warning.
Turn Configuration Restructuring
agents/src/voice/turn_config/endpointing.ts, agents/src/voice/turn_config/interruption.ts, agents/src/voice/turn_config/turn_handling.ts, agents/src/voice/turn_config/utils.ts, agents/src/voice/turn_config/utils.test.ts
Introduces modular configuration system with EndpointingConfig, InterruptionConfig, TurnHandlingConfig, and migration utilities (migrateLegacyOptions) to bridge legacy voiceOptions with new nested turnHandling structure.
Agent Voice Integration
agents/src/voice/agent.ts, agents/src/voice/agent_session.ts, agents/src/voice/agent_activity.ts, agents/src/voice/audio_recognition.ts
Integrates interruption detector, turn-handling config, and usage collection into voice agents; adds interruptionDetection getters, SessionOptions/InternalSessionOptions with defaults, ModelUsageCollector instance, interruption event handling, and audio recognition hooks for overlap detection.
Telemetry & Report Enhancements
agents/src/telemetry/trace_types.ts, agents/src/telemetry/traces.ts, agents/src/voice/report.ts, agents/src/voice/generation.ts
Adds latency (TTFT/TTFB/E2E), interruption (is_interruption, probability, durations, detection_delay), and provider-name trace attributes; integrates usage metrics into session logging; adds TTFT computation for LLM/TTS generation with model/provider parameters.
Model & Provider Metadata
agents/src/llm/llm.ts, agents/src/llm/realtime.ts, agents/src/stt/stt.ts, agents/src/tts/tts.ts
Adds provider and model getters (defaulting to "unknown") to LLM, STT, and TTS base classes; extends TTS with token usage tracking via setTokenUsage() and updates metrics emission to include metadata and token counts.
Stream Channel Enhancement
agents/src/stream/stream_channel.ts
Adds generic error type parameter to StreamChannel, introduces abort(error) method for controlled error-driven termination, and addStreamInput(stream) for piping external ReadableStream data.
Example & Configuration Updates
examples/src/basic_agent.ts, examples/package.json, .changeset/config.json, .github/workflows/test.yml
Updates example agent with interruption config and BackgroundVoiceCancellation, removes ai-coustics dependency, reformats changeset config, and comments out Test examples workflow step.
Gemini Realtime Error Handling
plugins/google/src/beta/realtime/realtime_api.ts
Surfaces non-normal WebSocket closures (non-1000 codes) as error events instead of silently continuing; normal closures remain at debug level.
Dependency Addition
agents/package.json
Adds "ofetch" ^1.5.1 as runtime dependency for HTTP transport implementation.

Sequence Diagram(s)

sequenceDiagram
    participant AudioInput as Audio Input
    participant AudioRecognition as Audio Recognition
    participant InterruptionDetector as Interruption Detector
    participant Transport as HTTP/WS Transport
    participant InferenceAPI as Inference API
    participant AgentActivity as Agent Activity
    
    AudioInput->>AudioRecognition: Push audio frame
    AudioRecognition->>InterruptionDetector: Forward to interruption stream
    
    InterruptionDetector->>InterruptionDetector: Accumulate audio chunks<br/>Detect overlap speech
    
    alt Overlap Speech Detected
        InterruptionDetector->>Transport: Send audio data with metadata
        Transport->>InferenceAPI: POST /bargein with audio + token
        InferenceAPI-->>Transport: Prediction response
        Transport->>InterruptionDetector: Cache prediction result
        
        alt Interruption Confidence High
            InterruptionDetector->>AgentActivity: Emit InterruptionEvent
            AgentActivity->>AgentActivity: Handle interruption<br/>Update span
        end
    end
Loading
sequenceDiagram
    participant Agent as Agent
    participant Generator as Generation
    participant LLM as LLM Model
    participant TTS as TTS Model
    participant Tracer as Tracer
    
    Agent->>Generator: performLLMInference(model, provider)
    Generator->>Tracer: Create span with ATTR_GEN_AI_REQUEST_MODEL
    Generator->>LLM: Start inference
    LLM-->>Generator: First token arrives
    Generator->>Tracer: Record ATTR_RESPONSE_TTFT
    LLM-->>Generator: Complete inference
    
    Generator->>Generator: performTTSInference(model, provider)
    Generator->>Tracer: Create span with model/provider
    Generator->>TTS: Start synthesis
    TTS-->>Generator: First bytes written
    Generator->>Tracer: Record ATTR_RESPONSE_TTFB
    TTS-->>Generator: Complete synthesis
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • chenghao-mou

Poem

🐰 A hopping herald of interruption detection,
Where WebSockets whisper with perfect direction,
Audio flows stream through metrics so keen,
Turn-handling now polished, a config machine!
Hark, the agents shall speak, then pause when we speak—
Adaptive responses, the future looks sleek! 🎙️✨

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch brian/fix-gemini-error-log

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efa8a4b and 3ce96e1.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (40)
  • .changeset/config.json
  • .github/workflows/test.yml
  • agents/package.json
  • agents/src/inference/interruption/defaults.ts
  • agents/src/inference/interruption/errors.ts
  • agents/src/inference/interruption/http_transport.ts
  • agents/src/inference/interruption/interruption_cache_entry.ts
  • agents/src/inference/interruption/interruption_detector.ts
  • agents/src/inference/interruption/interruption_stream.ts
  • agents/src/inference/interruption/types.ts
  • agents/src/inference/interruption/utils.test.ts
  • agents/src/inference/interruption/utils.ts
  • agents/src/inference/interruption/ws_transport.ts
  • agents/src/llm/llm.ts
  • agents/src/llm/realtime.ts
  • agents/src/metrics/base.ts
  • agents/src/metrics/index.ts
  • agents/src/metrics/model_usage.test.ts
  • agents/src/metrics/model_usage.ts
  • agents/src/metrics/usage_collector.ts
  • agents/src/stream/stream_channel.ts
  • agents/src/stt/stt.ts
  • agents/src/telemetry/trace_types.ts
  • agents/src/telemetry/traces.ts
  • agents/src/tts/tts.ts
  • agents/src/voice/agent.ts
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/agent_session.ts
  • agents/src/voice/audio_recognition.ts
  • agents/src/voice/events.ts
  • agents/src/voice/generation.ts
  • agents/src/voice/report.ts
  • agents/src/voice/turn_config/endpointing.ts
  • agents/src/voice/turn_config/interruption.ts
  • agents/src/voice/turn_config/turn_handling.ts
  • agents/src/voice/turn_config/utils.test.ts
  • agents/src/voice/turn_config/utils.ts
  • examples/package.json
  • examples/src/basic_agent.ts
  • plugins/google/src/beta/realtime/realtime_api.ts

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ce96e1855

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +147 to +151
state.cache.set(createdAt, entry);

if (state.overlapSpeechStarted && entry.isInterruption) {
if (updateUserSpeakingSpan) {
updateUserSpeakingSpan(entry);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-check overlap state after HTTP await

Because state is captured before the await predictHTTP(...), an overlap that ends while the request is in flight will still have state.overlapSpeechStarted === true here, which can emit an interruption event after overlap speech has already ended. This shows up when overlap ends quickly or the HTTP call is slow, producing false-positive interruptions. Consider re-reading getState() after the await (or checking a monotonic overlap token) before emitting.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants