Skip to content

RealtimeSession.interrupt() is a no-op without response_id under concurrent in-flight responses (OpenAI plugin) #5564

@cphoward

Description

@cphoward

TL;DR

RealtimeSession.interrupt() is a no-op when called without response_id under concurrent in-flight responses (OpenAI plugin)

Summary

RealtimeSession.interrupt() in the OpenAI Realtime plugin sends a ResponseCancelEvent(type="response.cancel") without specifying response_id. Per empirical OpenAI Realtime API behavior, a cancel event without response_id is a no-op when more than one response is in flight — it does not cancel "all" responses; it cancels nothing.

Today this is harmless because the plugin's single-slot _current_generation model only allows one in-flight response at a time, so "the only in-flight response" is unambiguous and the substrate accepts the cancel.

This becomes broken under any future change that enables concurrent in-flight responses on a single session (e.g., a refactor of _current_generation to a multi-instance state container, which is needed to correctly represent the substrate's existing concurrent-OOB-response capability per Research notes below).

Reproduction (current state — no concurrency, so this is latent)

from livekit.plugins.openai.realtime import RealtimeModel

session = RealtimeModel(model="gpt-4o-realtime-preview-2025-06-03").session()
# Today: only one response.create can be in flight at a time
# (single-slot _current_generation), so interrupt() works correctly.
fut = session.generate_reply(instructions="Speak slowly: aaaaaa")
await asyncio.sleep(0.5)
session.interrupt()  # cancels the only in-flight response — works fine

Reproduction under hypothetical concurrent-response refactor (the latent bug)

# Hypothetical: dict-based _generations replacing single-slot
fut_a = await session._fire_oob("Long response A " + " ".join(["a"] * 100))
fut_b = await session._fire_oob("Long response B " + " ".join(["b"] * 100))
await asyncio.sleep(0.5)
# Both responses now in flight (substrate supports this for OOB)
session.interrupt()  # NO-OP: cancels neither response
# Both responses continue to completion; user hears unwanted audio

Empirical evidence

I ran probes against the OpenAI Realtime API (gpt-4o-realtime-preview-2025-06-03) to characterize cancel semantics:

  • Cancel WITH response_id while concurrent responses are in flight: cancels the targeted response cleanly. Other in-flight responses continue.
  • Cancel WITHOUT response_id while concurrent responses are in flight: NO-OP. Neither response cancelled.
  • Cancel WITHOUT response_id when only one response is in flight: cancels that response (current single-slot behavior).

The relevant source code is at livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py lines 1542-1545:

def interrupt(self) -> None:
    self.send_event(
        ResponseCancelEvent(type="response.cancel")  # no response_id
    )

Proposed fix

interrupt() should iterate the in-flight response IDs and emit one ResponseCancelEvent per in-flight response, with each event carrying its specific response_id:

def interrupt(self) -> None:
    """Cancel all in-flight responses on this session."""
    # Today, _current_generation is a single slot; iterate accordingly.
    # If the slot model is later refactored to a dict, iterate dict keys.
    if self._current_generation is not None:
        response_id = self._current_generation.response_id  # if available
        if response_id:
            self.send_event(
                ResponseCancelEvent(
                    type="response.cancel",
                    response_id=response_id,
                )
            )
        else:
            # Fallback to no-response_id cancel for backward compat
            self.send_event(ResponseCancelEvent(type="response.cancel"))

Under a future dict-based refactor, this becomes:

def interrupt(self) -> None:
    for response_id in list(self._generations.keys()):
        self.send_event(
            ResponseCancelEvent(type="response.cancel", response_id=response_id)
        )

Why file this now even though the bug is latent

This issue is filed proactively because:

  1. The bug becomes active the moment any concurrent-response support lands (e.g., the dict refactor motivated by OpenAI Realtime API audio cuts off intermittently before finishing playback #1988).
  2. The fix is small (~10 LOC) and orthogonal to other concerns; can ship in either the dict refactor PR or as a tiny standalone PR.
  3. Documenting the empirical cancel semantics here helps future contributors who find the source-code race comment at realtime_model.py:1870 (which documents an adjacent race — response.done without prior response.created) and wonder about cancel semantics more broadly.

Acceptance criteria

  • RealtimeSession.interrupt() cancels all in-flight responses correctly under both single-slot AND any future multi-instance state model.
  • A test verifies the cancel-with-response_id path explicitly.
  • Substrate behavior documented in source-code comment near the implementation.

Related

cc @longcw (author of this file)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions