feat(realtime/openai): add ephemeral say() via response.create(conversation: "none")#5569
feat(realtime/openai): add ephemeral say() via response.create(conversation: "none")#5569cphoward wants to merge 8 commits intolivekit:mainfrom
Conversation
Add ephemeral_say: bool = False to RealtimeCapabilities. Plugins that honor add_to_chat_ctx=False on say() declare this True; all others default False. OpenAI plugin declares supports_say=True and ephemeral_say=True. Phonic defaults to ephemeral_say=False (substrate is turn-based). Substrate landscape verified empirically via reproducible probes against the OpenAI Realtime API. Part of feat(realtime/openai): add ephemeral say() via response.create(conversation: "none") — Phase 1 of 6.
Phonic's RealtimeSession.say() now accepts add_to_chat_ctx: bool = True as a keyword-only parameter. Phonic accepts but ignores the kwarg: the substrate is strictly turn-based and has no out-of-band response primitive. Phonic declares ephemeral_say=False so that the AgentActivity.say() dispatcher emits a DeprecationWarning before reaching this method when add_to_chat_ctx=False is requested. The kwarg is a forward-compatible signature shim so Phase 3's abstract RealtimeSession.say() signature change does not break the Phonic plugin at type-check time. Part of feat(realtime/openai): add ephemeral say() — Phase 2 of 6.
…phan filter Implements RealtimeSession.say(text, add_to_chat_ctx=False) for the OpenAI Realtime plugin using response.create(conversation: "none"). Adds metadata.client_event_id on outbound events to enable a late-event check in _handle_response_created that discards orphaned response.created when the caller-future has been popped (typically by client-side timeout). Builds on PR livekit#2125's InvalidStateError handling at the future-resolution layer. Also adds defense-in-depth guards in _handle_response_output_item_added and _handle_response_content_part_added so late events for an orphaned response (during the cancel-arrival race window) drop gracefully instead of asserting on a None _current_generation. The abstract RealtimeSession.say() in core gains the add_to_chat_ctx keyword-only parameter so plugins implementing isolation share a stable signature. Part of feat(realtime/openai): add ephemeral say() — Phase 3 of 6.
AgentActivity.say() now emits a DeprecationWarning when a caller
passes add_to_chat_ctx=False against a RealtimeModel that declares
ephemeral_say=False. Existing silent-degrade behavior is preserved
for the deprecation window; a future release will replace the warning
with NotImplementedError.
The add_to_chat_ctx parameter is threaded end-to-end through:
AgentActivity.say() -> _realtime_reply_task -> _rt_session.say()
-> _realtime_generation_task
-> _realtime_generation_task_impl
so downstream gating (e.g. local chat_ctx upsert) can read the
caller's intent.
Part of feat(realtime/openai): add ephemeral say() — Phase 4 of 6.
In _realtime_generation_task_impl, the conditional that upserts the forwarded assistant message into the local chat_ctx now requires add_to_chat_ctx in addition to msg_gen and forwarded_text. Without this gate, callers passing add_to_chat_ctx=False through the realtime path would still see the rendered text written to the local context, defeating the isolation contract that the substrate enforces server-side via response.create(conversation: "none"). Mirrors the chain (TTS) path's existing gate. Part of feat(realtime/openai): add ephemeral say() — Phase 5 of 6.
Verifies end-to-end against the real OpenAI Realtime API (gpt-4o-realtime-preview-2025-06-03): - Audibility: say(text) produces audio frames. - Server-side isolation: say(secret, add_to_chat_ctx=False) followed by generate_reply asking the model to repeat the token cannot recall it — substrate state is empty for conversation: "none". - Substrate chat_ctx isolation: session.chat_ctx after isolated say() contains no item with the secret. - Wire-format metadata: outbound response.create carries metadata.client_event_id with the say-prefixed identifier (round-trip verified). Tests skip when OPENAI_API_KEY is not set (e.g., on fork PRs). Part of feat(realtime/openai): add ephemeral say() — Phase 6 of 6.
Replaces two source-inspection placeholders in test_agent_activity_say.py with real behavioral tests in test_realtime_say_integration.py: 1. test_openai_realtime_say_isolation_no_local_leak — behavioral positive+negative control. Runs _realtime_generation_task_impl with a real async message stream. Positive: add_to_chat_ctx=True must call _chat_ctx._upsert_item. Negative: add_to_chat_ctx=False must NOT call it. Exercises the gate added in Phase 5. 2. test_orphaned_response_after_timeout_filtered — directly invokes _handle_response_created with a synthetic late event whose metadata.client_event_id is missing from _response_created_futures (simulating the timeout-popped state). Asserts _current_generation remains None and a ResponseCancelEvent(response_id=...) was sent. The handoff test (test_agent_handoff_does_not_propagate_isolated_text) is dropped: update_agent() does not propagate the prior agent's chat_ctx, so the assertion would be vacuous. The gate proof at _realtime_generation_task_impl makes it redundant. Part of feat(realtime/openai): add ephemeral say() — test addendum.
…r-VAD) Adds the sister case to test_orphaned_response_after_timeout_filtered: when response.metadata is None (no client_event_id), the orphan filter must NOT fire — the response was server-VAD-initiated, not one we issued that could have timed out. Without the metadata-presence guard at realtime_model.py, every server-VAD-initiated response would be silently discarded: - _current_generation would never be created on the server-VAD path - the agent would stop responding to user speech - no exception would be raised; the regression would be silent The test directly invokes _handle_response_created with metadata=None and asserts: - _current_generation IS NOT None after the call (post-guard happy path ran and constructed a fresh _ResponseGeneration). - No ResponseCancelEvent was emitted (the orphan filter did not fire). Part of feat(realtime/openai): add ephemeral say().
|
|
| ) | ||
| self._create_speech_task( | ||
| self._realtime_reply_task( | ||
| speech_handle=handle, | ||
| text=text, | ||
| model_settings=ModelSettings(), | ||
| add_to_chat_ctx=add_to_chat_ctx, |
There was a problem hiding this comment.
🔴 Deprecation path still passes add_to_chat_ctx=False downstream, breaking "silent-degrade" promise
When ephemeral_say=False and add_to_chat_ctx=False, the deprecation warning at line 1083 tells the user "The text will be added to chat context anyway in this release" (silent-degrade). However, the code never overrides add_to_chat_ctx back to True before passing it to _realtime_reply_task at line 1099. This value flows all the way to the gate at agent_activity.py:3270 (if msg_gen and forwarded_text and add_to_chat_ctx:), which skips the local _upsert_item call.
For plugins like Phonic (which ignore the parameter server-side), this creates a mismatch: the text IS in the server-side conversation state but is NOT in the local agent._chat_ctx. The stated intent of the deprecation path — backward-compatible silent degrade — is violated.
| ) | |
| self._create_speech_task( | |
| self._realtime_reply_task( | |
| speech_handle=handle, | |
| text=text, | |
| model_settings=ModelSettings(), | |
| add_to_chat_ctx=add_to_chat_ctx, | |
| ) | |
| add_to_chat_ctx = True | |
| self._create_speech_task( | |
| self._realtime_reply_task( | |
| speech_handle=handle, | |
| text=text, | |
| model_settings=ModelSettings(), | |
| add_to_chat_ctx=add_to_chat_ctx, | |
| ), |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if self._current_generation is None: | ||
| # Late event for an orphaned (timed-out) response; | ||
| # safe to drop. Mirrors the existing graceful handling | ||
| # in _handle_response_done. | ||
| return |
There was a problem hiding this comment.
🟡 Orphan filter leaves downstream event handlers unguarded, causing assertion failures on in-flight events
The orphan filter added to _handle_response_created (realtime_model.py:1755-1774) returns early and leaves _current_generation as None. The PR correctly added graceful None checks to _handle_response_output_item_added (line 1805) and _handle_response_content_part_added (line 1835), but the remaining handlers — _handle_response_text_delta (line 1917), _handle_response_text_done (line 1930), _handle_response_audio_transcript_delta (line 1933), _handle_response_audio_delta (line 1945), _handle_response_audio_done (line 1964), and _handle_response_output_item_done (line 1967) — still use assert self._current_generation is not None. Events already in the WebSocket pipeline when the cancel is sent will hit these asserts. While the outer dispatch loop (line 1103) catches Exception and logs it, this results in error log spam for every in-flight event of the orphaned response and is inconsistent with the graceful handling applied to the two other handlers.
Prompt for agents
The orphan filter in _handle_response_created (line 1755-1774) returns early when a late response.created arrives for a timed-out future, leaving _current_generation as None. Two handlers (_handle_response_output_item_added and _handle_response_content_part_added) were updated with graceful None checks, but several other handlers still assert _current_generation is not None: _handle_response_text_delta (line 1917), _handle_response_text_done (line 1930), _handle_response_audio_transcript_delta (line 1933), _handle_response_audio_delta (line 1945), _handle_response_audio_done (line 1964), _handle_response_output_item_done (line 1967). These should all be updated with the same pattern: if self._current_generation is None: return. This ensures consistent graceful handling when in-flight WebSocket events arrive after the orphan filter discards a response.
Was this helpful? React with 👍 or 👎 to provide feedback.
| params = RealtimeResponseCreateParams( | ||
| input=[ | ||
| realtime.RealtimeConversationItemAssistantMessage( | ||
| type="message", | ||
| role="assistant", | ||
| content=[ | ||
| realtime.realtime_conversation_item_assistant_message.Content( | ||
| type="output_text", | ||
| text=full_text, | ||
| ) | ||
| ], | ||
| ) | ||
| ], | ||
| metadata={"client_event_id": event_id}, | ||
| ) | ||
| if not add_to_chat_ctx: | ||
| params.conversation = "none" | ||
|
|
||
| self.send_event( | ||
| ResponseCreateEvent(type="response.create", event_id=event_id, response=params) | ||
| ) |
There was a problem hiding this comment.
have you tested that if openai realtime will "say" this message instead of using it as context?
There was a problem hiding this comment.
I just tested and it seems it's not supported by openai realtime session.
There was a problem hiding this comment.
You are right, my sincerest apologies. I wrote an made test validity error when experimenting via OpenAI's SDK. I only confirmed audio was returned, not that the audio was what I expected. Closing this PR.
Summary
Adds an ephemeral path to
RealtimeSession.say()for the OpenAI Realtime pluginvia
response.create(conversation: "none"): text rendered withadd_to_chat_ctx=Falseis heard by the user but does NOT enter the agent'sreasoning context, server-side OR local.
This closes the JS PR #1193 parity gap and addresses the silent-degradation
regression in that review.
How to review
Suggested reading path (≈30 min):
livekit-agents/livekit/agents/llm/realtime.py— capability shape (ephemeral_sayfield) and abstractsay()signature, ~30 lines.livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.py—say()body and the orphan filter in_handle_response_created, ~140 lines.livekit-agents/livekit/agents/voice/agent_activity.py— dispatcher capability check, plumbing, and the local-context gate at_realtime_generation_task_impl, ~30 lines.tests/test_realtime/test_realtime_say_integration.py— the property contract for the feature; the unit test files are proofs of the same properties at finer granularity.Cross-plugin scope
Only the OpenAI plugin gains the substrate implementation. Phonic accepts but
ignores the kwarg (structurally exempt — substrate is strictly turn-based per
docs.phonic.co; no out-of-band response primitive). Google Gemini Live, AWS
Nova Sonic, Ultravox, NVIDIA all declare
supports_say=Falsetoday; this PRdoesn't change that.
A new
RealtimeCapabilities.ephemeral_say: bool = Falsefield is added.Plugins set it to
Truewhen their substrate provides an out-of-band responseprimitive that does not enter conversation state. The OpenAI plugin declares
ephemeral_say=True. All other plugins default toFalse.Backward compatibility — caller side (deprecation cycle)
In this release, when a caller invokes
say(text, add_to_chat_ctx=False)on aplugin without
ephemeral_say, we emitDeprecationWarningand continue withthe existing silent-degrade behavior. A future release will replace the warning
with
NotImplementedError. This preserves backward compat for any callersdepending on the silent-degrade contract while loudly surfacing the capability
gap. Open to shipping the immediate-raise version if you'd prefer.
Backward compatibility — plugin authors
The abstract
RealtimeSession.say()signature gains a keyword-onlyadd_to_chat_ctx: bool = Trueparameter, and the dispatcher unconditionallypasses the kwarg via
_rt_session.say(text, add_to_chat_ctx=...). This is abreaking change for any third-party plugin that BOTH declares
supports_say=TrueAND has overriddensay()with the original signature(self, text)— those overrides will hitTypeError: say() got an unexpected keyword argument 'add_to_chat_ctx'.In-tree plugins are updated:
say()accepts and honors the kwarg.exempt, dispatcher emits
DeprecationWarningupstream).supports_say=Falsetoday and the dispatcher never reaches them.
If preserving signature-level backward-compat for out-of-tree plugin authors is
preferred, the dispatcher could
try/except TypeErrorand fall back to theold call shape. Happy to add the shim — flagging this so you can choose
deliberately.
Tests prove the isolation contract
Each property below is falsifiable: if the assertion fails, the named test fails. Tests run against
gpt-4o-realtime-preview-2025-06-03for integration paths and network-free for the local-context-gate and orphan-filter behavioral tests.say(text)produces audio framestest_openai_realtime_say_audio_renders(integration)say(secret, add_to_chat_ctx=False)followed bygenerate_replycannot retrievesecrettest_openai_realtime_say_isolation_no_leak(integration)chat_ctxdoes not containsecretafter isolatedsay()test_openai_realtime_say_isolation_no_remote_chat_ctx_leak(integration)response.createcarriesmetadata.client_event_idtest_openai_realtime_say_emits_metadata(integration)add_to_chat_ctx=Falsedoes NOT populateagent._chat_ctx;add_to_chat_ctx=TrueDOES (positive+negative control)test_openai_realtime_say_isolation_no_local_leak(behavioral, network-free)response.createdfor a popped future is discarded;_current_generationstays None;response.cancel(response_id=…)is senttest_orphaned_response_after_timeout_filtered(behavioral, network-free)AgentActivity.say(add_to_chat_ctx=False)actually forwardsFalseto_realtime_reply_tasktest_agent_activity_say_realtime_dispatches_with_add_to_chat_ctx(unit, frame-locals inspection)ephemeral_sayemitsDeprecationWarningtest_agent_activity_say_realtime_capability_warns(unit)Late-event check
Builds on merged PR #2125's
InvalidStateErrorhandling at thefuture-resolution layer. This PR adds an audio-playout layer check: when
response.createdarrives carrying ametadata.client_event_idthat wepreviously issued but is no longer in
_response_created_futures(because theclient-side timeout already popped it), we discard the late arrival and send a
defensive
response.cancel(response_id=...).The check reuses the existing
_response_created_futuresdict — no newlifecycle state is introduced. A metadata-presence guard ensures
server-VAD-initiated responses (no client-issued metadata) flow through
normally.
Two adjacent event handlers (
_handle_response_output_item_added,_handle_response_content_part_added) gain a defensiveif self._current_generation is None: returnguard mirroring the existingpattern in
_handle_response_done. This protects the cancel-arrival racewindow during which late events for an orphaned response could otherwise hit a
None-assertion crash.
Empirical verification of the substrate contract
The
conversation: "none"semantic andmetadata.client_event_idround-tripare both verified end-to-end by the integration tests in
tests/test_realtime/test_realtime_say_integration.py(6 tests total — 4 againstthe real OpenAI Realtime API, requires
OPENAI_API_KEY; 2 network-freebehavioral tests for the local-context gate and orphan filter that run on every
PR even from forks).
The local-context-gate test uses a positive+negative control structure: it
first verifies that
add_to_chat_ctx=Truepopulates the agent's chat context(positive control — would catch a regression where local insertion silently
stops working), then verifies
add_to_chat_ctx=Falsedoes NOT. Without thepositive control, the negative-only assertion could pass even if the entire
local-insertion path was broken.
The dispatcher kwarg-flow test inspects the captured
_realtime_reply_taskcoroutine's frame locals to assert that
AgentActivity.say(add_to_chat_ctx=False)actually forwards
Falsethrough the call chain — not just that the realtimepath is dispatched.
Scope note (deferred work, separately tracked)
This PR is intentionally focused. The following adjacent improvements are
deferred to standalone PRs to keep this review narrow:
_current_generation: see OpenAI Realtime API audio cuts off intermittently before finishing playback #1988. The slot pattern ispreserved in this PR. The dominant call path through
AgentActivity.say()serializes via
_speech_qso concurrent OOBsay()collisions are notexercised. Direct
_rt_session.say()callers firing OOB responses in tightsuccession may collide; documented as known limitation. I plan to file the
dict refactor as a follow-up PR motivated by OpenAI Realtime API audio cuts off intermittently before finishing playback #1988.
update_optionsrace fix: see RealtimeModel.update_options() does not send session.update to OpenAI (race condition with _opts mutation) #5530.will be a tiny standalone
docs:PR.interrupt()cancel-all-under-concurrency (latent today, active underfuture dict refactor): see RealtimeSession.interrupt() is a no-op without response_id under concurrent in-flight responses (OpenAI plugin) #5564.
response.errormid-flight future hang (TODO atrealtime_model.py:2005): see OpenAI Realtime: response.error mid-flight causes future to hang indefinitely (TODO at realtime_model.py:2005) #5566.Scope note (
generate_reply())This contribution is scoped to
say(). A follow-up PR will addadd_to_chat_ctxtogenerate_reply()using the sameephemeral_sayflagpattern (use case A semantic only).
CI note
Integration tests skip on fork PRs (no
OPENAI_API_KEY). I've verified thetest suite locally against
gpt-4o-realtime-preview-2025-06-03and will pastethe test output as a comment on this PR for inspection.
File diff summary (across all 8 commits)
livekit-agents/livekit/agents/llm/realtime.pyephemeral_saycapability field; abstractRealtimeSession.say()signature gainsadd_to_chat_ctxkwarglivekit-agents/livekit/agents/voice/agent_activity.pyDeprecationWarning); plumbing ofadd_to_chat_ctxthrough 3 internal methods; local chat_ctx upsert gated onadd_to_chat_ctxlivekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.pysupports_say=True, ephemeral_say=True;say()body usingresponse.create(conversation: "none"); late-event check; defensive guards in two downstream handlerslivekit-plugins/livekit-plugins-phonic/livekit/plugins/phonic/realtime/realtime_model.pysay()acceptsadd_to_chat_ctxkwarg (signature compat; substrate is structurally exempt)tests/test_realtime_capabilities.pysay()wire formattests/test_agent_activity_say.pytests/test_realtime/test_realtime_say_integration.pyresponse.created; metadata-presence guard bypasses the orphan filter for server-VAD-initiated responses)Commit log
If reviewers prefer a single squashed commit, the maintainer is welcome to
squash on merge — the per-phase commit chain is for development hygiene and
isolated rollback.