feat(realtime/openai): add ephemeral say() via response.create(conversation: "none") by cphoward · Pull Request #5569 · livekit/agents

cphoward · 2026-04-27T21:31:47Z

Summary

Adds an ephemeral path to RealtimeSession.say() for the OpenAI Realtime plugin
via response.create(conversation: "none"): text rendered with
add_to_chat_ctx=False is heard by the user but does NOT enter the agent's
reasoning context, server-side OR local.

This closes the JS PR #1193 parity gap and addresses the silent-degradation
regression in that review.

How to review

Cross-plugin scope

Only the OpenAI plugin gains the substrate implementation. Phonic accepts but
ignores the kwarg (structurally exempt — substrate is strictly turn-based per
docs.phonic.co; no out-of-band response primitive). Google Gemini Live, AWS
Nova Sonic, Ultravox, NVIDIA all declare supports_say=False today; this PR
doesn't change that.

A new RealtimeCapabilities.ephemeral_say: bool = False field is added.
Plugins set it to True when their substrate provides an out-of-band response
primitive that does not enter conversation state. The OpenAI plugin declares
ephemeral_say=True. All other plugins default to False.

Backward compatibility — caller side (deprecation cycle)

In this release, when a caller invokes say(text, add_to_chat_ctx=False) on a
plugin without ephemeral_say, we emit DeprecationWarning and continue with
the existing silent-degrade behavior. A future release will replace the warning
with NotImplementedError. This preserves backward compat for any callers
depending on the silent-degrade contract while loudly surfacing the capability
gap. Open to shipping the immediate-raise version if you'd prefer.

Backward compatibility — plugin authors

The abstract RealtimeSession.say() signature gains a keyword-only
add_to_chat_ctx: bool = True parameter, and the dispatcher unconditionally
passes the kwarg via _rt_session.say(text, add_to_chat_ctx=...). This is a
breaking change for any third-party plugin that BOTH declares
supports_say=True AND has overridden say() with the original signature
(self, text) — those overrides will hit TypeError: say() got an unexpected keyword argument 'add_to_chat_ctx'.

In-tree plugins are updated:

OpenAI plugin's say() accepts and honors the kwarg.
Phonic accepts but ignores (signature-only shim; substrate is structurally
exempt, dispatcher emits DeprecationWarning upstream).
Gemini Live, AWS Nova Sonic, Ultravox, NVIDIA all declare supports_say=False
today and the dispatcher never reaches them.

If preserving signature-level backward-compat for out-of-tree plugin authors is
preferred, the dispatcher could try/except TypeError and fall back to the
old call shape. Happy to add the shim — flagging this so you can choose
deliberately.

Tests prove the isolation contract

Each property below is falsifiable: if the assertion fails, the named test fails. Tests run against gpt-4o-realtime-preview-2025-06-03 for integration paths and network-free for the local-context-gate and orphan-filter behavioral tests.

Property	Test
Audibility — `say(text)` produces audio frames	`test_openai_realtime_say_audio_renders` (integration)
Server-side isolation — `say(secret, add_to_chat_ctx=False)` followed by `generate_reply` cannot retrieve `secret`	`test_openai_realtime_say_isolation_no_leak` (integration)
Substrate state isolation — substrate `chat_ctx` does not contain `secret` after isolated `say()`	`test_openai_realtime_say_isolation_no_remote_chat_ctx_leak` (integration)
Wire-format metadata — outbound `response.create` carries `metadata.client_event_id`	`test_openai_realtime_say_emits_metadata` (integration)
Local-context gate — `add_to_chat_ctx=False` does NOT populate `agent._chat_ctx`; `add_to_chat_ctx=True` DOES (positive+negative control)	`test_openai_realtime_say_isolation_no_local_leak` (behavioral, network-free)
Orphan filter — late `response.created` for a popped future is discarded; `_current_generation` stays None; `response.cancel(response_id=…)` is sent	`test_orphaned_response_after_timeout_filtered` (behavioral, network-free)
Dispatcher kwarg flow — `AgentActivity.say(add_to_chat_ctx=False)` actually forwards `False` to `_realtime_reply_task`	`test_agent_activity_say_realtime_dispatches_with_add_to_chat_ctx` (unit, frame-locals inspection)
Dispatcher capability warning — calling on a plugin without `ephemeral_say` emits `DeprecationWarning`	`test_agent_activity_say_realtime_capability_warns` (unit)

Late-event check

Builds on merged PR #2125's InvalidStateError handling at the
future-resolution layer. This PR adds an audio-playout layer check: when
response.created arrives carrying a metadata.client_event_id that we
previously issued but is no longer in _response_created_futures (because the
client-side timeout already popped it), we discard the late arrival and send a
defensive response.cancel(response_id=...).

The check reuses the existing _response_created_futures dict — no new
lifecycle state is introduced. A metadata-presence guard ensures
server-VAD-initiated responses (no client-issued metadata) flow through
normally.

Two adjacent event handlers (_handle_response_output_item_added,
_handle_response_content_part_added) gain a defensive
if self._current_generation is None: return guard mirroring the existing
pattern in _handle_response_done. This protects the cancel-arrival race
window during which late events for an orphaned response could otherwise hit a
None-assertion crash.

Empirical verification of the substrate contract

The conversation: "none" semantic and metadata.client_event_id round-trip
are both verified end-to-end by the integration tests in
tests/test_realtime/test_realtime_say_integration.py (6 tests total — 4 against
the real OpenAI Realtime API, requires OPENAI_API_KEY; 2 network-free
behavioral tests for the local-context gate and orphan filter that run on every
PR even from forks).

The local-context-gate test uses a positive+negative control structure: it
first verifies that add_to_chat_ctx=True populates the agent's chat context
(positive control — would catch a regression where local insertion silently
stops working), then verifies add_to_chat_ctx=False does NOT. Without the
positive control, the negative-only assertion could pass even if the entire
local-insertion path was broken.

The dispatcher kwarg-flow test inspects the captured _realtime_reply_task
coroutine's frame locals to assert that AgentActivity.say(add_to_chat_ctx=False)
actually forwards False through the call chain — not just that the realtime
path is dispatched.

Scope note (deferred work, separately tracked)

This PR is intentionally focused. The following adjacent improvements are
deferred to standalone PRs to keep this review narrow:

Dict refactor of _current_generation: see OpenAI Realtime API audio cuts off intermittently before finishing playback #1988. The slot pattern is
preserved in this PR. The dominant call path through AgentActivity.say()
serializes via _speech_q so concurrent OOB say() collisions are not
exercised. Direct _rt_session.say() callers firing OOB responses in tight
succession may collide; documented as known limitation. I plan to file the
dict refactor as a follow-up PR motivated by OpenAI Realtime API audio cuts off intermittently before finishing playback #1988.
update_options race fix: see RealtimeModel.update_options() does not send session.update to OpenAI (race condition with _opts mutation) #5530.
Capability docstring backfill (5 other undocumented fields): see RealtimeCapabilities has 6 undocumented fields + 1 misleading docstring #5565;
will be a tiny standalone docs: PR.
interrupt() cancel-all-under-concurrency (latent today, active under
future dict refactor): see RealtimeSession.interrupt() is a no-op without response_id under concurrent in-flight responses (OpenAI plugin) #5564.
response.error mid-flight future hang (TODO at
realtime_model.py:2005): see OpenAI Realtime: response.error mid-flight causes future to hang indefinitely (TODO at realtime_model.py:2005) #5566.

Scope note (`generate_reply()`)

This contribution is scoped to say(). A follow-up PR will add
add_to_chat_ctx to generate_reply() using the same ephemeral_say flag
pattern (use case A semantic only).

CI note

Integration tests skip on fork PRs (no OPENAI_API_KEY). I've verified the
test suite locally against gpt-4o-realtime-preview-2025-06-03 and will paste
the test output as a comment on this PR for inspection.

File diff summary (across all 8 commits)

File	Change
`livekit-agents/livekit/agents/llm/realtime.py`	new `ephemeral_say` capability field; abstract `RealtimeSession.say()` signature gains `add_to_chat_ctx` kwarg
`livekit-agents/livekit/agents/voice/agent_activity.py`	dispatcher capability check (`DeprecationWarning`); plumbing of `add_to_chat_ctx` through 3 internal methods; local chat_ctx upsert gated on `add_to_chat_ctx`
`livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py`	OpenAI plugin declares `supports_say=True, ephemeral_say=True`; `say()` body using `response.create(conversation: "none")`; late-event check; defensive guards in two downstream handlers
`livekit-plugins/livekit-plugins-phonic/livekit/plugins/phonic/realtime/realtime_model.py`	Phonic `say()` accepts `add_to_chat_ctx` kwarg (signature compat; substrate is structurally exempt)
`tests/test_realtime_capabilities.py`	unit tests: capability advertisement (OpenAI + Phonic), Phonic signature shim, OpenAI `say()` wire format
`tests/test_agent_activity_say.py`	unit tests: dispatcher capability warning, kwarg-flow verification (frame-locals inspection of dispatched coroutine)
`tests/test_realtime/test_realtime_say_integration.py`	integration + behavioral tests: 4 against the real OpenAI Realtime API (audibility, server-side isolation, substrate-state isolation, wire-format metadata), plus 3 network-free behavioral tests (local-context gate with positive+negative control; orphan filter discards late `response.created`; metadata-presence guard bypasses the orphan filter for server-VAD-initiated responses)

Commit log

53b64313 test(realtime): assert orphan filter bypasses on metadata=None (server-VAD)
a9a3e2ac test(realtime): behavioral integration tests for ephemeral say()
cd500116 test(realtime): integration tests for ephemeral say()
579df8f7 feat(realtime): gate local chat_ctx upsert on add_to_chat_ctx
1ff15adf feat(realtime): dispatcher capability check + add_to_chat_ctx plumbing
92418578 feat(realtime/openai): implement say() with conversation: "none" + orphan filter
a97d4283 feat(phonic): accept add_to_chat_ctx kwarg on say() for signature compat
e7947d30 feat(realtime): add ephemeral_say capability flag

If reviewers prefer a single squashed commit, the maintainer is welcome to
squash on merge — the per-phase commit chain is for development hygiene and
isolated rollback.

Add ephemeral_say: bool = False to RealtimeCapabilities. Plugins that honor add_to_chat_ctx=False on say() declare this True; all others default False. OpenAI plugin declares supports_say=True and ephemeral_say=True. Phonic defaults to ephemeral_say=False (substrate is turn-based). Substrate landscape verified empirically via reproducible probes against the OpenAI Realtime API. Part of feat(realtime/openai): add ephemeral say() via response.create(conversation: "none") — Phase 1 of 6.

Phonic's RealtimeSession.say() now accepts add_to_chat_ctx: bool = True as a keyword-only parameter. Phonic accepts but ignores the kwarg: the substrate is strictly turn-based and has no out-of-band response primitive. Phonic declares ephemeral_say=False so that the AgentActivity.say() dispatcher emits a DeprecationWarning before reaching this method when add_to_chat_ctx=False is requested. The kwarg is a forward-compatible signature shim so Phase 3's abstract RealtimeSession.say() signature change does not break the Phonic plugin at type-check time. Part of feat(realtime/openai): add ephemeral say() — Phase 2 of 6.

…phan filter Implements RealtimeSession.say(text, add_to_chat_ctx=False) for the OpenAI Realtime plugin using response.create(conversation: "none"). Adds metadata.client_event_id on outbound events to enable a late-event check in _handle_response_created that discards orphaned response.created when the caller-future has been popped (typically by client-side timeout). Builds on PR livekit#2125's InvalidStateError handling at the future-resolution layer. Also adds defense-in-depth guards in _handle_response_output_item_added and _handle_response_content_part_added so late events for an orphaned response (during the cancel-arrival race window) drop gracefully instead of asserting on a None _current_generation. The abstract RealtimeSession.say() in core gains the add_to_chat_ctx keyword-only parameter so plugins implementing isolation share a stable signature. Part of feat(realtime/openai): add ephemeral say() — Phase 3 of 6.

AgentActivity.say() now emits a DeprecationWarning when a caller passes add_to_chat_ctx=False against a RealtimeModel that declares ephemeral_say=False. Existing silent-degrade behavior is preserved for the deprecation window; a future release will replace the warning with NotImplementedError. The add_to_chat_ctx parameter is threaded end-to-end through: AgentActivity.say() -> _realtime_reply_task -> _rt_session.say() -> _realtime_generation_task -> _realtime_generation_task_impl so downstream gating (e.g. local chat_ctx upsert) can read the caller's intent. Part of feat(realtime/openai): add ephemeral say() — Phase 4 of 6.

In _realtime_generation_task_impl, the conditional that upserts the forwarded assistant message into the local chat_ctx now requires add_to_chat_ctx in addition to msg_gen and forwarded_text. Without this gate, callers passing add_to_chat_ctx=False through the realtime path would still see the rendered text written to the local context, defeating the isolation contract that the substrate enforces server-side via response.create(conversation: "none"). Mirrors the chain (TTS) path's existing gate. Part of feat(realtime/openai): add ephemeral say() — Phase 5 of 6.

Verifies end-to-end against the real OpenAI Realtime API (gpt-4o-realtime-preview-2025-06-03): - Audibility: say(text) produces audio frames. - Server-side isolation: say(secret, add_to_chat_ctx=False) followed by generate_reply asking the model to repeat the token cannot recall it — substrate state is empty for conversation: "none". - Substrate chat_ctx isolation: session.chat_ctx after isolated say() contains no item with the secret. - Wire-format metadata: outbound response.create carries metadata.client_event_id with the say-prefixed identifier (round-trip verified). Tests skip when OPENAI_API_KEY is not set (e.g., on fork PRs). Part of feat(realtime/openai): add ephemeral say() — Phase 6 of 6.

Replaces two source-inspection placeholders in test_agent_activity_say.py with real behavioral tests in test_realtime_say_integration.py: 1. test_openai_realtime_say_isolation_no_local_leak — behavioral positive+negative control. Runs _realtime_generation_task_impl with a real async message stream. Positive: add_to_chat_ctx=True must call _chat_ctx._upsert_item. Negative: add_to_chat_ctx=False must NOT call it. Exercises the gate added in Phase 5. 2. test_orphaned_response_after_timeout_filtered — directly invokes _handle_response_created with a synthetic late event whose metadata.client_event_id is missing from _response_created_futures (simulating the timeout-popped state). Asserts _current_generation remains None and a ResponseCancelEvent(response_id=...) was sent. The handoff test (test_agent_handoff_does_not_propagate_isolated_text) is dropped: update_agent() does not propagate the prior agent's chat_ctx, so the assertion would be vacuous. The gate proof at _realtime_generation_task_impl makes it redundant. Part of feat(realtime/openai): add ephemeral say() — test addendum.

…r-VAD) Adds the sister case to test_orphaned_response_after_timeout_filtered: when response.metadata is None (no client_event_id), the orphan filter must NOT fire — the response was server-VAD-initiated, not one we issued that could have timed out. Without the metadata-presence guard at realtime_model.py, every server-VAD-initiated response would be silently discarded: - _current_generation would never be created on the server-VAD path - the agent would stop responding to user speech - no exception would be raised; the regression would be silent The test directly invokes _handle_response_created with metadata=None and asserts: - _current_generation IS NOT None after the call (post-guard happy path ran and constructed a fresh _ResponseGeneration). - No ResponseCancelEvent was emitted (the orphan filter did not fire). Part of feat(realtime/openai): add ephemeral say().

CLAassistant · 2026-04-27T21:31:56Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

devin-ai-integration

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-27T21:40:08Z

                )
            self._create_speech_task(
                self._realtime_reply_task(
                    speech_handle=handle,
                    text=text,
                    model_settings=ModelSettings(),
+                    add_to_chat_ctx=add_to_chat_ctx,


🔴 Deprecation path still passes add_to_chat_ctx=False downstream, breaking "silent-degrade" promise

When ephemeral_say=False and add_to_chat_ctx=False, the deprecation warning at line 1083 tells the user "The text will be added to chat context anyway in this release" (silent-degrade). However, the code never overrides add_to_chat_ctx back to True before passing it to _realtime_reply_task at line 1099. This value flows all the way to the gate at agent_activity.py:3270 (if msg_gen and forwarded_text and add_to_chat_ctx:), which skips the local _upsert_item call.

For plugins like Phonic (which ignore the parameter server-side), this creates a mismatch: the text IS in the server-side conversation state but is NOT in the local agent._chat_ctx. The stated intent of the deprecation path — backward-compatible silent degrade — is violated.

Suggested change

)

self._create_speech_task(

self._realtime_reply_task(

speech_handle=handle,

text=text,

model_settings=ModelSettings(),

add_to_chat_ctx=add_to_chat_ctx,

)

add_to_chat_ctx = True

self._create_speech_task(

self._realtime_reply_task(

speech_handle=handle,

text=text,

model_settings=ModelSettings(),

add_to_chat_ctx=add_to_chat_ctx,

),

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-27T21:40:09Z

+        if self._current_generation is None:
+            # Late event for an orphaned (timed-out) response;
+            # safe to drop. Mirrors the existing graceful handling
+            # in _handle_response_done.
+            return


🟡 Orphan filter leaves downstream event handlers unguarded, causing assertion failures on in-flight events

The orphan filter added to _handle_response_created (realtime_model.py:1755-1774) returns early and leaves _current_generation as None. The PR correctly added graceful None checks to _handle_response_output_item_added (line 1805) and _handle_response_content_part_added (line 1835), but the remaining handlers — _handle_response_text_delta (line 1917), _handle_response_text_done (line 1930), _handle_response_audio_transcript_delta (line 1933), _handle_response_audio_delta (line 1945), _handle_response_audio_done (line 1964), and _handle_response_output_item_done (line 1967) — still use assert self._current_generation is not None. Events already in the WebSocket pipeline when the cancel is sent will hit these asserts. While the outer dispatch loop (line 1103) catches Exception and logs it, this results in error log spam for every in-flight event of the orphaned response and is inconsistent with the graceful handling applied to the two other handlers.

Prompt for agents

The orphan filter in _handle_response_created (line 1755-1774) returns early when a late response.created arrives for a timed-out future, leaving _current_generation as None. Two handlers (_handle_response_output_item_added and _handle_response_content_part_added) were updated with graceful None checks, but several other handlers still assert _current_generation is not None: _handle_response_text_delta (line 1917), _handle_response_text_done (line 1930), _handle_response_audio_transcript_delta (line 1933), _handle_response_audio_delta (line 1945), _handle_response_audio_done (line 1964), _handle_response_output_item_done (line 1967). These should all be updated with the same pattern: if self._current_generation is None: return. This ensures consistent graceful handling when in-flight WebSocket events arrive after the orphan filter discards a response.

Was this helpful? React with 👍 or 👎 to provide feedback.

longcw · 2026-04-28T08:48:00Z

+        params = RealtimeResponseCreateParams(
+            input=[
+                realtime.RealtimeConversationItemAssistantMessage(
+                    type="message",
+                    role="assistant",
+                    content=[
+                        realtime.realtime_conversation_item_assistant_message.Content(
+                            type="output_text",
+                            text=full_text,
+                        )
+                    ],
+                )
+            ],
+            metadata={"client_event_id": event_id},
+        )
+        if not add_to_chat_ctx:
+            params.conversation = "none"
+
+        self.send_event(
+            ResponseCreateEvent(type="response.create", event_id=event_id, response=params)
+        )


have you tested that if openai realtime will "say" this message instead of using it as context?

I just tested and it seems it's not supported by openai realtime session.

You are right, my sincerest apologies. I wrote an made test validity error when experimenting via OpenAI's SDK. I only confirmed audio was returned, not that the audio was what I expected. Closing this PR.

cphoward added 8 commits April 27, 2026 16:59

cphoward marked this pull request as ready for review April 27, 2026 21:36

devin-ai-integration Bot reviewed Apr 27, 2026

View reviewed changes

longcw reviewed Apr 28, 2026

View reviewed changes

cphoward closed this Apr 28, 2026

cphoward mentioned this pull request Apr 30, 2026

feat(realtime): add_to_chat_ctx on generate_reply() #5605

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(realtime/openai): add ephemeral say() via response.create(conversation: "none")#5569

feat(realtime/openai): add ephemeral say() via response.create(conversation: "none")#5569
cphoward wants to merge 8 commits intolivekit:mainfrom
cphoward:feat/realtime-ephemeral-say

cphoward commented Apr 27, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 27, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Uh oh!

longcw Apr 28, 2026

Uh oh!

longcw Apr 28, 2026

Uh oh!

cphoward Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cphoward commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to review

Cross-plugin scope

Backward compatibility — caller side (deprecation cycle)

Backward compatibility — plugin authors

Tests prove the isolation contract

Late-event check

Empirical verification of the substrate contract

Scope note (deferred work, separately tracked)

Scope note (generate_reply())

CI note

File diff summary (across all 8 commits)

Commit log

Uh oh!

CLAassistant commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

longcw Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

longcw Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

cphoward Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cphoward commented Apr 27, 2026 •

edited

Loading

Scope note (`generate_reply()`)

CLAassistant commented Apr 27, 2026 •

edited

Loading