Skip to content

feat(realtime/openai): add ephemeral say() via response.create(conversation: "none")#5569

Closed
cphoward wants to merge 8 commits intolivekit:mainfrom
cphoward:feat/realtime-ephemeral-say
Closed

feat(realtime/openai): add ephemeral say() via response.create(conversation: "none")#5569
cphoward wants to merge 8 commits intolivekit:mainfrom
cphoward:feat/realtime-ephemeral-say

Conversation

@cphoward
Copy link
Copy Markdown

@cphoward cphoward commented Apr 27, 2026

Summary

Adds an ephemeral path to RealtimeSession.say() for the OpenAI Realtime plugin
via response.create(conversation: "none"): text rendered with
add_to_chat_ctx=False is heard by the user but does NOT enter the agent's
reasoning context, server-side OR local.

This closes the JS PR #1193 parity gap and addresses the silent-degradation
regression in that review.

How to review

Suggested reading path (≈30 min):

  1. livekit-agents/livekit/agents/llm/realtime.py — capability shape (ephemeral_say field) and abstract say() signature, ~30 lines.
  2. livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.pysay() body and the orphan filter in _handle_response_created, ~140 lines.
  3. livekit-agents/livekit/agents/voice/agent_activity.py — dispatcher capability check, plumbing, and the local-context gate at _realtime_generation_task_impl, ~30 lines.
  4. tests/test_realtime/test_realtime_say_integration.py — the property contract for the feature; the unit test files are proofs of the same properties at finer granularity.

Cross-plugin scope

Only the OpenAI plugin gains the substrate implementation. Phonic accepts but
ignores the kwarg (structurally exempt — substrate is strictly turn-based per
docs.phonic.co; no out-of-band response primitive). Google Gemini Live, AWS
Nova Sonic, Ultravox, NVIDIA all declare supports_say=False today; this PR
doesn't change that.

A new RealtimeCapabilities.ephemeral_say: bool = False field is added.
Plugins set it to True when their substrate provides an out-of-band response
primitive that does not enter conversation state. The OpenAI plugin declares
ephemeral_say=True. All other plugins default to False.

Backward compatibility — caller side (deprecation cycle)

In this release, when a caller invokes say(text, add_to_chat_ctx=False) on a
plugin without ephemeral_say, we emit DeprecationWarning and continue with
the existing silent-degrade behavior. A future release will replace the warning
with NotImplementedError. This preserves backward compat for any callers
depending on the silent-degrade contract while loudly surfacing the capability
gap. Open to shipping the immediate-raise version if you'd prefer.

Backward compatibility — plugin authors

The abstract RealtimeSession.say() signature gains a keyword-only
add_to_chat_ctx: bool = True parameter, and the dispatcher unconditionally
passes the kwarg via _rt_session.say(text, add_to_chat_ctx=...). This is a
breaking change for any third-party plugin that BOTH declares
supports_say=True AND has overridden say() with the original signature
(self, text)
— those overrides will hit TypeError: say() got an unexpected keyword argument 'add_to_chat_ctx'.

In-tree plugins are updated:

  • OpenAI plugin's say() accepts and honors the kwarg.
  • Phonic accepts but ignores (signature-only shim; substrate is structurally
    exempt, dispatcher emits DeprecationWarning upstream).
  • Gemini Live, AWS Nova Sonic, Ultravox, NVIDIA all declare supports_say=False
    today and the dispatcher never reaches them.

If preserving signature-level backward-compat for out-of-tree plugin authors is
preferred, the dispatcher could try/except TypeError and fall back to the
old call shape. Happy to add the shim — flagging this so you can choose
deliberately.

Tests prove the isolation contract

Each property below is falsifiable: if the assertion fails, the named test fails. Tests run against gpt-4o-realtime-preview-2025-06-03 for integration paths and network-free for the local-context-gate and orphan-filter behavioral tests.

Property Test
Audibility — say(text) produces audio frames test_openai_realtime_say_audio_renders (integration)
Server-side isolation — say(secret, add_to_chat_ctx=False) followed by generate_reply cannot retrieve secret test_openai_realtime_say_isolation_no_leak (integration)
Substrate state isolation — substrate chat_ctx does not contain secret after isolated say() test_openai_realtime_say_isolation_no_remote_chat_ctx_leak (integration)
Wire-format metadata — outbound response.create carries metadata.client_event_id test_openai_realtime_say_emits_metadata (integration)
Local-context gate — add_to_chat_ctx=False does NOT populate agent._chat_ctx; add_to_chat_ctx=True DOES (positive+negative control) test_openai_realtime_say_isolation_no_local_leak (behavioral, network-free)
Orphan filter — late response.created for a popped future is discarded; _current_generation stays None; response.cancel(response_id=…) is sent test_orphaned_response_after_timeout_filtered (behavioral, network-free)
Dispatcher kwarg flow — AgentActivity.say(add_to_chat_ctx=False) actually forwards False to _realtime_reply_task test_agent_activity_say_realtime_dispatches_with_add_to_chat_ctx (unit, frame-locals inspection)
Dispatcher capability warning — calling on a plugin without ephemeral_say emits DeprecationWarning test_agent_activity_say_realtime_capability_warns (unit)

Late-event check

Builds on merged PR #2125's InvalidStateError handling at the
future-resolution layer. This PR adds an audio-playout layer check: when
response.created arrives carrying a metadata.client_event_id that we
previously issued but is no longer in _response_created_futures (because the
client-side timeout already popped it), we discard the late arrival and send a
defensive response.cancel(response_id=...).

The check reuses the existing _response_created_futures dict — no new
lifecycle state is introduced. A metadata-presence guard ensures
server-VAD-initiated responses (no client-issued metadata) flow through
normally.

Two adjacent event handlers (_handle_response_output_item_added,
_handle_response_content_part_added) gain a defensive
if self._current_generation is None: return guard mirroring the existing
pattern in _handle_response_done. This protects the cancel-arrival race
window during which late events for an orphaned response could otherwise hit a
None-assertion crash.

Empirical verification of the substrate contract

The conversation: "none" semantic and metadata.client_event_id round-trip
are both verified end-to-end by the integration tests in
tests/test_realtime/test_realtime_say_integration.py (6 tests total — 4 against
the real OpenAI Realtime API, requires OPENAI_API_KEY; 2 network-free
behavioral tests for the local-context gate and orphan filter that run on every
PR even from forks).

The local-context-gate test uses a positive+negative control structure: it
first verifies that add_to_chat_ctx=True populates the agent's chat context
(positive control — would catch a regression where local insertion silently
stops working), then verifies add_to_chat_ctx=False does NOT. Without the
positive control, the negative-only assertion could pass even if the entire
local-insertion path was broken.

The dispatcher kwarg-flow test inspects the captured _realtime_reply_task
coroutine's frame locals to assert that AgentActivity.say(add_to_chat_ctx=False)
actually forwards False through the call chain — not just that the realtime
path is dispatched.

Scope note (deferred work, separately tracked)

This PR is intentionally focused. The following adjacent improvements are
deferred to standalone PRs to keep this review narrow:

Scope note (generate_reply())

This contribution is scoped to say(). A follow-up PR will add
add_to_chat_ctx to generate_reply() using the same ephemeral_say flag
pattern (use case A semantic only).

CI note

Integration tests skip on fork PRs (no OPENAI_API_KEY). I've verified the
test suite locally against gpt-4o-realtime-preview-2025-06-03 and will paste
the test output as a comment on this PR for inspection.


File diff summary (across all 8 commits)

File Change
livekit-agents/livekit/agents/llm/realtime.py new ephemeral_say capability field; abstract RealtimeSession.say() signature gains add_to_chat_ctx kwarg
livekit-agents/livekit/agents/voice/agent_activity.py dispatcher capability check (DeprecationWarning); plumbing of add_to_chat_ctx through 3 internal methods; local chat_ctx upsert gated on add_to_chat_ctx
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py OpenAI plugin declares supports_say=True, ephemeral_say=True; say() body using response.create(conversation: "none"); late-event check; defensive guards in two downstream handlers
livekit-plugins/livekit-plugins-phonic/livekit/plugins/phonic/realtime/realtime_model.py Phonic say() accepts add_to_chat_ctx kwarg (signature compat; substrate is structurally exempt)
tests/test_realtime_capabilities.py unit tests: capability advertisement (OpenAI + Phonic), Phonic signature shim, OpenAI say() wire format
tests/test_agent_activity_say.py unit tests: dispatcher capability warning, kwarg-flow verification (frame-locals inspection of dispatched coroutine)
tests/test_realtime/test_realtime_say_integration.py integration + behavioral tests: 4 against the real OpenAI Realtime API (audibility, server-side isolation, substrate-state isolation, wire-format metadata), plus 3 network-free behavioral tests (local-context gate with positive+negative control; orphan filter discards late response.created; metadata-presence guard bypasses the orphan filter for server-VAD-initiated responses)

Commit log

53b64313 test(realtime): assert orphan filter bypasses on metadata=None (server-VAD)
a9a3e2ac test(realtime): behavioral integration tests for ephemeral say()
cd500116 test(realtime): integration tests for ephemeral say()
579df8f7 feat(realtime): gate local chat_ctx upsert on add_to_chat_ctx
1ff15adf feat(realtime): dispatcher capability check + add_to_chat_ctx plumbing
92418578 feat(realtime/openai): implement say() with conversation: "none" + orphan filter
a97d4283 feat(phonic): accept add_to_chat_ctx kwarg on say() for signature compat
e7947d30 feat(realtime): add ephemeral_say capability flag

If reviewers prefer a single squashed commit, the maintainer is welcome to
squash on merge — the per-phase commit chain is for development hygiene and
isolated rollback.

Add ephemeral_say: bool = False to RealtimeCapabilities. Plugins that
honor add_to_chat_ctx=False on say() declare this True; all others
default False.

OpenAI plugin declares supports_say=True and ephemeral_say=True.
Phonic defaults to ephemeral_say=False (substrate is turn-based).

Substrate landscape verified empirically via reproducible probes against
the OpenAI Realtime API.

Part of feat(realtime/openai): add ephemeral say() via
response.create(conversation: "none") — Phase 1 of 6.
Phonic's RealtimeSession.say() now accepts add_to_chat_ctx: bool = True
as a keyword-only parameter. Phonic accepts but ignores the kwarg:
the substrate is strictly turn-based and has no out-of-band response
primitive. Phonic declares ephemeral_say=False so that the
AgentActivity.say() dispatcher emits a DeprecationWarning before
reaching this method when add_to_chat_ctx=False is requested.

The kwarg is a forward-compatible signature shim so Phase 3's
abstract RealtimeSession.say() signature change does not break
the Phonic plugin at type-check time.

Part of feat(realtime/openai): add ephemeral say() — Phase 2 of 6.
…phan filter

Implements RealtimeSession.say(text, add_to_chat_ctx=False) for the
OpenAI Realtime plugin using response.create(conversation: "none").
Adds metadata.client_event_id on outbound events to enable a
late-event check in _handle_response_created that discards orphaned
response.created when the caller-future has been popped (typically
by client-side timeout). Builds on PR livekit#2125's InvalidStateError
handling at the future-resolution layer.

Also adds defense-in-depth guards in _handle_response_output_item_added
and _handle_response_content_part_added so late events for an orphaned
response (during the cancel-arrival race window) drop gracefully
instead of asserting on a None _current_generation.

The abstract RealtimeSession.say() in core gains the
add_to_chat_ctx keyword-only parameter so plugins implementing
isolation share a stable signature.

Part of feat(realtime/openai): add ephemeral say() — Phase 3 of 6.
AgentActivity.say() now emits a DeprecationWarning when a caller
passes add_to_chat_ctx=False against a RealtimeModel that declares
ephemeral_say=False. Existing silent-degrade behavior is preserved
for the deprecation window; a future release will replace the warning
with NotImplementedError.

The add_to_chat_ctx parameter is threaded end-to-end through:
AgentActivity.say() -> _realtime_reply_task -> _rt_session.say()
                                            -> _realtime_generation_task
                                            -> _realtime_generation_task_impl
so downstream gating (e.g. local chat_ctx upsert) can read the
caller's intent.

Part of feat(realtime/openai): add ephemeral say() — Phase 4 of 6.
In _realtime_generation_task_impl, the conditional that upserts the
forwarded assistant message into the local chat_ctx now requires
add_to_chat_ctx in addition to msg_gen and forwarded_text. Without
this gate, callers passing add_to_chat_ctx=False through the realtime
path would still see the rendered text written to the local context,
defeating the isolation contract that the substrate enforces
server-side via response.create(conversation: "none").

Mirrors the chain (TTS) path's existing gate.

Part of feat(realtime/openai): add ephemeral say() — Phase 5 of 6.
Verifies end-to-end against the real OpenAI Realtime API
(gpt-4o-realtime-preview-2025-06-03):

- Audibility: say(text) produces audio frames.
- Server-side isolation: say(secret, add_to_chat_ctx=False)
  followed by generate_reply asking the model to repeat the token
  cannot recall it — substrate state is empty for conversation: "none".
- Substrate chat_ctx isolation: session.chat_ctx after isolated say()
  contains no item with the secret.
- Wire-format metadata: outbound response.create carries
  metadata.client_event_id with the say-prefixed identifier (round-trip
  verified).

Tests skip when OPENAI_API_KEY is not set (e.g., on fork PRs).

Part of feat(realtime/openai): add ephemeral say() — Phase 6 of 6.
Replaces two source-inspection placeholders in test_agent_activity_say.py
with real behavioral tests in test_realtime_say_integration.py:

1. test_openai_realtime_say_isolation_no_local_leak — behavioral positive+negative
   control. Runs _realtime_generation_task_impl with a real async message
   stream. Positive: add_to_chat_ctx=True must call _chat_ctx._upsert_item.
   Negative: add_to_chat_ctx=False must NOT call it. Exercises the gate
   added in Phase 5.

2. test_orphaned_response_after_timeout_filtered — directly invokes
   _handle_response_created with a synthetic late event whose
   metadata.client_event_id is missing from _response_created_futures
   (simulating the timeout-popped state). Asserts _current_generation
   remains None and a ResponseCancelEvent(response_id=...) was sent.

The handoff test (test_agent_handoff_does_not_propagate_isolated_text)
is dropped: update_agent() does not propagate the prior agent's chat_ctx,
so the assertion would be vacuous. The gate proof at
_realtime_generation_task_impl makes it redundant.

Part of feat(realtime/openai): add ephemeral say() — test addendum.
…r-VAD)

Adds the sister case to test_orphaned_response_after_timeout_filtered:
when response.metadata is None (no client_event_id), the orphan filter
must NOT fire — the response was server-VAD-initiated, not one we issued
that could have timed out.

Without the metadata-presence guard at realtime_model.py, every
server-VAD-initiated response would be silently discarded:
- _current_generation would never be created on the server-VAD path
- the agent would stop responding to user speech
- no exception would be raised; the regression would be silent

The test directly invokes _handle_response_created with metadata=None and
asserts:
- _current_generation IS NOT None after the call (post-guard happy path
  ran and constructed a fresh _ResponseGeneration).
- No ResponseCancelEvent was emitted (the orphan filter did not fire).

Part of feat(realtime/openai): add ephemeral say().
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 27, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@cphoward cphoward marked this pull request as ready for review April 27, 2026 21:36
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1093 to +1099
)
self._create_speech_task(
self._realtime_reply_task(
speech_handle=handle,
text=text,
model_settings=ModelSettings(),
add_to_chat_ctx=add_to_chat_ctx,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Deprecation path still passes add_to_chat_ctx=False downstream, breaking "silent-degrade" promise

When ephemeral_say=False and add_to_chat_ctx=False, the deprecation warning at line 1083 tells the user "The text will be added to chat context anyway in this release" (silent-degrade). However, the code never overrides add_to_chat_ctx back to True before passing it to _realtime_reply_task at line 1099. This value flows all the way to the gate at agent_activity.py:3270 (if msg_gen and forwarded_text and add_to_chat_ctx:), which skips the local _upsert_item call.

For plugins like Phonic (which ignore the parameter server-side), this creates a mismatch: the text IS in the server-side conversation state but is NOT in the local agent._chat_ctx. The stated intent of the deprecation path — backward-compatible silent degrade — is violated.

Suggested change
)
self._create_speech_task(
self._realtime_reply_task(
speech_handle=handle,
text=text,
model_settings=ModelSettings(),
add_to_chat_ctx=add_to_chat_ctx,
)
add_to_chat_ctx = True
self._create_speech_task(
self._realtime_reply_task(
speech_handle=handle,
text=text,
model_settings=ModelSettings(),
add_to_chat_ctx=add_to_chat_ctx,
),
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1805 to +1809
if self._current_generation is None:
# Late event for an orphaned (timed-out) response;
# safe to drop. Mirrors the existing graceful handling
# in _handle_response_done.
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Orphan filter leaves downstream event handlers unguarded, causing assertion failures on in-flight events

The orphan filter added to _handle_response_created (realtime_model.py:1755-1774) returns early and leaves _current_generation as None. The PR correctly added graceful None checks to _handle_response_output_item_added (line 1805) and _handle_response_content_part_added (line 1835), but the remaining handlers — _handle_response_text_delta (line 1917), _handle_response_text_done (line 1930), _handle_response_audio_transcript_delta (line 1933), _handle_response_audio_delta (line 1945), _handle_response_audio_done (line 1964), and _handle_response_output_item_done (line 1967) — still use assert self._current_generation is not None. Events already in the WebSocket pipeline when the cancel is sent will hit these asserts. While the outer dispatch loop (line 1103) catches Exception and logs it, this results in error log spam for every in-flight event of the orphaned response and is inconsistent with the graceful handling applied to the two other handlers.

Prompt for agents
The orphan filter in _handle_response_created (line 1755-1774) returns early when a late response.created arrives for a timed-out future, leaving _current_generation as None. Two handlers (_handle_response_output_item_added and _handle_response_content_part_added) were updated with graceful None checks, but several other handlers still assert _current_generation is not None: _handle_response_text_delta (line 1917), _handle_response_text_done (line 1930), _handle_response_audio_transcript_delta (line 1933), _handle_response_audio_delta (line 1945), _handle_response_audio_done (line 1964), _handle_response_output_item_done (line 1967). These should all be updated with the same pattern: if self._current_generation is None: return. This ensures consistent graceful handling when in-flight WebSocket events arrive after the orphan filter discards a response.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1591 to +1611
params = RealtimeResponseCreateParams(
input=[
realtime.RealtimeConversationItemAssistantMessage(
type="message",
role="assistant",
content=[
realtime.realtime_conversation_item_assistant_message.Content(
type="output_text",
text=full_text,
)
],
)
],
metadata={"client_event_id": event_id},
)
if not add_to_chat_ctx:
params.conversation = "none"

self.send_event(
ResponseCreateEvent(type="response.create", event_id=event_id, response=params)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested that if openai realtime will "say" this message instead of using it as context?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested and it seems it's not supported by openai realtime session.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my sincerest apologies. I wrote an made test validity error when experimenting via OpenAI's SDK. I only confirmed audio was returned, not that the audio was what I expected. Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants