TL;DR
RealtimeCapabilities has 6 undocumented fields + 1 misleading docstring
Summary
The RealtimeCapabilities dataclass in livekit-agents/livekit/agents/llm/realtime.py has 11 fields. 6 of them have no docstrings, and the supports_say field has a docstring that's misleading post-add_to_chat_ctx parameter introduction.
This is a small documentation-hygiene issue but matters because plugin authors implementing new Realtime backends use this dataclass to declare their capabilities; ambiguous semantics produce wrong True/False decisions.
Undocumented fields
The following 6 fields lack docstrings (all in livekit-agents/livekit/agents/llm/realtime.py, around lines 60-75):
message_truncation: bool
turn_detection: bool
user_transcription: bool
auto_tool_reply_generation: bool
audio_output: bool
manual_function_calls: bool
Misleading docstring
supports_say currently says: "Whether the model supports session.say()".
This is misleading because RealtimeModel implementations that declare supports_say=True typically only support say() when add_to_chat_ctx=True. Plugin authors reading this docstring may incorrectly infer they should declare supports_say=True even when they cannot honor add_to_chat_ctx=False.
(Recommendation: amend the supports_say docstring AND add a sibling ephemeral_say field for the isolation capability — that work is happening in a separate PR.)
Proposed docstrings
Verified against actual usage in agent_activity.py (line numbers from main):
@dataclass
class RealtimeCapabilities:
message_truncation: bool
"""Whether the substrate supports per-message truncation via a
truncate() call (e.g., OpenAI's conversation.item.truncate). Used
during user interruption to cut off in-flight audio responses by
message_id."""
turn_detection: bool
"""Whether the substrate provides server-side voice activity
detection (VAD) for turn-taking. When True, AgentActivity relies on
the substrate's turn detection events; when False, AgentActivity
must orchestrate turns client-side."""
user_transcription: bool
"""Whether the substrate provides server-side transcription of user
audio. When True, AgentActivity skips client-side STT processing to
avoid duplicate transcripts."""
auto_tool_reply_generation: bool
"""Whether the substrate automatically generates a new response
after tool execution results are submitted. When False,
AgentActivity must manually trigger a new generation to process
tool outputs."""
audio_output: bool
"""Whether the substrate is capable of producing audio output
directly. When True, AgentActivity expects audio streams in model
responses; when False, it may fall back to a separate TTS model."""
manual_function_calls: bool
"""Whether the substrate supports manual function calls injected
into the chat context. When False, the model may not correctly
resume or recognize function calls that were not generated by the
model itself."""
# ... existing fields ...
supports_say: bool = False
"""Whether the model supports session.say(). Note: RealtimeModel
implementations typically only support say() when
add_to_chat_ctx=True; for ephemeral say() (where the rendered text
does NOT enter the agent's reasoning context) see ephemeral_say."""
Why file this as a separate issue
Documentation-only changes are easy to review and ship cleanly. Bundling them with feature PRs creates scope creep. A standalone docs: PR (~80 LOC, no functional change, very high acceptance probability) is the right vehicle.
Acceptance criteria
- All 11
RealtimeCapabilities fields have docstrings.
supports_say docstring acknowledges the add_to_chat_ctx=True constraint.
- Docstrings verified against actual usage sites in the codebase.
Related
- The proposed
ephemeral_say field (separately introduced in an upcoming feature PR) will replace the partial guidance in supports_say with a clean isolation-capability flag.
TL;DR
RealtimeCapabilities has 6 undocumented fields + 1 misleading docstring
Summary
The
RealtimeCapabilitiesdataclass inlivekit-agents/livekit/agents/llm/realtime.pyhas 11 fields. 6 of them have no docstrings, and thesupports_sayfield has a docstring that's misleading post-add_to_chat_ctxparameter introduction.This is a small documentation-hygiene issue but matters because plugin authors implementing new Realtime backends use this dataclass to declare their capabilities; ambiguous semantics produce wrong
True/Falsedecisions.Undocumented fields
The following 6 fields lack docstrings (all in
livekit-agents/livekit/agents/llm/realtime.py, around lines 60-75):message_truncation: boolturn_detection: booluser_transcription: boolauto_tool_reply_generation: boolaudio_output: boolmanual_function_calls: boolMisleading docstring
supports_saycurrently says:"Whether the model supports session.say()".This is misleading because
RealtimeModelimplementations that declaresupports_say=Truetypically only supportsay()whenadd_to_chat_ctx=True. Plugin authors reading this docstring may incorrectly infer they should declaresupports_say=Trueeven when they cannot honoradd_to_chat_ctx=False.(Recommendation: amend the
supports_saydocstring AND add a siblingephemeral_sayfield for the isolation capability — that work is happening in a separate PR.)Proposed docstrings
Verified against actual usage in
agent_activity.py(line numbers from main):Why file this as a separate issue
Documentation-only changes are easy to review and ship cleanly. Bundling them with feature PRs creates scope creep. A standalone
docs:PR (~80 LOC, no functional change, very high acceptance probability) is the right vehicle.Acceptance criteria
RealtimeCapabilitiesfields have docstrings.supports_saydocstring acknowledges theadd_to_chat_ctx=Trueconstraint.Related
ephemeral_sayfield (separately introduced in an upcoming feature PR) will replace the partial guidance insupports_saywith a clean isolation-capability flag.