Skip to content

RealtimeCapabilities has 6 undocumented fields + 1 misleading docstring #5565

@cphoward

Description

@cphoward

TL;DR

RealtimeCapabilities has 6 undocumented fields + 1 misleading docstring

Summary

The RealtimeCapabilities dataclass in livekit-agents/livekit/agents/llm/realtime.py has 11 fields. 6 of them have no docstrings, and the supports_say field has a docstring that's misleading post-add_to_chat_ctx parameter introduction.

This is a small documentation-hygiene issue but matters because plugin authors implementing new Realtime backends use this dataclass to declare their capabilities; ambiguous semantics produce wrong True/False decisions.

Undocumented fields

The following 6 fields lack docstrings (all in livekit-agents/livekit/agents/llm/realtime.py, around lines 60-75):

  1. message_truncation: bool
  2. turn_detection: bool
  3. user_transcription: bool
  4. auto_tool_reply_generation: bool
  5. audio_output: bool
  6. manual_function_calls: bool

Misleading docstring

supports_say currently says: "Whether the model supports session.say()".

This is misleading because RealtimeModel implementations that declare supports_say=True typically only support say() when add_to_chat_ctx=True. Plugin authors reading this docstring may incorrectly infer they should declare supports_say=True even when they cannot honor add_to_chat_ctx=False.

(Recommendation: amend the supports_say docstring AND add a sibling ephemeral_say field for the isolation capability — that work is happening in a separate PR.)

Proposed docstrings

Verified against actual usage in agent_activity.py (line numbers from main):

@dataclass
class RealtimeCapabilities:
    message_truncation: bool
    """Whether the substrate supports per-message truncation via a
    truncate() call (e.g., OpenAI's conversation.item.truncate). Used
    during user interruption to cut off in-flight audio responses by
    message_id."""

    turn_detection: bool
    """Whether the substrate provides server-side voice activity
    detection (VAD) for turn-taking. When True, AgentActivity relies on
    the substrate's turn detection events; when False, AgentActivity
    must orchestrate turns client-side."""

    user_transcription: bool
    """Whether the substrate provides server-side transcription of user
    audio. When True, AgentActivity skips client-side STT processing to
    avoid duplicate transcripts."""

    auto_tool_reply_generation: bool
    """Whether the substrate automatically generates a new response
    after tool execution results are submitted. When False,
    AgentActivity must manually trigger a new generation to process
    tool outputs."""

    audio_output: bool
    """Whether the substrate is capable of producing audio output
    directly. When True, AgentActivity expects audio streams in model
    responses; when False, it may fall back to a separate TTS model."""

    manual_function_calls: bool
    """Whether the substrate supports manual function calls injected
    into the chat context. When False, the model may not correctly
    resume or recognize function calls that were not generated by the
    model itself."""

    # ... existing fields ...

    supports_say: bool = False
    """Whether the model supports session.say(). Note: RealtimeModel
    implementations typically only support say() when
    add_to_chat_ctx=True; for ephemeral say() (where the rendered text
    does NOT enter the agent's reasoning context) see ephemeral_say."""

Why file this as a separate issue

Documentation-only changes are easy to review and ship cleanly. Bundling them with feature PRs creates scope creep. A standalone docs: PR (~80 LOC, no functional change, very high acceptance probability) is the right vehicle.

Acceptance criteria

  • All 11 RealtimeCapabilities fields have docstrings.
  • supports_say docstring acknowledges the add_to_chat_ctx=True constraint.
  • Docstrings verified against actual usage sites in the codebase.

Related

  • The proposed ephemeral_say field (separately introduced in an upcoming feature PR) will replace the partial guidance in supports_say with a clean isolation-capability flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions