Skip to content

fix(voice): add PreemptiveGenerationOptions for fine-grained control#5428

Merged
longcw merged 7 commits intomainfrom
longc/preemptive-generation-options
Apr 16, 2026
Merged

fix(voice): add PreemptiveGenerationOptions for fine-grained control#5428
longcw merged 7 commits intomainfrom
longc/preemptive-generation-options

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented Apr 13, 2026

Summary

  • Add PreemptiveGenerationOptions TypedDict with configurable options to reduce wasted compute during preemptive generation:
    • preemptive_tts (default False): when False, only LLM runs preemptively and TTS starts after the turn is confirmed
    • max_speech_duration (default 10s): skip preemptive generation when user has been speaking too long
    • max_retries (default 3): cap preemptive LLM requests per user turn, resets on turn completion
  • The preemptive_generation parameter now accepts bool | PreemptiveGenerationOptions, fully backward compatible

Add configurable limits to preemptive generation to reduce wasted LLM
requests during long user utterances:

- max_speech_duration (default 10s): skip preemptive generation when
  user has been speaking longer than this threshold
- max_retries (default 3): cap speculative LLM requests per user turn,
  counter resets when the turn completes

The preemptive_generation parameter now accepts bool | PreemptiveGenerationOptions,
keeping backward compatibility (True/False still works).
@chenghao-mou chenghao-mou requested a review from a team April 13, 2026 06:16
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1783 to +1794
self._cancel_preemptive_generation()

if (
info.started_speaking_at is not None
and time.time() - info.started_speaking_at > preemptive_opts["max_speech_duration"]
):
return

if self._preemptive_generation_count >= preemptive_opts["max_retries"]:
return

self._preemptive_generation_count += 1
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 Existing preemptive generation is cancelled before max_retries check, discarding valid work

In on_preemptive_generation, _cancel_preemptive_generation() is called unconditionally on line 1783 before the max_retries check on line 1791. When _preemptive_generation_count >= max_retries, the method returns early without starting a new generation β€” but the previous (most recent) preemptive generation has already been cancelled and set to None. This means the last successful preemptive generation is destroyed without replacement. Later, in _user_turn_completed_task at line 1995, self._preemptive_generation is None, so the preemptive result can never be used and a fresh (non-preemptive) LLM call is always made instead. This defeats the purpose of the max_retries limit, which should keep the last generation alive when retries are exhausted.

The fix is to move _cancel_preemptive_generation() after the early-return checks (or at least after the max_retries check), so the existing generation is only cancelled when it will actually be replaced by a new one.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected, on_preemptive_generation is called when user transcript changed, so the previous preemptive generation is invalid, we should cancel it asap.

longcw added 3 commits April 15, 2026 20:39
When False (default), only LLM runs preemptively and TTS starts after
the turn is confirmed. When True, both LLM and TTS run immediately on
interim transcripts for lowest latency at the cost of TTS compute on
discarded attempts.
Move preemptive generation config into turn_handling where it
conceptually belongs alongside endpointing and interruption. Add
"enabled" field matching the InterruptionOptions pattern, and deprecate
the top-level preemptive_generation parameter on AgentSession.
devin-ai-integration[bot]

This comment was marked as resolved.

Comment on lines +155 to +157
@property
def preemptive_generation(self) -> PreemptiveGenerationOptions:
return self.turn_handling["preemptive_generation"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to expose it if we expose turn_handling

Suggested change
@property
def preemptive_generation(self) -> PreemptiveGenerationOptions:
return self.turn_handling["preemptive_generation"]

Copy link
Copy Markdown
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

πŸ› 2 issues in files not directly in the diff

πŸ› Serialization format break: preemptive_generation changed from bool to dict in report output (livekit-agents/livekit/agents/voice/report.py:66)

self.options.preemptive_generation now returns a PreemptiveGenerationOptions dict (e.g., {"enabled": True, "preemptive_tts": False, "max_speech_duration": 10.0, "max_retries": 3}) instead of a bool. The to_dict() method at livekit-agents/livekit/agents/voice/report.py:66 embeds this value directly into the report dictionary. Any downstream consumer (e.g., LiveKit Cloud) that previously expected a boolean for the "preemptive_generation" key will now receive a nested dict, breaking deserialization or interpretation of the session report.


πŸ› Serialization format break: preemptive_generation changed from bool to dict in remote session serialization (livekit-agents/livekit/agents/voice/remote_session.py:331)

str(opts.preemptive_generation) at livekit-agents/livekit/agents/voice/remote_session.py:331 now produces a Python dict string representation like "{'enabled': True, 'preemptive_tts': False, 'max_speech_duration': 10.0, 'max_retries': 3}" instead of "True" or "False". This serialized value is sent to the remote SessionHost via the transport (used at remote_session.py:662). The receiving end likely expects a boolean-like string, so this change breaks the remote session protocol.

View 10 additional findings in Devin Review.

Open in Devin Review

@longcw longcw merged commit f5d1b88 into main Apr 16, 2026
22 of 23 checks passed
@longcw longcw deleted the longc/preemptive-generation-options branch April 16, 2026 04:43
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 11 additional findings in Devin Review.

Open in Devin Review

"max_tool_steps": str(opts.max_tool_steps),
"user_away_timeout": str(opts.user_away_timeout),
"preemptive_generation": str(opts.preemptive_generation),
"preemptive_generation": str(dict(opts.preemptive_generation)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 Existing test mock uses bool for preemptive_generation, but _serialize_options now calls dict() on it

The change from str(opts.preemptive_generation) to str(dict(opts.preemptive_generation)) will crash with TypeError: cannot convert 'bool' object to dict items when called with the mock in tests/test_session_host.py:447, which sets options.preemptive_generation = False. The test_get_session_state test exercises this path via _handle_request(get_session_state) β†’ _serialize_options(self._session.options) at livekit-agents/livekit/agents/voice/remote_session.py:662. The mock needs to be updated to use a PreemptiveGenerationOptions dict (e.g., MagicMock(__iter__=lambda s: iter([])) or a real dict like {"enabled": False}).

Prompt for agents
The test file tests/test_session_host.py at line 447 sets options.preemptive_generation = False (a plain bool). After the change on remote_session.py:331 from str(opts.preemptive_generation) to str(dict(opts.preemptive_generation)), calling dict(False) raises TypeError. The fix should update the mock in tests/test_session_host.py _make_mock_session() to use a dict-like object for preemptive_generation, for example: options.preemptive_generation = {"enabled": False, "preemptive_tts": False, "max_speech_duration": 10.0, "max_retries": 3} (matching PreemptiveGenerationOptions structure), or use MagicMock(__iter__=lambda s: iter([])) like the endpointing and interruption mocks do.
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants