fix(voice): add PreemptiveGenerationOptions for fine-grained control#5428
fix(voice): add PreemptiveGenerationOptions for fine-grained control#5428
Conversation
Add configurable limits to preemptive generation to reduce wasted LLM requests during long user utterances: - max_speech_duration (default 10s): skip preemptive generation when user has been speaking longer than this threshold - max_retries (default 3): cap speculative LLM requests per user turn, counter resets when the turn completes The preemptive_generation parameter now accepts bool | PreemptiveGenerationOptions, keeping backward compatibility (True/False still works).
| self._cancel_preemptive_generation() | ||
|
|
||
| if ( | ||
| info.started_speaking_at is not None | ||
| and time.time() - info.started_speaking_at > preemptive_opts["max_speech_duration"] | ||
| ): | ||
| return | ||
|
|
||
| if self._preemptive_generation_count >= preemptive_opts["max_retries"]: | ||
| return | ||
|
|
||
| self._preemptive_generation_count += 1 |
There was a problem hiding this comment.
π‘ Existing preemptive generation is cancelled before max_retries check, discarding valid work
In on_preemptive_generation, _cancel_preemptive_generation() is called unconditionally on line 1783 before the max_retries check on line 1791. When _preemptive_generation_count >= max_retries, the method returns early without starting a new generation β but the previous (most recent) preemptive generation has already been cancelled and set to None. This means the last successful preemptive generation is destroyed without replacement. Later, in _user_turn_completed_task at line 1995, self._preemptive_generation is None, so the preemptive result can never be used and a fresh (non-preemptive) LLM call is always made instead. This defeats the purpose of the max_retries limit, which should keep the last generation alive when retries are exhausted.
The fix is to move _cancel_preemptive_generation() after the early-return checks (or at least after the max_retries check), so the existing generation is only cancelled when it will actually be replaced by a new one.
Was this helpful? React with π or π to provide feedback.
There was a problem hiding this comment.
this is expected, on_preemptive_generation is called when user transcript changed, so the previous preemptive generation is invalid, we should cancel it asap.
When False (default), only LLM runs preemptively and TTS starts after the turn is confirmed. When True, both LLM and TTS run immediately on interim transcripts for lowest latency at the cost of TTS compute on discarded attempts.
Move preemptive generation config into turn_handling where it conceptually belongs alongside endpointing and interruption. Add "enabled" field matching the InterruptionOptions pattern, and deprecate the top-level preemptive_generation parameter on AgentSession.
| @property | ||
| def preemptive_generation(self) -> PreemptiveGenerationOptions: | ||
| return self.turn_handling["preemptive_generation"] |
There was a problem hiding this comment.
We don't need to expose it if we expose turn_handling
| @property | |
| def preemptive_generation(self) -> PreemptiveGenerationOptions: | |
| return self.turn_handling["preemptive_generation"] |
There was a problem hiding this comment.
Devin Review found 2 new potential issues.
π 2 issues in files not directly in the diff
π Serialization format break: preemptive_generation changed from bool to dict in report output (livekit-agents/livekit/agents/voice/report.py:66)
self.options.preemptive_generation now returns a PreemptiveGenerationOptions dict (e.g., {"enabled": True, "preemptive_tts": False, "max_speech_duration": 10.0, "max_retries": 3}) instead of a bool. The to_dict() method at livekit-agents/livekit/agents/voice/report.py:66 embeds this value directly into the report dictionary. Any downstream consumer (e.g., LiveKit Cloud) that previously expected a boolean for the "preemptive_generation" key will now receive a nested dict, breaking deserialization or interpretation of the session report.
π Serialization format break: preemptive_generation changed from bool to dict in remote session serialization (livekit-agents/livekit/agents/voice/remote_session.py:331)
str(opts.preemptive_generation) at livekit-agents/livekit/agents/voice/remote_session.py:331 now produces a Python dict string representation like "{'enabled': True, 'preemptive_tts': False, 'max_speech_duration': 10.0, 'max_retries': 3}" instead of "True" or "False". This serialized value is sent to the remote SessionHost via the transport (used at remote_session.py:662). The receiving end likely expects a boolean-like string, so this change breaks the remote session protocol.
View 10 additional findings in Devin Review.
| "max_tool_steps": str(opts.max_tool_steps), | ||
| "user_away_timeout": str(opts.user_away_timeout), | ||
| "preemptive_generation": str(opts.preemptive_generation), | ||
| "preemptive_generation": str(dict(opts.preemptive_generation)), |
There was a problem hiding this comment.
π‘ Existing test mock uses bool for preemptive_generation, but _serialize_options now calls dict() on it
The change from str(opts.preemptive_generation) to str(dict(opts.preemptive_generation)) will crash with TypeError: cannot convert 'bool' object to dict items when called with the mock in tests/test_session_host.py:447, which sets options.preemptive_generation = False. The test_get_session_state test exercises this path via _handle_request(get_session_state) β _serialize_options(self._session.options) at livekit-agents/livekit/agents/voice/remote_session.py:662. The mock needs to be updated to use a PreemptiveGenerationOptions dict (e.g., MagicMock(__iter__=lambda s: iter([])) or a real dict like {"enabled": False}).
Prompt for agents
The test file tests/test_session_host.py at line 447 sets options.preemptive_generation = False (a plain bool). After the change on remote_session.py:331 from str(opts.preemptive_generation) to str(dict(opts.preemptive_generation)), calling dict(False) raises TypeError. The fix should update the mock in tests/test_session_host.py _make_mock_session() to use a dict-like object for preemptive_generation, for example: options.preemptive_generation = {"enabled": False, "preemptive_tts": False, "max_speech_duration": 10.0, "max_retries": 3} (matching PreemptiveGenerationOptions structure), or use MagicMock(__iter__=lambda s: iter([])) like the endpointing and interruption mocks do.
Was this helpful? React with π or π to provide feedback.
Summary
PreemptiveGenerationOptionsTypedDict with configurable options to reduce wasted compute during preemptive generation:preemptive_tts(defaultFalse): whenFalse, only LLM runs preemptively and TTS starts after the turn is confirmedmax_speech_duration(default 10s): skip preemptive generation when user has been speaking too longmax_retries(default 3): cap preemptive LLM requests per user turn, resets on turn completionpreemptive_generationparameter now acceptsbool | PreemptiveGenerationOptions, fully backward compatible