-
Notifications
You must be signed in to change notification settings - Fork 3k
Add Temperature and Model Selection Parameters to xAI Realtime Plugin #4470
Description
Feature Type
Would make my life easier
Feature Description
The current xAI Realtime plugin (livekit.plugins.xai.realtime.RealtimeModel) has the model hardcoded to "grok-4-1-fast-non-reasoning" with no way to configure temperature or switch between available xAI models. This limits flexibility for developers who need deterministic responses or want to use other Grok model variants.
Current Limitation
In livekit/plugins/xai/realtime/realtime_model.py:
super().init(
base_url=base_url if is_given(base_url) else XAI_BASE_URL,
model="grok-4-1-fast-non-reasoning", # Hardcoded - no override option
voice=voice,
api_key=api_key,
# ...
)
Requested Features
- Temperature Parameter
Add a temperature parameter to control response randomness:
class RealtimeModel(openai.realtime.RealtimeModel):
def init(
self,
*,
voice: NotGivenOr[GrokVoices | str | None] = "Ara",
temperature: NotGivenOr[float] = NOT_GIVEN, # NEW: Temperature control
api_key: str | None = None,
# ... other parameters
) -> None:
- Model Selection Parameter
Allow developers to choose between available xAI realtime models:
class RealtimeModel(openai.realtime.RealtimeModel):
def init(
self,
*,
model: str = "grok-4-1-fast-non-reasoning", # NEW: Configurable model
voice: NotGivenOr[GrokVoices | str | None] = "Ara",
temperature: NotGivenOr[float] = NOT_GIVEN,
api_key: str | None = None,
# ... other parameters
) -> None:
Use Cases
Deterministic Medical/Financial Applications: Developers building healthcare or financial voice agents need low temperature (0.1-0.3) for consistent, reliable responses without hallucinations.
Model Experimentation: Access to different Grok model variants (e.g., grok-4-1-fast-reasoning vs grok-4-1-fast-non-reasoning) for performance vs. quality trade-offs.
Creative Applications: Higher temperature settings (0.8-1.2) for conversational AI that needs more varied, creative responses.
Production Control: Ability to tune response behavior without switching to separate STT-LLM-TTS pipeline, maintaining the low-latency benefits of the realtime API.
Proposed API Usage
python
from livekit.agents import AgentSession
from livekit.plugins import xai
#Example 1: Deterministic responses for medical diagnosis
session = AgentSession(
llm=xai.realtime.RealtimeModel(
model="grok-4-1-fast-non-reasoning",
voice="ara",
temperature=0.2, # Low randomness
api_key=api_key,
)
)
#Example 2: Use reasoning model with balanced creativity
session = AgentSession(
llm=xai.realtime.RealtimeModel(
model="grok-4-1-fast-reasoning",
voice="sal",
temperature=0.7, # Balanced
api_key=api_key,
)
)
Implementation Notes
The temperature and model parameters should be passed through to the xAI Realtime API session configuration:
def _create_session_update_event(self) -> SessionUpdateEvent:
event = super()._create_session_update_event()
#Add temperature if provided
if is_given(self._temperature):
event["session"]["temperature"] = self._temperature
return event
Compatibility
This change would maintain backward compatibility since:
Default model remains "grok-4-1-fast-non-reasoning"
Temperature defaults to NOT_GIVEN (uses API default)
Existing code continues to work without modification
References
xAI API documentation: https://docs.x.ai/docs/guides/voice/agent
xAI supports temperature parameter (0-2 range) in text models
OpenAI Realtime plugin already supports temperature configuration
Workarounds / Alternatives
Currently, developers must use separate STT-LLM-TTS pipeline to control temperature:
#Current workaround - loses realtime API benefits
session = AgentSession(
stt=deepgram.STT(model="nova-3"),
llm=xai.LLM(model="grok-4-1-fast-non-reasoning", temperature=0.2),
tts=cartesia.TTS(voice="sonic-3"),
)
This workaround sacrifices the low-latency benefits and integrated speech-to-speech capabilities of the xAI Realtime API.
Additional Context
No response