livekit · ladvoc · Jul 23, 2025 · Jul 24, 2025 · Jul 24, 2025 · Jul 24, 2025
diff --git a/examples/voice_agent/README.md b/examples/voice_agent/README.md
@@ -1,6 +1,6 @@
 # Voice Agent
 
-Example application combining this SDK with [LiveKit Agents](https://docs.livekit.io/agents/), enabling bidirectional voice communication with an AI agent. The agent can interact with hardware in response to user requests. Below is an example of a conversation between a user and the agent:
+Example application combining this SDK with [LiveKit Agents](https://docs.livekit.io/agents/), enabling bidirectional voice communication with an AI agent from an ESP32. The agent can interact with hardware in response to user requests. Below is an example of a conversation between a user and the agent:
 
 > **User:** What is the current CPU temperature? \
 > **Agent:** The CPU temperature is currently 33°C.
@@ -11,37 +11,48 @@ Example application combining this SDK with [LiveKit Agents](https://docs.liveki
 > **User:** Turn on the yellow LED. \
 > **Agent:** I'm sorry, the board does not have a yellow LED.
 
+## Structure
+
+The example includes both an ESP32 application for connecting to a LiveKit room, and the definition for the agent the user interacts with:
+```txt
+.
+├── agent/
+│   └── agent.py (Agent definition)
+└── main/ (ESP32 application)
+```
+
 ## Requirements
 
 - Software:
     - [IDF](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/get-started/index.html) release v5.4 or later
-    - Python 3.9 or later
-    - LiveKit Cloud Project
-    - Sandbox Token Server (created from your cloud project)
-    - API keys for OpenAI, Deepgram, and Cartesia.
 - Hardware
     - Dev board: [ESP32-S3-Korvo-2](https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/dev-boards/user-guide-esp32-s3-korvo-2.html)
     - Two micro USB cables: one for power, one for flashing
     - Mono enclosed speaker (example from [Adafruit](https://www.adafruit.com/product/3351))
 
-## Run example
+## Quick start
 
-To run the example on your board, begin in your terminal by navigating to the example's root directory: *[examples/voice_agent](./examples/voice_agent/)*.
+> [!NOTE]
+> The example comes pre-configured to connect to the agent hosted in
+> LiveKit Cloud. If you would like to run the agent yourself, see the the agent's [README](./agent/README.md).
+
+Begin in your terminal by navigating to this example's root directory: *[examples/voice_agent](./examples/voice_agent/)*.
 
 ### 1. Configuration
 
-The example requires a network connection and Sandbox ID from your [LiveKit Cloud Project](https://cloud.livekit.io/projects/p_/sandbox/templates/token-server). To configure these settings from your terminal, launch *menuconfig*:
+At minimum, the example requires a network connection. To configure Wi-Fi and other settings, launch *menuconfig* from your terminal:
 ```sh
 idf.py menuconfig
 ```
 
+
+
 With *menuconfig* open, navigate to the *LiveKit Example* menu and configure the following settings:
 
 - Network → Wi-Fi SSID
 - Network → Wi-Fi password
-- Room connection → Sandbox ID
 
-For more information about available options, please refer to [this guide](../README.md#configuration).
+For more information about available options, including how to connect to your own LiveKit Cloud project, please refer to [this guide](../README.md#configuration).
 
 ### 2. Build & flash
 
@@ -58,29 +69,4 @@ Once running on device, the example will establish a network connection and then
 I (19508) livekit_example: Room state: connected
 ```
 
-If you encounter any issues during this process, please refer to the example [troubleshooting guide](../README.md/#troubleshooting).
-
-## Run agent
-
-With the example running on your board, the next step is to run the agent so it can join the room.
-Begin by navigating to the agent's source directory in your terminal: *[examples/voice_agent/agent](../voice_agent/agent)*.
-
-In this directory, create a *.env* file containing the required API keys:
-
-```sh
-DEEPGRAM_API_KEY=<your Deepgram API Key>
-OPENAI_API_KEY=<your OpenAI API Key>
-CARTESIA_API_KEY=<your Cartesia API Key>
-LIVEKIT_API_KEY=<your API Key>
-LIVEKIT_API_SECRET=<your API Secret>
-LIVEKIT_URL=<your server URL>
-```
-
-With the API keys in place, download the required files and run the agent in development mode as follows:
-
-```sh
-python agent.py download-files
-python agent.py dev
-```
-
-With the agent running, it will discover and join the room, and you will now be able to engage in two-way conversation. Try asking some of the questions shown above.
+Shortly after the room is connected, the agent participant will join and begin speaking. If you encounter any issues during this process, please refer to the example [troubleshooting guide](../README.md/#troubleshooting).
diff --git a/examples/voice_agent/agent/.dockerignore b/examples/voice_agent/agent/.dockerignore
@@ -0,0 +1,25 @@
+# Python artifacts
+venv/
+__pycache__/
+
+# Local environment & config files
+.env
+.env.local
+.DS_Store
+
+# Logs & temp files
+*.log
+*.gz
+*.tgz
+.tmp
+.cache
+
+# Docker artifacts
+Dockerfile*
+.dockerignore
+
+# Git & Editor files
+.git
+.gitignore
+.idea
+.vscode
diff --git a/examples/voice_agent/agent/Dockerfile b/examples/voice_agent/agent/Dockerfile
@@ -0,0 +1,48 @@
+# This is an example Dockerfile that builds a minimal container for running LK Agents
+# syntax=docker/dockerfile:1
+ARG PYTHON_VERSION=3.11.6
+FROM python:${PYTHON_VERSION}-slim
+
+# Keeps Python from buffering stdout and stderr to avoid situations where
+# the application crashes without emitting any logs due to buffering.
+ENV PYTHONUNBUFFERED=1
+
+# Create a non-privileged user that the app will run under.
+# See https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user
+ARG UID=10001
+RUN adduser \
+    --disabled-password \
+    --gecos "" \
+    --home "/home/appuser" \
+    --shell "/sbin/nologin" \
+    --uid "${UID}" \
+    appuser
+
+
+# Install gcc and other build dependencies.
+RUN apt-get update && \
+    apt-get install -y \
+    gcc \
+    python3-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+USER appuser
+
+RUN mkdir -p /home/appuser/.cache
+RUN chown -R appuser /home/appuser/.cache
+
+WORKDIR /home/appuser
+
+COPY requirements.txt .
+RUN python -m pip install --user --no-cache-dir -r requirements.txt
+
+COPY . .
+
+# ensure that any dependent models are downloaded at build-time
+RUN python agent.py download-files
+
+# expose healthcheck port
+EXPOSE 8081
+
+# Run the application.
+CMD ["python","agent.py","start"]
diff --git a/examples/voice_agent/agent/README.md b/examples/voice_agent/agent/README.md
@@ -0,0 +1,30 @@
+## Example Agent
+
+Voice agent built with [LiveKit Agents](https://docs.livekit.io/agents/) to be paired with this example. You can use this as a starting point to build your own agents with hardware interaction capabilities.
+
+## Requirements
+
+- Python 3.9 or later
+- [LiveKit Cloud](https://cloud.livekit.io/) Project
+- Sandbox Token Server (created from your cloud project)
+- API key for OpenAI
+
+## Running locally
+
+1. Creating an *.env* file in this directory containing the following keys:
+
+```sh
+LIVEKIT_API_KEY=<your API Key>
+LIVEKIT_API_SECRET=<your API Secret>
+LIVEKIT_URL=<your server URL>
+OPENAI_API_KEY=<your OpenAI API Key>
+```
+
+2. Download the required files and run the agent in development mode:
+
+```sh
+python agent.py download-files
+python agent.py dev
+```
+
+With the agent running, it will automatically join the room with the ESP32.
diff --git a/examples/voice_agent/agent/agent.py b/examples/voice_agent/agent/agent.py
@@ -13,11 +13,8 @@
 )
 from livekit.plugins import (
     openai,
-    cartesia,
-    deepgram,
-    silero,
+    noise_cancellation
 )
-from livekit.plugins.turn_detector.multilingual import MultilingualModel
 
 # If enabled, RPC calls will not be performed.
 TEST_MODE = False
@@ -31,18 +28,13 @@ class LEDColor(str, Enum):
 class Assistant(Agent):
     def __init__(self) -> None:
         super().__init__(
-            instructions="""You are a helpful voice AI assistant running on an ESP-32 dev board.
+            instructions="""You are a helpful voice AI assistant running on an ESP32 dev board.
             You answer user's questions about the hardware state and control the hardware based on their requests.
             The board has discrete LEDs that can be controlled independently. Each LED has a static color
             that cannot be changed. While you are able to set the state of the LEDs, you are not able to read the
             state which could be changed without your knowledge. No markdown is allowed in your responses.
             """
         )
-    async def on_enter(self) -> None:
-        await self.session.say(
-            "Hi, how can I help you today?",
-            allow_interruptions=False
-        )
 
     @function_tool()
     async def set_led_state(self, _: RunContext, led: LEDColor, state: bool) -> None:
@@ -92,16 +84,17 @@ async def get_cpu_temp(self, _: RunContext) -> float:
 
 async def entrypoint(ctx: agents.JobContext):
     session = AgentSession(
-        stt=deepgram.STT(model="nova-3", language="multi"),
-        llm=openai.LLM(model="gpt-4o-mini"),
-        tts=cartesia.TTS(model="sonic-2", voice="c99d36f3-5ffd-4253-803a-535c1bc9c306"),
-        vad=silero.VAD.load(),
-        turn_detection=MultilingualModel(),
+        llm=openai.realtime.RealtimeModel(
+            voice="echo",
+            model="gpt-4o-mini-realtime-preview-2024-12-17"
+        )
     )
     await session.start(
         room=ctx.room,
         agent=Assistant(),
-        room_input_options=RoomInputOptions()
+        room_input_options=RoomInputOptions(
+            noise_cancellation=noise_cancellation.BVC()
+        )
     )
     await ctx.connect()
 

diff --git a/examples/voice_agent/agent/pyproject.toml b/examples/voice_agent/agent/pyproject.toml
@@ -1,11 +1,12 @@
 [project]
 name = "esp-example-agent"
 version = "0.1.0"
-description = "Example agent for interacting with the ESP-32 voice chat example."
+description = "Example agent for ESP32 voice agent example."
 readme = "README.md"
 license = {text = "MIT"}
 requires-python = ">=3.13"
 dependencies = [
-    "livekit-agents[cartesia,deepgram,openai,silero,turn-detector]~=1.0",
+    "livekit-agents[openai]~=1.0",
+    "livekit-plugins-noise-cancellation~=0.2",
     "python-dotenv>=1.1.1"
 ]
diff --git a/examples/voice_agent/agent/requirements.txt b/examples/voice_agent/agent/requirements.txt
@@ -0,0 +1,3 @@
+livekit-agents[openai]~=1.0
+livekit-plugins-noise-cancellation~=0.2
+python-dotenv>=1.1.1