Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 22 additions & 36 deletions examples/voice_agent/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Voice Agent

Example application combining this SDK with [LiveKit Agents](https://docs.livekit.io/agents/), enabling bidirectional voice communication with an AI agent. The agent can interact with hardware in response to user requests. Below is an example of a conversation between a user and the agent:
Example application combining this SDK with [LiveKit Agents](https://docs.livekit.io/agents/), enabling bidirectional voice communication with an AI agent from an ESP32. The agent can interact with hardware in response to user requests. Below is an example of a conversation between a user and the agent:

> **User:** What is the current CPU temperature? \
> **Agent:** The CPU temperature is currently 33°C.
Expand All @@ -11,37 +11,48 @@ Example application combining this SDK with [LiveKit Agents](https://docs.liveki
> **User:** Turn on the yellow LED. \
> **Agent:** I'm sorry, the board does not have a yellow LED.

## Structure

The example includes both an ESP32 application for connecting to a LiveKit room, and the definition for the agent the user interacts with:
```txt
.
├── agent/
│ └── agent.py (Agent definition)
└── main/ (ESP32 application)
```

## Requirements

- Software:
- [IDF](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/get-started/index.html) release v5.4 or later
- Python 3.9 or later
- LiveKit Cloud Project
- Sandbox Token Server (created from your cloud project)
- API keys for OpenAI, Deepgram, and Cartesia.
- Hardware
- Dev board: [ESP32-S3-Korvo-2](https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/dev-boards/user-guide-esp32-s3-korvo-2.html)
- Two micro USB cables: one for power, one for flashing
- Mono enclosed speaker (example from [Adafruit](https://www.adafruit.com/product/3351))

## Run example
## Quick start

To run the example on your board, begin in your terminal by navigating to the example's root directory: *[examples/voice_agent](./examples/voice_agent/)*.
> [!NOTE]
> The example comes pre-configured to connect to the agent hosted in
> LiveKit Cloud. If you would like to run the agent yourself, see the the agent's [README](./agent/README.md).

Begin in your terminal by navigating to this example's root directory: *[examples/voice_agent](./examples/voice_agent/)*.

### 1. Configuration

The example requires a network connection and Sandbox ID from your [LiveKit Cloud Project](https://cloud.livekit.io/projects/p_/sandbox/templates/token-server). To configure these settings from your terminal, launch *menuconfig*:
At minimum, the example requires a network connection. To configure Wi-Fi and other settings, launch *menuconfig* from your terminal:
```sh
idf.py menuconfig
```



With *menuconfig* open, navigate to the *LiveKit Example* menu and configure the following settings:

- Network → Wi-Fi SSID
- Network → Wi-Fi password
- Room connection → Sandbox ID

For more information about available options, please refer to [this guide](../README.md#configuration).
For more information about available options, including how to connect to your own LiveKit Cloud project, please refer to [this guide](../README.md#configuration).

### 2. Build & flash

Expand All @@ -58,29 +69,4 @@ Once running on device, the example will establish a network connection and then
I (19508) livekit_example: Room state: connected
```

If you encounter any issues during this process, please refer to the example [troubleshooting guide](../README.md/#troubleshooting).

## Run agent

With the example running on your board, the next step is to run the agent so it can join the room.
Begin by navigating to the agent's source directory in your terminal: *[examples/voice_agent/agent](../voice_agent/agent)*.

In this directory, create a *.env* file containing the required API keys:

```sh
DEEPGRAM_API_KEY=<your Deepgram API Key>
OPENAI_API_KEY=<your OpenAI API Key>
CARTESIA_API_KEY=<your Cartesia API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your server URL>
```

With the API keys in place, download the required files and run the agent in development mode as follows:

```sh
python agent.py download-files
python agent.py dev
```

With the agent running, it will discover and join the room, and you will now be able to engage in two-way conversation. Try asking some of the questions shown above.
Shortly after the room is connected, the agent participant will join and begin speaking. If you encounter any issues during this process, please refer to the example [troubleshooting guide](../README.md/#troubleshooting).
25 changes: 25 additions & 0 deletions examples/voice_agent/agent/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Python artifacts
venv/
__pycache__/

# Local environment & config files
.env
.env.local
.DS_Store

# Logs & temp files
*.log
*.gz
*.tgz
.tmp
.cache

# Docker artifacts
Dockerfile*
.dockerignore

# Git & Editor files
.git
.gitignore
.idea
.vscode
48 changes: 48 additions & 0 deletions examples/voice_agent/agent/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This is an example Dockerfile that builds a minimal container for running LK Agents
# syntax=docker/dockerfile:1
ARG PYTHON_VERSION=3.11.6
FROM python:${PYTHON_VERSION}-slim

# Keeps Python from buffering stdout and stderr to avoid situations where
# the application crashes without emitting any logs due to buffering.
ENV PYTHONUNBUFFERED=1

# Create a non-privileged user that the app will run under.
# See https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user
ARG UID=10001
RUN adduser \
--disabled-password \
--gecos "" \
--home "/home/appuser" \
--shell "/sbin/nologin" \
--uid "${UID}" \
appuser


# Install gcc and other build dependencies.
RUN apt-get update && \
apt-get install -y \
gcc \
python3-dev \
&& rm -rf /var/lib/apt/lists/*

USER appuser

RUN mkdir -p /home/appuser/.cache
RUN chown -R appuser /home/appuser/.cache

WORKDIR /home/appuser

COPY requirements.txt .
RUN python -m pip install --user --no-cache-dir -r requirements.txt

COPY . .

# ensure that any dependent models are downloaded at build-time
RUN python agent.py download-files

# expose healthcheck port
EXPOSE 8081

# Run the application.
CMD ["python","agent.py","start"]
30 changes: 30 additions & 0 deletions examples/voice_agent/agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## Example Agent

Voice agent built with [LiveKit Agents](https://docs.livekit.io/agents/) to be paired with this example. You can use this as a starting point to build your own agents with hardware interaction capabilities.

## Requirements

- Python 3.9 or later
- [LiveKit Cloud](https://cloud.livekit.io/) Project
- Sandbox Token Server (created from your cloud project)
- API key for OpenAI

## Running locally

1. Creating an *.env* file in this directory containing the following keys:

```sh
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your server URL>
OPENAI_API_KEY=<your OpenAI API Key>
```

2. Download the required files and run the agent in development mode:

```sh
python agent.py download-files
python agent.py dev
```

With the agent running, it will automatically join the room with the ESP32.
25 changes: 9 additions & 16 deletions examples/voice_agent/agent/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,8 @@
)
from livekit.plugins import (
openai,
cartesia,
deepgram,
silero,
noise_cancellation
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

# If enabled, RPC calls will not be performed.
TEST_MODE = False
Expand All @@ -31,18 +28,13 @@ class LEDColor(str, Enum):
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful voice AI assistant running on an ESP-32 dev board.
instructions="""You are a helpful voice AI assistant running on an ESP32 dev board.
You answer user's questions about the hardware state and control the hardware based on their requests.
The board has discrete LEDs that can be controlled independently. Each LED has a static color
that cannot be changed. While you are able to set the state of the LEDs, you are not able to read the
state which could be changed without your knowledge. No markdown is allowed in your responses.
"""
)
async def on_enter(self) -> None:
await self.session.say(
"Hi, how can I help you today?",
allow_interruptions=False
)

@function_tool()
async def set_led_state(self, _: RunContext, led: LEDColor, state: bool) -> None:
Expand Down Expand Up @@ -92,16 +84,17 @@ async def get_cpu_temp(self, _: RunContext) -> float:

async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
stt=deepgram.STT(model="nova-3", language="multi"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=cartesia.TTS(model="sonic-2", voice="c99d36f3-5ffd-4253-803a-535c1bc9c306"),
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
llm=openai.realtime.RealtimeModel(
voice="echo",
model="gpt-4o-mini-realtime-preview-2024-12-17"
)
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=RoomInputOptions()
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC()
)
)
await ctx.connect()

Expand Down
5 changes: 3 additions & 2 deletions examples/voice_agent/agent/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
[project]
name = "esp-example-agent"
version = "0.1.0"
description = "Example agent for interacting with the ESP-32 voice chat example."
description = "Example agent for ESP32 voice agent example."
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.13"
dependencies = [
"livekit-agents[cartesia,deepgram,openai,silero,turn-detector]~=1.0",
"livekit-agents[openai]~=1.0",
"livekit-plugins-noise-cancellation~=0.2",
"python-dotenv>=1.1.1"
]
3 changes: 3 additions & 0 deletions examples/voice_agent/agent/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
livekit-agents[openai]~=1.0
livekit-plugins-noise-cancellation~=0.2
python-dotenv>=1.1.1
Loading