An interactive AI assistant with real-time voice and text communication, persistent memory, Google Search, and support for camera/screen streaming.
- Real-time Voice — zero-latency conversation via Gemini Live API
- Google Search — AI can search the web during conversation
- Persistent Memory — AI remembers facts about you between sessions
- Modular Tools — add new AI capabilities by dropping a file into
tools/ - Single Config — model, voice, speed, and AI persona in one YAML file
- Docker Support — run in an isolated container
voise/
├── ai_studio_code.py # Main application entrypoint
├── config_utils.py # Config loading, device selection
├── settings/
│ ├── config.yaml # Main config (model, instructions, devices)
│ └── parameters_guide.md # Parameter reference
├── tools/ # Auto-loaded AI tools
│ ├── save_user_memory.py
│ ├── read_user_memory.py
│ └── update_user_memory.py
├── user_store/ # Runtime data (gitignored)
│ ├── memory.md # Persistent AI memory
│ ├── transcript.md # Session transcripts
│ └── debug_log.txt # Debug output (--debug mode)
├── Dockerfile
└── docker-compose.yaml
- Python 3.12+
- A Gemini API key — get one free at aistudio.google.com
-
Install uv:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-
Clone the repo and enter the folder:
git clone <repo-url> cd voise
-
Create
.envfrom the template and add your key:copy .env.example .env notepad .env
Set
GEMINI_API_KEY=your_key_here -
Install dependencies and run:
uv sync uv run python ai_studio_code.py
-
Install Python 3.12+ (check "Add to PATH")
-
Install system dependency for PyAudio:
pip install pipwin pipwin install pyaudio
-
Install the rest:
pip install google-genai opencv-python pillow mss pyyaml python-dotenv
-
Copy
.env.exampleto.envand set your API key, then run:python ai_studio_code.py
-
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Clone the repo and enter the folder:
git clone <repo-url> cd voise
-
Create
.envfrom the template and add your key:cp .env.example .env open -e .env # or: nano .envSet
GEMINI_API_KEY=your_key_here -
Install PortAudio (required by PyAudio):
brew install portaudio
-
Install dependencies and run:
uv sync uv run python ai_studio_code.py
-
Install Python 3.12+ or via Homebrew:
brew install python@3.12
-
Install PortAudio and dependencies:
brew install portaudio pip install google-genai opencv-python pyaudio pillow mss pyyaml python-dotenv
-
Copy
.env.exampleto.envand set your API key, then run:python ai_studio_code.py
# Default — voice only
uv run python ai_studio_code.py
# Stream from camera
uv run python ai_studio_code.py --mode camera
# Stream from screen
uv run python ai_studio_code.py --mode screen
# Enable debug logging to user_store/debug_log.txt
uv run python ai_studio_code.py --debugOn first run, the app will ask you to select your microphone and speaker. The choices are saved to .env automatically.
All settings live in settings/config.yaml. Key options:
| Key | Description |
|---|---|
model.voice |
AI voice: Zephyr, Puck, Aoede, Charon, Fenrir, Kore |
model.speed |
Speech rate: normal, fast, slow |
instructions.personality |
Who the AI is and how it speaks |
instructions.greeting |
First message sent to the AI on startup |
devices.input |
Microphone name (auto-detected on first run) |
devices.output |
Speaker name (auto-detected on first run) |
See settings/parameters_guide.md for the full reference.
Drop a new file into tools/ — it will be auto-loaded on next start.
Each tool file must export:
- A function named after the file (e.g.
my_tool.py→def my_tool(...)) - A
declarationdict with the JSON schema for the Gemini API
Example — tools/get_weather.py:
def get_weather(city: str):
# your implementation
return {"temperature": "22°C", "condition": "Sunny"}
declaration = {
"name": "get_weather",
"description": "Returns current weather for a given city.",
"parameters": {
"type": "OBJECT",
"properties": {
"city": {"type": "STRING", "description": "City name"}
},
"required": ["city"]
}
}# Build
docker build -t voise-app .
# Run with Docker Compose
docker-compose upNote: Audio and video passthrough in Docker requires additional setup (PulseAudio or device permissions on Linux). Native run is recommended for development.