Autonomous Responsive Interactive Assistant
ARIA is an AI-powered personal assistant with a 3D avatar that speaks to you. Built on Claude's multi-agent architecture, ARIA combines intelligent conversation routing, real-time voice synthesis, and an expressive avatar to create a natural, interactive AI experience.
█████╗ ██████╗ ██╗ █████╗
██╔══██╗██╔══██╗██║██╔══██╗
███████║██████╔╝██║███████║
██╔══██║██╔══██╗██║██╔══██║
██║ ██║██║ ██║██║██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝╚═╝ ╚═╝
ARIA is a voice-first AI assistant designed to feel like a conversation with a real person. When you speak to ARIA:
- Your voice is transcribed in real-time
- An intelligent Orchestrator routes your request to the right specialized agent
- A specialized Agent (Code, Teach, Research, etc.) generates a response
- ARIA speaks the response through a 3D animated avatar
Unlike text-only chatbots, ARIA is designed for spoken interaction. Responses are concise (2-4 sentences), conversational, and optimized for speech synthesis.
- Multi-Agent Architecture - Specialized agents for coding, teaching, research, and more
- 3D Animated Avatar - TalkingHead WebGL avatar with mood-based expressions
- Browser-Based TTS - HeadTTS/Kokoro voice synthesis runs locally in your browser via WebGPU
- Conversation Memory - Cache Augmented Generation (CAG) maintains context across turns
- Conversation Mixer - Adjust verbosity, creativity, and response style in real-time
- Cost-Optimized - Smart model tiering (Haiku for routing, Sonnet for tasks, Opus for complex reasoning)
┌─────────────────────────────────────────────────────────────────────────┐
│ BROWSER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ TalkingHead │ │ HeadTTS │ │ Whisper │ │
│ │ 3D Avatar │ │ Kokoro │ │ STT │ │
│ │ (WebGL) │ │ (WebGPU) │ │ (OpenAI) │ │
│ └──────┬───────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ WebSocket │
└────────────────────────────┼─────────────────────────────────────────────┘
│
┌────────────────────────────┼─────────────────────────────────────────────┐
│ SERVER │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATOR (Haiku) │ │
│ │ Routes requests to specialized agents │ │
│ └─────────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌───────────┬───────────┬───┴───┬───────────┬───────────┬───────────┐ │
│ ▼ ▼ ▼ ▼ ▼ ▼ │ │
│ ┌──────┐ ┌────────┐ ┌────────┐ ┌──────┐ ┌────────┐ ┌────────┐ │ │
│ │CONVERSE│ │ TEACH │ │ CODE │ │RESEARCH│ │ ASSIST │ │ DEMO │ │ │
│ │Sonnet │ │ Sonnet │ │Sonnet/O│ │Sonnet/O│ │ Sonnet │ │ Sonnet │ │ │
│ └──────┘ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ CAG Conversation Cache │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ Claude API │ │
│ │ (Anthropic) │ │
│ └───────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────┘
| Agent | Model | Purpose |
|---|---|---|
| Orchestrator | Haiku | Fast routing, intent classification |
| Converse | Sonnet | General conversation, chitchat |
| Teach | Sonnet | Educational explanations with analogies |
| Code | Sonnet → Opus | Programming help, code review |
| Research | Sonnet → Opus | Deep analysis, fact-finding |
| Assist | Sonnet | Task planning, productivity |
| Demo | Sonnet | Interactive demonstrations |
Agents automatically escalate to Opus for complex tasks.
- macOS on Apple Silicon (M1/M2/M3/M4) - optimized for native performance
- Python 3.11+
- Anthropic API Key - Get one at console.anthropic.com
- Modern Browser - Chrome, Edge, or Safari with WebGPU support
Note: TTS runs entirely in your browser via WebGPU. No additional audio dependencies required on the server.
git clone https://github.com/yourusername/aria.git
cd ariapython3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtcp .env.example .envEdit .env and add your Anthropic API key:
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
ARIA_HOST=0.0.0.0
ARIA_PORT=8080
LOG_LEVEL=INFOFor microphone access, browsers require HTTPS. Generate self-signed certificates:
mkdir -p certs
openssl req -x509 -newkey rsa:4096 -keyout certs/key.pem -out certs/cert.pem -days 365 -nodes -subj "/CN=localhost"./run.shThen open your browser to: http://localhost:8080
./run.sh # Standard startup
./run.sh --debug # Enable debug logging
./run.sh --reload # Auto-reload on code changes
./run.sh --ssl # HTTPS mode (required for microphone)source venv/bin/activate
python -m src.main --port 8080- TTS Model Download: The first time you load ARIA, the browser will download the Kokoro TTS model (~80MB). This may take 30-60 seconds.
- Microphone Access: To use voice input, you must run with
--ssland accept the self-signed certificate in your browser.
- Click the microphone button or press Space to start speaking
- Speak naturally - ARIA waits for a 1.2-second pause before responding
- ARIA's response appears in the chat and is spoken by the avatar
The equalizer-style mixer panel lets you adjust ARIA's responses in real-time:
| Control | Range | Effect |
|---|---|---|
| Speed | 0.5x - 2x | TTS playback speed |
| Pause | 0 - 4s | Delay before responding |
| Verbosity | 1 - 5 | Response length (brief ↔ detailed) |
| Creativity | 1 - 5 | Response variation (precise ↔ creative) |
| Formality | 1 - 5 | Tone (casual ↔ professional) |
- Quick - Brief, to-the-point answers
- Explain - Balanced educational responses
- Ideate - Creative brainstorming mode
- Deep - Thorough, detailed analysis
ARIA automatically routes your requests:
- "Hey, how's it going?" → Converse (chitchat)
- "Explain how neural networks work" → Teach (education)
- "Write a Python function to..." → Code (programming)
- "What are the pros and cons of..." → Research (analysis)
- "Help me plan my project" → Assist (task planning)
- "Show me what you can do" → Demo (demonstration)
aria/
├── src/
│ ├── main.py # Entry point
│ ├── agents/ # Agent implementations
│ │ ├── base.py # Base agent class
│ │ ├── orchestrator.py # Routing agent
│ │ ├── converse.py # General conversation
│ │ ├── teach.py # Educational explanations
│ │ ├── code.py # Programming assistance
│ │ └── specialized.py # Research, Assist, Demo
│ ├── core/
│ │ ├── aria.py # Main orchestrator
│ │ └── cache/ # CAG conversation cache
│ └── api/
│ └── websocket.py # WebSocket & REST API
│
├── frontend/
│ └── index.html # Single-page web application
│
├── certs/ # SSL certificates (optional)
├── docs/ # Documentation & presentations
├── run.sh # Startup script
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── CLAUDE.md # Project instructions
ws://localhost:8080/ws/chat
// Send a message
{ "type": "message", "content": "Hello ARIA" }
// Reset conversation
{ "type": "reset" }
// Get statistics
{ "type": "stats" }
// Update mixer settings
{
"type": "mixer_settings",
"settings": {
"speed": 1.0,
"pauseTime": 1,
"verbosity": 3,
"creativity": 3,
"formality": 2,
"mode": "quick"
}
}// Thinking indicator
{ "type": "thinking", "agent": "Orchestrator" }
// Response
{
"type": "response",
"content": "Hello! How can I help you today?",
"agent": "Converse",
"mood": "happy",
"tokens": 42
}
// Error
{ "type": "error", "message": "..." }| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Serve main interface |
/health |
GET | Health check |
/api/stats |
GET | Global statistics |
/api/presentations |
GET | List available presentations |
/api/presentation/{filename} |
GET | Serve presentation PDF |
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
required | Your Anthropic API key |
ARIA_HOST |
0.0.0.0 |
Server bind address |
ARIA_PORT |
8080 |
Server port |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
Models are configured in the agent classes:
class ModelTier(Enum):
HAIKU = "claude-3-5-haiku-latest" # Fast, cheap - routing
SONNET = "claude-sonnet-4-20250514" # Balanced - most tasks
OPUS = "claude-opus-4-20250514" # Powerful - complex reasoningEnsure your .env file exists and contains a valid API key:
cat .env | grep ANTHROPIC- Ensure you're using a WebGPU-compatible browser (Chrome 113+, Edge 113+)
- Check the browser console for WebGPU errors
- The first load downloads a ~80MB model - wait for it to complete
- Run ARIA with
--sslflag:./run.sh --ssl - Accept the self-signed certificate warning in your browser
- Grant microphone permissions when prompted
- Check browser console for WebGL errors
- Ensure hardware acceleration is enabled in your browser
- Try refreshing the page
- Check your internet connection to Anthropic's API
- Use "Quick" mode in the mixer for faster responses
- Reduce verbosity slider to get shorter responses
pytest tests/ -v./run.sh --debugThis enables:
- Verbose logging
- Detailed API request/response logs
- WebSocket message tracing
./run.sh --reloadAutomatically restarts the server when Python files change.
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI + Uvicorn | WebSocket API server |
| AI Brain | Claude API (Anthropic) | Multi-agent conversation & reasoning |
| Image Gen | Gemini API (Google) | AI image generation (optional) |
| Image Gen | Local Stable Diffusion | Local image generation (optional) |
| Avatar | TalkingHead | WebGL 3D avatar |
| TTS | HeadTTS / Kokoro | Browser-based voice synthesis (WebGPU) |
| STT | OpenAI Whisper | High-accuracy transcription (default) |
| STT | Web Speech API | Browser-based speech recognition (fallback) |
| Frontend | Vanilla JS | Single-page application |
| API Key | Purpose | Get it at |
|---|---|---|
ANTHROPIC_API_KEY |
Required - Claude AI | console.anthropic.com |
OPENAI_API_KEY |
Recommended - Whisper STT (default, more accurate) | platform.openai.com/api-keys |
GEMINI_API_KEY |
AI image generation (optional) | aistudio.google.com/apikey |
The most protective approach is "All Rights Reserved" with a custom license that explicitly states:
Copyright (c) 2024-2026 Robert A Vance, Jr - a.k.a.,sealmindset. All Rights Reserved.
This software and associated documentation files (the "Software") are the exclusive property of [Your Name].
NO PERMISSION is granted to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software without explicit written permission from the copyright holder.
Commercial use requires a separate licensing agreement and royalty payments. Contact: sealmindset@gmail.com, ravance@gmail.com
- Anthropic - Claude API and multi-agent architecture inspiration
- TalkingHead - WebGL avatar rendering
- Kokoro - Browser-based TTS via WebGPU
- Original PULSE project - Foundation and design patterns
Built with Claude on Apple Silicon