Skip to content

sealmindset/aria

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,357 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ARIA

Autonomous Responsive Interactive Assistant

ARIA is an AI-powered personal assistant with a 3D avatar that speaks to you. Built on Claude's multi-agent architecture, ARIA combines intelligent conversation routing, real-time voice synthesis, and an expressive avatar to create a natural, interactive AI experience.

    █████╗ ██████╗ ██╗ █████╗
   ██╔══██╗██╔══██╗██║██╔══██╗
   ███████║██████╔╝██║███████║
   ██╔══██║██╔══██╗██║██╔══██║
   ██║  ██║██║  ██║██║██║  ██║
   ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝╚═╝  ╚═╝

What is ARIA?

ARIA is a voice-first AI assistant designed to feel like a conversation with a real person. When you speak to ARIA:

  1. Your voice is transcribed in real-time
  2. An intelligent Orchestrator routes your request to the right specialized agent
  3. A specialized Agent (Code, Teach, Research, etc.) generates a response
  4. ARIA speaks the response through a 3D animated avatar

Unlike text-only chatbots, ARIA is designed for spoken interaction. Responses are concise (2-4 sentences), conversational, and optimized for speech synthesis.

Key Features

  • Multi-Agent Architecture - Specialized agents for coding, teaching, research, and more
  • 3D Animated Avatar - TalkingHead WebGL avatar with mood-based expressions
  • Browser-Based TTS - HeadTTS/Kokoro voice synthesis runs locally in your browser via WebGPU
  • Conversation Memory - Cache Augmented Generation (CAG) maintains context across turns
  • Conversation Mixer - Adjust verbosity, creativity, and response style in real-time
  • Cost-Optimized - Smart model tiering (Haiku for routing, Sonnet for tasks, Opus for complex reasoning)

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              BROWSER                                     │
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐                    │
│  │  TalkingHead │   │   HeadTTS   │   │   Whisper   │                    │
│  │  3D Avatar   │   │   Kokoro    │   │    STT      │                    │
│  │   (WebGL)    │   │  (WebGPU)   │   │  (OpenAI)   │                    │
│  └──────┬───────┘   └──────┬──────┘   └──────┬──────┘                    │
│         │                  │                  │                          │
│         └──────────────────┼──────────────────┘                          │
│                            │ WebSocket                                   │
└────────────────────────────┼─────────────────────────────────────────────┘
                             │
┌────────────────────────────┼─────────────────────────────────────────────┐
│                         SERVER                                            │
│                            ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐ │
│  │                     ORCHESTRATOR (Haiku)                             │ │
│  │              Routes requests to specialized agents                   │ │
│  └─────────────────────────────┬───────────────────────────────────────┘ │
│                                │                                          │
│    ┌───────────┬───────────┬───┴───┬───────────┬───────────┬───────────┐ │
│    ▼           ▼           ▼       ▼           ▼           ▼           │ │
│ ┌──────┐  ┌────────┐  ┌────────┐ ┌──────┐  ┌────────┐  ┌────────┐      │ │
│ │CONVERSE│  │ TEACH  │  │  CODE  │ │RESEARCH│ │ ASSIST │  │  DEMO  │    │ │
│ │Sonnet │  │ Sonnet │  │Sonnet/O│ │Sonnet/O│ │ Sonnet │  │ Sonnet │    │ │
│ └──────┘  └────────┘  └────────┘ └────────┘ └────────┘  └────────┘      │ │
│                                │                                          │
│                    ┌───────────┴───────────┐                             │
│                    │  CAG Conversation Cache │                            │
│                    └───────────┬───────────┘                             │
│                                │                                          │
│                    ┌───────────┴───────────┐                             │
│                    │      Claude API       │                             │
│                    │      (Anthropic)       │                             │
│                    └───────────────────────┘                             │
└───────────────────────────────────────────────────────────────────────────┘

Agents

Agent Model Purpose
Orchestrator Haiku Fast routing, intent classification
Converse Sonnet General conversation, chitchat
Teach Sonnet Educational explanations with analogies
Code Sonnet → Opus Programming help, code review
Research Sonnet → Opus Deep analysis, fact-finding
Assist Sonnet Task planning, productivity
Demo Sonnet Interactive demonstrations

Agents automatically escalate to Opus for complex tasks.


Requirements

  • macOS on Apple Silicon (M1/M2/M3/M4) - optimized for native performance
  • Python 3.11+
  • Anthropic API Key - Get one at console.anthropic.com
  • Modern Browser - Chrome, Edge, or Safari with WebGPU support

Note: TTS runs entirely in your browser via WebGPU. No additional audio dependencies required on the server.


Installation

1. Clone the Repository

git clone https://github.com/yourusername/aria.git
cd aria

2. Create Virtual Environment

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

3. Configure Environment

cp .env.example .env

Edit .env and add your Anthropic API key:

ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
ARIA_HOST=0.0.0.0
ARIA_PORT=8080
LOG_LEVEL=INFO

4. (Optional) Generate SSL Certificates

For microphone access, browsers require HTTPS. Generate self-signed certificates:

mkdir -p certs
openssl req -x509 -newkey rsa:4096 -keyout certs/key.pem -out certs/cert.pem -days 365 -nodes -subj "/CN=localhost"

Running ARIA

Quick Start

./run.sh

Then open your browser to: http://localhost:8080

Startup Options

./run.sh              # Standard startup
./run.sh --debug      # Enable debug logging
./run.sh --reload     # Auto-reload on code changes
./run.sh --ssl        # HTTPS mode (required for microphone)

Manual Startup

source venv/bin/activate
python -m src.main --port 8080

First Run Notes

  • TTS Model Download: The first time you load ARIA, the browser will download the Kokoro TTS model (~80MB). This may take 30-60 seconds.
  • Microphone Access: To use voice input, you must run with --ssl and accept the self-signed certificate in your browser.

Usage

Speaking to ARIA

  1. Click the microphone button or press Space to start speaking
  2. Speak naturally - ARIA waits for a 1.2-second pause before responding
  3. ARIA's response appears in the chat and is spoken by the avatar

Conversation Mixer

The equalizer-style mixer panel lets you adjust ARIA's responses in real-time:

Control Range Effect
Speed 0.5x - 2x TTS playback speed
Pause 0 - 4s Delay before responding
Verbosity 1 - 5 Response length (brief ↔ detailed)
Creativity 1 - 5 Response variation (precise ↔ creative)
Formality 1 - 5 Tone (casual ↔ professional)

Mode Presets

  • Quick - Brief, to-the-point answers
  • Explain - Balanced educational responses
  • Ideate - Creative brainstorming mode
  • Deep - Thorough, detailed analysis

Agent Routing

ARIA automatically routes your requests:

  • "Hey, how's it going?"Converse (chitchat)
  • "Explain how neural networks work"Teach (education)
  • "Write a Python function to..."Code (programming)
  • "What are the pros and cons of..."Research (analysis)
  • "Help me plan my project"Assist (task planning)
  • "Show me what you can do"Demo (demonstration)

Project Structure

aria/
├── src/
│   ├── main.py              # Entry point
│   ├── agents/              # Agent implementations
│   │   ├── base.py          # Base agent class
│   │   ├── orchestrator.py  # Routing agent
│   │   ├── converse.py      # General conversation
│   │   ├── teach.py         # Educational explanations
│   │   ├── code.py          # Programming assistance
│   │   └── specialized.py   # Research, Assist, Demo
│   ├── core/
│   │   ├── aria.py          # Main orchestrator
│   │   └── cache/           # CAG conversation cache
│   └── api/
│       └── websocket.py     # WebSocket & REST API
│
├── frontend/
│   └── index.html           # Single-page web application
│
├── certs/                   # SSL certificates (optional)
├── docs/                    # Documentation & presentations
├── run.sh                   # Startup script
├── requirements.txt         # Python dependencies
├── .env.example             # Environment template
└── CLAUDE.md                # Project instructions

API Reference

WebSocket Endpoint

ws://localhost:8080/ws/chat

Client → Server Messages

// Send a message
{ "type": "message", "content": "Hello ARIA" }

// Reset conversation
{ "type": "reset" }

// Get statistics
{ "type": "stats" }

// Update mixer settings
{
  "type": "mixer_settings",
  "settings": {
    "speed": 1.0,
    "pauseTime": 1,
    "verbosity": 3,
    "creativity": 3,
    "formality": 2,
    "mode": "quick"
  }
}

Server → Client Messages

// Thinking indicator
{ "type": "thinking", "agent": "Orchestrator" }

// Response
{
  "type": "response",
  "content": "Hello! How can I help you today?",
  "agent": "Converse",
  "mood": "happy",
  "tokens": 42
}

// Error
{ "type": "error", "message": "..." }

REST Endpoints

Endpoint Method Description
/ GET Serve main interface
/health GET Health check
/api/stats GET Global statistics
/api/presentations GET List available presentations
/api/presentation/{filename} GET Serve presentation PDF

Configuration

Environment Variables

Variable Default Description
ANTHROPIC_API_KEY required Your Anthropic API key
ARIA_HOST 0.0.0.0 Server bind address
ARIA_PORT 8080 Server port
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

Model Configuration

Models are configured in the agent classes:

class ModelTier(Enum):
    HAIKU = "claude-3-5-haiku-latest"   # Fast, cheap - routing
    SONNET = "claude-sonnet-4-20250514" # Balanced - most tasks
    OPUS = "claude-opus-4-20250514"     # Powerful - complex reasoning

Troubleshooting

"ANTHROPIC_API_KEY not set"

Ensure your .env file exists and contains a valid API key:

cat .env | grep ANTHROPIC

No Sound / TTS Not Working

  1. Ensure you're using a WebGPU-compatible browser (Chrome 113+, Edge 113+)
  2. Check the browser console for WebGPU errors
  3. The first load downloads a ~80MB model - wait for it to complete

Microphone Not Working

  1. Run ARIA with --ssl flag: ./run.sh --ssl
  2. Accept the self-signed certificate warning in your browser
  3. Grant microphone permissions when prompted

Avatar Not Displaying

  1. Check browser console for WebGL errors
  2. Ensure hardware acceleration is enabled in your browser
  3. Try refreshing the page

High Latency

  1. Check your internet connection to Anthropic's API
  2. Use "Quick" mode in the mixer for faster responses
  3. Reduce verbosity slider to get shorter responses

Development

Running Tests

pytest tests/ -v

Debug Mode

./run.sh --debug

This enables:

  • Verbose logging
  • Detailed API request/response logs
  • WebSocket message tracing

Auto-Reload

./run.sh --reload

Automatically restarts the server when Python files change.


Technology Stack

Component Technology Purpose
Backend FastAPI + Uvicorn WebSocket API server
AI Brain Claude API (Anthropic) Multi-agent conversation & reasoning
Image Gen Gemini API (Google) AI image generation (optional)
Image Gen Local Stable Diffusion Local image generation (optional)
Avatar TalkingHead WebGL 3D avatar
TTS HeadTTS / Kokoro Browser-based voice synthesis (WebGPU)
STT OpenAI Whisper High-accuracy transcription (default)
STT Web Speech API Browser-based speech recognition (fallback)
Frontend Vanilla JS Single-page application

API Keys

API Key Purpose Get it at
ANTHROPIC_API_KEY Required - Claude AI console.anthropic.com
OPENAI_API_KEY Recommended - Whisper STT (default, more accurate) platform.openai.com/api-keys
GEMINI_API_KEY AI image generation (optional) aistudio.google.com/apikey

License

The most protective approach is "All Rights Reserved" with a custom license that explicitly states:

Copyright (c) 2024-2026 Robert A Vance, Jr - a.k.a.,sealmindset. All Rights Reserved.

This software and associated documentation files (the "Software") are the exclusive property of [Your Name].

NO PERMISSION is granted to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software without explicit written permission from the copyright holder.

Commercial use requires a separate licensing agreement and royalty payments. Contact: sealmindset@gmail.com, ravance@gmail.com


Acknowledgments

  • Anthropic - Claude API and multi-agent architecture inspiration
  • TalkingHead - WebGL avatar rendering
  • Kokoro - Browser-based TTS via WebGPU
  • Original PULSE project - Foundation and design patterns

Built with Claude on Apple Silicon

About

ARIA, your AI assistant. I can teach you about AI, demonstrate applications, help with research, write code, or just chat.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors