A modular, extensible AI voice assistant with personality β inspired by Iron Man's Jarvis.
- π€ Voice Interaction: Wake word detection β Speech-to-Text β LLM β Text-to-Speech
- π Configurable Personality: Adjustable sarcasm, wit, formality, and warmth levels
- π Pluggable Architecture: Easily swap TTS, STT, and LLM providers
- π Smart Home Ready: Extensible workflow system for home automation
- π Cross-Platform: Works on macOS, Linux, and Windows
cd jarvis-assistant
pip install -r requirements.txtCreate a .env file or export environment variables:
# Required
export ANTHROPIC_API_KEY="your-anthropic-api-key"
# For ElevenLabs TTS (recommended)
export ELEVENLABS_API_KEY="your-elevenlabs-api-key"
# For wake word detection (free at picovoice.ai)
export PORCUPINE_ACCESS_KEY="your-porcupine-access-key"
# Optional - Home Assistant integration
export HASS_URL="http://homeassistant.local:8123"
export HASS_TOKEN="your-long-lived-access-token"# Normal operation (wake word + voice)
python main.py
# Debug mode (shows processing details)
python main.py --debug
# Keyboard activation (no wake word needed)
python main.py --keyboard
# Test with text (no voice required)
python main.py --test "What's the weather like?"Edit config/settings.py or modify at runtime:
from config import PersonalityConfig, SarcasmLevel, FormalityLevel
personality = PersonalityConfig(
name="Jarvis",
user_title="sir",
sarcasm_level=SarcasmLevel.MODERATE, # NONE, LIGHT, MODERATE, HEAVY, MAXIMUM
formality_level=FormalityLevel.BUTLER, # CASUAL, FRIENDLY, PROFESSIONAL, FORMAL, BUTLER
warmth_level=WarmthLevel.WARM, # COLD, NEUTRAL, WARM, AFFECTIONATE
wit_enabled=True,
self_aware_ai_jokes=True,
use_british_vocabulary=True,
)TTS Providers:
from config import TTSConfig
# ElevenLabs (best quality)
tts_config = TTSConfig(provider="elevenlabs")
# OpenAI TTS
tts_config = TTSConfig(provider="openai")
# Piper (local, free)
tts_config = TTSConfig(provider="piper")
# System TTS (no setup required)
tts_config = TTSConfig(provider="system")STT Providers:
from config import STTConfig
# Whisper (local)
stt_config = STTConfig(provider="whisper", whisper_model="base")
# Whisper API (cloud)
stt_config = STTConfig(provider="whisper_api")
# Vosk (local, lightweight)
stt_config = STTConfig(provider="vosk")
# Deepgram (cloud, fast)
stt_config = STTConfig(provider="deepgram")LLM Providers:
from config import LLMConfig
# Anthropic Claude (recommended)
llm_config = LLMConfig(provider="anthropic")
# OpenAI GPT
llm_config = LLMConfig(provider="openai")
# Ollama (local)
llm_config = LLMConfig(provider="ollama", ollama_model="llama3.1")The workflow system allows you to add new capabilities. Here's how to add a custom doorbell integration:
# workflows/my_doorbell.py
from workflows.base import Workflow, WorkflowResult, WorkflowStatus, WorkflowTrigger
class MyDoorbellWorkflow(Workflow):
def __init__(self, doorbell_api):
self.api = doorbell_api
@property
def name(self) -> str:
return "my_doorbell"
@property
def description(self) -> str:
return "Check doorbell camera and control door lock"
@property
def trigger(self) -> WorkflowTrigger:
return WorkflowTrigger(
keywords=["door", "doorbell", "visitor", "lock"],
patterns=[r"who.*(at|the) door", r"(lock|unlock)"],
examples=["Who's at the door?", "Lock the front door"]
)
async def execute(self, intent: str, entities: dict) -> WorkflowResult:
action = entities.get("action", "check")
if action == "check":
# Call your doorbell API
snapshot = await self.api.get_snapshot()
return WorkflowResult(
status=WorkflowStatus.SUCCESS,
message="I'm checking the door camera now, sir.",
data={"snapshot": snapshot}
)
elif action == "unlock":
await self.api.unlock_door()
return WorkflowResult(
status=WorkflowStatus.SUCCESS,
message="I've unlocked the door, sir. Do exercise caution."
)
return WorkflowResult(
status=WorkflowStatus.SUCCESS,
message="Door action completed, sir."
)# In main.py or your setup code
from workflows.my_doorbell import MyDoorbellWorkflow
workflow_manager = create_default_workflow_manager()
workflow_manager.register(MyDoorbellWorkflow(my_doorbell_api))
assistant = VoiceAssistant(config, workflow_manager)jarvis-assistant/
βββ main.py # Entry point
βββ requirements.txt # Dependencies
βββ config/
β βββ __init__.py
β βββ settings.py # All configuration dataclasses
βββ core/
β βββ __init__.py
β βββ assistant.py # Main VoiceAssistant class
βββ tts/
β βββ __init__.py
β βββ providers.py # TTS provider implementations
βββ stt/
β βββ __init__.py
β βββ providers.py # STT provider implementations
βββ llm/
β βββ __init__.py
β βββ providers.py # LLM providers + personality prompts
βββ workflows/
β βββ __init__.py
β βββ base.py # Workflow base classes + examples
β βββ home_assistant.py # Home Assistant integration
βββ utils/
βββ __init__.py
βββ audio.py # Audio recording/playback
βββ wakeword.py # Wake word detection
- Best: ReSpeaker USB Mic Array v2.0 (~$70) - far-field, LED ring
- Budget: Anker PowerConf S3 (~$80) - mic + speaker combo
- Testing: Any USB microphone
- Best: Audioengine A2+ (~$270) - excellent quality
- Budget: Creative Pebble V3 (~$30) - USB powered
- Works on any modern computer
- Mac Mini M4 recommended for always-on use
- Raspberry Pi 5 works for cloud-based LLM
pip install sounddevice
# On Linux, you may also need:
sudo apt-get install libportaudio2- Check
PORCUPINE_ACCESS_KEYis set - Use
--keyboardflag to test without wake word - Verify microphone with
python main.py --list-devices
- Check
ELEVENLABS_API_KEYis set - Falls back to system TTS automatically
- Test with
python main.py --test "Hello world"
- Increase buffer size in
utils/audio.py - Try different sample rates
- Check CPU usage
MIT License - feel free to use and modify for personal projects.
This is a personal project template. Feel free to fork and customize!
"At your service, sir." β Jarvis