A fully offline, voice-activated AI assistant with a British personality. You talk, it listens, it responds. No cloud, no API keys, no internet required.
It started as a Whisper experiment. It got out of hand.
- 🎤 Hold
SPACEto talk, release to process - 🧠 Whisper transcribes your voice locally
- 💬 Mistral 7B (via Ollama) generates a response
- 🔊 Edge TTS reads it back in a British accent
- 🖥️ Arc Reactor desktop UI built in Svelte + Tauri
Everything runs on your machine. No subscriptions. No data leaving your laptop.
I wanted to learn Whisper. That was it. That was the whole plan.
Then I got curious about TTS voices. Then local LLMs. Then I thought it'd be funny to make it sound like Jarvis from Iron Man. Then I figured if I'd already spent this long on it, I may as well give it a proper UI. (UI was all Claude)
It's not trying to be anything groundbreaking. It's a personal project I built to use in places without reliable Wi-Fi — libraries, cafés, airports — when I just want to ask something quickly and get a sensible answer back.
Three TTS libraries. Days of environment conflicts. Zero working voices I actually liked.
- Piper — never got it running
- Coqui TTS — ran fine, sounded rough
- XTTS v2 — version hell, never resolved
Eventually switched to Edge TTS. Took ten minutes. Sounded great. Lesson learned.
System dependencies
Python
pip install -r requirements.txt# 1. Pull the model
ollama pull mistral:7b
# 2. Install Python deps
pip install -r requirements.txt
# 3. Install frontend deps
cd jarvis-ui && npm installTwo terminals. Both need to be open.
# Terminal 1
python jarvis_server.py
# Terminal 2
cd jarvis-ui && npm run tauri devHold SPACE to speak. Release to process. ESC to quit.
First run of
npm run tauri devtakes 5–10 minutes while Rust compiles. Every run after that is fast.
SPACE held → mic records via sounddevice
SPACE released → Whisper transcribes (low-confidence audio gets discarded)
→ Mistral generates a short response via Ollama
→ Edge TTS speaks it back
→ WebSocket pushes state + transcript to the UI in real time
- No internet awareness — Mistral's knowledge cuts off early 2024
- Responses are capped short by design — not ideal for deep questions
- No memory between sessions — it forgets everything the moment you close it
- macOS only for now — uses
afplayfor audio; swap it formpg123on Linux
MIT. Take it, break it, improve it.