A standalone, local-first runtime for autonomous AI agents. Memory, multi-channel messaging, real-time voice, phone calls, and on-device language models — all running on your machine, with cloud as the exception, not the default.
⚠️ Apple Silicon only. GenosOS uses MLX for the voice pipeline (Qwen3-TTS + Qwen3-ASR), which runs exclusively on M-series chips. Intel Macs are not supported and will fail at thepip installstep.
💾 Disk space: budget ~35 GB free — ~33 GB for the local model weights (downloaded on first run from HuggingFace), ~1 GB for JS deps + Chromium, ~1 GB headroom.
⏱️ Total setup time: ~30 min (fast fiber) to ~2 hours (slow connection). The bulk is the model download in the first-run wizard.
- What is GenosOS
- Highlights
- Requirements
- Installation
- Verify the install
- First Run
- Troubleshooting
- Architecture Overview
- License
- Author
GenosOS is a server runtime that gives an LLM-driven agent the operating-system layer it needs to actually live in your workflow:
- Persistent semantic memory — every conversation contributes to a knowledge graph; relevant context is injected automatically on each turn, with zero prompt overhead for memory mechanics.
- Multi-channel presence — the same agent handles WhatsApp, Telegram, Discord, Slack, iMessage and browser chat, with channel-aware tool restrictions.
- Real-time voice — browser microphone and direct SIP phone calls, fully local pipeline (Qwen3-ASR + Qwen 3.6 35B-A3B + Qwen3-TTS), ~1-1.5s round-trip on cached turns.
- Multi-agent coordination — agent-to-agent delegation, multi-participant rooms with autonomous echo conversations, isolated per-agent memory and workspace.
- Capability-based runtime sandbox — every tool call validated against declarative policies; workspace confinement, allowlisted shell commands, secret redaction.
- Local-first by design — 5 local models cover chat, vision, voice and embeddings (~37 GB RAM). Cloud (Anthropic/OpenAI/Gemini) is opt-in via ⌘L.
- Encrypted memory at rest — AES-256-GCM derived from your own mnemonic, keys never leave the machine.
- Per-account isolation — each Ethereum address is an independent account with its own encrypted database, workspace, and channel sessions.
- WebAuthn + mnemonic identity — no passwords, no shared secrets, no cloud auth.
- Direct SIP trunk for phone calls — no Twilio, no Cloudflare tunnel, no vendor lock-in.
- Zero frameworks on the client — vanilla JS + DOM API + CSS. No build step required to develop the UI.
- Full-state backup engine — immutable manifests, per-account iCloud sync, standard tar.gz + SQLite (no proprietary formats).
- Hardware: Apple Silicon Mac (M1/M2/M3/M4). 64 GB RAM recommended for the full local stack (32 GB works with one model loaded at a time).
- OS: macOS 14+ (Sonoma or newer).
- Disk: ~35 GB free (see warning above).
- Runtime: Bun
>=1.2.0. - Python: 3.11+ (for the Qwen3 MLX server — TTS + ASR).
- System packages via Homebrew:
llama.cpp— chat + embedding modelsffmpeg— audio conversion for messaging channelscliclick— Computer Use tool (optional, only needed if you want the agent to control your desktop)
# 1. Clone
git clone https://github.com/estebanrfp/gos.git
cd gos
# 2. Install system dependencies (skip the ones you already have)
brew install bun llama.cpp ffmpeg cliclick python@3.11
# 3. Install JavaScript runtime dependencies
bun install
# 4. Install Python dependencies for the MLX TTS/ASR server
python3 -m pip install -r dist/qwen3-requirements.txt
# 5. Start GenosOS
bun startWhile the server is running (after step 5), in another terminal:
curl -s -o /dev/null -w '%{http_code}\n' http://localhost:4400
# Expected: 200A 200 confirms the WebSocket server is up. If you see Connection refused, the server did not start — check the boot log printed by bun start.
Open http://localhost:4400 in your browser. The setup wizard will:
- Ask for a mnemonic passphrase. This is a BIP39-style 12- or 24-word seed used to derive your encryption key. The key never leaves your machine. If you don't have one, the wizard can generate one — write it down, because losing it means losing all your encrypted data permanently.
- Pick an LLM provider for the (optional) cloud boost — Anthropic, OpenAI, or Gemini. You can skip this entirely and stay 100% local.
- Auto-download local GGUF models from HuggingFace (~33 GB total). Progress bars per model. Resumable if interrupted.
- Start
llama-serverfor chat + embeddings and launch the Python MLX server for voice. - Land you in your first agent's chat session. A short onboarding conversation will configure the agent's soul, identity, user context, and rules.
After the first run, every subsequent boot is fast (~5-10 seconds): models load once, channels reconnect automatically.
bun: command not found after brew install bun
→ Open a new terminal window so the shell picks up the new PATH entry.
Failed to start server. Is port 4400 in use?
→ Another process is bound to :4400. Find it with lsof -i :4400 and stop it, or pass PORT=4500 bun start to use a different port.
pip install fails with error: externally-managed-environment on macOS 14+
→ Use a virtual environment: python3 -m venv .venv && source .venv/bin/activate && python3 -m pip install -r dist/qwen3-requirements.txt. After this, always activate the venv before running bun start so the MLX server can find its packages.
mlx install fails with architecture errors
→ You are on an Intel Mac. GenosOS does not support Intel — only Apple Silicon (M1/M2/M3/M4).
HuggingFace download stalls or fails
→ Check your network. Downloads are resumable: stop GenosOS and restart it, the wizard picks up where it left off via HTTP Range requests and .part temp files.
Bundle boots but voice / channels don't work
→ Make sure llama-server is on your PATH (which llama-server should print a path under /opt/homebrew/). Check the boot log of bun start for [local] ... and [qwen3] ... lines confirming the model subprocesses started.
I forgot my mnemonic
→ There is no recovery. Your encrypted data is unrecoverable. Delete ~/.genos/ and start fresh, or restore from a backup if you made one (see the Backup feature in-app).
┌─────────────────────────────────────────────────────────────┐
│ Channels: WhatsApp · Telegram · Discord · Slack · iMessage │
│ Voice: Browser mic (Talk Local) · SIP phone calls │
│ UI: Browser chat at http://localhost:4400 │
└─────────────────────────────┬───────────────────────────────┘
│
┌───────────▼───────────┐
│ GenosOS Server │ Bun, single process,
│ :4400 (WS + HTTP) │ encrypted SQLite store
└───────────┬───────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ llama- │ │ qwen3- │ │ node- │
│ server │ │ server.py │ │ llama-cpp │
│ :8081 │ │ :8890 │ │ (embed) │
│ │ │ │ │ in-proc │
│ Qwen 3.6 │ │ Qwen3-TTS │ │ Qwen3-Emb │
│ 35B-A3B │ │ Qwen3-ASR │ │ │
└───────────┘ └───────────┘ └───────────┘
All intelligence, all encryption, all routing — runs locally. Cloud APIs (Anthropic, OpenAI, Gemini) are available as an opt-in boost (⌘L) but are not required for any feature.
The minified production build in dist/ is free for personal and commercial use — integrate, distribute, resell. The source code is proprietary; reverse-engineering, decompilation, or modification of the bundle is not permitted.
See LICENSE for the full terms.
Esteban Fuster Pozzi (@estebanrfp) — Full Stack JavaScript Developer