A multi-agent AI executive assistant with web UI, Telegram integration, voice calls, and autonomous browser automation. Built as a monorepo with a Fastify API server, Expo Web mobile app, and a Claude CLI proxy on Unraid.
- One Assistant, Many Functions -- The user talks to ONE executive assistant: Dwight. Internally, Dwight has 16 domain "functions" (calendar, finance, shopping, communications, relationships, travel, research, news, analyst, local, health, clerk, toolsmith, developer, customer_service, ceo/handler) plus 4 system agents for self-improvement (monitoring, qa_tester, moderator, developer_auto). The user never sees sub-agent names in chat — Dwight speaks in the first-person singular for every action. HONESTY_PROTOCOL rule #13 enforces this at the prompt level for every agent.
- Household Model -- Aide is a household OS, not a single-user tool. The
trusted_principalstable is the canonical list of people Dwight serves (account owner + authorized family). When any household member messages Dwight via voice/SMS/WhatsApp/Telegram, the webhook callsidentifyPrincipal()to recognize them by phone/email/telegram id and threads the identity into the pipeline. Non-members are rejected. - Voice Calls -- Twilio + Cartesia Sonic 3 outbound calling via
make_phone_calltool. Inbound voice webhook personalizes the greeting per household member ("Hey Cristina, what can I help with?"). - SMS -- 6 agents granted
send_sms(handler, relationships, communications, shopping, travel, local). Executor polls Twilio delivery status 3× and returns an honest error on US A2P 10DLC carrier blocks (error 30034) instead of hallucinating "Text sent". - Relationship Graph -- 50+ contacts with visualization, proactive push reminders. Separate from the household table — relationships are people the household interacts with, not people the assistant serves.
- Proactive Push -- Automated notifications via chat and Telegram (birthday reminders, follow-ups)
- Multi-Turn Intent Classification -- Classifier uses conversation history for follow-up detection, contact-aware routing, verb-override rule for imperative actions
- Browser Automation -- Playwright/BlackTip with stealth mode, session isolation per agent run
- Self-Improving Agent System (Sprint 9) -- Nightly chain at 04:00 PST: Monitoring agent (Sonnet) grades recent conversations against an honesty rubric; QA tester runs 20 sandboxed synthetic tests against every agent with side-effect tools short-circuited; Moderator grades results; Developer agent (Opus 4.6) clusters findings + failures, reads PRD/CLAUDE.md/default-agents.ts, and proposes prompt fixes with real unified diffs. Proposals surface as home cards for owner approval; nothing auto-merges.
- v3 Tactical Play UI (2026-04-14) -- Expo Web mobile app redesigned to match the "Tactile Tactician" design system from
planning/mockups-v3/. Cream surface, Inter 800/900, dual-shadow recipe, 28pt radii, per-agent gradient palette, Stitch-designed AgentIcon tiles with real MaterialIcons glyphs. All 5 screens (Home/Code/Talk/Intel/Settings) ported. Floating pill bottom nav. - A2P 10DLC Landing Page --
https://aide-production-5c54.up.railway.app/aideis the Twilio Campaign Registry registration target. Describes the business, SMS use case, opt-in flow, sample messages, STOP/HELP keywords. Separate/aide/privacyand/aide/termsroutes. - 200+ E2E Tests -- Adversarial tests across shopping (120), relationships (90), parser (20), cross-agent scenarios, plus the nightly QA harness that grows the test bank over time
- Railway (cloud): API server + web UI at
https://aide-production-5c54.up.railway.app - Unraid (self-hosted): Claude CLI proxy at
claude-proxy.dwightassistant.com - Database: Neon PostgreSQL with pgvector
- Providers: Anthropic (via CLI proxy), Gemini 2.5 Flash, Groq (fallback)
| Concern | Choice |
|---|---|
| Runtime | Node.js 22 LTS |
| API | Fastify 5 |
| ORM | Drizzle ORM |
| Database | Neon PostgreSQL 16 + pgvector |
| LLM (primary) | Anthropic Claude (Haiku/Sonnet via CLI proxy) |
| LLM (function calling) | Google Gemini 2.5 Flash |
| LLM (fallback) | Groq Llama 3.3 70B |
| STT | Deepgram Nova-3 |
| TTS | Cartesia Sonic 3 (voice calls), ElevenLabs (app) |
| Voice Calls | Twilio |
| Browser | Playwright + BlackTip stealth |
| Web App | Expo Web (React) |
npx tsc --noEmit --project server/tsconfig.json # Type check
cd server && npx vitest run # All tests (200+)Push to main. Railway auto-deploys, then triggers Unraid proxy update via webhook.
git push origin main- Phase: Active development, MVP features shipping
- One-Dwight persona principle (shipped 2026-04-14): chat is always with Dwight, sub-agents never named in user-facing prose, first-person-singular always. Enforced via HONESTY_PROTOCOL #13 + buildPersonalityFooter + 4 stripped UI leak sites.
- Household principals (shipped 2026-04-14):
trusted_principalstable migrated with telegram + account-owner columns,identifyPrincipal()helper wired into Twilio + Telegram webhooks, Settings → Household CRUD UI. Current rows: Enrico (owner) + Cristina (wife). - v3 Tactical Play UI pass (shipped 2026-04-14): design tokens, 7 primitives, 17 Stitch-ported AgentIcon tiles, all 5 screens ported. Desktop maxWidth 560 cap.
- A2P 10DLC registration target (shipped 2026-04-14):
/aide,/aide/privacy,/aide/termsroutes in v3 styling. Ready to submit to Twilio Campaign Registry — user's Twilio number+16625478890is currently carrier-blocked (error 30034) until registration completes. - Critical fixes shipped 2026-04-14: Handler rerouted through runAgent() (was hallucinating tool execution); send_sms polls Twilio status + detects A2P blocks honestly; voice TTS dual guard against empty LLM output (was causing error 11200 call drops).
- All 16 domain agents + 4 system agents operational
- Grant-based tool system (65+ tool definitions across native, API, and MCP categories)
- Self-improving nightly chain live (Sprint 9): monitoring + QA + developer agent fire at 04:00 PST
- Outstanding gaps: no tests written for the 2026-04-14 arc (household, persona, v3 UI, send_sms A2P detection, voice TTS guard) — being backfilled now. Travel cheapest-flight engine not started. Developer v2 heartbeat wiring not landed. Phase 3 household (per-principal access scopes, memory namespacing, audit log) not started.