A voice-first AI assistant for macOS. Talk to Sam like a person β ask questions, set alarms, remember things, search the web, learn new skills, and see through your camera. Built with SwiftUI, powered by OpenAI and local LLMs.
Conversation first. Tools second. Perfection never.
- "Hey Sam" wake word activates the microphone (Porcupine)
- Speech-to-text via bundled Whisper model (local, private) or OpenAI Realtime streaming
- Text-to-speech via ElevenLabs with audio caching and queue-based playback
- Follow-up listening β mic reopens after Sam speaks, no wake word needed
- Barge-in β interrupt Sam mid-sentence by saying "Hey Sam"
- Text input β type in the chat pane as an alternative to voice
- OpenAI (primary) β GPT-4o-mini with structured JSON output
- Ollama (local fallback) β any local model, no internet required
- Adaptive token budget based on query complexity
- Sub-500ms target for basic questions
- Auto-save β Sam detects and remembers facts, preferences, and notes from conversation automatically
- 4 memory types β Facts (365d), Preferences (365d), Notes (90d), Check-ins (7d)
- Smart deduplication β detects near-duplicates, refinements, and high-value replacements
- Hybrid search β semantic similarity + BM25 lexical matching + recency boost
- 1,000 memory cap with automatic expiry and daily pruning
- Set alarms β "Wake me at 7am" or "Set an alarm for 3pm"
- Set timers β "Timer for 5 minutes"
- Alarm cards β orange interactive card with Dismiss/Snooze buttons
- Voice wake-up loop β Sam speaks until you acknowledge
- Scene description β "What do you see?"
- Object search β "Can you see my keys?"
- Face detection β "How many people are there?"
- Face enrollment β "Remember my face" (local recognition, no cloud)
- Face recognition β "Who is that?"
- Visual Q&A β "Is the door open?" or "What does that sign say?"
- Inventory tracking β snapshot what's visible and track changes over time
- Camera memories β save timestamped notes of what Sam sees
All vision runs locally via Apple's Vision framework β nothing leaves your Mac.
- "Learn how to..." β Sam builds new capabilities on demand
- 5-stage AI pipeline β Draft, Refine, Review, Validate, Install
- Skills are JSON specs with trigger phrases, parameter slots, and executable steps
- Installed skills work like built-in tools β matched by intent and executed automatically
- Learn a URL β "Learn this article" fetches, summarizes, and indexes a webpage
- Autonomous research β "Research AI for 10 minutes" β Sam independently browses, reads, and saves findings
- Searches DuckDuckGo, Wikipedia, and HackerNews
- All learned content is available for future Q&A
- Image search β "Find me a picture of a golden retriever"
- Video search β "Find a video about sourdough baking" (YouTube)
- Recipe search β "Find a recipe for pasta carbonara" (with ingredients and steps)
- File search β "Find my recent PDFs" (searches Downloads/Documents)
- Time β "What time is it in Tokyo?" (supports 400+ cities and IANA timezones)
- Weather β "What's the weather in Melbourne?" (current + 7-day forecast via Open-Meteo)
- Rich content display alongside the chat pane
- Renders markdown, images, interactive alarm cards
- Pagination, copy-all, and auto-scroll
- Behavioural self-learning β Sam detects patterns (verbosity, clarity, follow-up habits) and adjusts
- Knowledge attribution β tracks what % of each answer came from local memory vs external AI
- Up to 120 behavioural lessons with confidence scoring
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input β
β "Hey Sam" (Porcupine) β AudioCapture β STT (Whisper) β
β Text field β direct input β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Brain β
β TurnOrchestrator β
β βββ Memory injection (MemoryStore + WebsiteLearning) β
β βββ Self-learning lessons β
β βββ OpenAI Router (primary) / Ollama Router (local) β
β βββ Returns a Plan: [talk, tool, ask, delegate] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Execution β
β PlanExecutor β
β βββ Tool calls β ToolsRuntime β ToolRegistry (30 tools) β
β βββ Skill calls β SkillEngine β slot filling β execution β
β βββ Pending slots β follow-up question β resume β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Output β
β TTSService (ElevenLabs) β spoken response β
β ChatPaneView β message bubbles with metadata β
β OutputCanvasView β markdown, images, alarm cards β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Background Services β
β MemoryAutoSave Β· SelfLearning Β· TaskScheduler β
β SkillForge Β· AutonomousLearning Β· CameraVision β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sam follows a strict architectural philosophy documented in ARCHITECTURE.md:
- Speech is success β if Sam can speak an answer, it should. Tools are for side effects only.
- No tool enforcement β if the LLM answers a weather question conversationally, that's valid.
- Minimal validation β only reject responses that aren't valid JSON or are empty.
- No semantic classifiers β no "if question is X, use tool Y" logic.
- Max 1 retry per provider β no repair loops, no provider ping-pong.
SamOS/
βββ SamOSApp.swift # App entry point
βββ ARCHITECTURE.md # Design principles (frozen v1.0)
β
βββ Models/ # Data structures
β βββ Action.swift # LLM response actions (talk, tool, delegate, gap)
β βββ Plan.swift # Multi-step execution plans
β βββ ChatMessage.swift # Conversation messages
β βββ OutputItem.swift # Canvas output (markdown, image, card)
β βββ MemoryRow.swift # Persistent memory entries
β βββ SkillSpec.swift # Learned skill definitions
β βββ PendingSlot.swift # Awaiting user input state
β βββ SkillForgeJob.swift # Active skill build tracking
β βββ ForgeQueueJob.swift # Queued skill build jobs
β
βββ Views/ # SwiftUI interface
β βββ MainView.swift # Root layout (HSplitView)
β βββ ChatPaneView.swift # Conversation bubbles
β βββ OutputCanvasView.swift # Rich content display
β βββ SettingsView.swift # Configuration panel
β βββ StatusStripView.swift # Bottom status bar
β
βββ Services/ # Core engine (~17K LOC)
β βββ AppState.swift # Central state (ObservableObject)
β βββ TurnOrchestrator.swift # The brain β routes and executes turns
β βββ PlanExecutor.swift # Step-by-step plan execution
β βββ OpenAIRouter.swift # OpenAI API integration
β βββ OllamaRouter.swift # Local LLM integration
β βββ VoicePipelineCoordinator.swift # Voice I/O state machine
β βββ WakeWordService.swift # "Hey Sam" detection (Porcupine)
β βββ AudioCaptureService.swift # Microphone recording
β βββ STTService.swift # Speech-to-text (Whisper/Realtime)
β βββ TTSService.swift # Text-to-speech orchestration
β βββ ElevenLabsClient.swift # ElevenLabs API client
β βββ MemoryStore.swift # SQLite persistent memory
β βββ MemoryAutoSaveService.swift # Automatic memory extraction
β βββ KnowledgeAttributionScorer.swift # Local vs AI attribution
β βββ SkillEngine.swift # Skill matching and execution
β βββ SkillForge.swift # AI-powered skill building
β βββ SkillForgeQueueService.swift # Forge job queue
β βββ SkillStore.swift # Skill persistence
β βββ TaskScheduler.swift # Alarm/timer scheduling
β βββ AlarmSession.swift # Alarm state machine
β βββ CameraVisionService.swift # Apple Vision integration
β βββ FaceProfileStore.swift # Encrypted face data
β βββ KeychainStore.swift # Secure credential storage
β βββ TimezoneMapping.swift # City β IANA timezone lookup
β
βββ Tools/ # Tool implementations
β βββ ToolRegistry.swift # 30 registered tools
β βββ ToolsRuntime.swift # Execution dispatcher
β βββ MemoryTools.swift # save/list/delete/clear memory
β βββ SchedulerTools.swift # schedule/cancel/list tasks
β βββ SkillForgeTools.swift # start/status/clear forge
β
βββ Utils/
β βββ TextUnescaper.swift # LLM text normalization
β
βββ Vendor/ # Vendored dependencies
βββ Porcupine/ # Wake word engine
βββ Whisper/ # Local STT model
| Category | Tool | Description |
|---|---|---|
| Display | show_text |
Render markdown on canvas |
show_image |
Display remote image with fallbacks | |
| Search | find_image |
Google image search |
find_video |
YouTube video search | |
find_recipe |
Recipe search with ingredients & steps | |
find_files |
Search Downloads/Documents by name/type | |
| Camera | describe_camera_view |
Describe the live scene |
find_camera_objects |
Find specific objects in frame | |
get_camera_face_presence |
Detect faces | |
enroll_camera_face |
Enroll face for recognition | |
recognize_camera_faces |
Identify enrolled faces | |
camera_visual_qa |
Answer visual questions | |
camera_inventory_snapshot |
Track visible object changes | |
save_camera_memory_note |
Save camera observation to memory | |
| Memory | save_memory |
Save a fact, preference, note, or check-in |
list_memories |
List saved memories | |
delete_memory |
Delete a memory by ID | |
clear_memories |
Clear all memories | |
| Scheduler | schedule_task |
Set alarm or timer |
cancel_task |
Cancel scheduled task | |
list_tasks |
List pending tasks | |
| Learning | learn_website |
Learn from a URL |
autonomous_learn |
Self-directed research session | |
stop_autonomous_learn |
Stop research session | |
| Info | get_time |
Current time / timezone conversion |
get_weather |
Weather and forecast | |
| SkillForge | start_skillforge |
Queue a new skill to build |
forge_queue_status |
Show forge queue state | |
forge_queue_clear |
Stop and clear forge queue | |
| Utility | capability_gap_to_claude_prompt |
Generate build prompt for missing capability |
- macOS 14.0+ (Sonoma or later)
- Xcode 15+
- OpenAI API key (for primary LLM routing and SkillForge)
- ElevenLabs API key (for text-to-speech)
- Porcupine access key (for wake word detection)
- Ollama installed locally (for offline/local LLM fallback)
- YouTube API key (for video search)
-
Clone the repository
git clone https://github.com/rjamesy/samos.git cd samos -
Open in Xcode
open SamOS.xcodeproj
-
Build and run (Cmd+R)
-
Configure API keys in Settings:
- OpenAI β required for LLM routing
- ElevenLabs β required for voice output
- Porcupine β required for "Hey Sam" wake word
-
Enable microphone and camera when prompted
All API keys are stored securely in the macOS Keychain.
Say "Hey Sam" followed by your request:
- "Hey Sam, what time is it in London?"
- "Hey Sam, set an alarm for 7am"
- "Hey Sam, what do you see?"
- "Hey Sam, remember that my dog's name is Bailey"
- "Hey Sam, find me a recipe for banana bread"
- "Hey Sam, learn how to control my smart lights"
Type directly into the chat field at the bottom of the window.
Click the gear icon or press Cmd+, to configure:
- LLM providers and models
- Voice and TTS settings
- Wake word sensitivity
- Camera toggle
- Auto-start preferences
| Component | Technology |
|---|---|
| UI | SwiftUI |
| Language | Swift 5 |
| LLM (Primary) | OpenAI API (GPT-4o-mini) |
| LLM (Local) | Ollama (any model) |
| Wake Word | Porcupine (Picovoice) |
| Speech-to-Text | Whisper (local) / OpenAI Realtime |
| Text-to-Speech | ElevenLabs |
| Computer Vision | Apple Vision framework |
| Database | SQLite3 (WAL mode) |
| Secrets | macOS Keychain |
| Face Data | AES-GCM encrypted local storage |
| Weather | Open-Meteo API (free, no key) |
No CocoaPods, no SPM β only Apple frameworks + vendored binaries.
25 test files (~9,900 lines) covering:
- LLM routing and response parsing
- Turn orchestration and plan execution
- Memory storage, deduplication, and retrieval
- Skill matching, slot extraction, and forge queue
- Scheduling, timezone handling, and alarm state
- Text unescaping and image parsing
- Knowledge attribution calibration
- Keychain and face profile storage
- Wake word detection runs locally (Porcupine)
- Speech-to-text runs locally by default (Whisper)
- Computer vision runs locally (Apple Vision) β no images leave your Mac
- Face data is AES-GCM encrypted on disk
- Memories are stored locally in SQLite
- API keys are stored in the macOS Keychain
- Only LLM routing, TTS, and web learning make network calls
MIT
Built by @rjamesy