Skip to content

rjamesy/samos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SamOS

A voice-first AI assistant for macOS. Talk to Sam like a person β€” ask questions, set alarms, remember things, search the web, learn new skills, and see through your camera. Built with SwiftUI, powered by OpenAI and local LLMs.

Conversation first. Tools second. Perfection never.

macOS Swift License


What Sam Can Do

Voice Conversation

  • "Hey Sam" wake word activates the microphone (Porcupine)
  • Speech-to-text via bundled Whisper model (local, private) or OpenAI Realtime streaming
  • Text-to-speech via ElevenLabs with audio caching and queue-based playback
  • Follow-up listening β€” mic reopens after Sam speaks, no wake word needed
  • Barge-in β€” interrupt Sam mid-sentence by saying "Hey Sam"
  • Text input β€” type in the chat pane as an alternative to voice

Dual LLM Routing

  • OpenAI (primary) β€” GPT-4o-mini with structured JSON output
  • Ollama (local fallback) β€” any local model, no internet required
  • Adaptive token budget based on query complexity
  • Sub-500ms target for basic questions

Persistent Memory

  • Auto-save β€” Sam detects and remembers facts, preferences, and notes from conversation automatically
  • 4 memory types β€” Facts (365d), Preferences (365d), Notes (90d), Check-ins (7d)
  • Smart deduplication β€” detects near-duplicates, refinements, and high-value replacements
  • Hybrid search β€” semantic similarity + BM25 lexical matching + recency boost
  • 1,000 memory cap with automatic expiry and daily pruning

Alarms & Scheduling

  • Set alarms β€” "Wake me at 7am" or "Set an alarm for 3pm"
  • Set timers β€” "Timer for 5 minutes"
  • Alarm cards β€” orange interactive card with Dismiss/Snooze buttons
  • Voice wake-up loop β€” Sam speaks until you acknowledge

Camera & Computer Vision

  • Scene description β€” "What do you see?"
  • Object search β€” "Can you see my keys?"
  • Face detection β€” "How many people are there?"
  • Face enrollment β€” "Remember my face" (local recognition, no cloud)
  • Face recognition β€” "Who is that?"
  • Visual Q&A β€” "Is the door open?" or "What does that sign say?"
  • Inventory tracking β€” snapshot what's visible and track changes over time
  • Camera memories β€” save timestamped notes of what Sam sees

All vision runs locally via Apple's Vision framework β€” nothing leaves your Mac.

Self-Learning Skills (SkillForge)

  • "Learn how to..." β€” Sam builds new capabilities on demand
  • 5-stage AI pipeline β€” Draft, Refine, Review, Validate, Install
  • Skills are JSON specs with trigger phrases, parameter slots, and executable steps
  • Installed skills work like built-in tools β€” matched by intent and executed automatically

Web Learning & Research

  • Learn a URL β€” "Learn this article" fetches, summarizes, and indexes a webpage
  • Autonomous research β€” "Research AI for 10 minutes" β€” Sam independently browses, reads, and saves findings
  • Searches DuckDuckGo, Wikipedia, and HackerNews
  • All learned content is available for future Q&A

Search & Discovery

  • Image search β€” "Find me a picture of a golden retriever"
  • Video search β€” "Find a video about sourdough baking" (YouTube)
  • Recipe search β€” "Find a recipe for pasta carbonara" (with ingredients and steps)
  • File search β€” "Find my recent PDFs" (searches Downloads/Documents)

Time & Weather

  • Time β€” "What time is it in Tokyo?" (supports 400+ cities and IANA timezones)
  • Weather β€” "What's the weather in Melbourne?" (current + 7-day forecast via Open-Meteo)

Output Canvas

  • Rich content display alongside the chat pane
  • Renders markdown, images, interactive alarm cards
  • Pagination, copy-all, and auto-scroll

Self-Improvement

  • Behavioural self-learning β€” Sam detects patterns (verbosity, clarity, follow-up habits) and adjusts
  • Knowledge attribution β€” tracks what % of each answer came from local memory vs external AI
  • Up to 120 behavioural lessons with confidence scoring

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Input                                                      β”‚
β”‚   "Hey Sam" (Porcupine) β†’ AudioCapture β†’ STT (Whisper)     β”‚
β”‚   Text field β†’ direct input                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Brain                                                      β”‚
β”‚   TurnOrchestrator                                          β”‚
β”‚   β”œβ”€β”€ Memory injection (MemoryStore + WebsiteLearning)      β”‚
β”‚   β”œβ”€β”€ Self-learning lessons                                 β”‚
β”‚   β”œβ”€β”€ OpenAI Router (primary) / Ollama Router (local)       β”‚
β”‚   └── Returns a Plan: [talk, tool, ask, delegate]           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Execution                                                  β”‚
β”‚   PlanExecutor                                              β”‚
β”‚   β”œβ”€β”€ Tool calls β†’ ToolsRuntime β†’ ToolRegistry (30 tools)   β”‚
β”‚   β”œβ”€β”€ Skill calls β†’ SkillEngine β†’ slot filling β†’ execution  β”‚
β”‚   └── Pending slots β†’ follow-up question β†’ resume           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Output                                                     β”‚
β”‚   TTSService (ElevenLabs) β†’ spoken response                 β”‚
β”‚   ChatPaneView β†’ message bubbles with metadata              β”‚
β”‚   OutputCanvasView β†’ markdown, images, alarm cards          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Background Services                                        β”‚
β”‚   MemoryAutoSave Β· SelfLearning Β· TaskScheduler             β”‚
β”‚   SkillForge Β· AutonomousLearning Β· CameraVision            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Principles

Sam follows a strict architectural philosophy documented in ARCHITECTURE.md:

  • Speech is success β€” if Sam can speak an answer, it should. Tools are for side effects only.
  • No tool enforcement β€” if the LLM answers a weather question conversationally, that's valid.
  • Minimal validation β€” only reject responses that aren't valid JSON or are empty.
  • No semantic classifiers β€” no "if question is X, use tool Y" logic.
  • Max 1 retry per provider β€” no repair loops, no provider ping-pong.

Project Structure

SamOS/
β”œβ”€β”€ SamOSApp.swift                  # App entry point
β”œβ”€β”€ ARCHITECTURE.md                 # Design principles (frozen v1.0)
β”‚
β”œβ”€β”€ Models/                         # Data structures
β”‚   β”œβ”€β”€ Action.swift                # LLM response actions (talk, tool, delegate, gap)
β”‚   β”œβ”€β”€ Plan.swift                  # Multi-step execution plans
β”‚   β”œβ”€β”€ ChatMessage.swift           # Conversation messages
β”‚   β”œβ”€β”€ OutputItem.swift            # Canvas output (markdown, image, card)
β”‚   β”œβ”€β”€ MemoryRow.swift             # Persistent memory entries
β”‚   β”œβ”€β”€ SkillSpec.swift             # Learned skill definitions
β”‚   β”œβ”€β”€ PendingSlot.swift           # Awaiting user input state
β”‚   β”œβ”€β”€ SkillForgeJob.swift         # Active skill build tracking
β”‚   └── ForgeQueueJob.swift         # Queued skill build jobs
β”‚
β”œβ”€β”€ Views/                          # SwiftUI interface
β”‚   β”œβ”€β”€ MainView.swift              # Root layout (HSplitView)
β”‚   β”œβ”€β”€ ChatPaneView.swift          # Conversation bubbles
β”‚   β”œβ”€β”€ OutputCanvasView.swift      # Rich content display
β”‚   β”œβ”€β”€ SettingsView.swift          # Configuration panel
β”‚   └── StatusStripView.swift       # Bottom status bar
β”‚
β”œβ”€β”€ Services/                       # Core engine (~17K LOC)
β”‚   β”œβ”€β”€ AppState.swift              # Central state (ObservableObject)
β”‚   β”œβ”€β”€ TurnOrchestrator.swift      # The brain β€” routes and executes turns
β”‚   β”œβ”€β”€ PlanExecutor.swift          # Step-by-step plan execution
β”‚   β”œβ”€β”€ OpenAIRouter.swift          # OpenAI API integration
β”‚   β”œβ”€β”€ OllamaRouter.swift          # Local LLM integration
β”‚   β”œβ”€β”€ VoicePipelineCoordinator.swift  # Voice I/O state machine
β”‚   β”œβ”€β”€ WakeWordService.swift       # "Hey Sam" detection (Porcupine)
β”‚   β”œβ”€β”€ AudioCaptureService.swift   # Microphone recording
β”‚   β”œβ”€β”€ STTService.swift            # Speech-to-text (Whisper/Realtime)
β”‚   β”œβ”€β”€ TTSService.swift            # Text-to-speech orchestration
β”‚   β”œβ”€β”€ ElevenLabsClient.swift      # ElevenLabs API client
β”‚   β”œβ”€β”€ MemoryStore.swift           # SQLite persistent memory
β”‚   β”œβ”€β”€ MemoryAutoSaveService.swift # Automatic memory extraction
β”‚   β”œβ”€β”€ KnowledgeAttributionScorer.swift  # Local vs AI attribution
β”‚   β”œβ”€β”€ SkillEngine.swift           # Skill matching and execution
β”‚   β”œβ”€β”€ SkillForge.swift            # AI-powered skill building
β”‚   β”œβ”€β”€ SkillForgeQueueService.swift    # Forge job queue
β”‚   β”œβ”€β”€ SkillStore.swift            # Skill persistence
β”‚   β”œβ”€β”€ TaskScheduler.swift         # Alarm/timer scheduling
β”‚   β”œβ”€β”€ AlarmSession.swift          # Alarm state machine
β”‚   β”œβ”€β”€ CameraVisionService.swift   # Apple Vision integration
β”‚   β”œβ”€β”€ FaceProfileStore.swift      # Encrypted face data
β”‚   β”œβ”€β”€ KeychainStore.swift         # Secure credential storage
β”‚   └── TimezoneMapping.swift       # City β†’ IANA timezone lookup
β”‚
β”œβ”€β”€ Tools/                          # Tool implementations
β”‚   β”œβ”€β”€ ToolRegistry.swift          # 30 registered tools
β”‚   β”œβ”€β”€ ToolsRuntime.swift          # Execution dispatcher
β”‚   β”œβ”€β”€ MemoryTools.swift           # save/list/delete/clear memory
β”‚   β”œβ”€β”€ SchedulerTools.swift        # schedule/cancel/list tasks
β”‚   └── SkillForgeTools.swift       # start/status/clear forge
β”‚
β”œβ”€β”€ Utils/
β”‚   └── TextUnescaper.swift         # LLM text normalization
β”‚
└── Vendor/                         # Vendored dependencies
    β”œβ”€β”€ Porcupine/                  # Wake word engine
    └── Whisper/                    # Local STT model

Tools Reference (30 Built-in)

Category Tool Description
Display show_text Render markdown on canvas
show_image Display remote image with fallbacks
Search find_image Google image search
find_video YouTube video search
find_recipe Recipe search with ingredients & steps
find_files Search Downloads/Documents by name/type
Camera describe_camera_view Describe the live scene
find_camera_objects Find specific objects in frame
get_camera_face_presence Detect faces
enroll_camera_face Enroll face for recognition
recognize_camera_faces Identify enrolled faces
camera_visual_qa Answer visual questions
camera_inventory_snapshot Track visible object changes
save_camera_memory_note Save camera observation to memory
Memory save_memory Save a fact, preference, note, or check-in
list_memories List saved memories
delete_memory Delete a memory by ID
clear_memories Clear all memories
Scheduler schedule_task Set alarm or timer
cancel_task Cancel scheduled task
list_tasks List pending tasks
Learning learn_website Learn from a URL
autonomous_learn Self-directed research session
stop_autonomous_learn Stop research session
Info get_time Current time / timezone conversion
get_weather Weather and forecast
SkillForge start_skillforge Queue a new skill to build
forge_queue_status Show forge queue state
forge_queue_clear Stop and clear forge queue
Utility capability_gap_to_claude_prompt Generate build prompt for missing capability

Requirements

  • macOS 14.0+ (Sonoma or later)
  • Xcode 15+
  • OpenAI API key (for primary LLM routing and SkillForge)
  • ElevenLabs API key (for text-to-speech)
  • Porcupine access key (for wake word detection)

Optional

  • Ollama installed locally (for offline/local LLM fallback)
  • YouTube API key (for video search)

Setup

  1. Clone the repository

    git clone https://github.com/rjamesy/samos.git
    cd samos
  2. Open in Xcode

    open SamOS.xcodeproj
  3. Build and run (Cmd+R)

  4. Configure API keys in Settings:

    • OpenAI β€” required for LLM routing
    • ElevenLabs β€” required for voice output
    • Porcupine β€” required for "Hey Sam" wake word
  5. Enable microphone and camera when prompted

All API keys are stored securely in the macOS Keychain.


Usage

Voice

Say "Hey Sam" followed by your request:

  • "Hey Sam, what time is it in London?"
  • "Hey Sam, set an alarm for 7am"
  • "Hey Sam, what do you see?"
  • "Hey Sam, remember that my dog's name is Bailey"
  • "Hey Sam, find me a recipe for banana bread"
  • "Hey Sam, learn how to control my smart lights"

Text

Type directly into the chat field at the bottom of the window.

Settings

Click the gear icon or press Cmd+, to configure:

  • LLM providers and models
  • Voice and TTS settings
  • Wake word sensitivity
  • Camera toggle
  • Auto-start preferences

Tech Stack

Component Technology
UI SwiftUI
Language Swift 5
LLM (Primary) OpenAI API (GPT-4o-mini)
LLM (Local) Ollama (any model)
Wake Word Porcupine (Picovoice)
Speech-to-Text Whisper (local) / OpenAI Realtime
Text-to-Speech ElevenLabs
Computer Vision Apple Vision framework
Database SQLite3 (WAL mode)
Secrets macOS Keychain
Face Data AES-GCM encrypted local storage
Weather Open-Meteo API (free, no key)

No CocoaPods, no SPM β€” only Apple frameworks + vendored binaries.


Test Suite

25 test files (~9,900 lines) covering:

  • LLM routing and response parsing
  • Turn orchestration and plan execution
  • Memory storage, deduplication, and retrieval
  • Skill matching, slot extraction, and forge queue
  • Scheduling, timezone handling, and alarm state
  • Text unescaping and image parsing
  • Knowledge attribution calibration
  • Keychain and face profile storage

Privacy

  • Wake word detection runs locally (Porcupine)
  • Speech-to-text runs locally by default (Whisper)
  • Computer vision runs locally (Apple Vision) β€” no images leave your Mac
  • Face data is AES-GCM encrypted on disk
  • Memories are stored locally in SQLite
  • API keys are stored in the macOS Keychain
  • Only LLM routing, TTS, and web learning make network calls

License

MIT


Built by @rjamesy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •