A real-time visual assistant for blind and visually impaired users. Point your camera at the world and Insight narrates your surroundings, answers your questions, and navigates you to your destination — all by voice.
- Passive narration — continuously describes what's around you: obstacles, people, landmarks, hazards
- Voice Q&A — tap once, ask anything ("am I holding salt or pepper?", "is there anyone near me?")
- Voice navigation — say "take me to the nearest Tim Hortons" and get turn-by-turn walking directions
- Haptic feedback — physical cues for hazards, turns, arrival, and every interaction
- Memory — short-term context prevents repetition; long-term memory persists across sessions via Supabase
- Anthropic Claude Haiku — vision + language in one API call. Analyzes camera frames and generates navigation-relevant descriptions, answers user questions, handles all natural language output. Chosen over traditional CV models (e.g. YOLO) because blind users need spatial reasoning and context, not bounding boxes.
- Google Places API (Text Search) — finds destinations from natural language queries with fuzzy matching
- Google Directions API — walking turn-by-turn route generation with step-level waypoints
- Supabase (Postgres) — stores every narration and Q&A entry per user. Fetched on launch to seed long-term context.
- iOS Keychain — auto-generates a UUID on first launch as the user identity. No signup, no login, zero friction. Persists across reinstalls.
- Short-term memory — rolling in-memory buffer of last 6 narrations injected into every Claude prompt to prevent repetition
- Long-term memory — last 20 Supabase entries fetched on launch and seeded into Claude's context
- AVFoundation — camera session, frame capture,
AVSpeechSynthesizerfor TTS - Speech (
SFSpeechRecognizer) — on-device voice recognition, zero latency, no API call - CoreLocation — GPS tracking for navigation step progression
- UIKit — haptic feedback (
UIImpactFeedbackGenerator,UINotificationFeedbackGenerator) - SwiftUI — entire UI layer
- Swift — 100% native iOS, no third-party dependencies
Camera frame (every 1s)
→ On-device scene change detection (16x16 pixel diff, ~5ms)
→ If scene changed + cooldown elapsed:
→ Short-term + long-term memory context assembled
→ JPEG encoded at 50% quality
→ Claude Haiku vision API (~1-2s round trip)
→ Response logged to Supabase
→ Added to short-term memory buffer
→ AVSpeechSynthesizer speaks it
User taps screen
→ SFSpeechRecognizer activates (on-device, instant)
→ User speaks
→ User taps again to stop
→ On-device keyword detection:
→ Navigation trigger? → Google Places → Google Directions → step tracker
→ Question? → Claude vision API with memory context → spoken answer
Navigation active
→ CoreLocation tracks position every 5m
→ Step tracker checks distance to next waypoint every 2s
→ 50m from turn: light haptic + verbal warning
→ 15m from turn: double haptic + "turn now"
→ Arrived: double heavy haptic + verbal confirmation
→ Passive narration continues between navigation instructions
→ Instructions never cut off narration — pending queue speaks after current sentence
Latency budget:
- Frame capture: ~0ms (continuous)
- Scene change detection: ~5ms (on-device)
- JPEG encode: ~20ms
- Claude API round trip: ~1–2s
- TTS first word: ~100ms after response
- Total: ~1.5–2.5 seconds from scene change to first spoken word
| Gesture | Action |
|---|---|
| Single tap | Start listening / stop listening |
| Swipe (any direction) | Silence immediately |
| Haptic | Meaning |
|---|---|
| Double light tap | Mic is ready, listening started |
| Single medium tap | Mic stopped, processing |
| Warning buzz | Hazard detected in scene |
| Light tap | 50m turn warning |
| Double medium tap | Turn now |
| Double heavy pulse | Arrived at destination |
| Light tap | Silenced by swipe |
Add to Secrets.xcconfig (never commit this file):
ANTHROPIC_API_KEY = sk-ant-...
GOOGLE_API_KEY = your-google-key
SUPABASE_URL = https://xxxx.supabase.co
SUPABASE_ANON_KEY = your-anon-key
Add all four to your target's Info.plist as $(KEY_NAME) variable references.
Run in Supabase SQL Editor:
create table memories (
id uuid default gen_random_uuid() primary key,
user_id text not null,
content text not null,
type text not null,
created_at timestamptz default now()
);Privacy - Camera Usage DescriptionPrivacy - Microphone Usage DescriptionPrivacy - Speech Recognition Usage DescriptionPrivacy - Location When In Use Usage Description- Background Modes: Audio
The simulator has no camera. Plug in your iPhone and hit Run.
| File | Purpose |
|---|---|
AwarelyApp.swift |
App entry point |
SplashView.swift |
Launch splash screen |
ContentView.swift |
Fullscreen camera UI, gesture handling, direction banner |
CameraManager.swift |
AVFoundation session, frame capture, SwiftUI preview |
NarratorEngine.swift |
Scene change detection, Claude API, TTS, memory, voice input |
NavigationEngine.swift |
Google Places + Directions, step tracking, navigation haptics |
LocationManager.swift |
CoreLocation GPS wrapper |
SupabaseManager.swift |
Postgres logging and memory fetching |
KeychainManager.swift |
Auto-generated user identity, zero-friction auth |