Skip to content

hasan-malik/Insight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Insight

A real-time visual assistant for blind and visually impaired users. Point your camera at the world and Insight narrates your surroundings, answers your questions, and navigates you to your destination — all by voice.


What it does

  • Passive narration — continuously describes what's around you: obstacles, people, landmarks, hazards
  • Voice Q&A — tap once, ask anything ("am I holding salt or pepper?", "is there anyone near me?")
  • Voice navigation — say "take me to the nearest Tim Hortons" and get turn-by-turn walking directions
  • Haptic feedback — physical cues for hazards, turns, arrival, and every interaction
  • Memory — short-term context prevents repetition; long-term memory persists across sessions via Supabase

Tech Stack

AI / Intelligence

  • Anthropic Claude Haiku — vision + language in one API call. Analyzes camera frames and generates navigation-relevant descriptions, answers user questions, handles all natural language output. Chosen over traditional CV models (e.g. YOLO) because blind users need spatial reasoning and context, not bounding boxes.

Navigation

  • Google Places API (Text Search) — finds destinations from natural language queries with fuzzy matching
  • Google Directions API — walking turn-by-turn route generation with step-level waypoints

Backend / Memory

  • Supabase (Postgres) — stores every narration and Q&A entry per user. Fetched on launch to seed long-term context.
  • iOS Keychain — auto-generates a UUID on first launch as the user identity. No signup, no login, zero friction. Persists across reinstalls.
  • Short-term memory — rolling in-memory buffer of last 6 narrations injected into every Claude prompt to prevent repetition
  • Long-term memory — last 20 Supabase entries fetched on launch and seeded into Claude's context

iOS Frameworks (all native, no external dependencies)

  • AVFoundation — camera session, frame capture, AVSpeechSynthesizer for TTS
  • Speech (SFSpeechRecognizer) — on-device voice recognition, zero latency, no API call
  • CoreLocation — GPS tracking for navigation step progression
  • UIKit — haptic feedback (UIImpactFeedbackGenerator, UINotificationFeedbackGenerator)
  • SwiftUI — entire UI layer

Language

  • Swift — 100% native iOS, no third-party dependencies

Architecture

Camera frame (every 1s)
    → On-device scene change detection (16x16 pixel diff, ~5ms)
        → If scene changed + cooldown elapsed:
            → Short-term + long-term memory context assembled
            → JPEG encoded at 50% quality
            → Claude Haiku vision API (~1-2s round trip)
                → Response logged to Supabase
                → Added to short-term memory buffer
                → AVSpeechSynthesizer speaks it

User taps screen
    → SFSpeechRecognizer activates (on-device, instant)
    → User speaks
    → User taps again to stop
    → On-device keyword detection:
        → Navigation trigger? → Google Places → Google Directions → step tracker
        → Question? → Claude vision API with memory context → spoken answer

Navigation active
    → CoreLocation tracks position every 5m
    → Step tracker checks distance to next waypoint every 2s
    → 50m from turn: light haptic + verbal warning
    → 15m from turn: double haptic + "turn now"
    → Arrived: double heavy haptic + verbal confirmation
    → Passive narration continues between navigation instructions
    → Instructions never cut off narration — pending queue speaks after current sentence

Latency budget:

  • Frame capture: ~0ms (continuous)
  • Scene change detection: ~5ms (on-device)
  • JPEG encode: ~20ms
  • Claude API round trip: ~1–2s
  • TTS first word: ~100ms after response
  • Total: ~1.5–2.5 seconds from scene change to first spoken word

Interactions

Gesture Action
Single tap Start listening / stop listening
Swipe (any direction) Silence immediately
Haptic Meaning
Double light tap Mic is ready, listening started
Single medium tap Mic stopped, processing
Warning buzz Hazard detected in scene
Light tap 50m turn warning
Double medium tap Turn now
Double heavy pulse Arrived at destination
Light tap Silenced by swipe

Setup

1. API Keys

Add to Secrets.xcconfig (never commit this file):

ANTHROPIC_API_KEY = sk-ant-...
GOOGLE_API_KEY = your-google-key
SUPABASE_URL = https://xxxx.supabase.co
SUPABASE_ANON_KEY = your-anon-key

Add all four to your target's Info.plist as $(KEY_NAME) variable references.

2. Supabase table

Run in Supabase SQL Editor:

create table memories (
  id uuid default gen_random_uuid() primary key,
  user_id text not null,
  content text not null,
  type text not null,
  created_at timestamptz default now()
);

3. Info.plist permissions

  • Privacy - Camera Usage Description
  • Privacy - Microphone Usage Description
  • Privacy - Speech Recognition Usage Description
  • Privacy - Location When In Use Usage Description
  • Background Modes: Audio

4. Run on a real device

The simulator has no camera. Plug in your iPhone and hit Run.


Files

File Purpose
AwarelyApp.swift App entry point
SplashView.swift Launch splash screen
ContentView.swift Fullscreen camera UI, gesture handling, direction banner
CameraManager.swift AVFoundation session, frame capture, SwiftUI preview
NarratorEngine.swift Scene change detection, Claude API, TTS, memory, voice input
NavigationEngine.swift Google Places + Directions, step tracking, navigation haptics
LocationManager.swift CoreLocation GPS wrapper
SupabaseManager.swift Postgres logging and memory fetching
KeychainManager.swift Auto-generated user identity, zero-friction auth

About

MacHacks repo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages