Narrait

AI-powered assistive companion for disabled users navigating computers. Hold a key — Narrait reads the screen, explains what's in front of you, and points to what you need to click. It also agentically plans multi-step tasks as necessary - and assists you until you get to your end goal.

3rd place winner - 3x Nvidia Jetson Orin Nano

Link to Devpost and Video

Quick Start

Requirements: macOS 14.2+, Xcode 15+

# 1. Clone
git clone <your-repo-url> && cd narrAIt

# 2. API keys
cp .env.example Narrait/.env
# edit Narrait/.env → fill in ANTHROPIC_API_KEY, GEMINI_API_KEY, GROQ_API_KEY

# 3. Your Team ID (find at developer.apple.com/account → Membership → Team ID)
cp Local.xcconfig.example Local.xcconfig
# edit Local.xcconfig → set DEVELOPMENT_TEAM = YOUR_TEAM_ID

# 4. Open in Xcode
open Narrait.xcodeproj

In Xcode:

Right-click the Narrait folder → Add Files to "Narrait" → select Narrait/.env → ✅ "Add to target: Narrait" → Add
Make sure your Apple ID is in Xcode → Settings → Accounts (free account is fine)
Select scheme Narrait + destination My Mac → ⌘R

First launch: approve Screen Recording, Accessibility, and Microphone prompts when they appear.

Usage

The app lives in the menu bar — no Dock icon.

Hotkey	Action
Press/hold ⌥ Option	Captures your screen → Gemini explains what's under your cursor
Hold ⌘⌥ Cmd+Option	Records your voice → Gemini Flash answers general questions or routes pointing to Claude Sonnet
Release ⌘⌥ Cmd+Option	Stops voice recording

Click the menu bar icon to switch access profiles or update API keys.

API Keys

Key	Get it from
`ANTHROPIC_API_KEY`	console.anthropic.com
`ANTHROPIC_MODEL`	Pointing model, e.g. `claude-sonnet-4-6`
`GEMINI_API_KEY`	aistudio.google.com
`GEMINI_ROUTER_MODEL`	Router model, e.g. `gemini-3-flash-preview`
`GROQ_API_KEY`	console.groq.com/keys

Note: Computer Use is beta; Narrait chooses the matching Anthropic beta header for the configured Claude model.

Tech Stack

Layer	Technology
Language / UI	Swift 5.9, SwiftUI + AppKit
Vision + Reasoning	Gemini Flash (`GEMINI_ROUTER_MODEL`) for hover/general answers + Claude Sonnet Computer Use (`ANTHROPIC_MODEL`) for precise pointing
Speech → Text	Groq Whisper Large v3 (batch REST, ~180ms)
Text → Speech	Gemini 3.1 Flash TTS Preview (`gemini-3.1-flash-tts-preview`)
Screen capture	ScreenCaptureKit (`SCScreenshotManager`)
Global hotkeys	CGEventTap (listen-only, no interception)
Key storage	Bundled `.env` loaded into `UserDefaults` at launch
API call logging	JSON files in `~/Library/Logs/Narrait/`

Architecture

Single orchestrator (ActivationCoordinator) owns the state machine. All API clients and UI components are injected as dependencies — they never call each other.

idle ──hold Option──▶ capturing ──▶ streaming ──▶ playing ──▶ idle
idle ──hold Cmd+Opt─▶ recording ──▶ transcribing ──▶ streaming ──▶ playing ──▶ idle
any  ──release key──▶ cancel in-flight ──▶ idle

Hover-explain flow

User holds Option. GlobalHotkeyMonitor waits 150ms to confirm it's not a Cmd+Option chord.
ScreenCapture grabs all displays via SCScreenshotManager, sorted cursor-screen first, downscaled to max 1280px.
Gemini Flash sends screenshot(s) + cursor position + system prompt + conversation history.
ResponseOverlay (cursor-following NSPanel) renders the text in real time.
Full response is parsed for a [POINT:y,x:label] tag if Gemini includes one.
GeminiTTSClient speaks the response via AVAudioPlayer.
On key release: in-flight requests cancel, overlay fades after 3s.

Voice flow

User holds Cmd+Option → mic recording starts via MicRecorder (AVAudioEngine, 16kHz PCM16).
Key release → MicRecorder.stop() returns WAV data → GroqWhisperClient transcribes.
Transcript + fresh screen capture → Gemini Flash router answers general screen questions directly.
If the router says the user needs a visible location or action target, Sonnet Computer Use receives the same screen and returns the point for the green marker.

Cursor pointing

Sonnet Computer Use returns a mouse_move coordinate only for action/location requests. Coordinates are in submitted screenshot-pixel space; handlePointTo converts them to AppKit global screen coords accounting for multi-display layout and Retina scaling.

All profiles: green marker appears at the target
Narrait does not move the physical mouse

Access profiles

Four profiles (Default, Blind / Low Vision, Dyslexia, Language Support). Each appends a clause to the system prompt and sets a TTS speed when needed. Persisted in UserDefaults. Selectable from the menu bar.

Conversation memory

ConversationStore keeps the last 6 turns (user text + assistant response). Sent with every request for follow-up context. Auto-clears after 30 seconds of inactivity.

API logging

Every API call is appended to a JSON file in ~/Library/Logs/Narrait/:

gemini.json — model, system prompt, history, user prompt, image count, output, input/output tokens, duration
groq.json — model, audio size, transcript, audio duration, duration
gemini_tts.json — model, voice ID, input text, character count, speed, audio size, duration

Project Structure

Narrait.xcodeproj
Narrait/
  NarraitApp.swift               # @main entry, builds dependency graph, requests permissions
  Info.plist                     # LSUIElement=true, usage description strings
  Narrait.entitlements           # sandbox=false, network, audio-input

  Core/
    ActivationCoordinator.swift  # State machine, orchestrates all components
    ScreenCapture.swift          # SCScreenshotManager, multi-display, Retina coords
    GlobalHotkeyMonitor.swift    # CGEventTap, Option + Cmd+Option with debounce
    AccessProfile.swift          # Enum, UserDefaults persistence, profile-specific TTS speed
    ConversationStore.swift      # 6-turn rolling history, 30s idle expiry
    SystemPrompts.swift          # Locked rubric + per-profile clauses
    KeychainStore.swift          # .env loader → UserDefaults
    APILogger.swift              # Append-only JSON logs per API

  API/
    GeminiClient.swift           # Gemini 2.0 Flash SSE streaming, [POINT:] tag parsing
    GroqWhisperClient.swift      # Whisper Large v3 multipart POST
    CartesiaTTSClient.swift      # Gemini TTS REST → WAV → AVAudioPlayer

  UI/
    MenuBarController.swift      # NSStatusItem, profile picker menu, key entry
    ResponseOverlay.swift        # Cursor-following NSPanel, streaming text, auto-resize
    CursorPointer.swift          # Highlight ring NSPanel, cursor warp

  Audio/
    MicRecorder.swift            # AVAudioEngine 16kHz PCM16, WAV export
    AudioPlayer.swift            # AVAudioPlayer wrapper with stop/stopHard

Why Narrait

Guidance, not automation. Competitors like Cluely do things for you. Narrait teaches you to do them yourself. Screen readers don't read books for you — they give you access to read. Narrait doesn't navigate VS Code for you — it shows you how, so next time you don't need it.

Cost. ~$0.003/call with Gemini 2.0 Flash + billing enabled. Competitors charge $20–30/month subscriptions — out of reach for most students on financial aid. Narrait is pay-as-you-go.

Not an AI assistant. An assistive AI. The system prompt has a hard rubric: describe the screen, translate jargon, walk through software, complete non-graded forms. It refuses to interpret academic content (math problems, essay prompts, code assignments). The refusal isn't keyword-based — it's content-type-based. Hover a math problem and ask "what's the answer" → refusal. Hover a confusing FAFSA field → full plain-English explanation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Narrait.xcodeproj		Narrait.xcodeproj
Narrait		Narrait
figures		figures
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
Local.xcconfig.example		Local.xcconfig.example
README.md		README.md
SYSTEM_DESIGN.md		SYSTEM_DESIGN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Narrait

Quick Start

Usage

API Keys

Tech Stack

Architecture

Hover-explain flow

Voice flow

Cursor pointing

Access profiles

Conversation memory

API logging

Project Structure

Why Narrait

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Narrait

Quick Start

Usage

API Keys

Tech Stack

Architecture

Hover-explain flow

Voice flow

Cursor pointing

Access profiles

Conversation memory

API logging

Project Structure

Why Narrait

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages