SpeakEasy

An Augmentative and Alternative Communication (AAC) app with on-device AI, built for people who need a voice.

SpeakEasy runs entirely in the browser or as a native iOS/Android app — no server, no cloud, no account required. Tap symbols to build sentences, and the app speaks them aloud using high-quality text-to-speech. An on-device language model provides contextual reply suggestions when someone speaks to you.

Features

Symbol Board

100+ built-in symbols organised across 8 categories: Social, People, Feelings, Actions, Food, Places, Things, and Descriptors
Each symbol shows an emoji and a localised label
Tap symbols to compose messages; freely mix tapping and typing
Category grid with drill-down navigation and quick-phrase / emergency tabs

Intelligent Prediction Pipeline

SpeakEasy uses a three-tier prediction system, all running 100% on-device:

Tier	Trigger	Latency	What it does
Heuristic templates	Every symbol tap	0 ms	POS-aware sentence candidates from per-language templates
Gender agreement fixer	Post-heuristic	0 ms	Rule-based suffix correction (IT/FR/ES) — no LLM needed
N-gram engine	Background	< 1 ms	Bigram/trigram predictions learned from your speech history
On-device LLM	Listen Mode only	1-3 s	Contextual reply generation when someone speaks to you

The LLM (Qwen3 0.6B–1.7B via WebGPU) is reserved for Mode B — generating contextual replies to overheard speech. Simple symbol-tap predictions use instant heuristics, making the app responsive even on low-end devices.

RAG Memory

Powered by MiniLM-L6-v2 (~23 MB) — embeds your past utterances into 384-d vectors stored in localStorage
Retrieves the most relevant context to improve LLM reply suggestions
All inference runs 100% offline with zero data leaving the device

Listen Mode (Two-Stage)

Stage 1 – Wake word detection: Mic open with energy VAD → short Whisper transcription → keyword match
Stage 2 – Full transcription: Records full utterance → Whisper STT → LLM generates 5 contextual replies
User can tap a reply (speaks it via TTS), type their own reply using the board, or listen again
Runs entirely on-device using @xenova/transformers (Whisper ONNX WASM)

Multilingual Support

10 languages with full UI translation, localised symbol labels, and per-language TTS voices:

Language	Code	TTS
🇬🇧 English	`en`	`en-US`
🇪🇸 Español	`es`	`es-ES`
🇫🇷 Français	`fr`	`fr-FR`
🇩🇪 Deutsch	`de`	`de-DE`
🇮🇹 Italiano	`it`	`it-IT`
🇧🇷 Português	`pt`	`pt-BR`
🇸🇦 العربية	`ar`	`ar-SA`
🇨🇳 中文	`zh`	`zh-CN`
🇯🇵 日本語	`ja`	`ja-JP`
🇰🇷 한국어	`ko`	`ko-KR`

Separate UI language, typing language, TTS language, and listening language settings for bilingual users.

Text-to-Speech

Native TTS on iOS/Android via @capacitor-community/text-to-speech
Web Speech API on desktop/mobile browsers with smart voice selection — automatically prefers premium, enhanced, natural, and neural voices
Adjustable speed and pitch, selectable voice name, and a Try Voice button to preview settings
Haptic feedback on native platforms when speaking
Gender-aware voice preferences

Accessibility & UX

Left/right-handed mode — flips action button layout for comfortable one-handed use
iOS-inspired native design: translucent blur headers, system font stack, smooth animations
Mobile-first responsive layout — full-width, optimised for one-thumb use
60+ avatar emoji options including skin-tone variants, roles, animals, and accessibility symbols
History panel with phrase frequency tracking, export, and one-tap replay

Tech Stack

Layer	Technology
Framework	React 19 + Vite 7
Native	Capacitor 8 (iOS + Android)
AI / LLM	@mlc-ai/web-llm (Qwen3, 4-bit, WebGPU)
STT	@xenova/transformers (Whisper ONNX WASM)
Embeddings	@xenova/transformers (MiniLM-L6-v2, 384-d)
Icons	Lucide React
TTS	Web Speech API / Capacitor TTS plugin
Storage	`localStorage` (settings, history, RAG vectors, n-gram model)

Project Structure

speakeasy/
├── public/                          # Static assets (logo, icons)
├── src/
│   ├── main.jsx                     # React root mount + ErrorBoundary
│   ├── index.css                    # Design system (CSS variables, light/dark)
│   ├── app/
│   │   ├── App.jsx                  # Root orchestrator (~690 lines)
│   │   ├── App.css                  # App-specific styles
│   │   └── native.js                # Capacitor bootstrap + haptic()
│   ├── features/
│   │   ├── board/                   # CategoryGrid, SymbolPicker, PhraseGrid,
│   │   │                            #   SmartKeyboard, IntentBar, SymbolButton, etc.
│   │   ├── composer/                # MessageBar (sentence builder + speak)
│   │   ├── prediction/              # useAIPrediction, usePrediction,
│   │   │                            #   predictionEngine, ragMemory
│   │   ├── listen/                  # useListenMode, useWhisper, ListenOverlay,
│   │   │                            #   audioCapture, wakeWordDetector
│   │   ├── history/                 # HistoryPanel, useStorage
│   │   ├── settings/                # SettingsPanel, AIModelModal, useSettings
│   │   ├── profile/                 # ProfilePanel
│   │   └── symbols/                 # SymbolsPage, useCustomSymbols
│   ├── i18n/
│   │   ├── languages.js             # LANGUAGES, LANG_MAP, ACTIVATION_KEYWORDS
│   │   ├── translations.js          # Symbol / category / hierarchy translations
│   │   ├── ui-strings.js            # UI_STRINGS + getUI() for 6 languages
│   │   └── useLanguage.js           # 4-dimension language state hook
│   ├── data/
│   │   ├── symbols.js               # AAC symbol definitions (9 categories)
│   │   ├── hierarchy.js             # Category → subcategory → symbol tree
│   │   ├── boardTabs.js             # Board tabs + default phrases
│   │   ├── wordFrequency.js         # Word frequency data
│   │   └── posLookup.js             # POS lookup table
│   ├── shared/
│   │   ├── platform.js              # Capacitor platform detection
│   │   ├── ui/                      # ConfirmSheet, HelpModal, settingsUI,
│   │   │                            #   ErrorBoundary
│   │   └── hooks/                   # useFavorites, useQuickPhrases, useTTS
│   └── prompts/
│       └── intentPrompt.js          # Heuristic templates, gender fixer,
│                                    #   LLM prompt builders, parsers
├── docs/
│   └── ARCHITECTURE.md              # Detailed architecture document
├── claude.md                        # AI assistant rules for this project
├── capacitor.config.json            # Capacitor configuration
├── vite.config.js                   # Vite build config (COOP/COEP, WASM)
└── package.json

Getting Started

Prerequisites

Node.js ≥ 18
npm ≥ 9

Install & Run (Web)

git clone https://github.com/your-username/speakeasy.git
cd speakeasy
npm install
npm run dev

Open http://localhost:5173 in a browser with WebGPU support (Chrome 113+, Edge 113+) for AI predictions.

Build for Production

npm run build
npm run preview

Native (iOS / Android)

Requires Xcode (iOS) or Android Studio (Android).

npm run cap:sync          # Build + sync to native platforms
npm run cap:ios           # Build, sync, open in Xcode
npm run cap:android       # Build, sync, open in Android Studio
npm run cap:run:ios       # Build, sync, run on iOS device
npm run cap:run:android   # Build, sync, run on Android device

AI Models

SpeakEasy ships with four model options (used for Mode B — Listen Mode replies only):

Model	Size	VRAM	Speed	Use case
Qwen3 0.6B (fast)	~400 MB	~500 MB	~25 tok/s	Most WebGPU devices
Qwen3 1.7B (quality)	~900 MB	~1 GB	~15 tok/s	Desktop + high-end mobile
Gemma 3 1B	~600 MB	~700 MB	~20 tok/s	Balanced speed & quality
Qwen2.5 0.5B	~300 MB	~400 MB	~30 tok/s	Smallest + fastest download

The model is downloaded once and cached by the browser. Switch between models in Settings → AI Engine.

How Prediction Works

Symbol tap ──► Heuristic templates (0ms) ──► Gender fixer (0ms) ──► Display
                                                                       │
Listen Mode ──► Whisper STT ──► LLM Mode B (1-3s) ──► Reply pills ────┘
                                                                       │
Background ────► N-gram engine learns from spoken phrases ─────────────┘

Heuristic templates generate 5 POS-aware sentences instantly per symbol tap
Gender agreement fixer applies rule-based suffix correction (IT: -ato → -ata, FR: -é → -ée, ES: -ado → -ada)
N-gram engine provides bigram/trigram suggestions from your history (< 1 ms)
LLM generates contextual replies only in Listen Mode when someone speaks to you
RAG memory retrieves your most similar past utterances to improve LLM context

Privacy

Zero cloud dependency — all data stays on your device
No accounts, no analytics, no telemetry
AI models run locally via WebGPU/WASM
History and RAG vectors stored in localStorage — clear them anytime from Settings → Data & Privacy
Export your phrase history as a file at any time

Scripts

Command	Description
`npm run dev`	Start Vite dev server with HMR
`npm run build`	Production build to `dist/`
`npm run preview`	Preview production build locally
`npm run lint`	Run ESLint
`npm run cap:sync`	Build + sync to Capacitor platforms
`npm run cap:ios`	Build, sync, and open in Xcode
`npm run cap:android`	Build, sync, and open in Android Studio
`npm run cap:run:ios`	Build, sync, and run on iOS device
`npm run cap:run:android`	Build, sync, and run on Android device

Browser Compatibility

Feature	Chrome 113+	Safari 17+	Firefox	Mobile Chrome
Core app	✅	✅	✅	✅
Web Speech TTS	✅	✅	✅	✅
WebGPU (AI)	✅	✅	❌	✅ (Android)
WASM fallback	✅	✅	✅	✅

Note: On browsers without WebGPU, the AI prediction gracefully degrades to heuristic + n-gram suggestions only. The core AAC functionality works everywhere.

License

This project is open source. See LICENSE for details.

SpeakEasy — Everyone deserves a voice.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.agent/skills		.agent/skills
.agents/skills		.agents/skills
.github/workflows		.github/workflows
.windsurf/skills		.windsurf/skills
android		android
assets		assets
docs		docs
ios		ios
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
TODO.txt		TODO.txt
capacitor.config.json		capacitor.config.json
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vercel.json		vercel.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakEasy

Features

Symbol Board

Intelligent Prediction Pipeline

RAG Memory

Listen Mode (Two-Stage)

Multilingual Support

Text-to-Speech

Accessibility & UX

Tech Stack

Project Structure

Getting Started

Prerequisites

Install & Run (Web)

Build for Production

Native (iOS / Android)

AI Models

How Prediction Works

Privacy

Scripts

Browser Compatibility

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeakEasy

Features

Symbol Board

Intelligent Prediction Pipeline

RAG Memory

Listen Mode (Two-Stage)

Multilingual Support

Text-to-Speech

Accessibility & UX

Tech Stack

Project Structure

Getting Started

Prerequisites

Install & Run (Web)

Build for Production

Native (iOS / Android)

AI Models

How Prediction Works

Privacy

Scripts

Browser Compatibility

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages