Your browser becomes a private AI computer.
The Local AI Mission Console is a minimal, elegant, and powerful demonstration of local Edge AI. It bridges the gap between browser-side execution (using the RunAnywhere Web SDK) and local backend inference (using Ollama), creating a fully private, multimodal AI pipeline.
Traditional AI applications send your voice, text, and images to centralized servers. The Local AI Mission Console flips this model: it brings the AI directly to your hardware.
By running Vision-Language Models (VLMs) and Text-to-Speech (TTS) locally, it guarantees:
- Zero Latency: No waiting for server round-trips.
- Absolute Privacy: Your camera feed and data never leave your machine.
- Offline Capability: Works completely decoupled from the cloud.
This project pioneers a hybrid "Local-Edge" architecture, optimally distributing heavy AI workloads across your machine's resources:
graph TD
A[Camera Feed] -->|Frames| B[Web Browser App]
B -->|Base64 Image| C{{Ollama Local API}}
C -->|Qwen2.5-VL / Llama3| D[Scene Description]
D -->|Text| B
B -->|Sherpa-ONNX WASM| E[Piper TTS Engine]
E -->|PCM Audio| F[Browser AudioContext]
style B fill:#f9f,stroke:#333,stroke-width:2px
style C fill:#bbf,stroke:#333,stroke-width:2px
style E fill:#bfb,stroke:#333,stroke-width:2px
- Perception: The browser safely accesses your webcam and extracts a high-quality frame.
- Reasoning (Ollama): Because Vision-Language Models (like
qwen2.5-vl) are GPU-intensive, the heavy lifting is offloaded to Ollama running locally on your hardware. This keeps the browser lightweight. - Synthesis (RunAnywhere / WebAssembly): The text response is streamed back to the browser, where the RunAnywhere Web SDK initializes a WebAssembly-compiled Sherpa-ONNX engine. It uses a Piper
en_US-amy-lowvoice model to synthesize studio-quality speech entirely within the browser's memory.
To achieve flawless in-browser TTS, the app uses an advanced WASM filesystem hydration technique:
- The app fetches a single
.tgz.binarchive containing the Piper.onnxmodel,tokens.txt, and over 350 files ofespeak-ng-data. - This archive is manually extracted directly into the Sherpa-ONNX virtual filesystem.
- The
TTSprovider is accessed via the SDK'sExtensionPointregistry to ensure perfect physical singleton isolation, preventing Vite module-duplication errors.
- Ollama: Download and install Ollama.
- Vision Model: Pull a robust VLM (we recommend Qwen for vision):
(Note: You can use any model Ollama supports; the UI will autodetect them).
ollama run qwen2.5-vl
- Camera Access: A webcam is required.
-
Clone the repository and install dependencies:
npm install
-
Run the development server:
npm run dev
-
Open the app at
http://localhost:5173. Select your model, and click "Analyze Scene"!
The Local AI Mission Console is designed as a foundation. Here are ways you can extend it:
- Continuous Monitoring (Dashcam Mode): Modify
pipeline.tsto run asetIntervalloop, creating an AI that narrates what it sees every 10 seconds. - Security Guard: Add an instruction to the Ollama prompt: "Only respond if you see a person, describe their clothing." Tie this to an alert system.
- Accessibility Aide: Deploy this on a mobile browser. Users can point their phone at signs or documents, and the local AI reads it aloud to them.
- Local RAG Integration: Instead of just describing the scene, pass the description to a local ChromaDB instance to retrieve context before generating the speech.
- [RunAnywhere Web SDK]https://github.com/RunanywhereAI/runanywhere-sdks/tree/main/sdk/runanywhere-web: Standardized APIs for bridging WASM AI, Audio/Video capture, and device capabilities.
- Ollama: Local LLM/VLM inference engine.
- Vite & TypeScript: Fast, typed frontend tooling.
- Piper/Sherpa-ONNX: Blazing fast offline Text-to-Speech.