A fully local AI desktop application for analyzing and querying short video files (~1 minute) using natural language. The application extracts content, generates summaries, and creates reports (PDF/PPT) entirely offline using local AI models and custom MCP servers.
- Video Upload & Processing: Select and upload local .mp4 files
- Natural Language Interface: Chat-based interaction for video queries
- Multi-Modal Analysis: Transcription, object detection, OCR, document generation
- Human-in-the-Loop: Clarification prompts for ambiguous queries
- Persistent Chat History: Conversation history maintained across sessions
- 100% Offline: All AI inference runs locally using OpenVINO-optimized models
See ARCHITECTURE.md for detailed architecture diagrams and design decisions.
- Python 3.10+
- Node.js 18+
- Rust (for Tauri)
- 8GB+ RAM recommended
cd backend
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
python main.pycd frontend
npm install
npm run tauri dev- Architecture - System design and architecture
- Setup Guide - Detailed installation instructions
- User Guide - How to use the application
- Development Guide - Developer documentation
- Frontend: React, TypeScript, Tauri
- Backend: Python, gRPC, OpenVINO
- AI Models: Whisper, YOLO, CLIP, Phi-3
- Document Gen: ReportLab, python-pptx
MIT