Jarvis — AI-Driven Desktop Virtual Assistant

<<<<<<< HEAD

Jarvis — AI-Driven Desktop Virtual Assistant

MCA (Data Science) Final Year Capstone Project

An intelligent desktop assistant with animated avatar, voice interaction, computer vision, and autonomous task planning — running 100% locally with zero cloud cost.

Architecture at a Glance

User speaks → Wake Word → STT (Faster-Whisper)
                              ↓
                    Supervisor (LangGraph)
                    ┌──────────────────────┐
                    │  Intent classifier   │
                    │  (Phi-3-mini)        │
                    └──────┬───────────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
   Conversation     Planning Agent    Vision Agent
   Agent (Qwen)     (Qwen + plan)     (LLaVA + OCR)
          │                │                │
          └────────────────┼────────────────┘
                           ▼
                   Unified Response
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
    TTS (Piper)     Avatar (Godot)    Memory (ChromaDB)

Quick Start

# 1. Install Ollama
curl https://ollama.ai/install.sh | sh
ollama serve &

# 2. Clone and set up
git clone <your-repo>
cd jarvis
python setup.py          # downloads all models (~15GB)

# 3. Run
python -m uvicorn api.main:app --port 8000

# 4. Test
curl -X POST http://localhost:8000/chat \
     -H "Content-Type: application/json" \
     -d '{"text": "Create a folder on my desktop called MCA Project"}'

Project Structure

jarvis/
├── core/
│   ├── supervisor.py        LangGraph orchestrator
│   └── llm/
│       └── ollama_client.py Ollama wrapper
├── speech/
│   ├── wake_word.py         OpenWakeWord listener
│   ├── stt.py               Faster-Whisper STT
│   └── tts.py               Piper / Coqui TTS
├── memory/
│   └── long_term.py         ChromaDB + SQLite memory
├── vision/
│   └── screen_capture.py   OCR + UI detection + LLaVA
├── automation/
│   └── executor.py          PyAutoGUI + Playwright
├── avatar/
│   ├── avatar_controller.py WebSocket bridge to Godot
│   └── godot_project/       Godot 4 avatar scene
├── api/
│   └── main.py              FastAPI backend (entry point)
├── config/
│   └── settings.yaml
├── requirements.txt
└── setup.py

Hardware Requirements

Tier	GPU	RAM	Performance
Minimum	GTX 1060 6GB	16GB	Good — 7B models, ~1.5s latency
Recommended	RTX 3060 12GB	32GB	Excellent — 13B models, <1s
CPU-only	None	16GB	Degraded — 3B models, ~5s latency

Technology Stack

Component	Technology	Reason
LLM	Qwen2.5-7B via Ollama	Best reasoning per GB
STT	Faster-Whisper	4× faster than Whisper, same accuracy
TTS	Piper TTS	50ms latency, 35+ languages
Wake word	OpenWakeWord	Apache 2.0, fully offline, trainable
Agents	LangGraph	Fine-grained control over agent flow
Memory	ChromaDB + SQLite	Vector + structured storage
Vision	LLaVA + EasyOCR	LLM-grade screen understanding
Automation	PyAutoGUI + Playwright	GUI + browser control
Avatar	Godot 4	MIT license, WebSocket API
Backend	FastAPI	Async, WebSocket support

Research Contributions

Emotion-aware memory retrieval — weights past memories by emotional context
Proactive task anticipation — learns user patterns and suggests actions
Visual workflow recording — records human actions → replayable plan
Cross-app context transfer — shares context between applications

API Reference

POST /chat

{"text": "Open Chrome and search for deep learning", "session_id": "user1"}

Response:

{
  "response": "[HAPPY] Opening Chrome and searching...",
  "emotion": "happy",
  "intent": "automation",
  "action_plan": [...],
  "latency_ms": 1240
}

WebSocket /ws

Connect to ws://localhost:8000/ws and send:

{"text": "What's on my screen?", "session_id": "user1"}

GET /health

{"status": "ok", "ollama": true, "avatar": true}

License

MIT — Free for academic and personal use.

Nano-

An AI-powered desktop virtual assistant with an animated avatar, voice interaction, system automation, computer vision, memory, and LLM-based reasoning using free and open-source AI models.

f83a25afb1ceae62cd8188772f013fdc896457e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jarvis — AI-Driven Desktop Virtual Assistant

Architecture at a Glance

Quick Start

Project Structure

Hardware Requirements

Technology Stack

Research Contributions

API Reference

POST /chat

WebSocket /ws

GET /health

License

MIT — Free for academic and personal use.

Nano-

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
api		api
automation		automation
avatar		avatar
core		core
memory		memory
research		research
speech		speech
ui		ui
vision		vision
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Jarvis — AI-Driven Desktop Virtual Assistant

Architecture at a Glance

Quick Start

Project Structure

Hardware Requirements

Technology Stack

Research Contributions

API Reference

POST /chat

WebSocket /ws

GET /health

License

MIT — Free for academic and personal use.

Nano-

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages