A desktop application that visually observes your screen and autonomously generates practice questions to test your understanding — powered by Gemini 2.0 Flash on Vertex AI.
Built for the Gemini Live Agent Challenge — UI Navigator Category
DeskMate is a proactive AI study coach that silently watches your screen using native OS screenshot capture. It identifies study material (slides, PDFs, textbooks, code tutorials) and automatically generates contextual practice questions — without DOM access, using only Gemini's visual understanding.
- 📸 Native screen monitoring — works with ANY application (PowerPoint, Adobe, browsers, etc.)
- 🧠 Gemini visual understanding — analyzes screenshots to identify study content
- ❓ Real-time question generation — MCQ, short answer, and explanation questions
- ✅ AI answer evaluation — encouraging, honest feedback with hints
- 📊 Session summary — topics covered, score %, weak areas
- 🎯 Screen annotations — highlights relevant screen regions for each question
| Layer | Technology | Purpose |
|---|---|---|
| Desktop wrapper | Tauri v2 (Rust) | Native app + sidecar management |
| Frontend | HTML/CSS/Vanilla JS | Question UI inside Tauri window |
| Screen capture | Python + mss | Native OS-level screenshot (any app) |
| Transport | WebSocket | Real-time screenshot streaming |
| Backend | FastAPI (Python) | API server |
| AI Framework | Google GenAI SDK | Gemini integration |
| AI Model | Gemini 2.0 Flash | Visual understanding + question gen |
| Cloud Hosting | Google Cloud Run | Serverless backend |
| AI Platform | Vertex AI | Managed Gemini API (GCP compliant) |
deskmate/
├── backend/ # Python FastAPI — deployed to Cloud Run
│ ├── main.py
│ ├── agent/
│ │ ├── __init__.py
│ │ ├── deskmate_agent.py # Core agent logic
│ │ ├── question_generator.py # Gemini API caller
│ │ └── session_manager.py # In-memory session tracking
│ ├── requirements.txt
│ └── Dockerfile
├── desktop/ # Tauri desktop app
│ ├── src-tauri/
│ │ ├── Cargo.toml
│ │ ├── tauri.conf.json
│ │ └── src/
│ │ ├── main.rs # Tauri main + screenshot command
│ │ └── annotation_window.rs # Screen annotation overlay
│ ├── src/
│ │ ├── index.html
│ │ ├── annotation.html
│ │ ├── style.css
│ │ └── app.js # WebSocket client + UI logic
│ └── package.json
├── screencapture/ # Python sidecar for native screenshots
│ ├── capture.py
│ └── requirements.txt
├── .env.example
├── .github/workflows/build.yml # CI/CD pipeline
├── cloudbuild.yaml
├── gcp_proof/
│ └── vertex_ai_call.py
├── ARCHITECTURE.md
├── SUBMISSION_DESCRIPTION.md
├── DEMO_SCRIPT.md
└── README.md
- Rust — rustup.rs
- Node.js v18+ — nodejs.org
- Python 3.9+
- Google Cloud CLI — cloud.google.com/sdk
- Tauri CLI —
npm install -g @tauri-apps/cli
gcloud auth login
gcloud auth application-default login
gcloud config set project deskmate-488522
gcloud services enable aiplatform.googleapis.com1. Backend
cd backend
pip install -r requirements.txt
# Set your GCP Project ID (see .env.example)
export GCP_PROJECT_ID=deskmate-488522 # Linux/Mac
set GCP_PROJECT_ID=deskmate-488522 # Windows CMD
$env:GCP_PROJECT_ID="deskmate-488522" # PowerShell
uvicorn main:app --reload --port 80802. Screen Capture Sidecar
cd screencapture
pip install -r requirements.txt3. Desktop App
cd desktop
npm install
npm run tauri devgcloud auth login
gcloud config set project deskmate-488522
gcloud builds submit --config cloudbuild.yaml- Health endpoint:
https://deskmate-backend-<hash>.run.app/health - Vertex AI proof script:
python gcp_proof/vertex_ai_call.py
This project was built for the Gemini Live Agent Challenge.