GitHub - leochame/lavis: A macOS desktop AI agent that perceives your screen, controls mouse & keyboard, and supports voice interaction. Built with Spring Boot, React, and Electron.

Lavis - macOS System-level Multimodal AI Agent

Lavis is a macOS desktop AI agent that perceives your screen, controls mouse & keyboard, and supports voice interaction.

中文版本 / Chinese Version: See README-zh.md

Key Features

Visual Perception: Real-time screenshot analysis with Retina support
Autonomous Actions: Mouse, keyboard, and system shortcut control
Reflection Loop: Closed loop for self-correction
System Integration: AppleScript, app control, shell commands
Voice Interaction: Wake word, ASR, TTS
Transparent UI: HUD-style UI showing internal reasoning
Memory Safety: Automatic cleanup for long-running sessions
Context Engineering: Intelligent compression and perceptual deduplication, reducing historical visual tokens by 95%+
Web Search: Deep search agent with up to 5 iterations

Tech Stack

Layer	Technology	Version
Backend	Spring Boot	3.5.9
Language	Java	21
AI Framework	LangChain4j	0.35.0
Frontend	React	19.x
Desktop	Electron	40.x
Build	Vite	7.x
State	Zustand	5.x

Quick Start

Prerequisites

macOS (Intel / Apple Silicon)
JDK 21+
Node.js 18+
At least one LLM API key

1. Configure API Key

Lavis uses Google Gemini API for all AI services (chat, speech-to-text, and text-to-speech). You only need one API key to get started.

Option 1 (recommended): Environment variable

export GEMINI_API_KEY=your_gemini_api_key_here

Option 2: Frontend Settings Panel (Easiest)

Launch the app after starting frontend
Open Settings panel (via menu bar icon or Cmd + K)
Enter your Gemini API key in the settings form
Click "Save"

Option 3: Configuration file

cp src/main/resources/application.properties.example src/main/resources/application.properties
# Edit application.properties and set GEMINI_API_KEY or fill API keys directly

Getting Your Gemini API Key

Visit Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the generated key

Security Note: API keys are stored locally only and never exposed to third parties.

2. Start Backend

./mvnw spring-boot:run

For AOT compilation and stronger reverse-engineering resistance with GraalVM Native Image, see docs/Build-and-Packaging-en.md (English) or docs/Build-and-Packaging-zh.md (中文).

3. Start Frontend

cd frontend
npm install
npm run electron:dev

4. Grant macOS Permissions

Screen Recording: System Settings → Privacy & Security → Screen Recording → enable Lavis
Accessibility: System Settings → Privacy & Security → Accessibility → enable Lavis

Packaging & Distribution

One-Click Build

Lavis supports fully automated one-click packaging, including embedded Java runtime. Anyone can easily package the app without complex configuration.

Prerequisites

macOS (Intel / Apple Silicon)
JDK 21+ (for compiling backend)
Node.js 18+ (for building frontend)
Maven (project includes mvnw, no separate installation needed)

Quick Build

cd frontend
npm install  # Install dependencies for first run
npm run package

This command will automatically:

✅ Check prerequisites (Java, Maven, Node.js)
✅ Build Java backend JAR
✅ Auto-download JRE for current architecture (arm64 or x64)
✅ Build frontend code
✅ Compile Electron main process
✅ Package app with electron-builder
✅ Generate DMG installer

Features

✅ Embedded Java - JRE 21 embedded, no Java installation required for end users
✅ Auto-start - Backend service starts automatically on launch
✅ Cross-architecture - Automatically detects and packages current architecture (arm64/x64)
✅ User-friendly - DMG includes auto-install script and instructions

Output

After packaging, you'll find in frontend/dist-electron/:

Lavis-1.0.0-arm64.dmg (Apple Silicon) or Lavis-1.0.0-x64.dmg (Intel)
Lavis-1.0.0-arm64.zip (alternative format)

Installation Instructions

The DMG package includes:

Lavis.app - Main application
自动安装.command - One-click install script (recommended)
安装说明.rtf - Detailed installation instructions

First-time Installation:

Double-click the DMG file to open
Double-click 自动安装.command script (recommended)
- If security prompt appears, click "Open"
- Script will automatically handle permissions and install the app
Or manually drag Lavis.app to Applications folder
On first launch, if you see "app is damaged" message:
- Method 1: Right-click app → Select "Open" → Click "Open" in dialog
- Method 2: Run in terminal: xattr -dr com.apple.quarantine /Applications/Lavis.app
- This is macOS security mechanism, only needed once

Common Issues

Q: Build fails, can't find Java?
A: Make sure JDK 21+ is installed, verify with java -version in terminal.

Q: Build fails, can't find Maven?
A: Project includes mvnw (Maven Wrapper), no separate Maven installation needed.

Q: Build takes a long time?
A: First build downloads JRE (~150MB), subsequent builds are faster.

Q: How to package for other architectures?
A: Run the build command on a machine with that architecture, electron-builder auto-detects it.

Detailed Documentation:

Complete build guide, debugging, troubleshooting: docs/Build-and-Packaging-en.md
GraalVM Native Image advanced option: docs/Build-and-Packaging-en.md

Project Structure

lavis/
├── src/main/java/com/lavis/        # Java backend
│   ├── cognitive/                  # Cognitive logic
│   ├── perception/                 # Perception (screen)
│   ├── action/                     # Actions
│   ├── controller/                 # REST API
│   ├── websocket/                  # WebSocket
│   ├── service/                    # Services (TTS/ASR)
│   ├── scheduler/                  # Scheduler (Cron tasks + history)
│   ├── skills/                     # Skills plugin system (SKILL.md, dynamic load)
│   ├── memory/                     # Memory & context engineering (sessions, images)
│   ├── entity/                     # JPA entities (tasks, logs, sessions, skills, etc.)
│   └── repository/                 # JPA repositories (SQLite)
├── frontend/                       # Electron + React frontend
│   ├── electron/                  # Electron main process (tray, windows, shortcuts)
│   │   ├── main.ts                 # Main process entry
│   │   ├── backend-manager.ts     # Backend process manager
│   │   └── preload.ts             # Preload script
│   ├── src/                        # React UI & hooks (including Skills/Scheduler management)
│   │   ├── components/            # UI components
│   │   │   ├── Capsule.tsx        # Floating capsule UI
│   │   │   ├── ChatPanel.tsx      # Chat interface (virtual scroll)
│   │   │   ├── TaskPanel.tsx      # Task progress panel
│   │   │   ├── SkillsPanel.tsx    # Skills management
│   │   │   ├── SchedulerPanel.tsx # Scheduler management
│   │   │   └── VoicePanel.tsx     # Voice interaction panel
│   │   ├── hooks/                 # React Hooks
│   │   │   ├── useWebSocket.ts    # WebSocket connection
│   │   │   ├── useVoskWakeWord.ts # Vosk wake word detection
│   │   │   └── useVoiceRecorder.ts # Voice recording
│   │   └── store/                 # Zustand state management
│   └── scripts/                    # Packaging and dev tools
│       ├── package.js             # One-click packaging script
│       └── test-packaged-app.sh   # Test packaged app
├── docs/                           # Documentation
│   ├── User-Guide-en.md           # User guide
│   ├── Build-and-Packaging-en.md  # Build & packaging guide (English)
│   ├── Build-and-Packaging-zh.md  # Build & packaging guide (Chinese)
│   └── ARCHITECTURE.md            # Architecture documentation

REST API Overview

Method	Endpoint	Description
GET	`/api/agent/status`	Get system status
POST	`/api/agent/chat`	Chat with screenshot context
POST	`/api/agent/task`	Execute automation task
POST	`/api/agent/stop`	Emergency stop
POST	`/api/agent/reset`	Reset state
GET	`/api/agent/screenshot`	Get screenshot
POST	`/api/agent/tts`	Text-to-speech
GET	`/api/agent/history`	Get task history

Examples

# Check status
curl http://localhost:18765/api/agent/status

# Send a chat message
curl -X POST http://localhost:18765/api/agent/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is displayed on the screen?"}'

# Execute a task
curl -X POST http://localhost:18765/api/agent/task \
  -H "Content-Type: application/json" \
  -d '{"goal": "Open Safari and search for weather"}'

Frontend Development

Tech Stack

Technology	Version	Purpose
Electron	40.x	Desktop app shell
React	19.x	UI framework
TypeScript	5.9.x	Type safety
Vite	7.x	Build tool
Zustand	5.x	State management
react-window	-	Virtual scrolling

Window States

State	Description	Window Size	Trigger
Idle	Dormant/standby	Hidden or tray only	Default
Listening	Voice wake/listening	Mini (200x60px)	Wake word detected
Expanded	Full interaction	Full (800x600px)	Double-click capsule

Global Shortcuts

Shortcut	Action
`Alt+Space`	Toggle capsule/chat window
`Cmd+K`	Quick chat
`Escape`	Hide window

Development Commands

# Install dependencies
cd frontend
npm install

# Start Vite dev server only
npm run dev

# Start Electron with hot reload (recommended)
npm run electron:dev

# Build for production
npm run build

Voice Interaction

Wake Word Detection: Uses Vosk for offline wake word detection
Voice Recording: Browser MediaRecorder API
TTS Playback: Backend TTS proxy with audio streaming

Documentation

docs/User-Guide-en.md
User guide: Installation, running, permissions, basic usage.
docs/Build-and-Packaging-en.md / docs/Build-and-Packaging-zh.md
Complete build & packaging guide: Development mode, one-click packaging (JAR method), GraalVM Native Image (advanced option), debugging, troubleshooting.
docs/ARCHITECTURE.md
System architecture, data flow details, and development history.

Security & Privacy

All automation runs locally; screenshots are transient and used only for visual reasoning.
API keys live in local env/config only and are never exposed to the frontend or third parties.
GraalVM Native Image packaging removes .class files, making reverse engineering significantly harder.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.mvn/wrapper		.mvn/wrapper
docs		docs
frontend		frontend
src		src
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README-zh.md		README-zh.md
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lavis - macOS System-level Multimodal AI Agent

Key Features

Tech Stack

Quick Start

Prerequisites

1. Configure API Key

Getting Your Gemini API Key

2. Start Backend

3. Start Frontend

4. Grant macOS Permissions

Packaging & Distribution

One-Click Build

Prerequisites

Quick Build

Features

Output

Installation Instructions

Common Issues

Project Structure

REST API Overview

Frontend Development

Tech Stack

Window States

Global Shortcuts

Development Commands

Voice Interaction

Documentation

Security & Privacy

License

About

Uh oh!

Releases

Packages

Languages

leochame/lavis

Folders and files

Latest commit

History

Repository files navigation

Lavis - macOS System-level Multimodal AI Agent

Key Features

Tech Stack

Quick Start

Prerequisites

1. Configure API Key

Getting Your Gemini API Key

2. Start Backend

3. Start Frontend

4. Grant macOS Permissions

Packaging & Distribution

One-Click Build

Prerequisites

Quick Build

Features

Output

Installation Instructions

Common Issues

Project Structure

REST API Overview

Frontend Development

Tech Stack

Window States

Global Shortcuts

Development Commands

Voice Interaction

Documentation

Security & Privacy

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages