Skip to content

This is an MCP (Model Context Protocol) server for analyzing UI screenshots and design images, integrating with GitHub Copilot. The server uses local AI models (Gemma 3 12B via Ollama) combined with computer vision tools (YOLOv8, OpenCV) to detect UI bugs, identify components, extract color palettes, and generate implementation suggestions.

Notifications You must be signed in to change notification settings

manhhaycode/mcp-ui-analyzer

Repository files navigation

MCP UI Screenshot Analyzer

An MCP (Model Context Protocol) server that integrates with GitHub Copilot to provide AI-powered UI analysis using local models only (zero API costs, privacy-first design).

Features

  • UI Screenshot Analysis: Semantic understanding of UI layouts and structure
  • Color Palette Extraction: Extract dominant colors using OpenCV k-means
  • Text Extraction: OCR capabilities via Gemma 3 vision
  • Bug Detection: Identify layout issues and accessibility problems
  • Depth Levels: Configurable analysis depth (quick/standard/deep)
  • Smart Caching: Performance optimization for repeated analyses

Current Status

Week 1 MVP - COMPLETE

  • ✅ Gemma 3 12B vision integration via Ollama
  • ✅ OpenCV color extraction
  • ✅ Result caching (1-hour TTL)
  • ✅ All 6 MCP tools implemented (4 fully functional)
  • ✅ Error handling and validation
  • ⏳ YOLOv8 component detection (Week 2)
  • ⏳ Code generation (Week 3)

Quick Start

Prerequisites

  • macOS or Linux
  • Python 3.10+
  • 8GB+ RAM (16GB recommended)
  • Ollama installed

Installation

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull Gemma 3 12B model
ollama pull gemma3:12b

# 3. Activate virtual environment (already created)
source venv/bin/activate

# 4. Run the MCP server
python server.py

Verify Installation

# Check Ollama is running
pgrep -x "ollama"

# Check models are available
ollama list

# Test server initialization
python -c "from server import mcp, gemma_analyzer; print('Server ready!')"

MCP Tools

1. analyze_ui_screenshot

Main analysis tool with configurable depth levels.

Parameters:

  • image_path (str): Absolute path to screenshot
  • depth (str): "quick" | "standard" | "deep"

Depth Levels:

  • quick: Gemma 3 only (2-4s) - Basic description
  • standard: Gemma 3 + colors (5-8s) - Detailed analysis
  • deep: Full pipeline (12-18s) - Comprehensive analysis

Example:

# Via GitHub Copilot Chat:
Analyze the UI screenshot at /Users/me/screenshot.png

# Programmatic:
result = analyze_ui_screenshot("/path/to/screenshot.png", depth="standard")

2. extract_color_palette

Extract dominant colors using OpenCV k-means clustering.

Parameters:

  • image_path (str): Absolute path to image
  • n_colors (int): Number of colors (2-10, default: 5)

Returns: Color palette with hex codes, RGB values, and percentages

3. extract_ui_text

Extract text from UI using Gemma 3 OCR capabilities.

Parameters:

  • image_path (str): Absolute path to screenshot

Returns: List of extracted text elements

4. detect_ui_bugs

Detect layout issues and accessibility problems.

Parameters:

  • image_path (str): Absolute path to screenshot

Returns: List of issues with severity and suggestions

5. detect_ui_components

Coming in Week 2 - YOLOv8 integration

6. generate_component_code

Coming in Week 3 - Code generation

GitHub Copilot Integration

Configuration

Create .vscode/mcp-settings.json:

{
  "mcpServers": {
    "ui-analyzer": {
      "command": "python",
      "args": ["/Users/manhhaycode/Developer/image-analysis/server.py"],
      "env": {}
    }
  }
}

Usage in Copilot Chat

Analyze the UI screenshot at /path/to/screenshot.png

Extract colors from /path/to/design.png

Detect bugs in ~/Desktop/app-screenshot.png

Performance

Operation Time (CPU) Time (GPU) Cached
Quick analysis 2-4s 1-2s <1s
Standard analysis 5-8s 2-3s <1s
Deep analysis 12-18s 4-6s <1s
Color extraction <1s <0.5s <0.1s

Cache: Results cached for 1 hour, automatic invalidation

Project Structure

image-analysis/
├── server.py                  # MCP server entry point
├── config.yaml               # Configuration
├── analyzers/
│   ├── gemma_analyzer.py     # Ollama Gemma 3 integration
│   ├── color_extractor.py    # OpenCV color extraction
│   └── __init__.py
├── orchestrator/
│   ├── cache.py              # Result caching
│   └── __init__.py
├── tests/
│   ├── fixtures/             # Sample screenshots
│   └── __init__.py
├── utils/
│   └── __init__.py
└── venv/                     # Virtual environment

Configuration

Edit config.yaml to customize:

vision:
  model: "gemma3:12b"           # Primary model
  fallback: "gemma3:2b"          # Low RAM fallback

performance:
  enable_caching: true
  cache_ttl_seconds: 3600       # 1 hour

color_extraction:
  default_n_colors: 5

Troubleshooting

Server fails to start:

# Verify Ollama is running
pgrep -x "ollama" || ollama serve &

# Check Gemma 3 model
ollama list | grep gemma3:12b

# Reinstall dependencies
pip install -r requirements.txt

Out of memory:

# Use quantized model (6.6GB vs 9GB)
ollama pull gemma3:12b-q4

# Edit config.yaml:
vision:
  model: "gemma3:12b-q4"

Image not found errors:

  • Always use absolute paths
  • Verify file exists: ls -la /path/to/image.png
  • Check file permissions

Development

Running Tests

# (Tests to be implemented)
python -m pytest tests/

Adding New Tools

  1. Implement analyzer in analyzers/
  2. Add MCP tool decorator in server.py
  3. Integrate caching
  4. Update documentation

Roadmap

  • Week 1 (COMPLETE): MVP with Gemma 3 + color extraction + caching ✓
  • Week 2: YOLOv8 component detection
  • Week 3-4: Code generation, comprehensive testing, documentation

Performance Optimization

The system includes several optimizations:

  1. Smart Caching: MD5-based image hashing with 1-hour TTL
  2. Depth Levels: User-controlled trade-off between speed and detail
  3. Lazy Loading: Components loaded only when needed
  4. Error Recovery: Graceful degradation if optional features fail

Hardware Requirements

  • Minimum: 8GB RAM, CPU only (using quantized model)
  • Recommended: 16GB RAM, any GPU
  • Optimal: 32GB RAM, GPU with 8GB+ VRAM

License

MIT License

Contributing

Contributions welcome! Please open issues or PRs on GitHub.

Support

For issues or questions:

  • Check CLAUDE.md for detailed documentation
  • Review troubleshooting section above
  • Open a GitHub issue

About

This is an MCP (Model Context Protocol) server for analyzing UI screenshots and design images, integrating with GitHub Copilot. The server uses local AI models (Gemma 3 12B via Ollama) combined with computer vision tools (YOLOv8, OpenCV) to detect UI bugs, identify components, extract color palettes, and generate implementation suggestions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages