Image Analyzer

An intelligent image analysis and cropping tool that uses vision language models to detect subjects and generate optimal crops for various aspect ratios.

Features

Intelligent Subject Detection: Automatically detects the primary subject in images using vision models
Smart Cropping: Generates optimally cropped versions in multiple aspect ratios while preserving the main subject
Multiple Backend Support: Works with both Ollama and llama.cpp servers (OpenAI-compatible API)
Flexible Output Formats: Supports JPEG, PNG, and WebP output formats
Debug Overlays: Optional visualization of detected subjects and crop boundaries
Batch Processing: Process multiple target sizes in a single run
URL Support: Load images directly from HTTP/HTTPS URLs

Installation

Prerequisites

Go 1.24.6 or later
Either:
- Ollama installed and running with a vision model
- llama.cpp server with a compatible vision model (e.g., MiniCPM-V)

Build from Source

git clone https://github.com/menta2k/image-analyzer.git
cd image-analyzer
go build -o image-analyzer cmd/image-analyzer/main.go

Quick Start

Using llama.cpp Server (Default)

Start llama.cpp server with a vision model:

# Using Docker Compose (recommended)
docker-compose -f docker-compose.minicpm.yml up

# Or manually with llama.cpp
./llama-server \
  -m models/ggml-model-Q4_K_M.gguf \
  --mmproj models/mmproj-model-f16.gguf \
  -c 8192 \
  --host 0.0.0.0 \
  --port 8080

Run the analyzer:

./image-analyzer -in input.jpg

Using Ollama

Install Ollama and pull a vision model:

ollama pull minicpm-v:latest
# or
ollama pull llava

Run the analyzer with Ollama backend:

./image-analyzer -in input.jpg -backend ollama

Usage Examples

Basic Usage

# Analyze local image with llama.cpp (default)
./image-analyzer -in photo.jpg

# Analyze image from URL
./image-analyzer -in "https://example.com/image.jpg"

# Use Ollama backend with specific model
./image-analyzer -in photo.jpg -backend ollama -model llava

# Custom output directory and format
./image-analyzer -in photo.jpg -out results/ -ext webp -quality 95

Advanced Options

# Full control over processing
./image-analyzer \
  -in input.jpg \
  -backend llamacpp \
  -url http://localhost:8080 \
  -model openbmb/minicpm-v4.5 \
  -out crops/ \
  -ext webp \
  -quality 95 \
  -lossless false \
  -zoom 0.9 \
  -debug \
  -sendfmt png \
  -sendsize 2048 \
  -sendq 90

Command Line Options

Core Options

Flag	Default	Description
`-in`	(required)	Input image path or URL (jpg/png/webp)
`-backend`	`llamacpp`	Backend to use: `ollama` or `llamacpp`
`-url`	Auto	Server URL (defaults: ollama=http://localhost:11435/api/chat, llamacpp=http://localhost:8080)
`-model`	`openbmb/minicpm-v4.5`	Model name to use
`-out`	`out`	Output directory for processed images

Output Options

Flag	Default	Description
`-ext`	`jpg`	Output format: `jpg`, `png`, or `webp`
`-quality`	`90`	JPEG/WebP quality (1-100)
`-lossless`	`false`	Enable lossless WebP mode
`-zoom`	`1.0`	Zoom factor for crops (0.01-1.0)
`-debug`	`false`	Create debug overlay images

Model Input Options

Flag	Default	Description
`-sendfmt`	`jpg`	Format sent to model: `jpg` or `png`
`-sendsize`	`1536`	Max dimension for model input (0=original)
`-sendq`	`85`	JPEG quality for model input

Debug Overlay Options

Flag	Default	Description
`-dbgext`	`png`	Debug overlay format
`-dbgquality`	`92`	Debug overlay quality
`-dbglossless`	`false`	Debug overlay WebP lossless

Output Files

The tool generates:

Cropped Images

Multiple crops in different aspect ratios:

001_1200x675_A.jpg - 16:9 landscape
002_1200x800_A.jpg - 3:2 landscape
003_400x250_A.jpg - 8:5 small
004_600x400_A.jpg - 3:2 medium
005_1200x630_A.jpg - Social media optimized

Analysis Results

model_output.json - Detection results with:
- Primary subject label and confidence
- Bounding box coordinates (normalized 0-1)
- Description and tags

Debug Overlays (with `-debug` flag)

000_original_with_box.png - Original with detected subject (green box)
001_debug_1200x675_A.png - Crop overlays showing:
- Green: Detected subject box
- Red: Crop boundary
- Blue/Cyan: Center points

API Usage

Basic Integration

import (
    "context"
    "github.com/menta2k/image-analyzer/pkg/detection"
    "github.com/menta2k/image-analyzer/pkg/llamacpp"
    "github.com/menta2k/image-analyzer/pkg/processing"
)

func main() {
    // Create components
    processor := processing.NewProcessor()
    client, _ := llamacpp.NewClient("http://localhost:8080")
    detector := detection.NewDetector(client)

    // Load and prepare image
    img, _ := processor.LoadImageSmart("photo.jpg")
    imgB64, _ := processor.PrepareImageForModel(img, "jpg", 1536, 85)

    // Detect subject
    result, _ := detector.DetectSubject(context.Background(), "model", imgB64)

    // Generate crop
    cx, cy := processor.FindNearestPointToCenter(result.Primary.Box)
    cropBox := processor.CalculateOptimalCropBox(cx, cy, 1200, 675,
        img.Bounds().Dx(), img.Bounds().Dy(), 1.0)
    cropped, _ := processor.CropImageToBox(img, cropBox, 1200, 675)

    // Save result
    processor.SaveImage(cropped, "output.jpg", "jpg", 90, false)
}

Custom Detection Prompts

// Use custom prompt for specific detection needs
customPrompt := `Detect the main person's face in this image...`
result, _ := detector.DetectSubjectWithPrompt(
    ctx, "model", imgB64, customPrompt
)

Architecture

image-analyzer/
├── cmd/
│   └── image-analyzer/      # CLI application
├── pkg/
│   ├── client/              # Backend interface
│   ├── detection/           # Subject detection logic
│   ├── llamacpp/            # llama.cpp client (OpenAI-compatible)
│   ├── ollama/              # Ollama client
│   ├── processing/          # Image processing and cropping
│   └── types/               # Shared data types
├── contrib/
│   └── models/              # Model storage (for Docker)
├── example/                 # Example usage
└── docker-compose.minicpm.yml

Package APIs

Core Types (`pkg/types`)

Box: Normalized bounding box (0-1 coordinates)
Primary: Detected subject with confidence
AnalysisResult: Complete detection result

Detection (`pkg/detection`)

NewDetector(client): Create detector with backend client
DetectSubject(): Detect with default prompt
DetectSubjectWithPrompt(): Custom detection prompt

Processing (`pkg/processing`)

LoadImageSmart(): Load from file or URL
PrepareImageForModel(): Optimize for model input
CalculateOptimalCropBox(): Smart crop calculation
CropImageToBox(): Execute crop
CreateDebugOverlay(): Visualization

Backends

pkg/llamacpp: OpenAI-compatible API client
pkg/ollama: Ollama-specific client

Supported Models

Via llama.cpp

MiniCPM-V 4.5 (recommended)
Any GGUF vision model with multimodal projector
Models compatible with OpenAI vision API

Via Ollama

minicpm-v (all versions)
llava (all variants)
Any Ollama-compatible vision model

Docker Deployment

Use the provided Docker Compose for easy deployment:

version: '3.8'

services:
  minicpmv:
    image: ghcr.io/ggml-org/llama.cpp:full-cuda
    command: >
      --server
      -m /models/ggml-model-Q4_K_M.gguf
      --mmproj /models/mmproj-model-f16.gguf
      -c 8192
      -np 2
      -ngl 999
      --host 0.0.0.0
      --port 8080
    ports:
      - "8080:8080"
    volumes:
      - ./contrib/models:/models

Performance Tips

Model Input Size: Reduce -sendsize for faster processing (default 1536px)
Model Selection: Q4_K_M quantization offers good speed/quality balance
GPU Acceleration: Use CUDA-enabled builds for 10x+ speedup
Batch Processing: Tool processes multiple crops efficiently in one run
Image Formats: JPEG with 85-90 quality is optimal for model input

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

License

MIT

Acknowledgments

Ollama - Model serving infrastructure
llama.cpp - Efficient inference engine
MiniCPM-V - Vision language model
Imaging - Go image processing

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cmd/image-analyzer		cmd/image-analyzer
contrib/model		contrib/model
pkg		pkg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.minicpm.yml		docker-compose.minicpm.yml
go.mod		go.mod
go.sum		go.sum

License

menta2k/image-analyzer

Folders and files

Latest commit

History

Repository files navigation

Image Analyzer

Features

Installation

Prerequisites

Build from Source

Quick Start

Using llama.cpp Server (Default)

Using Ollama

Usage Examples

Basic Usage

Advanced Options

Command Line Options

Core Options

Output Options

Model Input Options

Debug Overlay Options

Output Files

Cropped Images

Analysis Results

Debug Overlays (with -debug flag)

API Usage

Basic Integration

Custom Detection Prompts

Architecture

Package APIs

Core Types (pkg/types)

Detection (pkg/detection)

Processing (pkg/processing)

Backends

Supported Models

Via llama.cpp

Via Ollama

Docker Deployment

Performance Tips

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Debug Overlays (with `-debug` flag)

Core Types (`pkg/types`)

Detection (`pkg/detection`)

Processing (`pkg/processing`)

Packages