An intelligent image analysis and cropping tool that uses vision language models to detect subjects and generate optimal crops for various aspect ratios.
- Intelligent Subject Detection: Automatically detects the primary subject in images using vision models
- Smart Cropping: Generates optimally cropped versions in multiple aspect ratios while preserving the main subject
- Multiple Backend Support: Works with both Ollama and llama.cpp servers (OpenAI-compatible API)
- Flexible Output Formats: Supports JPEG, PNG, and WebP output formats
- Debug Overlays: Optional visualization of detected subjects and crop boundaries
- Batch Processing: Process multiple target sizes in a single run
- URL Support: Load images directly from HTTP/HTTPS URLs
- Go 1.24.6 or later
- Either:
- Ollama installed and running with a vision model
- llama.cpp server with a compatible vision model (e.g., MiniCPM-V)
git clone https://github.com/menta2k/image-analyzer.git
cd image-analyzer
go build -o image-analyzer cmd/image-analyzer/main.go- Start llama.cpp server with a vision model:
# Using Docker Compose (recommended)
docker-compose -f docker-compose.minicpm.yml up
# Or manually with llama.cpp
./llama-server \
-m models/ggml-model-Q4_K_M.gguf \
--mmproj models/mmproj-model-f16.gguf \
-c 8192 \
--host 0.0.0.0 \
--port 8080- Run the analyzer:
./image-analyzer -in input.jpg- Install Ollama and pull a vision model:
ollama pull minicpm-v:latest
# or
ollama pull llava- Run the analyzer with Ollama backend:
./image-analyzer -in input.jpg -backend ollama# Analyze local image with llama.cpp (default)
./image-analyzer -in photo.jpg
# Analyze image from URL
./image-analyzer -in "https://example.com/image.jpg"
# Use Ollama backend with specific model
./image-analyzer -in photo.jpg -backend ollama -model llava
# Custom output directory and format
./image-analyzer -in photo.jpg -out results/ -ext webp -quality 95# Full control over processing
./image-analyzer \
-in input.jpg \
-backend llamacpp \
-url http://localhost:8080 \
-model openbmb/minicpm-v4.5 \
-out crops/ \
-ext webp \
-quality 95 \
-lossless false \
-zoom 0.9 \
-debug \
-sendfmt png \
-sendsize 2048 \
-sendq 90| Flag | Default | Description |
|---|---|---|
-in |
(required) | Input image path or URL (jpg/png/webp) |
-backend |
llamacpp |
Backend to use: ollama or llamacpp |
-url |
Auto | Server URL (defaults: ollama=http://localhost:11435/api/chat, llamacpp=http://localhost:8080) |
-model |
openbmb/minicpm-v4.5 |
Model name to use |
-out |
out |
Output directory for processed images |
| Flag | Default | Description |
|---|---|---|
-ext |
jpg |
Output format: jpg, png, or webp |
-quality |
90 |
JPEG/WebP quality (1-100) |
-lossless |
false |
Enable lossless WebP mode |
-zoom |
1.0 |
Zoom factor for crops (0.01-1.0) |
-debug |
false |
Create debug overlay images |
| Flag | Default | Description |
|---|---|---|
-sendfmt |
jpg |
Format sent to model: jpg or png |
-sendsize |
1536 |
Max dimension for model input (0=original) |
-sendq |
85 |
JPEG quality for model input |
| Flag | Default | Description |
|---|---|---|
-dbgext |
png |
Debug overlay format |
-dbgquality |
92 |
Debug overlay quality |
-dbglossless |
false |
Debug overlay WebP lossless |
The tool generates:
Multiple crops in different aspect ratios:
001_1200x675_A.jpg- 16:9 landscape002_1200x800_A.jpg- 3:2 landscape003_400x250_A.jpg- 8:5 small004_600x400_A.jpg- 3:2 medium005_1200x630_A.jpg- Social media optimized
model_output.json- Detection results with:- Primary subject label and confidence
- Bounding box coordinates (normalized 0-1)
- Description and tags
000_original_with_box.png- Original with detected subject (green box)001_debug_1200x675_A.png- Crop overlays showing:- Green: Detected subject box
- Red: Crop boundary
- Blue/Cyan: Center points
import (
"context"
"github.com/menta2k/image-analyzer/pkg/detection"
"github.com/menta2k/image-analyzer/pkg/llamacpp"
"github.com/menta2k/image-analyzer/pkg/processing"
)
func main() {
// Create components
processor := processing.NewProcessor()
client, _ := llamacpp.NewClient("http://localhost:8080")
detector := detection.NewDetector(client)
// Load and prepare image
img, _ := processor.LoadImageSmart("photo.jpg")
imgB64, _ := processor.PrepareImageForModel(img, "jpg", 1536, 85)
// Detect subject
result, _ := detector.DetectSubject(context.Background(), "model", imgB64)
// Generate crop
cx, cy := processor.FindNearestPointToCenter(result.Primary.Box)
cropBox := processor.CalculateOptimalCropBox(cx, cy, 1200, 675,
img.Bounds().Dx(), img.Bounds().Dy(), 1.0)
cropped, _ := processor.CropImageToBox(img, cropBox, 1200, 675)
// Save result
processor.SaveImage(cropped, "output.jpg", "jpg", 90, false)
}// Use custom prompt for specific detection needs
customPrompt := `Detect the main person's face in this image...`
result, _ := detector.DetectSubjectWithPrompt(
ctx, "model", imgB64, customPrompt
)image-analyzer/
├── cmd/
│ └── image-analyzer/ # CLI application
├── pkg/
│ ├── client/ # Backend interface
│ ├── detection/ # Subject detection logic
│ ├── llamacpp/ # llama.cpp client (OpenAI-compatible)
│ ├── ollama/ # Ollama client
│ ├── processing/ # Image processing and cropping
│ └── types/ # Shared data types
├── contrib/
│ └── models/ # Model storage (for Docker)
├── example/ # Example usage
└── docker-compose.minicpm.yml
Box: Normalized bounding box (0-1 coordinates)Primary: Detected subject with confidenceAnalysisResult: Complete detection result
NewDetector(client): Create detector with backend clientDetectSubject(): Detect with default promptDetectSubjectWithPrompt(): Custom detection prompt
LoadImageSmart(): Load from file or URLPrepareImageForModel(): Optimize for model inputCalculateOptimalCropBox(): Smart crop calculationCropImageToBox(): Execute cropCreateDebugOverlay(): Visualization
pkg/llamacpp: OpenAI-compatible API clientpkg/ollama: Ollama-specific client
- MiniCPM-V 4.5 (recommended)
- Any GGUF vision model with multimodal projector
- Models compatible with OpenAI vision API
- minicpm-v (all versions)
- llava (all variants)
- Any Ollama-compatible vision model
Use the provided Docker Compose for easy deployment:
version: '3.8'
services:
minicpmv:
image: ghcr.io/ggml-org/llama.cpp:full-cuda
command: >
--server
-m /models/ggml-model-Q4_K_M.gguf
--mmproj /models/mmproj-model-f16.gguf
-c 8192
-np 2
-ngl 999
--host 0.0.0.0
--port 8080
ports:
- "8080:8080"
volumes:
- ./contrib/models:/models- Model Input Size: Reduce
-sendsizefor faster processing (default 1536px) - Model Selection: Q4_K_M quantization offers good speed/quality balance
- GPU Acceleration: Use CUDA-enabled builds for 10x+ speedup
- Batch Processing: Tool processes multiple crops efficiently in one run
- Image Formats: JPEG with 85-90 quality is optimal for model input
Contributions are welcome! Please feel free to submit pull requests or open issues.
MIT