Turn any image into agent-friendly JSON — local OCR and image understanding on macOS.
macvision wraps Apple's Vision framework in a tiny Swift binary. Point it at a screenshot, photo, or scan and get back text, scene labels, and detected faces, barcodes, and documents — all as compact JSON, all processed on your Mac. There's no big model to download and nothing is uploaded. Run it wherever you'd otherwise pay for an LLM vision call: OCR the image locally for free, then send only the text to your model.
中文说明见 README.cn.md。
- No big model to download — runs on Apple's built-in
Visionframework; nothing to fetch, cache, or load. - Cuts your LLM vision bill — OCR and detection run free, on-device; send the text to your model instead of paying per image.
- Protects your privacy — nothing is uploaded — every image is processed locally on your Mac.
- Agent-friendly JSON — compact single-line output and a FIFO daemon, so it drops straight into
jqpipelines and agent loops. - Full Vision surface — OCR, classification, face/barcode/document detection, document segmentation, saliency heatmaps, and image feature-prints.
Docs: ljh-sh.github.io/macvision
Paste this one-line prompt into Claude Code, Cursor, or any agent's system prompt:
Use `macvision` to read images on macOS (OCR, classify, detect). Install if missing: `brew install ljh-sh/cli/macvision`. JSON output, check `ok`. Run `macvision --help` for subcommands.The classic agent loop — screenshot → read it → reason about it — becomes one pipe:
screencapture -i /tmp/s.png
macvision ocr /tmp/s.png --lang zh-Hans,en-US | jq -r '.texts[].text'brew install ljh-sh/cli/macvisionOr tap once, then use the short name:
brew tap ljh-sh/cli
brew install macvisioncurl -L https://github.com/ljh-sh/macvision/releases/latest/download/macvision-darwin-universal.tar.xz | tar xJ -
sudo mv bin/macvision /usr/local/bin/The universal tarball is a fat Mach-O (arm64 + x86_64) — works on Apple Silicon and Intel Macs.
Requires Swift 5.10+ / macOS 13+.
git clone https://github.com/ljh-sh/macvision
cd macvision
swift build -c releasemacvision ocr ./screenshot.png # extract text
macvision ocr ./screenshot.png --lang zh-Hans,en-US # Chinese + English
macvision ocr - # read base64 image from stdin
macvision classify ./photo.jpg --top 5 # scene/object labels
macvision classify ./photo.jpg --animals # animal species
macvision detect ./photo.jpg # faces, barcodes, text regions, horizon
macvision detect ./shot.png --ocr --lang zh-Hans,en-US # broad + read the text
macvision detect ./card.jpg --rects # document/card rectangles
macvision detect ./qr.png --barcodes --symbologies qr # barcodes / QR only
macvision document ./scan.jpg # document outline (for crop/deskew)
macvision salient ./photo.jpg # saliency heatmap PNG
macvision feature ./a.jpg # image fingerprint vector
macvision feature ./a.jpg --compare ./b.jpg # distance between two images
macvision doctor # environment + capability checkImage input accepts a file path, - for base64 on stdin, or --clipboard / --screen to read the clipboard or take a fresh screenshot.
Output is JSON by default:
{"ok":true,"image":"./screenshot.png","width":1920,"height":1080,"languages":["zh-Hans","en-US"],"count":3,"texts":[{"text":"你好世界","confidence":0.97,"bbox":[60,495,515,30],"norm":[0.05,0.77,0.43,0.05]}]}Bounding boxes are pixel coordinates [x, y, w, h] with the origin at the top-left of the image (the convention agents need for screen coordinates). norm is the same box normalized to [0,1].
For agents that make many vision calls, macvision daemon keeps the framework warm and serves requests over named pipes (no HTTP, no port):
macvision daemon --req /tmp/macvision.req --res /tmp/macvision.res &
echo '{"action":"ocr","image":"/tmp/s.png","lang":["zh-Hans","en-US"]}' > /tmp/macvision.req
cat /tmp/macvision.res # one NDJSON response line per requestSee docs/subcommands.md for the request schema.
See docs/faq.md or the published FAQ for permissions, screencapture, coordinate conventions, and how macvision compares to Tesseract and cloud OCR.
- Small surface:
ocr,classify,detect,salient,document,feature,daemon,doctor. - JSON output: compact single-line JSON, easy to pipe to
jq. - No run loop:
Vision'sperform(_:)is synchronous, so — unlike audio tools — macvision never spins up anNSApplicationfor image work. - FIFO IPC: the daemon speaks NDJSON over named pipes, matching the shell-native style of the surrounding toolchain.
See CONTRIBUTING.md and ROADMAP.md.
See SECURITY.md.
Apache-2.0