Skip to content

Image Vision version 0.2.0

Latest

Choose a tag to compare

@kipcole9 kipcole9 released this 03 May 01:25
· 2 commits to main since this release

[0.2.0] 2026-05-02

Added

  • Image.FaceDetection — fast face detection with bounding boxes, confidence scores, and the five canonical facial landmarks (right eye, left eye, nose tip, right mouth corner, left mouth corner). Default model is YuNet 2023-March hosted at opencv/face_detection_yunet — MIT licensed, ~340 KB on disk, real-time on CPU. Functions: detect/2, boxes/2, crop_largest/2, draw_boxes/3. The crop_largest/2 helper is the wire-in point for face-aware crop bias used by sibling image_plug (gravity: :face, ImageKit z-, Cloudflare face-zoom).

  • Image.Background — class-agnostic foreground/background separation. remove/2 returns the input image with the background made transparent (alpha mask applied); mask/2 returns the foreground mask alone for custom compositing. Default model is BiRefNet lite (MIT, ~210 MB), powered by Ortex.

  • Image.Captioning — natural-language description of an image. caption/2 returns a string like "a man riding a horse with a bird of prey". Default model is BLIP base (BSD-3-Clause, ~990 MB), powered by Bumblebee. Heavy enough that it is not autostarted by default; configure autostart: true or add the child spec to your supervisor.

  • Image.ZeroShot — classify an image against arbitrary labels you supply at call time, no retraining. classify/3 returns [%{label, score}] sorted descending; label/3 returns just the best label; similarity/3 computes CLIP-space cosine similarity between two images. Default model is OpenAI CLIP ViT-B/32 (MIT, ~600 MB), powered by Bumblebee. Default prompt template "a photo of {label}" boosts accuracy on bare-noun labels; override or disable as needed.

  • New flags --background, --caption, and --zero-shot for mix image_vision.download_models to pre-fetch the new defaults.

Changed

  • The :files list in mix.exs now ships logo.jpg so the docs render the project logo on hexdocs.pm.

See the README for the full feature list and the background, captioning, and zero-shot guides