ImageVision is a simple, opinionated image vision library for Elixir. It sits alongside the image library and answers common questions about an image — what's in it, where are the objects, which pixels belong to which object, what's the foreground, describe it in words, does it match these labels — with strong defaults and no ML expertise required.
# Classification — what is in this image?
iex> puppy = Image.open!("puppy.jpg")
iex> Image.Classification.labels(puppy)
["Blenheim spaniel"]
# Detection — where are the objects and what are they?
iex> street = Image.open!("street.jpg")
iex> detections = Image.Detection.detect(street)
iex> hd(detections)
%{label: "person", score: 0.94, box: {120, 45, 60, 180}}
# Draw bounding boxes on the image
iex> Image.Detection.draw_bbox_with_labels(detections, street)
# Segmentation — which pixels belong to which object?
iex> segments = Image.Segmentation.segment_panoptic(street)
iex> Enum.map(segments, & &1.label)
["person", "car", "road", "sky"]
# Colour-coded overlay of all segments
iex> Image.Segmentation.compose_overlay(street, segments)
# Promptable segmentation — isolate the object at a specific point
iex> %{mask: mask} = Image.Segmentation.segment(puppy, prompt: {:point, 320, 240})
iex> {:ok, cutout} = Image.Segmentation.apply_mask(puppy, mask)
# Embedding — 768-dim feature vector for similarity search
iex> Image.Classification.embed(puppy)
#Nx.Tensor<f32[768]>
# Background removal — class-agnostic foreground cutout
iex> {:ok, cutout} = Image.Background.remove(puppy)
# Image captioning — natural-language description
iex> Image.Captioning.caption(puppy)
"a small brown and white puppy sitting on a wooden floor"
# Zero-shot classification — your labels, no retraining required
iex> Image.ZeroShot.classify(puppy, ["a dog", "a cat", "a horse"])
[%{label: "a dog", score: 0.998}, %{label: "a cat", score: 0.002}, ...]Add :image_vision to mix.exs along with whichever optional ML backends you need:
def deps do
[
{:image_vision, "~> 0.2"},
# Required for Image.Classification and Image.Classification.embed/2
{:bumblebee, "~> 0.6"},
{:nx, "~> 0.10"},
{:exla, "~> 0.10"}, # or {:torchx, "~> 0.10"} for Torch backend
# Required for Image.Detection and Image.Segmentation
{:ortex, "~> 0.1"}
]
endAll ML deps are optional — omit any you do not use. The library compiles cleanly without them.
For the vast majority of users on Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64, no native toolchain is required. The libraries used here ship precompiled native binaries for those platforms and mix deps.get is all you need.
If your platform isn't covered by precompiled binaries — uncommon Linux distros, ARM Linux, glibc mismatches — you'll need:
-
A Rust toolchain for
:ortex(the ONNX runtime wrapper used by detection and segmentation) and for:tokenizers(pulled in transitively by:bumblebee). Install via rustup:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-
A C compiler for
:vix(the libvips wrapper used by:image). On Linux installbuild-essential(Debian/Ubuntu) orgcc(Fedora/RHEL); on macOS install Xcode Command Line Tools (xcode-select --install). -
libvipsif you need advanced libvips features beyond what the precompiled NIF includes. On macOS:brew install vips. On Linux: your distro'slibvips-dev/vips-develpackage. Most users don't need this.
If you see Cargo or cc errors during mix deps.compile, you've likely landed on a platform without precompiled coverage — install the toolchain above and re-run.
Model weights are downloaded on first call and cached on disk. Across all three default models the total is approximately:
| Task | Default model | Size |
|---|---|---|
| Classification | facebook/convnext-tiny-224 |
~110 MB |
| Embedding | facebook/dinov2-base |
~340 MB |
| Detection | onnx-community/rtdetr_r50vd |
~175 MB |
| Segmentation (SAM 2) | SharpAI/sam2-hiera-tiny-onnx |
~150 MB |
| Segmentation (panoptic) | Xenova/detr-resnet-50-panoptic |
~175 MB |
| Background removal | onnx-community/BiRefNet_lite-ONNX |
~210 MB |
| Captioning | Salesforce/blip-image-captioning-base |
~990 MB |
| Zero-shot classification | openai/clip-vit-base-patch32 |
~605 MB |
The first call to each task therefore appears to "hang" while weights download — that's expected, not a bug.
To pre-download all default models before first use (recommended for production deployments and CI):
mix image_vision.download_modelsPass --classify, --detect, --segment, --background, --caption, or --zero-shot to limit scope.
Livebook Desktop launches as a GUI application and does not inherit your shell's PATH. Tools installed via rustup, mise, asdf, or Homebrew aren't visible to it by default — even if cargo works fine in your terminal.
If you hit "cargo: command not found" or similar during Mix.install inside Livebook Desktop, create ~/.livebookdesktop.sh and add the relevant directories to PATH. A reasonable starting point:
# ~/.livebookdesktop.sh
# Rust (rustup)
export PATH="$HOME/.cargo/bin:$PATH"
# Homebrew (Apple Silicon)
export PATH="/opt/homebrew/bin:$PATH"
# mise — uncomment if you use it
# eval "$(mise activate bash)"
# asdf — uncomment if you use it
# . "$HOME/.asdf/asdf.sh"Restart Livebook Desktop after creating this file. See the Livebook Desktop documentation for details.
All models are permissively licensed. Weights are downloaded automatically on first call and cached on disk — no manual setup required.
| Task | Model | License | Size |
|---|---|---|---|
| Classification | facebook/convnext-tiny-224 |
Apache 2.0 | ~110 MB |
| Embedding | facebook/dinov2-base |
Apache 2.0 | ~340 MB |
| Object detection | onnx-community/rtdetr_r50vd |
Apache 2.0 | ~175 MB |
| Promptable segmentation | SharpAI/sam2-hiera-tiny-onnx |
Apache 2.0 | ~150 MB |
| Panoptic segmentation | Xenova/detr-resnet-50-panoptic |
Apache 2.0 | ~175 MB |
ONNX model weights are cached in a per-user directory by default (~/.cache/image_vision on Linux, ~/Library/Caches/image_vision on macOS). Override in config/runtime.exs:
config :image_vision, :cache_dir, "/var/lib/my_app/models"Image.Classification runs a supervised Bumblebee serving. It does not autostart by default. Start it in your application's supervision tree:
# application.ex
def start(_type, _args) do
children = [
Image.Classification.classifier(),
Image.Classification.embedder() # omit if you do not need embeddings
]
Supervisor.start_link(children, strategy: :one_for_one)
endOr enable autostart via config so ImageVision.Supervisor handles it:
# config/runtime.exs
config :image_vision, :classifier, autostart: trueTo use a different model:
config :image_vision, :classifier,
model: {:hf, "facebook/convnext-large-224-22k-1k"},
featurizer: {:hf, "facebook/convnext-large-224-22k-1k"},
autostart: true- Classification — classifying images and computing embeddings
- Detection — bounding-box object detection
- Segmentation — promptable and panoptic segmentation
- Background removal — class-agnostic foreground cutout
- Captioning — natural-language image descriptions
- Zero-shot classification — classify against arbitrary labels via CLIP