ImageVision

ImageVision is a simple, opinionated image vision library for Elixir. It sits alongside the image library and answers common questions about an image — what's in it, where are the objects, which pixels belong to which object, what's the foreground, describe it in words, does it match these labels — with strong defaults and no ML expertise required.

Quick start

# Classification — what is in this image?
iex> puppy = Image.open!("puppy.jpg")
iex> Image.Classification.labels(puppy)
["Blenheim spaniel"]

# Detection — where are the objects and what are they?
iex> street = Image.open!("street.jpg")
iex> detections = Image.Detection.detect(street)
iex> hd(detections)
%{label: "person", score: 0.94, box: {120, 45, 60, 180}}

# Draw bounding boxes on the image
iex> Image.Detection.draw_bbox_with_labels(detections, street)

# Segmentation — which pixels belong to which object?
iex> segments = Image.Segmentation.segment_panoptic(street)
iex> Enum.map(segments, & &1.label)
["person", "car", "road", "sky"]

# Colour-coded overlay of all segments
iex> Image.Segmentation.compose_overlay(street, segments)

# Promptable segmentation — isolate the object at a specific point
iex> %{mask: mask} = Image.Segmentation.segment(puppy, prompt: {:point, 320, 240})
iex> {:ok, cutout} = Image.Segmentation.apply_mask(puppy, mask)

# Embedding — 768-dim feature vector for similarity search
iex> Image.Classification.embed(puppy)
#Nx.Tensor<f32[768]>

# Background removal — class-agnostic foreground cutout
iex> {:ok, cutout} = Image.Background.remove(puppy)

# Image captioning — natural-language description
iex> Image.Captioning.caption(puppy)
"a small brown and white puppy sitting on a wooden floor"

# Zero-shot classification — your labels, no retraining required
iex> Image.ZeroShot.classify(puppy, ["a dog", "a cat", "a horse"])
[%{label: "a dog", score: 0.998}, %{label: "a cat", score: 0.002}, ...]

Installation

Add :image_vision to mix.exs along with whichever optional ML backends you need:

def deps do
  [
    {:image_vision, "~> 0.2"},

    # Required for Image.Classification and Image.Classification.embed/2
    {:bumblebee, "~> 0.6"},
    {:nx, "~> 0.10"},
    {:exla, "~> 0.10"},     # or {:torchx, "~> 0.10"} for Torch backend

    # Required for Image.Detection and Image.Segmentation
    {:ortex, "~> 0.1"}
  ]
end

All ML deps are optional — omit any you do not use. The library compiles cleanly without them.

Prerequisites

For the vast majority of users on Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64, no native toolchain is required. The libraries used here ship precompiled native binaries for those platforms and mix deps.get is all you need.

If your platform isn't covered by precompiled binaries — uncommon Linux distros, ARM Linux, glibc mismatches — you'll need:

A Rust toolchain for :ortex (the ONNX runtime wrapper used by detection and segmentation) and for :tokenizers (pulled in transitively by :bumblebee). Install via rustup:
```
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
A C compiler for :vix (the libvips wrapper used by :image). On Linux install build-essential (Debian/Ubuntu) or gcc (Fedora/RHEL); on macOS install Xcode Command Line Tools (xcode-select --install).
libvips if you need advanced libvips features beyond what the precompiled NIF includes. On macOS: brew install vips. On Linux: your distro's libvips-dev / vips-devel package. Most users don't need this.

If you see Cargo or cc errors during mix deps.compile, you've likely landed on a platform without precompiled coverage — install the toolchain above and re-run.

Disk space and first-call latency

Model weights are downloaded on first call and cached on disk. Across all three default models the total is approximately:

Task	Default model	Size
Classification	`facebook/convnext-tiny-224`	~110 MB
Embedding	`facebook/dinov2-base`	~340 MB
Detection	`onnx-community/rtdetr_r50vd`	~175 MB
Segmentation (SAM 2)	`SharpAI/sam2-hiera-tiny-onnx`	~150 MB
Segmentation (panoptic)	`Xenova/detr-resnet-50-panoptic`	~175 MB
Background removal	`onnx-community/BiRefNet_lite-ONNX`	~210 MB
Captioning	`Salesforce/blip-image-captioning-base`	~990 MB
Zero-shot classification	`openai/clip-vit-base-patch32`	~605 MB

The first call to each task therefore appears to "hang" while weights download — that's expected, not a bug.

To pre-download all default models before first use (recommended for production deployments and CI):

mix image_vision.download_models

Pass --classify, --detect, --segment, --background, --caption, or --zero-shot to limit scope.

Livebook Desktop

Livebook Desktop launches as a GUI application and does not inherit your shell's PATH. Tools installed via rustup, mise, asdf, or Homebrew aren't visible to it by default — even if cargo works fine in your terminal.

If you hit "cargo: command not found" or similar during Mix.install inside Livebook Desktop, create ~/.livebookdesktop.sh and add the relevant directories to PATH. A reasonable starting point:

# ~/.livebookdesktop.sh

# Rust (rustup)
export PATH="$HOME/.cargo/bin:$PATH"

# Homebrew (Apple Silicon)
export PATH="/opt/homebrew/bin:$PATH"

# mise — uncomment if you use it
# eval "$(mise activate bash)"

# asdf — uncomment if you use it
# . "$HOME/.asdf/asdf.sh"

Restart Livebook Desktop after creating this file. See the Livebook Desktop documentation for details.

Default models

All models are permissively licensed. Weights are downloaded automatically on first call and cached on disk — no manual setup required.

Task	Model	License	Size
Classification	`facebook/convnext-tiny-224`	Apache 2.0	~110 MB
Embedding	`facebook/dinov2-base`	Apache 2.0	~340 MB
Object detection	`onnx-community/rtdetr_r50vd`	Apache 2.0	~175 MB
Promptable segmentation	`SharpAI/sam2-hiera-tiny-onnx`	Apache 2.0	~150 MB
Panoptic segmentation	`Xenova/detr-resnet-50-panoptic`	Apache 2.0	~175 MB

Configuration

Model cache

ONNX model weights are cached in a per-user directory by default (~/.cache/image_vision on Linux, ~/Library/Caches/image_vision on macOS). Override in config/runtime.exs:

config :image_vision, :cache_dir, "/var/lib/my_app/models"

Classification serving

Image.Classification runs a supervised Bumblebee serving. It does not autostart by default. Start it in your application's supervision tree:

# application.ex
def start(_type, _args) do
  children = [
    Image.Classification.classifier(),
    Image.Classification.embedder()   # omit if you do not need embeddings
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end

Or enable autostart via config so ImageVision.Supervisor handles it:

# config/runtime.exs
config :image_vision, :classifier, autostart: true

To use a different model:

config :image_vision, :classifier,
  model: {:hf, "facebook/convnext-large-224-22k-1k"},
  featurizer: {:hf, "facebook/convnext-large-224-22k-1k"},
  autostart: true

Guides

Classification — classifying images and computing embeddings
Detection — bounding-box object detection
Segmentation — promptable and panoptic segmentation
Background removal — class-agnostic foreground cutout
Captioning — natural-language image descriptions
Zero-shot classification — classify against arbitrary labels via CLIP

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
config		config
guides		guides
lib		lib
livebooks		livebooks
scripts		scripts
test		test
.dialyzer_ignore_warnings		.dialyzer_ignore_warnings
.formatter.exs		.formatter.exs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
logo.jpg		logo.jpg
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageVision

Quick start

Installation

Prerequisites

Disk space and first-call latency

Livebook Desktop

Default models

Configuration

Model cache

Classification serving

Guides

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ImageVision

Quick start

Installation

Prerequisites

Disk space and first-call latency

Livebook Desktop

Default models

Configuration

Model cache

Classification serving

Guides

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages