siglip2-unity

On-device Multi-modal Retrieval: SigLIP2 in Unity Sentis

Overview

This repository provides a lightweight inference implementation optimized for SigLIP2 (specifically siglip2-base-patch16-224) using Unity Sentis. This project enables powerful multi-modal retrieval tasks directly within Unity, allowing for smart asset search and semantic retrieval across text and images without requiring an internet connection.

Features

✅ Text-to-Image Search
✅ Image-to-Image Search
✅ Text-to-Text Search
✅ Image-to-Text Search

Requirements

Unity: 6000.2.10f1
Unity Sentis: 2.4.1 (com.unity.ai.inference)

Architecture

1. SigLIP2 (ONNX)

The project utilizes the ONNX version of the siglip2-base-patch16-224 model. It requires both the text encoder and the vision encoder to compute embeddings for multi-modal tasks.

2. Tokenizer

Text input processing is handled by the Google SentencePieceTokenizer, implemented using the Microsoft.ML.Tokenizers library. This ensures that text queries are correctly tokenized and encoded to match the SigLIP2 model's expected input format.

3. Embedding Database

The system generates a local database (image_embeddings.bin) by processing images located in the StreamingAssets folder. This allows for real-time similarity search during runtime.

Getting Started

1. Model Setup

Download text_model.onnx and vision_model.onnx from onnx-community/siglip2-base-patch16-224-ONNX
Place both files into the /Assets/SigLIP2 directory in your project

2. Assets Setup

Clone or download this repository
Unzip the provided StreamingAssets.zip file
- Note: The demo images included in this zip are sourced from the Fashion Product Images (Small) dataset.
Place the unzipped contents into the /Assets/StreamingAssets directory
Ensure that your .jpg or .png files are located inside /Assets/StreamingAssets/Images

3. Generate Database

Open the /Assets/Scenes/SigLip2Scene.unity scene in the Unity Editor
Select the ImageSearchManager object in the hierarchy
Click the "Generate And Save Embeddings (Create Index)" button in the Inspector
This process will read images from StreamingAssets and generate the image_embeddings.bin file

4. Run the Demo Scene

Play the scene to see the retrieval in action
Input keywords to perform image retrieval, or explore other tasks like Image-to-Image or Text-to-Text search

Demo

Experience SigLIP2 in Unity in action! Check out our demo showcasing the retrieval capabilities:

Links

License

This project uses the SigLIP2 model which is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Assets		Assets
Packages		Packages
ProjectSettings		ProjectSettings
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

siglip2-unity

Overview

Features

Requirements

Architecture

1. SigLIP2 (ONNX)

2. Tokenizer

3. Embedding Database

Getting Started

1. Model Setup

2. Assets Setup

3. Generate Database

4. Run the Demo Scene

Demo

Links

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

skykim/siglip2-unity

Folders and files

Latest commit

History

Repository files navigation

siglip2-unity

Overview

Features

Requirements

Architecture

1. SigLIP2 (ONNX)

2. Tokenizer

3. Embedding Database

Getting Started

1. Model Setup

2. Assets Setup

3. Generate Database

4. Run the Demo Scene

Demo

Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages