Skip to content

skykim/siglip2-unity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

siglip2-unity

On-device Multi-modal Retrieval: SigLIP2 in Unity Sentis

Overview

This repository provides a lightweight inference implementation optimized for SigLIP2 (specifically siglip2-base-patch16-224) using Unity Sentis. This project enables powerful multi-modal retrieval tasks directly within Unity, allowing for smart asset search and semantic retrieval across text and images without requiring an internet connection.

Features

  • ✅ Text-to-Image Search
  • ✅ Image-to-Image Search
  • ✅ Text-to-Text Search
  • ✅ Image-to-Text Search

Requirements

  • Unity: 6000.2.10f1
  • Unity Sentis: 2.4.1 (com.unity.ai.inference)

Architecture

1. SigLIP2 (ONNX)

The project utilizes the ONNX version of the siglip2-base-patch16-224 model. It requires both the text encoder and the vision encoder to compute embeddings for multi-modal tasks.

2. Tokenizer

Text input processing is handled by the Google SentencePieceTokenizer, implemented using the Microsoft.ML.Tokenizers library. This ensures that text queries are correctly tokenized and encoded to match the SigLIP2 model's expected input format.

3. Embedding Database

The system generates a local database (image_embeddings.bin) by processing images located in the StreamingAssets folder. This allows for real-time similarity search during runtime.

Getting Started

1. Model Setup

2. Assets Setup

  • Clone or download this repository
  • Unzip the provided StreamingAssets.zip file
  • Place the unzipped contents into the /Assets/StreamingAssets directory
  • Ensure that your .jpg or .png files are located inside /Assets/StreamingAssets/Images

3. Generate Database

  • Open the /Assets/Scenes/SigLip2Scene.unity scene in the Unity Editor
  • Select the ImageSearchManager object in the hierarchy
  • Click the "Generate And Save Embeddings (Create Index)" button in the Inspector
  • This process will read images from StreamingAssets and generate the image_embeddings.bin file

4. Run the Demo Scene

  • Play the scene to see the retrieval in action
  • Input keywords to perform image retrieval, or explore other tasks like Image-to-Image or Text-to-Text search

Demo

Experience SigLIP2 in Unity in action! Check out our demo showcasing the retrieval capabilities:

SigLIP2 Unity Demo

Links

License

This project uses the SigLIP2 model which is licensed under the Apache 2.0 License.

About

SigLIP2 model in Unity Sentis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •