This is a zero-shot binary image classifier. Type in the name of an object and AI predicts whether your uploaded photo matches it.
This project uses OpenCLIP, an open-source implementation of OpenAI's CLIP.
This project is available as a web demo here. But it will be slower than when the project is run locally on a GPU.
You can expand "Additional Inputs" to allow adjusting the cosine similarity threshold below which your photo is deemed Not <object>.
Tested on Debian.
- NVIDIA Container Toolkit
- Docker (and Docker Compose)
- An NVIDIA GPU with sufficient VRAM for your chosen ViT model (model size can be changed in app.py)
Create a .env
file which points to the path where you downloaded the OpenCLIP model. Then run:
docker compose build
docker compose run torch
Finally, go to http://localhost:7860
in your browser.