# How to use CLIP Zero-Shot on your own classificaiton dataset

This notebook provides an example of how to benchmark CLIP's zero shot classification performance on your own classification dataset.

[CLIP](https://openai.com/blog/clip/) is a new zero shot image classifier relased by OpenAI that has been trained on 400 million text/image pairs across the web. CLIP uses these learnings to make predicts based on a flexible span of possible classification categories.

CLIP is zero shot, that means **no training is required**.

Try it out on your own task here!

Be sure to experiment with various text prompts to unlock the richness of CLIP's pretraining procedure.


![Roboflow Wordmark](https://i.imgur.com/dcLNMhV.png)


# Download and Install CLIP Dependencies

In [None]:
#installing some dependencies, CLIP was release in PyTorch
import subprocess

#!pip install torch==1.7.1{torch_version_suffix} torchvision==0.8.2{torch_version_suffix} -f https://download.pytorch.org/whl/torch_stable.html ftfy regex
!pip install torch torchvision -f https://download.pytorch.org/whl/torch_stable.html ftfy regex

import numpy as np
import torch
import os

print("Torch version:", torch.__version__)
os.kill(os.getpid(), 9)
#Your notebook process will restart after these installs

CUDA version: 12.5
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting ftfy
  Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Downloading ftfy-6.3.1-py3-none-any.whl (44 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ftfy
Successfully installed ftfy-6.3.1


In [None]:
#clone the CLIP repository
!git clone https://github.com/openai/CLIP.git
%cd CLIP

Cloning into 'CLIP'...
remote: Enumerating objects: 164, done.[K
remote: Counting objects: 100% (73/73), done.[K
remote: Compressing objects: 100% (49/49), done.[K
remote: Total 164 (delta 31), reused 49 (delta 19), pack-reused 91[K
Receiving objects: 100% (164/164), 8.87 MiB | 27.51 MiB/s, done.
Resolving deltas: 100% (71/71), done.
/content/CLIP


# Download Classification Data or Object Detection Data

We will download the [public flowers classificaiton dataset](https://public.roboflow.com/classification/flowers_classification) from Roboflow. The data will come out as folders broken into train/valid/test splits and seperate folders for each class label.

You can easily download your own dataset from Roboflow in this format, too.

We made a conversion from object detection to CLIP text prompts in Roboflow, too, if you want to try that out.


To get your data into Roboflow, follow the [Getting Started Guide](https://blog.roboflow.ai/getting-started-with-roboflow/).

In [None]:
#follow the link below to get your download code from from Roboflow
!pip install -q roboflow
from roboflow import Roboflow
rf = Roboflow(model_format="clip", notebook="roboflow-clip")

[K     |████████████████████████████████| 178 kB 9.2 MB/s 
[K     |████████████████████████████████| 1.1 MB 40.4 MB/s 
[K     |████████████████████████████████| 138 kB 51.3 MB/s 
[K     |████████████████████████████████| 636 kB 40.4 MB/s 
[K     |████████████████████████████████| 62 kB 809 kB/s 
[?25h  Building wheel for roboflow (setup.py) ... [?25l[?25hdone
  Building wheel for wget (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.10.0 requires torch==1.9.0, but you have torch 1.7.1+cu110 which is incompatible.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.26.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.[0m
up

In [None]:
#download classification data
# from roboflow import Roboflow
# rf = Roboflow(api_key="YOUR_API_KEY")
# project = rf.workspace().project("YOUR_PROJECT")
# dataset = project.version("YOUR_VERSION").download("clip")

loading Roboflow workspace...
loading Roboflow project...
Downloading Dataset Version Zip in Rock-Paper-Scissors-1 to clip: 100% [12121758 / 12121758] bytes


Extracting Dataset Version Zip to Rock-Paper-Scissors-1 in clip:: 100%|██████████| 2941/2941 [00:02<00:00, 1034.21it/s]


In [None]:
dataset.location

'/content/CLIP/Rock-Paper-Scissors-1'

In [None]:
import os
#our the classes and images we want to test are stored in folders in the test set
class_names = os.listdir(dataset.location + '/test/')
class_names.remove('_tokenization.txt')
class_names

['scissors', 'rock', 'paper']

In [None]:
#we auto generate some example tokenizations in Roboflow but you should edit this file to try out your own prompts
#CLIP gets a lot better with the right prompting!
#be sure the tokenizations are in the same order as your class_names above!
%cat {dataset.location}/test/_tokenization.txt

An example picture from the Rock Paper Scissors dataset depicting a paper
An example picture from the Rock Paper Scissors dataset depicting a rock
An example picture from the Rock Paper Scissors dataset depicting a scissors

In [None]:
#edit your prompts as you see fit here, be sure the classes are in teh same order as above
%%writefile {dataset.location}/test/_tokenization.txt
The paper sign in rock paper scissors
The rock sign in rock paper scissors
The scissors sign in rock paper scissors

Overwriting /content/CLIP/Rock-Paper-Scissors-1/test/_tokenization.txt


In [None]:
candidate_captions = []
with open(dataset.location + '/test/_tokenization.txt') as f:
    candidate_captions = f.read().splitlines()

# Run CLIP inference on your classification dataset

In [None]:
import torch
import clip
from PIL import Image
import glob

def argmax(iterable):
    return max(enumerate(iterable), key=lambda x: x[1])[0]

device = "cuda" if torch.cuda.is_available() else "cpu"
model, transform = clip.load("ViT-B/32", device=device)

correct = []

#define our target classificaitons, you can should experiment with these strings of text as you see fit, though, make sure they are in the same order as your class names above
text = clip.tokenize(candidate_captions).to(device)

for cls in class_names:
    class_correct = []
    test_imgs = glob.glob(dataset.location + '/test/' + cls + '/*.jpg')
    for img in test_imgs:
        #print(img)
        image = transform(Image.open(img)).unsqueeze(0).to(device)
        with torch.no_grad():
            image_features = model.encode_image(image)
            text_features = model.encode_text(text)

            logits_per_image, logits_per_text = model(image, text)
            probs = logits_per_image.softmax(dim=-1).cpu().numpy()

            pred = class_names[argmax(list(probs)[0])]
            #print(pred)
            if pred == cls:
                correct.append(1)
                class_correct.append(1)
            else:
                correct.append(0)
                class_correct.append(0)

    print('accuracy on class ' + cls + ' is :' + str(sum(class_correct)/len(class_correct)))
print('accuracy on all is : ' + str(sum(correct)/len(correct)))


accuracy on class scissors is :0.5454545454545454
accuracy on class rock is :0.18181818181818182
accuracy on class paper is :0.0
accuracy on all is : 0.24242424242424243


In [None]:
#Hope you enjoyed!
#As always, happy inferencing
#Roboflow