## Evaluating performance of CLIP model

CLIP model claims to have very high classification score regardless of method. In this notebook I want to evaluate that claim in this kaggle competition.

| Model | Public LB | Private LB |
| --- | --- | --- |
| ViT-B-32-quickgelu | 0.16666 | 0.18397 |
| ViT-H-14 | 0.28591 | 0.27955 |

Even after using one of the best trained CLIP models available it's getting close to 0.26 score in private LB in a kaggle competition where Resnet or Convnext models can give easily 0.75+ score.

In [41]:
from pathlib import Path
import pandas as pd
import torch
import open_clip

from PIL import Image
from kaggle import api
from fastkaggle import *

In [2]:
iskaggle = 'Kaggle' if iskaggle else 'Not Kaggle'

In [9]:
path = setup_comp('kaggle-pog-series-s01e03')

In [13]:
(path/"corn").ls()

(#5) [Path('kaggle-pog-series-s01e03/corn/train'),Path('kaggle-pog-series-s01e03/corn/sample_submission.csv'),Path('kaggle-pog-series-s01e03/corn/test.csv'),Path('kaggle-pog-series-s01e03/corn/train.csv'),Path('kaggle-pog-series-s01e03/corn/test')]

In [5]:
image_categories = [
    "a photo of pure corn seed",
    "a photo of broken corn seed",
    "a photo of silkcut corn seed",
    "a photo of discolored corn seed"
]

In [14]:
test_df = pd.read_csv(path/"corn/test.csv")
test_df.head()

Unnamed: 0,seed_id,view,image
0,2,top,test/00002.png
1,11,bottom,test/00011.png
2,13,top,test/00013.png
3,19,bottom,test/00019.png
4,27,bottom,test/00027.png


In [15]:
sub_df = pd.read_csv(path/"corn/sample_submission.csv")
sub_df.head()

Unnamed: 0,seed_id,label
0,8632,broken
1,11394,broken
2,17362,pure
3,9987,discolored
4,17226,silkcut


In [16]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [17]:
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14',
                                                             pretrained='laion2b_s32b_b79k',
                                                             device=device
                                                            )

Downloading:   0%|          | 0.00/3.94G [00:00<?, ?B/s]

In [23]:
image = preprocess(Image.open(path/"corn/test/00002.png")).unsqueeze(0).to(device)
text = open_clip.tokenize(image_categories).to(device)

In [24]:
with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

In [25]:
text_probs

tensor([[0.2243, 0.0107, 0.1360, 0.6290]], device='cuda:0')

In [28]:
image_categories[text_probs.argmax()].split()[3]

'discolored'

In [35]:
def get_categories_clip(img_path):
    image = preprocess(Image.open((path/"corn")/img_path)).unsqueeze(0).to(device)
    with torch.no_grad(), torch.cuda.amp.autocast():
        image_features = model.encode_image(image)
        text_features = model.encode_text(text)
        image_features /= image_features.norm(dim=-1, keepdim=True)
        text_features /= text_features.norm(dim=-1, keepdim=True)

        text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
    
    return image_categories[text_probs.argmax()].split()[3]

In [36]:
get_categories_clip("test/00027.png")

'silkcut'

In [38]:
%%time
result = test_df["image"].map(get_categories_clip)
result

CPU times: user 4min 20s, sys: 14.3 s, total: 4min 35s
Wall time: 2min 27s


0       discolored
1          silkcut
2       discolored
3       discolored
4          silkcut
           ...    
3474    discolored
3475    discolored
3476    discolored
3477       silkcut
3478          pure
Name: image, Length: 3479, dtype: object

In [39]:
test_df["label"] = result

In [40]:
submission = sub_df["seed_id"]
submission = pd.merge(submission, test_df, on="seed_id")
sub = submission[["seed_id", "label"]]
sub.to_csv("submission.csv", index=False)

In [47]:
api.competition_submit("submission.csv", "CLIP H4 model",'kaggle-pog-series-s01e03')

100%|██████████| 50.7k/50.7k [00:00<00:00, 114kB/s]


Successfully submitted to It's Corn (PogChamps #3)