# Md Jakaria Mashud Shahria (2431751)

**Task 1**

The first task will require the realization of:

*   usage of existing pre-trained (pre-trained) image classification model adaptation to new task using few-shot,one-shot and zero-shot learning.
*   calculate accuracy, precision, recovery and F1 statistics for selected new class on unseen 1000 images from OpenImages,
*   to implement threshold value (threshold) change, enabling classification of images for each assigned class by changing T∈[0,1]. Statistics must be recalculated after changing the threshold value.







First I tried without gpu, used tensorflow dataset and this method to load dataset:

```
dataset = tfds.load(‘open_images/v7’, split='train')
```

Both did not work. Enabling GPU in colab and use FiftyOne package to load openimages_v7 dataset.

In [None]:
!pip install "sse-starlette<1"
!pip install -q fiftyone transformers datasets scikit-learn tqdm torch



Use CUDA to get GPU Power, and use OpenAI's ClipModel

In [None]:
import torch
from transformers import CLIPProcessor, CLIPModel
from datasets import load_dataset
from PIL import Image
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from tqdm import tqdm
import random
from huggingface_hub import login
from google.colab import userdata

# Use a GPU if available (which we enabled in Colab's runtime settings)
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")

# This line securely retrieves the secret you just created
hf_token = userdata.get('HF_TOKEN')
login(token=hf_token)

print("Successfully logged in to Hugging Face!")

# Load the pre-trained CLIP model and its processor
MODEL_NAME = "openai/clip-vit-base-patch32"
print(f"Loading model: {MODEL_NAME}...")
model = CLIPModel.from_pretrained(MODEL_NAME).to(DEVICE)
processor = CLIPProcessor.from_pretrained(MODEL_NAME)
print("Model loaded successfully!")

Using device: cuda
Successfully logged in to Hugging Face!
Loading model: openai/clip-vit-base-patch32...


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/605M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/605M [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

Model loaded successfully!


In [None]:
# DATA PREPARATION (Using FiftyOne)
import fiftyone as fo
import fiftyone.zoo as foz
from PIL import Image
from tqdm import tqdm
import random

TARGET_CLASS_NAME = "Horse"
NUM_EVAL_IMAGES = 1000
NUM_POSITIVE_SAMPLES = 500
NUM_FEW_SHOT_EXAMPLES = 5  # Number of examples for few-shot learning

def prepare_dataset():
    """
    Loads and filters the OpenImages v7 dataset using the FiftyOne Zoo.
    """
    print("Preparing dataset from the FiftyOne Zoo...")

    # We'll load a larger number of random samples and then filter them.
    # This is an easy way to get both positive and negative examples.
    num_samples_to_load = NUM_EVAL_IMAGES * 2 # Load more to ensure we find enough of each class

    # Load a random subset of the dataset from the zoo
    # This downloads only the images and metadata we need.
    dataset = foz.load_zoo_dataset(
        "open-images-v7",
        split="test",
        label_types=["detections"],
        max_samples=num_samples_to_load,
        shuffle=True,
    )

    # Launch the app to visualize the loaded dataset (optional, but very useful!)
    # print("You can view the loaded dataset in the FiftyOne App:")
    # session = fo.launch_app(dataset, auto=False)
    # print(session)

    positive_samples, negative_samples, support_samples = [], [], []

    print("Filtering for positive and negative samples...")
    # Use a view to make processing faster
    view = dataset.select_fields("ground_truth")

    pbar = tqdm(total=NUM_EVAL_IMAGES + NUM_FEW_SHOT_EXAMPLES)
    for sample in view.iter_samples(autosave=True, progress=False):
        # Get all labels for the current sample
        if not sample.ground_truth:
            continue

        labels = [d.label for d in sample.ground_truth.detections]

        # Load the image from its filepath
        pil_image = Image.open(sample.filepath).convert("RGB")

        have_enough_support = len(support_samples) >= NUM_FEW_SHOT_EXAMPLES
        have_enough_positives = len(positive_samples) >= NUM_POSITIVE_SAMPLES
        have_enough_negatives = len(negative_samples) >= (NUM_EVAL_IMAGES - NUM_POSITIVE_SAMPLES)

        if TARGET_CLASS_NAME in labels:
            if not have_enough_support:
                support_samples.append(pil_image)
                pbar.update(1)
            elif not have_enough_positives:
                positive_samples.append(pil_image)
                pbar.update(1)
        elif not have_enough_negatives:
            negative_samples.append(pil_image)
            pbar.update(1)

        if have_enough_positives and have_enough_negatives and have_enough_support:
            break
    pbar.close()

    # Clean up the downloaded dataset to save space
    dataset.delete()

    eval_images = positive_samples + negative_samples
    true_labels = [1] * len(positive_samples) + [0] * len(negative_samples)

    combined = list(zip(eval_images, true_labels))
    random.shuffle(combined)
    eval_images, true_labels = zip(*combined)

    print(f"\nDataset prepared: {len(eval_images)} evaluation images and {len(support_samples)} support images.")
    return list(eval_images), list(true_labels), support_samples

In [None]:
# CLASSIFICATION METHODS

def predict_zero_shot(image, text_labels):
    """
    Classifies an image using zero-shot learning with text prompts.
    Returns the probability score for the first (target) label.
    """
    with torch.no_grad():
        inputs = processor(text=text_labels, images=image, return_tensors="pt", padding=True).to(DEVICE)
        outputs = model(**inputs)
        logits_per_image = outputs.logits_per_image
        probs = logits_per_image.softmax(dim=1)
        return probs[0][0].item() # Return probability of the first text label

def get_image_embedding(image):
    """Helper function to get the embedding for a single image."""
    with torch.no_grad():
        inputs = processor(images=image, return_tensors="pt").to(DEVICE)
        embedding = model.get_image_features(**inputs)
        return torch.nn.functional.normalize(embedding, p=2, dim=-1)

def predict_few_shot(query_image, support_embeddings):
    """
    Classifies an image by comparing it to the average embedding of support images.
    Returns the cosine similarity score.
    """
    with torch.no_grad():
        query_embedding = get_image_embedding(query_image)
        avg_support_embedding = torch.mean(support_embeddings, dim=0, keepdim=True)
        similarity = torch.nn.functional.cosine_similarity(query_embedding, avg_support_embedding)
        return similarity.item()

In [None]:
# EVALUATION

def calculate_and_print_metrics(scores, true_labels, threshold):
    """
    Calculates and prints classification metrics based on a given threshold.
    """
    predictions = [1 if score >= threshold else 0 for score in scores]

    accuracy = accuracy_score(true_labels, predictions)
    precision = precision_score(true_labels, predictions, zero_division=0)
    recall = recall_score(true_labels, predictions, zero_division=0)
    f1 = f1_score(true_labels, predictions, zero_division=0)

    print(f"Threshold: {threshold:.2f}")
    print(f"  Accuracy:  {accuracy:.4f}")
    print(f"  Precision: {precision:.4f}")
    print(f"  Recall (Recovery): {recall:.4f}")
    print(f"  F1-Score:  {f1:.4f}")
    print("-" * 30)

In [None]:
#Mani Execution

eval_images, true_labels, support_images = prepare_dataset()

#ZERO-SHOT LEARNING
print("\n" + "="*50)
print("Starting Zero-Shot Classification...")
print("="*50)
zero_shot_labels = [f"a photo of a {TARGET_CLASS_NAME}", "a photo of something else"]
zero_shot_scores = [predict_zero_shot(img, zero_shot_labels) for img in tqdm(eval_images, desc="Zero-Shot")]

print("\nZero-Shot Evaluation Results:")
for T in [0.1, 0.3, 0.5, 0.7, 0.9]:
    calculate_and_print_metrics(zero_shot_scores, true_labels, threshold=T)


#ONE-SHOT LEARNING
print("\n" + "="*50)
print("Starting One-Shot Classification...")
print("="*50)
one_shot_support_embedding = get_image_embedding(support_images[0])
one_shot_scores = [predict_few_shot(img, one_shot_support_embedding) for img in tqdm(eval_images, desc="One-Shot")]

print("\nOne-Shot Evaluation Results:")
for T in [0.20, 0.25, 0.30, 0.35, 0.40]:
    calculate_and_print_metrics(one_shot_scores, true_labels, threshold=T)


#FEW-SHOT LEARNING
print("\n" + "="*50)
print(f"Starting Few-Shot ({NUM_FEW_SHOT_EXAMPLES} examples) Classification...")
print("="*50)
few_shot_support_embeddings = torch.cat([get_image_embedding(img) for img in support_images], dim=0)
few_shot_scores = [predict_few_shot(img, few_shot_support_embeddings) for img in tqdm(eval_images, desc="Few-Shot")]

print(f"\nFew-Shot ({NUM_FEW_SHOT_EXAMPLES} examples) Evaluation Results:")
for T in [0.20, 0.25, 0.30, 0.35, 0.40]:
    calculate_and_print_metrics(few_shot_scores, true_labels, threshold=T)

Preparing dataset from the FiftyOne Zoo...
Downloading split 'test' to '/root/fiftyone/open-images-v7/test' if necessary


INFO:fiftyone.zoo.datasets:Downloading split 'test' to '/root/fiftyone/open-images-v7/test' if necessary


Downloading 'https://storage.googleapis.com/openimages/2018_04/test/test-images-with-rotation.csv' to '/root/fiftyone/open-images-v7/test/metadata/image_ids.csv'


INFO:fiftyone.utils.openimages:Downloading 'https://storage.googleapis.com/openimages/2018_04/test/test-images-with-rotation.csv' to '/root/fiftyone/open-images-v7/test/metadata/image_ids.csv'


Downloading 'https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv' to '/root/fiftyone/open-images-v7/test/metadata/classes.csv'


INFO:fiftyone.utils.openimages:Downloading 'https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv' to '/root/fiftyone/open-images-v7/test/metadata/classes.csv'


Downloading 'https://storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy.json' to '/tmp/tmpesmpju7f/metadata/hierarchy.json'


INFO:fiftyone.utils.openimages:Downloading 'https://storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy.json' to '/tmp/tmpesmpju7f/metadata/hierarchy.json'


Downloading 'https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv' to '/root/fiftyone/open-images-v7/test/labels/detections.csv'


INFO:fiftyone.utils.openimages:Downloading 'https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv' to '/root/fiftyone/open-images-v7/test/labels/detections.csv'


Downloading 2000 images


INFO:fiftyone.utils.openimages:Downloading 2000 images


 100% |█████████████████| 2000/2000 [4.1m elapsed, 0s remaining, 6.7 files/s]       


INFO:eta.core.utils: 100% |█████████████████| 2000/2000 [4.1m elapsed, 0s remaining, 6.7 files/s]       


Dataset info written to '/root/fiftyone/open-images-v7/info.json'


INFO:fiftyone.zoo.datasets:Dataset info written to '/root/fiftyone/open-images-v7/info.json'


Loading 'open-images-v7' split 'test'


INFO:fiftyone.zoo.datasets:Loading 'open-images-v7' split 'test'


 100% |███████████████| 2000/2000 [15.6s elapsed, 0s remaining, 236.7 samples/s]      


INFO:eta.core.utils: 100% |███████████████| 2000/2000 [15.6s elapsed, 0s remaining, 236.7 samples/s]      


Dataset 'open-images-v7-test-2000' created


INFO:fiftyone.zoo.datasets:Dataset 'open-images-v7-test-2000' created


Filtering for positive and negative samples...


 51%|█████     | 509/1005 [00:22<00:21, 22.74it/s]



Dataset prepared: 504 evaluation images and 5 support images.

Starting Zero-Shot Classification...


Zero-Shot: 100%|██████████| 504/504 [00:15<00:00, 33.18it/s]



Zero-Shot Evaluation Results:
Threshold: 0.10
  Accuracy:  0.6052
  Precision: 0.0197
  Recall (Recovery): 1.0000
  F1-Score:  0.0386
------------------------------
Threshold: 0.30
  Accuracy:  0.8571
  Precision: 0.0526
  Recall (Recovery): 1.0000
  F1-Score:  0.1000
------------------------------
Threshold: 0.50
  Accuracy:  0.9246
  Precision: 0.0952
  Recall (Recovery): 1.0000
  F1-Score:  0.1739
------------------------------
Threshold: 0.70
  Accuracy:  0.9683
  Precision: 0.2000
  Recall (Recovery): 1.0000
  F1-Score:  0.3333
------------------------------
Threshold: 0.90
  Accuracy:  0.9881
  Precision: 0.4000
  Recall (Recovery): 1.0000
  F1-Score:  0.5714
------------------------------

Starting One-Shot Classification...


One-Shot: 100%|██████████| 504/504 [00:11<00:00, 44.21it/s]



One-Shot Evaluation Results:
Threshold: 0.20
  Accuracy:  0.0099
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0158
------------------------------
Threshold: 0.25
  Accuracy:  0.0099
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0158
------------------------------
Threshold: 0.30
  Accuracy:  0.0159
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0159
------------------------------
Threshold: 0.35
  Accuracy:  0.0397
  Precision: 0.0082
  Recall (Recovery): 1.0000
  F1-Score:  0.0163
------------------------------
Threshold: 0.40
  Accuracy:  0.1071
  Precision: 0.0088
  Recall (Recovery): 1.0000
  F1-Score:  0.0175
------------------------------

Starting Few-Shot (5 examples) Classification...


Few-Shot: 100%|██████████| 504/504 [00:11<00:00, 44.38it/s]



Few-Shot (5 examples) Evaluation Results:
Threshold: 0.20
  Accuracy:  0.0099
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0158
------------------------------
Threshold: 0.25
  Accuracy:  0.0099
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0158
------------------------------
Threshold: 0.30
  Accuracy:  0.0119
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0158
------------------------------
Threshold: 0.35
  Accuracy:  0.0198
  Precision: 0.0080
  Recall (Recovery): 1.0000
  F1-Score:  0.0159
------------------------------
Threshold: 0.40
  Accuracy:  0.0377
  Precision: 0.0082
  Recall (Recovery): 1.0000
  F1-Score:  0.0162
------------------------------
