# Lab 1 — Tiny VLM Adversarial Cost Challenge

## Overview

Welcome to the ML Security Lab! In this lab, you'll implement adversarial attacks against a Vision-Language Model (VLM) using the custom dataset and a frozen TinyCLIP scorer.

### Objective
Your task is to build an attack function that can manipulate either:
- **Caption tokens** (text modifications)
- **Image pixels** (visual modifications)
- **Both** (multimodal attack)

The goal is to flip the model's decision (match → no-match or vice versa) while minimizing the attack cost.

### Constraints
- **T_MAX = 10**: Maximum token edits per sample
- **P_MAX = 100**: Maximum pixel edits per sample  
- **Q_MAX = 100**: Maximum queries per sample
- **Evaluation**: Public leaderboard (1,000 val pairs) + Private leaderboard (1,000 test pairs)

### Scoring
Your attack will be evaluated based on:
1. **Success Rate**: Percentage of samples where you successfully flip the decision
2. **Cost Efficiency**: Lower total cost (token edits + pixel edits + queries) is better
3. **Attack Budget**: Must stay within the specified limits

In [1]:
import os
import zipfile

# Check if 'images' directory or 'val_pairs.json' file is missing
if not os.path.exists('images') or not os.path.exists('val_pairs.json'):
    print("Required data not found. Extracting 'data.zip'...")
    
    # Check if 'data.zip' exists
    if os.path.exists('data.zip'):
        with zipfile.ZipFile('data.zip', 'r') as zip_ref:
            zip_ref.extractall()  # Extract all files in the current directory
        print("Extraction complete!")
    else:
        print("Error: 'data.zip' not found. Please ensure the file is in the current directory.")
else:
    print("All required data is already present.")

All required data is already present.


In [2]:
# Import required libraries
import numpy as np
import pandas as pd
from PIL import Image
import PIL
import torch
import torch.nn.functional as F
from tqdm.auto import tqdm
import open_clip
from datasets import load_dataset
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
import json
import random
import os
from typing import List, Tuple, Dict, Optional
import warnings
import requests
from urllib.parse import urlparse
import hashlib
from collections import defaultdict
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)
random.seed(42)

print("All packages imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

  from .autonotebook import tqdm as notebook_tqdm


All packages imported successfully!
PyTorch version: 2.8.0+cu128
CUDA available: True
Using device: cuda


In [3]:
# Data Loading
print("Loading validation data...")

# Load validation pairs from JSON
with open('val_pairs.json', 'r') as f:
    val_pairs = json.load(f)

print(f"Loaded {len(val_pairs)} validation pairs")

# Helper function to load images
def load_image_from_pair(pair: dict) -> Image.Image:
    """Load image from the pair dictionary using image_path"""
    return Image.open(pair['image_path']).convert('RGB')

# Sample a few pairs to verify data loading
print("\nSample validation pairs:")
for i in range(3):
    pair = val_pairs[i]
    print(f"  Image ID: {pair['image_id']}, Path: {pair['image_path']}")
    print(f"  Caption: {pair['caption'][:50]}..., Match: {pair['is_match']}")
    print()

print(f"Data distribution:")
labels = [pair['is_match'] for pair in val_pairs]
print(f"  Match (True): {sum(labels)}")
print(f"  No-match (False): {len(labels) - sum(labels)}")
print(f"  Balance: {sum(labels)/len(labels):.2%} positive")

Loading validation data...
Loaded 1000 validation pairs

Sample validation pairs:
  Image ID: 38137, Path: images/val/000000119402.jpg
  Caption: A gold bus traveling on a single lane road..., Match: True

  Image ID: 15194, Path: images/val/000000404780.jpg
  Caption: Two women in the snow on skis in front of a large ..., Match: False

  Image ID: 19082, Path: images/val/000000148898.jpg
  Caption: A man jumping a brown horse over an obstacle...., Match: False

Data distribution:
  Match (True): 500
  No-match (False): 500
  Balance: 50.00% positive


Bad pipe message: %s [b'0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (']
Bad pipe message: %s [b'TML, like Gecko) Chrome/141.0.0.0 Safari/537.36\r\nAccept-Encoding: gzip, de']
Bad pipe message: %s [b'ate, br, zstd\r\nAccept-Language: en-US,en;q=0.9,da;q=0.8,da-DK;q=0.7,de;q=0.6\r\nX-Request-ID: a2acf30f9']
Bad pipe message: %s [b'c18a1ff381a4e46b476a3\r\nX-Real-IP: 130.225.91.23']
Bad pipe message: %s [b'\nX-Forwarded-Port: 443\r\nX-Forwarded-Scheme: https\r', b'-Original']
Bad pipe message: %s [b'RI: /\r\nX-Scheme: https\r\nsec-fetch-site: none', b'sec-fetch-mo']
Bad pipe message: %s [b': cors\r\nsec-fetch-dest: empty\r\nsec-fetch-storage-access: active\r\npriority: u=1, i\r\nX-Forwarded-Prot', b' https\r\nX-Forwarded-Host: 63vxb49s-40725.euw.devtunnels.ms\r\nX-Forwarded-For: 130.225.91.233\r\nProxy-Connection:']
Bad pipe message: %s [b'svg+xml,image/*,*/*;q=0.8\r\nHost: localhost:40725\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x', b') AppleWebKit/537.36 (KHTML, like G

In [16]:
# TinyCLIP Scorer Implementation
print("Loading CLIP model...")

# Try to load TinyCLIP, fallback to OpenCLIP ViT-B/32 if failed
try:
    # Attempt to load TinyCLIP from HuggingFace hub
    model, preprocess, tokenizer = open_clip.create_model_and_transforms(
        "hf-hub:microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M"
    )
    print("Successfully loaded TinyCLIP model")
except Exception as e:
    print(f"Failed to load TinyCLIP: {e}")
    print("Falling back to OpenCLIP ViT-B/32...")

    model, preprocess, tokenizer = open_clip.create_model_and_transforms(
        "ViT-B-32", 
        pretrained="laion2b_s34b_b79k"
    )
    print("Successfully loaded OpenCLIP ViT-B/32")

# Move model to device
model = model.to(device)
model.eval()

print(f"Model loaded on: {device}")


Loading CLIP model...
Failed to load TinyCLIP: Failed initial config/weights load from HF Hub microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M: Failed to download file (open_clip_config.json) for microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M. Last error: 404 Client Error. (Request ID: Root=1-68e83bb3-161e0a867af9421049bac461;06432bb6-83af-48c4-bd5b-c1bd96b9066e)

Repository Not Found for url: https://huggingface.co/microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M/resolve/main/open_clip_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Falling back to OpenCLIP ViT-B/32...
Successfully loaded OpenCLIP ViT-B/32
Model loaded on: cuda


In [17]:
def clip_embed(image: Image.Image, caption: str) -> float:
    """
    Compute cosine similarity between image and text embeddings.
    
    Args:
        image: PIL Image
        caption: Text string
    
    Returns:
        Cosine similarity score between normalized embeddings
    """
    with torch.no_grad():
        # Preprocess image
        image_tensor = preprocess(image).unsqueeze(0).to(device)
        
        # Tokenize text properly using open_clip tokenizer
        text_tokens = open_clip.tokenize([caption]).to(device)
        
        # Get embeddings
        image_features = model.encode_image(image_tensor)
        text_features = model.encode_text(text_tokens)
        
        # Normalize embeddings
        image_features = F.normalize(image_features, dim=-1)
        text_features = F.normalize(text_features, dim=-1)
        
        # Compute cosine similarity
        similarity = (image_features @ text_features.T).item()
        
    return similarity

# Test the embedding function
print("\nTesting CLIP embedding function...")
test_pair = val_pairs[0]
test_image = load_image_from_pair(test_pair)
test_similarity = clip_embed(test_image, test_pair['caption'])
print(f"Sample similarity score: {test_similarity:.4f} (Expected match: {test_pair['is_match']})")

# Test on a few more samples
print("\nTesting on more samples:")
for i in range(3):
    pair = val_pairs[i]
    image = load_image_from_pair(pair)
    similarity = clip_embed(image, pair['caption'])
    print(f"Sample {i+1}: similarity={similarity:.4f}, match={pair['is_match']}")


Testing CLIP embedding function...
Sample similarity score: 0.3695 (Expected match: True)

Testing on more samples:
Sample 1: similarity=0.3526, match=True
Sample 2: similarity=0.0343, match=False
Sample 3: similarity=0.0774, match=False


In [18]:
# Calibration: Fit logistic regression to get alpha, beta parameters
print("Calibrating scorer with logistic regression...")

# Use first 200 samples for calibration
tune_slice = val_pairs[:200]
print(f"Using {len(tune_slice)} samples for calibration")

# Compute similarities for calibration
similarities = []
ground_truths = []

print("Computing similarities for calibration...")
for pair in tqdm(tune_slice, desc="Calibration"):
    image = load_image_from_pair(pair)
    similarity = clip_embed(image, pair['caption'])
    similarities.append(similarity)
    ground_truths.append(int(pair['is_match']))

similarities = np.array(similarities).reshape(-1, 1)
ground_truths = np.array(ground_truths)

# Fit logistic regression: sigmoid(alpha * cosine + beta)
lr = LogisticRegression()
lr.fit(similarities, ground_truths)

# Extract alpha and beta
alpha = lr.coef_[0][0]  # Coefficient for similarity
beta = lr.intercept_[0]  # Intercept

print(f"Calibration complete!")
print(f"   Alpha (slope): {alpha:.4f}")
print(f"   Beta (intercept): {beta:.4f}")

# Test calibration
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

test_similarities = similarities[:5].flatten()
test_labels = ground_truths[:5]
calibrated_probs = sigmoid(alpha * test_similarities + beta)

print(f"\nCalibration test:")
for i in range(5):
    print(f"  Sim: {test_similarities[i]:.4f} → Prob: {calibrated_probs[i]:.4f}, True: {test_labels[i]}")

Calibrating scorer with logistic regression...
Using 200 samples for calibration
Computing similarities for calibration...


Calibration: 100%|██████████| 200/200 [00:05<00:00, 37.82it/s]

Calibration complete!
   Alpha (slope): 6.6231
   Beta (intercept): -1.1130

Calibration test:
  Sim: 0.3561 → Prob: 0.7765, True: 1
  Sim: 0.0343 → Prob: 0.2920, True: 0
  Sim: 0.0802 → Prob: 0.3585, True: 0
  Sim: 0.1899 → Prob: 0.5362, True: 0
  Sim: 0.3586 → Prob: 0.7794, True: 1





In [19]:
#investigate the preprocessing:

print(type(preprocess))
print(preprocess)

# we can see the images are already in rgb mode, so no need to no that
for pair in val_pairs[:5]:
    image = load_image_from_pair(pair)
    print(image.mode)

<class 'torchvision.transforms.transforms.Compose'>
Compose(
    RandomResizedCrop(size=(224, 224), scale=(0.9, 1.0), ratio=(0.75, 1.3333), interpolation=bicubic, antialias=True)
    <function _convert_to_rgb at 0x7f3a9a191bc0>
    ToTensor()
    Normalize(mean=(0.48145466, 0.4578275, 0.40821073), std=(0.26862954, 0.26130258, 0.27577711))
)
RGB
RGB
RGB
RGB
RGB


In [20]:
# Looking up the code for clip _convert_to_rgb it is:
def _convert_to_rgb(image):
    return image.convert("RGB")
# aka the built-in RGB conversion from PIL

In [21]:
from scipy.ndimage import label

# BlackBox API Implementation
import editdistance  # For Levenshtein distance

class BlackBoxAPI:
    """
    Black-box API for the VLM scorer with query budget tracking and cost calculation.
    """
    
    def __init__(self, alpha: float, beta: float, q_max: int = 200):
        """
        Initialize the black-box API.
        
        Args:
            alpha: Logistic regression slope parameter
            beta: Logistic regression intercept parameter  
            q_max: Maximum queries allowed per sample
        """
        self.alpha = alpha
        self.beta = beta
        self.q_max = q_max
        self.query_count = 0
        
    def score(self, image_uint8: np.ndarray, caption_str: str) -> float:
        """
        Score image-caption pair and return probability.
        
        Args:
            image_uint8: Image as uint8 numpy array (H, W, C)
            caption_str: Caption string
            
        Returns:
            Probability in [0, 1] using sigmoid(alpha * cosine + beta)
        """
        if self.query_count >= self.q_max:
            raise RuntimeError(f"Query budget exceeded! Used {self.query_count}/{self.q_max}")
            
        # Convert numpy array to PIL Image
        image_pil = Image.fromarray(image_uint8)
        
        # Get cosine similarity
        cosine_sim = clip_embed(image_pil, caption_str)
        
        # Apply calibrated sigmoid
        logit = self.alpha * cosine_sim + self.beta
        probability = 1 / (1 + np.exp(-logit))
        
        self.query_count += 1
        
        return probability
    
    def reset_query_count(self):
        """Reset query counter for new sample."""
        self.query_count = 0
        
    def get_remaining_queries(self) -> int:
        """Get remaining query budget."""
        return self.q_max - self.query_count

# Cost Functions
def token_edit_cost(original: str, modified: str) -> int:
    """
    Compute token-level Levenshtein distance using CLIP tokenizer.
    
    Args:
        original: Original caption
        modified: Modified caption
        
    Returns:
        Number of token edits (insertions, deletions, substitutions)
    """
    # Use CLIP tokenizer for more accurate tokenization
    orig_tokens = open_clip.tokenize([original], context_length=77)[0].numpy()
    mod_tokens = open_clip.tokenize([modified], context_length=77)[0].numpy()
    
    # Remove padding tokens (0s) and special tokens for fair comparison
    # Keep only actual content tokens
    orig_tokens = orig_tokens[orig_tokens != 0]
    mod_tokens = mod_tokens[mod_tokens != 0]
    
    return editdistance.eval(orig_tokens.tolist(), mod_tokens.tolist())

def pixel_edit_cost(original: np.ndarray, modified: np.ndarray) -> int:
    """
    Compute number of changed pixels with reduced cost for continuous regions.
    
    Args:
        original: Original image as uint8 numpy array
        modified: Modified image as uint8 numpy array
        
    Returns:
        Adjusted cost based on number of changed pixels, with reduced cost for continuous regions.
    """
    # Find the difference mask
    diff_mask = np.any(original != modified, axis=-1)
    
    # Label connected components in the difference mask
    labeled_regions, num_features = label(diff_mask)
    
    # Count pixels in each connected region
    total_cost = 0
    for region_id in range(1, num_features + 1):
        region_size = np.sum(labeled_regions == region_id)
        if region_size > 0:
            # Full cost for the first pixel, half cost for the rest
            total_cost += 1 + (region_size - 1) * 0.5
    
    return int(total_cost)

# Test the BlackBox API
print("Testing BlackBox API...")

# Initialize API with calibrated parameters
api = BlackBoxAPI(alpha, beta, q_max=200)

# Test on a sample
test_pair = val_pairs[0]
test_image = load_image_from_pair(test_pair)
test_image_uint8 = np.array(test_image)

# Get score
score = api.score(test_image_uint8, test_pair['caption'])
print(f"API Score: {score:.4f} (Expected match: {test_pair['is_match']})")
print(f"Queries used: {api.query_count}/{api.q_max}")

# Test cost functions
original_caption = "A cat sitting on a mat"
modified_caption = "A dog standing on a rug" 
token_cost = token_edit_cost(original_caption, modified_caption)
print(f"\nToken edit cost example:")
print(f"  Original: '{original_caption}'")  
print(f"  Modified: '{modified_caption}'")
print(f"  Cost: {token_cost} token edits")

# Test pixel cost (create a simple modification)
original_img = np.zeros((100, 100, 3), dtype=np.uint8)
modified_img = original_img.copy()
modified_img[10:20, 10:20] = 255  # Change a 10x10 region
pixel_cost = pixel_edit_cost(original_img, modified_img)
print(f"\nPixel edit cost example:")
print(f"  Modified {pixel_cost} pixels in 100x100 image")

Testing BlackBox API...
API Score: 0.7881 (Expected match: True)
Queries used: 1/200

Token edit cost example:
  Original: 'A cat sitting on a mat'
  Modified: 'A dog standing on a rug'
  Cost: 3 token edits

Pixel edit cost example:
  Modified 50 pixels in 100x100 image


## Your Task: Implement Adversarial Attacks

### Attack Function Template

Replace the trivial baseline in the `attack()` function with sophisticated adversarial attacks:

```python
def attack(image_np_uint8, caption_str, api, budgets):
    # Your attack implementation here!
    # You can modify:
    # - Caption tokens (text modifications)  
    # - Image pixels (visual modifications)
    # - Both (multimodal attack)
    
    # Stay within budgets:
    # - budgets['T_MAX'] = 10  token edits
    # - budgets['P_MAX'] = 100 pixel edits
    # - budgets['Q_MAX'] = 100 queries
    
    return {
        'success': success,      # bool: did you flip the decision?
        'image': final_image,    # np.array: attacked image
        'caption': final_caption, # str: attacked caption  
        'token_cost': token_cost, # int: tokens changed
        'pixel_cost': pixel_cost, # int: pixels changed
        'query_cost': query_cost  # int: API calls made
    }
```

### Attack Strategies to Consider

- **Text Attacks**: Synonym replacement, word insertion/deletion, semantic paraphrasing
- **Image Attacks**: Adversarial noise, targeted pixel modifications, patch attacks
- **Query Optimization**: Gradient-free optimization, genetic algorithms, hill climbing
- **Multimodal**: Combined text+image attacks for maximum effectiveness

### Evaluation Metrics

Your attack will be scored as: **ASR - 0.5×ANC - 0.1×(AQ/Q_MAX)**

- **ASR**: Attack Success Rate (higher is better)
- **ANC**: Average Number of Changes (lower is better) 
- **AQ**: Average Queries (lower is better)

### Next Steps

1. **Implement your attack** in the `attack()` function above
2. **Test locally** using the evaluation framework  
3. **Run on full dataset** by changing `max_samples=None`

Good luck! 

### <span style="color:red">**FILL THIS CODE BLOCK**</span>

In [22]:
from torchvision.transforms import RandomResizedCrop, ToTensor, Normalize, Compose
from torchvision.transforms import InterpolationMode


def clip_loss(image_tensor, caption: str, label) -> torch.Tensor:
    """
    Compute cosine similarity between image and text embeddings.
    Returns a torch scalar that supports autograd.
    """
    # convert to float
    image_tensor = image_tensor.float().permute(2, 0, 1).unsqueeze(0).to(device)
    # Preprocess image (make sure to keep gradient tracking)
    rrc = RandomResizedCrop(size=(224, 224), scale=(0.9, 1.0), ratio=(0.75, 1.3333), interpolation=InterpolationMode.BICUBIC, antialias=True)


    norm = Normalize(mean=(0.48145466, 0.4578275, 0.40821073), std=(0.26862954, 0.26130258, 0.27577711))
    # Tokenize text (token embeddings are usually non-differentiable wrt raw string)   

    image_tensor = rrc(image_tensor)
    image_tensor = norm(image_tensor)


    text_tokens = open_clip.tokenize([caption]).to(device)

    # Get embeddings (these will remain differentiable wrt image_tensor)
    image_features = model.encode_image(image_tensor)
    text_features = model.encode_text(text_tokens)

    # Normalize embeddings
    image_features = F.normalize(image_features, dim=-1)
    text_features = F.normalize(text_features, dim=-1)

    similarity = (image_features @ text_features.T).squeeze()  # shape: scalar

    output = similarity*alpha + beta
    output = torch.sigmoid(output)
    # Cosine similarity
    if label: # if label is true
        loss = 1 - output  # want to maximize similarity
        
    else: # if label is false
        loss = output  # want to minimize similarity

      
    return loss

def calc_image_grad(image_tensor, caption: str, label) -> torch.Tensor:
    image_tensor = image_tensor.float().to(device)
    image_tensor.requires_grad_(True)  # allow gradient wrt input
    if image_tensor.grad is not None:
        image_tensor.grad.zero_()
    losses = []
    for _ in range(20):
        loss =  clip_loss(image_tensor, caption, label)
        losses.append(loss)
    total_loss = torch.stack(losses).mean()
    image_grad = torch.autograd.grad(total_loss, [image_tensor])[0]
    image_grad = image_grad.detach().squeeze().cpu()
    return image_grad




In [23]:

def top_x(img: torch.Tensor, x: int) -> torch.Tensor:
    """
    Keep only the top x pixels (by L2 norm across channels) in a PyTorch image tensor.

    This function is robust to both channel-first (C, H, W) and channel-last (H, W, C) tensors and
    returns a masked tensor with the same layout as the input.

    Args:
        img (torch.Tensor): Tensor of shape (C, H, W) or (H, W, C).
        x (int): Number of pixels to retain based on highest L2 norms.

    Returns:
        torch.Tensor: Masked image tensor with only top-x pixels preserved (same layout as input).
    """

    if img.ndim != 3:
        raise ValueError("top_x expects a 3-D tensor (C,H,W) or (H,W,C)")

    # Detect layout: assume channel-first if first dim is typical channel count
    if img.shape[0] in (1, 3, 4):
        # (C, H, W)
        C, H, W = img.shape
        # norms per pixel across channels -> shape (H, W)
        norms = torch.norm(img, p=2, dim=0)
        flat = norms.view(-1)
        k = min(int(x), flat.numel())
        if k <= 0:
            return torch.zeros_like(img)
        topk_idx = torch.topk(flat, k, largest=True).indices
        mask = torch.zeros_like(flat)
        mask[topk_idx] = 1.0
        mask = mask.view(H, W)
        masked = img * mask.unsqueeze(0)
        return masked
    else:
        # Assume (H, W, C)
        H, W, C = img.shape
        norms = torch.norm(img, p=2, dim=2)  # (H, W)
        flat = norms.view(-1)
        k = min(int(x), flat.numel())
        if k <= 0:
            return torch.zeros_like(img)
        topk_idx = torch.topk(flat, k, largest=True).indices
        mask = torch.zeros_like(flat)
        mask[topk_idx] = 1.0
        mask = mask.view(H, W).unsqueeze(2)
        masked = img * mask
        return masked, mask



In [25]:
def image_attack(image: np.array, caption: str, label: bool, pixels_changed = 100, epsilon = 10, steps: int = 10) -> np.array:

    image = image.astype(np.uint8)
    # Work on float copies
    original = torch.tensor(image.astype(np.float32))
    adversarial_image = torch.tensor(image.copy().astype(np.float32))
    step_eps = float(epsilon) / max(1, steps)

    # Iterative gradient steps (unconstrained)
    for _ in range(steps):
        image_grad = calc_image_grad(adversarial_image, caption, label).detach()
        adversarial_image.requires_grad_(False)
        # apply small step everywhere (not sparsified here) 
        step = step_eps * image_grad.sign()
        adversarial_image += step
        adversarial_image = torch.clip(adversarial_image, 0, 255)

    # Compute full perturbation
    perturb = adversarial_image - original  # float32 HWC

    perturb, mask = top_x(torch.tensor(perturb), pixels_changed)


    # Apply masked perturbation to original 
    perturb = perturb.numpy()
    final = np.clip(image + perturb.astype(np.uint8), 0, 255)

    return final.astype(np.uint8)



for nr, pair in enumerate(val_pairs[:5]):
    image = load_image_from_pair(pair)
    image_tensor = torch.tensor(np.array(image))  # Change to (C, H, W)
    # this is how it should be done according to the preprocess function
    # but its kinda wrong since the preprocess function is maybe for training
    # idk maybe we can find it out later, so the gradient can be passed through it
    #image_tensor = preprocess(image).unsqueeze(0).to(device)
    #print(image_tensor)
    torch.manual_seed(42)
    loss = clip_loss(image_tensor, pair['caption'], pair['is_match'])
    torch.manual_seed(42)
    similarity = clip_embed(image, pair['caption'])
    print(f"\nSample {nr+1}:")
    print(f"loss: {loss.item():.4f}, Similarity: {similarity:.4f}, Match: {pair['is_match']}")
    
    epsilon = 2# step size for adversarial perturbation
    adversarial_image_np = image_attack(np.array(image), pair['caption'], pair['is_match'], pixels_changed=100, epsilon=epsilon, steps=15)

    #reconvert tensor to image
    adversarial_image = PIL.Image.fromarray(adversarial_image_np)
    torch.manual_seed(42)
    adversarial_loss = clip_loss(torch.Tensor(adversarial_image_np), pair['caption'], pair['is_match'])
    torch.manual_seed(42)
    adversarial_similarity = clip_embed(adversarial_image, pair['caption'])

    print(f"Adversarial loss: {adversarial_loss.item():.4f}, Adversarial Similarity: {adversarial_similarity:.4f}")

test_losses = []
adversarial_losses = []

for pair in tqdm(val_pairs[:10]):
    image = load_image_from_pair(pair)
    image_tensor = torch.tensor(np.array(image))  # Change to (C, H, W)
    # this is how it should be done according to the preprocess function
    # but its kinda wrong since the preprocess function is maybe for training
    # idk maybe we can find it out later, so the gradient can be passed through it
    #image_tensor = preprocess(image).unsqueeze(0).to(device)
    #print(image_tensor)
    torch.manual_seed(42)
    loss = clip_loss(image_tensor, pair['caption'], pair['is_match'])

    similarity = clip_embed(image, pair['caption'])
    #print(f"\nSample {nr+1}:")
    #print(f"loss: {loss.item():.4f}, Similarity: {similarity:.4f}, Match: {pair['is_match']}")
    
    epsilon = 2# step size for adversarial perturbation
    adversarial_image_np = image_attack(np.array(image), pair['caption'], pair['is_match'], pixels_changed=100, epsilon=epsilon, steps=15)

    #reconvert tensor to image
    adversarial_image = PIL.Image.fromarray(adversarial_image_np)
    torch.manual_seed(42)
    adversarial_loss = clip_loss(torch.Tensor(adversarial_image_np), pair['caption'], pair['is_match'])
    torch.manual_seed(42)
    adversarial_similarity = clip_embed(adversarial_image, pair['caption'])

    test_losses.append(loss.item())
    adversarial_losses.append(adversarial_loss.item())

print(f"\nAverage test loss: {np.mean(test_losses):.4f}")
print(f"Average adversarial loss: {np.mean(adversarial_losses):.4f}")


# no limit
#Average test loss: 0.4674
#Average adversarial loss: 0.6009

# 100 pixels


Sample 1:
loss: 0.5400, Similarity: 0.3342, Match: True
Adversarial loss: 0.5401, Adversarial Similarity: 0.3342

Sample 2:
loss: 0.4119, Similarity: 0.0343, Match: False
Adversarial loss: 0.4122, Adversarial Similarity: 0.0343

Sample 3:
loss: 0.3687, Similarity: 0.0719, Match: False
Adversarial loss: 0.3688, Adversarial Similarity: 0.0720

Sample 4:
loss: 0.5568, Similarity: 0.1899, Match: False
Adversarial loss: 0.5573, Adversarial Similarity: 0.1899

Sample 5:
loss: 0.4793, Similarity: 0.3785, Match: True
Adversarial loss: 0.4794, Adversarial Similarity: 0.3785


100%|██████████| 10/10 [01:49<00:00, 10.93s/it]


Average test loss: 0.4605
Average adversarial loss: 0.4607





In [26]:

def image_attack_standalone(image: np.ndarray,
                            caption: str,
                            label,
                            pixels_changed: int = 100,
                            epsilon: float = 10.0,
                            dense_steps: int = 10,
                            reopt_steps: int = 300,
                            reopt_lr: float = 0.03,
                            use_iterations: bool = False,
                            device: torch.device = None):
    """
    Two-stage sparse L0 attack with optimized dense-stage:
      - Default (use_iterations=False): one-shot gradient importance to pick top-k pixels (fast)
      - If use_iterations=True: a few gradient-following raw steps without sign (slower, stronger)
    Then re-optimize values on the selected support using clip_loss + Adam.

    Requires:
      - calc_image_grad(image_tensor, caption, label): returns gradient dL/dx shaped H x W x C
      - top_x(perturb_tensor, k) [optional]: returns (sparse_perturb, mask) OR you can rely on internal selection
      - clip_loss(image_tensor, caption, label): returns a scalar torch.Tensor (autograd-compatible)
    """

    if device is None:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # --------- prepare tensors ----------
    img_np = image.astype(np.uint8)
    H, W, C = img_np.shape

    orig = torch.tensor(img_np.astype(np.float32), device=device)   # H x W x C
    orig_bchw = orig.permute(2, 0, 1).unsqueeze(0).contiguous()      # 1 x C x H x W

    # --------- Dense stage: compute importance (two options) ----------
    if not use_iterations:
        # Option A: one-shot gradient at original (fast)
        grad_hwc = calc_image_grad(orig, caption, label).detach().to(device)  # H x W x C
        # per-pixel importance: L2 across channels
        importance = torch.sqrt(torch.sum(grad_hwc * grad_hwc, dim=2))     # H x W
        # We'll use the gradient values later to initialize delta on the selected support.
        dense_grad_hwc = grad_hwc
    else:
        # Option B: a few raw gradient-following steps (no sign). This may give a better dense perturbation.
        adv = orig.clone().detach()
        step_eps = float(epsilon) / max(1, dense_steps)
        for _ in range(dense_steps):
            g = calc_image_grad(adv, caption, label).detach().to(device)    # H x W x C
            gnorm = torch.sqrt(torch.sum(g**2)) + 1e-12
            adv = torch.clamp(adv + (step_eps * g / gnorm), 0.0, 255.0)
        dense_perturb = (adv - orig).detach()    # H x W x C
        importance = torch.sqrt(torch.sum(dense_perturb**2, dim=2))  # H x W
        # convert to "grad-like" values for delta init:
        dense_grad_hwc = dense_perturb

    # --------- select top-k pixels by importance ----------
    flat = importance.view(-1)
    k = min(pixels_changed, flat.numel())
    topk = torch.topk(flat, k)
    topk_indices = topk.indices  # indices into flattened H*W

    # build mask H x W (float 0/1)
    mask_hw = torch.zeros_like(flat, device=device)
    mask_hw[topk_indices] = 1.0
    mask_hw = mask_hw.view(H, W)   # H x W

    # convert mask to 1 x C x H x W (same shape as image tensor)
    mask_bchw = mask_hw.unsqueeze(0).unsqueeze(0).repeat(1, C, 1, 1).float()


    # --------- build initial sparse perturbation (delta) using gradient info ----------
    # Convert dense_grad_hwc (H x W x C) to B x C x H x W
    dense_grad_bchw = dense_grad_hwc.permute(2, 0, 1).unsqueeze(0).contiguous()  # 1 x C x H x W

    # Initialize delta on the support as a small scaled step in gradient direction.
    # Using epsilon as total allowed amplitude scale (you can tune factor)
    init_scale = float(epsilon)  # scale factor for initial delta; you can reduce (e.g., epsilon*0.5)
    delta_init = (init_scale * dense_grad_bchw) * mask_bchw  # only on mask
    # Optionally clip initial delta to reasonable bounds so that orig + delta stays in [0,255]
    with torch.no_grad():
        delta_init = torch.clamp(orig_bchw + delta_init, 0.0, 255.0) - orig_bchw

    delta = delta_init.clone().detach().to(device)
    delta.requires_grad_(True)

    # --------- re-optimize values on support using Adam and clip_loss ----------
    optimizer = torch.optim.Adam([delta], lr=reopt_lr)

    # Decide whether to maximize clip_loss or minimize (common: maximize model loss for untargeted)
    maximize = True

    for it in range(reopt_steps):
        optimizer.zero_grad()

        adv_bchw = orig_bchw + delta * mask_bchw   # only masked positions change
        adv_hwc = adv_bchw.squeeze(0).permute(1, 2, 0).contiguous()  # H x W x C

        loss_val = clip_loss(adv_hwc, caption, label)
        if not isinstance(loss_val, torch.Tensor):
            # make it a torch scalar (no grad) — preferably clip_loss is autograd-capable
            loss_val = torch.tensor(float(loss_val), device=device, requires_grad=False)

        # maximize clip_loss if requested (backprop on -loss)
        loss_to_backprop = -loss_val if maximize else loss_val
        # If clip_loss is not autograd-compatible, you'd need to set delta.grad manually via calc_image_grad (see note below)
        loss_to_backprop.backward()

        # Mask the gradients so only selected pixels are updated.
        if delta.grad is not None:
            delta.grad.data *= mask_bchw

        optimizer.step()

        # Keep adv in valid range and update delta = adv - orig
        with torch.no_grad():
            adv_proj = torch.clamp(orig_bchw + delta * mask_bchw, 0.0, 255.0)
            delta.data = (adv_proj - orig_bchw)

        # (optional) early stopping: you can break if clip_loss indicates success
        # e.g., if targeted: break when model predicts target; if untargeted: break when misclassified.

    # --------- finalize adversarial image ----------
    final_adv_bchw = torch.clamp(orig_bchw + delta * mask_bchw, 0.0, 255.0).squeeze(0)
    final_adv_hwc = final_adv_bchw.permute(1, 2, 0).contiguous()
    final_adv_np = final_adv_hwc.cpu().detach().numpy().astype(np.uint8)

    return final_adv_np


test_losses = []
adversarial_losses = []

for pair in tqdm(val_pairs[:10]):
    image = load_image_from_pair(pair)
    image_tensor = torch.tensor(np.array(image))  # Change to (C, H, W)
    # this is how it should be done according to the preprocess function
    # but its kinda wrong since the preprocess function is maybe for training
    # idk maybe we can find it out later, so the gradient can be passed through it
    #image_tensor = preprocess(image).unsqueeze(0).to(device)
    #print(image_tensor)
    torch.manual_seed(42)
    loss = clip_loss(image_tensor, pair['caption'], pair['is_match'])

    similarity = clip_embed(image, pair['caption'])
    #print(f"\nSample {nr+1}:")
    #print(f"loss: {loss.item():.4f}, Similarity: {similarity:.4f}, Match: {pair['is_match']}")
    
    epsilon = 2# step size for adversarial perturbation
    adversarial_image_np = image_attack_standalone(np.array(image), pair['caption'], pair['is_match'], pixels_changed=100, epsilon=epsilon)

    #reconvert tensor to image
    adversarial_image = PIL.Image.fromarray(adversarial_image_np)
    torch.manual_seed(42)
    adversarial_loss = clip_loss(torch.Tensor(adversarial_image_np), pair['caption'], pair['is_match'])
    torch.manual_seed(42)
    adversarial_similarity = clip_embed(adversarial_image, pair['caption'])

    test_losses.append(loss.item())
    adversarial_losses.append(adversarial_loss.item())

print(f"\nAverage test loss: {np.mean(test_losses):.4f}")
print(f"Average adversarial loss: {np.mean(adversarial_losses):.4f}")


100%|██████████| 10/10 [02:03<00:00, 12.36s/it]


Average test loss: 0.4605
Average adversarial loss: 0.4654





In [34]:
def attack(image_np_uint8: np.ndarray, caption_str: str, api: BlackBoxAPI, budgets: dict) -> dict:
    """
    Student attack function to implement adversarial attacks.
    
    Args:
        image_np_uint8: Original image as uint8 numpy array (H, W, C)
        caption_str: Original caption string
        api: BlackBoxAPI instance for querying the model
        budgets: Dictionary with 'T_MAX', 'P_MAX', 'Q_MAX' limits
        
    Returns:
        Dictionary with:
        - 'success': bool, whether attack succeeded (flipped decision)
        - 'image': np.ndarray, final attacked image  
        - 'caption': str, final attacked caption
        - 'token_cost': int, number of token edits used
        - 'pixel_cost': int, number of pixel edits used
        - 'query_cost': int, number of queries used
    """
    
    # TRIVIAL BASELINE - Students should replace this!
    # This baseline just returns the original inputs without any attack
    
    original_image = image_np_uint8.copy()
    original_caption = caption_str
    
    
    # Get original score to determine target (flip the decision)
    original_score = api.score(original_image, original_caption)
    

    # For this baseline, we don't actually perform any attack
    # Students should implement sophisticated attacks here!
    
    # TODO: IMPLEMENT YOUR ATTACK HERE!
    # You can modify:
    # - Caption tokens (text modifications)  
    # - Image pixels (visual modifications)
    # - Both (multimodal attack)
    
    # Stay within budgets:
    # - budgets['T_MAX'] = maximum token edits
    # - budgets['P_MAX'] = maximum pixel edits
    # - budgets['Q_MAX'] = maximum queries
    
    # Attack Strategies to Consider:
    # - Text Attacks: Synonym replacement, word insertion/deletion, semantic paraphrasing
    # - Image Attacks: Adversarial noise, targeted pixel modifications, patch attacks
    # - Query Optimization: Gradient-free optimization, genetic algorithms, hill climbing
    # - Multimodal: Combined text+image attacks for maximum effectiveness

    original_label = original_score > 0.5

    #final_image = image_attack(original_image, original_caption, original_label, pixels_changed=100, epsilon=2, steps=15)        # Comment out this line when implementing image attacks
    final_image = image_attack_standalone(original_image, original_caption, original_label, pixels_changed=100, epsilon=2)  # Comment out this line when implementing image attacks
    final_caption = original_caption    # Comment out this line when implementing text attacks
    
    # Get final score  
    final_score = api.score(final_image, final_caption)
    
    # Check if attack succeeded (decision flipped)
    original_decision = original_score > 0.5
    final_decision = final_score > 0.5
    success = (original_decision != final_decision)
    
    # Calculate costs
    token_cost = token_edit_cost(original_caption, final_caption)
    pixel_cost = pixel_edit_cost(original_image, final_image)
    query_cost = api.query_count
    
    return {
        'success': success,
        'image': final_image,
        'caption': final_caption,
        'token_cost': token_cost,
        'pixel_cost': pixel_cost,  
        'query_cost': query_cost,
        'original_score': original_score,
        'final_score': final_score
    }

print("Attack function defined (TRIVIAL BASELINE)")
print("   Students should replace the trivial implementation with sophisticated attacks!")
print("   Current baseline: Returns original inputs unchanged (0% success rate expected)")

# Test the attack function
print("\nTesting attack function...")
test_pair = val_pairs[0]
test_image = np.array(load_image_from_pair(test_pair))

# Create fresh API instance  
test_api = BlackBoxAPI(alpha, beta, q_max=100)

attack_budgets = {
    'T_MAX': 10,     # Maximum token edits per sample
    'P_MAX': 100,   # Maximum pixel edits per sample  
    'Q_MAX': 100    # Maximum queries per sample
}

result = attack(test_image, test_pair['caption'], test_api, attack_budgets)

Attack function defined (TRIVIAL BASELINE)
   Students should replace the trivial implementation with sophisticated attacks!
   Current baseline: Returns original inputs unchanged (0% success rate expected)

Testing attack function...


In [35]:
# Evaluation Framework
def evaluate_attack(val_pairs: list, attack_function, alpha: float, beta: float, budgets: dict, max_samples: int = None):
    """
    Evaluate attack function on validation pairs.
    
    Args:
        val_pairs: List of validation pairs
        attack_function: Attack function to evaluate
        alpha, beta: Calibrated parameters
        budgets: Attack budgets dictionary
        max_samples: Limit number of samples (None = all)
        
    Returns:
        Dictionary with evaluation metrics
    """
    
    print(f"Starting evaluation...")
    
    # Limit samples if specified
    eval_pairs = val_pairs[:max_samples] if max_samples else val_pairs
    print(f"Evaluating on {len(eval_pairs)} samples")
    
    results = []
    total_success = 0
    total_token_cost = 0
    total_pixel_cost = 0
    total_query_cost = 0
    
    for i, pair in enumerate(tqdm(eval_pairs, desc="Attacking")):
        # Create fresh API instance for each sample
        api = BlackBoxAPI(alpha, beta, q_max=budgets['Q_MAX'])
        
        # Load image
        image = np.array(load_image_from_pair(pair))
        caption = pair['caption']
        
        try:
            # Run attack
            result = attack_function(image, caption, api, budgets)
            
            # Validate budget constraints
            budget_valid = (
                result['token_cost'] <= budgets['T_MAX'] and
                result['pixel_cost'] <= budgets['P_MAX'] and  
                result['query_cost'] <= budgets['Q_MAX']
            )
            
            if not budget_valid:
                print(f"Sample {i}: Budget violation!")
                print(f"   Tokens: {result['token_cost']}/{budgets['T_MAX']}")
                print(f"   Pixels: {result['pixel_cost']}/{budgets['P_MAX']}")  
                print(f"   Queries: {result['query_cost']}/{budgets['Q_MAX']}")
                result['success'] = False  # Invalid attacks count as failures
            
            results.append(result)
            
            if result['success']:
                total_success += 1
            total_token_cost += result['token_cost']
            total_pixel_cost += result['pixel_cost']
            total_query_cost += result['query_cost']
                
        except Exception as e:
            print(f"Sample {i}: Attack failed with error: {e}")
            # Add failed result
            results.append({
                'success': False,
                'token_cost': budgets['T_MAX'],  # Penalize failures
                'pixel_cost': budgets['P_MAX'], 
                'query_cost': budgets['Q_MAX'],
                'error': str(e)
            })
    
    # Calculate metrics
    n_samples = len(results)
    asr = total_success / n_samples  # Attack Success Rate
    anc = (10*total_token_cost + total_pixel_cost) / n_samples  # Average Number of Changes  
    aq = total_query_cost / n_samples  # Average Queries
    
    # Final score: ASR - 0.5*ANC - 0.1*(AQ/Q_MAX)
    score = asr - 0.5 * (anc / (10*budgets['T_MAX'] + budgets['P_MAX'])) - 0.1 * (aq / budgets['Q_MAX'])
    
    evaluation_result = {
        'ASR': asr,
        'ANC': anc, 
        'AQ': aq,
        'Score': score,
        'n_samples': n_samples,
        'total_success': total_success,
        'avg_token_cost': total_token_cost / n_samples,
        'avg_pixel_cost': total_pixel_cost / n_samples,
        'budgets': budgets,
        'results': results
    }
    
    return evaluation_result

# Run Evaluation
print("Running evaluation on validation set...")

# Define attack budgets
attack_budgets = {
    'T_MAX': 10,     # Maximum token edits per sample
    'P_MAX': 100,   # Maximum pixel edits per sample  
    'Q_MAX': 100    # Maximum queries per sample
}

# Evaluate on subset first (faster for testing)
print("Running on first 50 samples for quick testing...")
eval_result = evaluate_attack(
    val_pairs=val_pairs, 
    attack_function=attack,
    alpha=alpha,
    beta=beta, 
    budgets=attack_budgets,
    max_samples=50  # Quick test on 50 samples
)

# Print results
print(f"\nEVALUATION RESULTS (50 samples):")
print(f"{'='*50}")
print(f"Attack Success Rate (ASR): {eval_result['ASR']:.1%}")
print(f"Average Number of Changes (ANC): {eval_result['ANC']:.2f}")  
print(f"Average Queries (AQ): {eval_result['AQ']:.1f}")
print(f"Final Score: {eval_result['Score']:.4f}")
print(f"{'='*50}")
print(f"Budget Usage:")
print(f"  Avg Token Cost: {eval_result['avg_token_cost']:.2f}/{attack_budgets['T_MAX']}")
print(f"  Avg Pixel Cost: {eval_result['avg_pixel_cost']:.2f}/{attack_budgets['P_MAX']}")  
print(f"  Avg Query Cost: {eval_result['AQ']:.1f}/{attack_budgets['Q_MAX']}")


Running evaluation on validation set...
Running on first 50 samples for quick testing...
Starting evaluation...
Evaluating on 50 samples


Attacking: 100%|██████████| 50/50 [09:46<00:00, 11.74s/it]


EVALUATION RESULTS (50 samples):
Attack Success Rate (ASR): 0.0%
Average Number of Changes (ANC): 52.90
Average Queries (AQ): 2.0
Final Score: -0.1343
Budget Usage:
  Avg Token Cost: 0.00/10
  Avg Pixel Cost: 52.90/100
  Avg Query Cost: 2.0/100





In [None]:
# Full Evaluation (Uncomment when ready to test your attack)

def run_full_evaluation():
    """Run evaluation on all 1000 validation samples."""
    print("Running FULL evaluation on all 1000 validation samples...")
    print("This may take several minutes depending on your attack implementation.")
    
    full_result = evaluate_attack(
        val_pairs=val_pairs,
        attack_function=attack, 
        alpha=alpha,
        beta=beta,
        budgets=attack_budgets,
        max_samples=None  # All samples
    )
    
    print(f"\nFINAL EVALUATION RESULTS:")
    print(f"{'='*60}")
    print(f"Attack Success Rate (ASR): {full_result['ASR']:.1%}")
    print(f"Average Number of Changes (ANC): {full_result['ANC']:.2f}")
    print(f"Average Queries (AQ): {full_result['AQ']:.1f}")
    print(f"Final Score: {full_result['Score']:.4f}")
    print(f"{'='*60}")
    
    return full_result

# Uncomment the line below when ready to run full evaluation:
full_results = run_full_evaluation()


Running FULL evaluation on all 1000 validation samples...
This may take several minutes depending on your attack implementation.
Starting evaluation...
Evaluating on 1000 samples


Attacking: 100%|██████████| 1000/1000 [05:18<00:00,  3.14it/s]


FINAL EVALUATION RESULTS:
Attack Success Rate (ASR): 0.2%
Average Number of Changes (ANC): 76.19
Average Queries (AQ): 2.0
Final Score: -0.1905





## Baseline scoring

In [None]:
def baseline_attack(image_np_uint8: np.ndarray, caption_str: str, api: BlackBoxAPI, budgets: dict) -> dict:

    # TRIVIAL BASELINE - Students should replace this!
    # This baseline just returns the original inputs without any attack
    
    original_image = image_np_uint8.copy()
    original_caption = caption_str
    
    # Get original score to determine target (flip the decision)
    original_score = api.score(original_image, original_caption)

    final_image = original_image        # Comment out this line when implementing image attacks
    final_caption = original_caption    # Comment out this line when implementing text attacks
    
    # Get final score  
    final_score = api.score(final_image, final_caption)
    
    # Check if attack succeeded (decision flipped)
    original_decision = original_score > 0.5
    final_decision = final_score > 0.5
    success = (original_decision != final_decision)
    
    # Calculate costs
    token_cost = token_edit_cost(original_caption, final_caption)
    pixel_cost = pixel_edit_cost(original_image, final_image)
    query_cost = api.query_count
    
    return {
        'success': success,
        'image': final_image,
        'caption': final_caption,
        'token_cost': token_cost,
        'pixel_cost': pixel_cost,  
        'query_cost': query_cost,
        'original_score': original_score,
        'final_score': final_score
    }

In [None]:


# Run Evaluation
print("Running evaluation on validation set...")

# Define attack budgets
attack_budgets = {
    'T_MAX': 10,     # Maximum token edits per sample
    'P_MAX': 100,   # Maximum pixel edits per sample  
    'Q_MAX': 100    # Maximum queries per sample
}

# Evaluate on subset first (faster for testing)
print("Running on first 50 samples for quick testing...")
eval_result = evaluate_attack(
    val_pairs=val_pairs, 
    attack_function=baseline_attack,
    alpha=alpha,
    beta=beta, 
    budgets=attack_budgets,
    max_samples=50  # Quick test on 50 samples
)

# Print results
print(f"\nEVALUATION RESULTS (50 samples):")
print(f"{'='*50}")
print(f"Attack Success Rate (ASR): {eval_result['ASR']:.1%}")
print(f"Average Number of Changes (ANC): {eval_result['ANC']:.2f}")  
print(f"Average Queries (AQ): {eval_result['AQ']:.1f}")
print(f"Final Score: {eval_result['Score']:.4f}")
print(f"{'='*50}")
print(f"Budget Usage:")
print(f"  Avg Token Cost: {eval_result['avg_token_cost']:.2f}/{attack_budgets['T_MAX']}")
print(f"  Avg Pixel Cost: {eval_result['avg_pixel_cost']:.2f}/{attack_budgets['P_MAX']}")  
print(f"  Avg Query Cost: {eval_result['AQ']:.1f}/{attack_budgets['Q_MAX']}")
print(f"\nNOTE: This is a trivial baseline (0% ASR expected)")
print(f"Students should implement sophisticated attacks to improve ASR!")

Running evaluation on validation set...
Running on first 50 samples for quick testing...
Starting evaluation...
Evaluating on 50 samples


Attacking: 100%|██████████| 50/50 [00:01<00:00, 28.71it/s]


EVALUATION RESULTS (50 samples):
Attack Success Rate (ASR): 0.0%
Average Number of Changes (ANC): 0.00
Average Queries (AQ): 2.0
Final Score: -0.0020
Budget Usage:
  Avg Token Cost: 0.00/10
  Avg Pixel Cost: 0.00/100
  Avg Query Cost: 2.0/100

NOTE: This is a trivial baseline (0% ASR expected)
Students should implement sophisticated attacks to improve ASR!





In [None]:
# Full Evaluation (Uncomment when ready to test your attack)

def run_full_evaluation():
    """Run evaluation on all 1000 validation samples."""
    print("Running FULL evaluation on all 1000 validation samples...")
    print("This may take several minutes depending on your attack implementation.")
    
    full_result = evaluate_attack(
        val_pairs=val_pairs,
        attack_function=baseline_attack, 
        alpha=alpha,
        beta=beta,
        budgets=attack_budgets,
        max_samples=None  # All samples
    )
    
    print(f"\nFINAL EVALUATION RESULTS:")
    print(f"{'='*60}")
    print(f"Attack Success Rate (ASR): {full_result['ASR']:.1%}")
    print(f"Average Number of Changes (ANC): {full_result['ANC']:.2f}")
    print(f"Average Queries (AQ): {full_result['AQ']:.1f}")
    print(f"Final Score: {full_result['Score']:.4f}")
    print(f"{'='*60}")
    
    return full_result

# Uncomment the line below when ready to run full evaluation:
full_results = run_full_evaluation()

print("To run full evaluation on all 1000 samples:")
print("Uncomment: full_results = run_full_evaluation()")
print("\nCurrent status: Framework ready for student implementations!")

Running FULL evaluation on all 1000 validation samples...
This may take several minutes depending on your attack implementation.
Starting evaluation...
Evaluating on 1000 samples


Attacking: 100%|██████████| 1000/1000 [00:34<00:00, 28.58it/s]


FINAL EVALUATION RESULTS:
Attack Success Rate (ASR): 0.6%
Average Number of Changes (ANC): 0.00
Average Queries (AQ): 2.0
Final Score: 0.0040
To run full evaluation on all 1000 samples:
Uncomment: full_results = run_full_evaluation()

Current status: Framework ready for student implementations!



