# Lab 1 — Tiny VLM Adversarial Cost Challenge

## Overview

Welcome to the ML Security Lab! In this lab, you'll implement adversarial attacks against a Vision-Language Model (VLM) using the custom dataset and a frozen TinyCLIP scorer.

### Objective
Your task is to build an attack function that can manipulate either:
- **Caption tokens** (text modifications)
- **Image pixels** (visual modifications)
- **Both** (multimodal attack)

The goal is to flip the model's decision (match → no-match or vice versa) while minimizing the attack cost.

### Constraints
- **T_MAX = 10**: Maximum token edits per sample
- **P_MAX = 100**: Maximum pixel edits per sample  
- **Q_MAX = 100**: Maximum queries per sample
- **Evaluation**: Public leaderboard (1,000 val pairs) + Private leaderboard (1,000 test pairs)

### Scoring
Your attack will be evaluated based on:
1. **Success Rate**: Percentage of samples where you successfully flip the decision
2. **Cost Efficiency**: Lower total cost (token edits + pixel edits + queries) is better
3. **Attack Budget**: Must stay within the specified limits

In [1]:
import os
import zipfile

# Check if 'images' directory or 'val_pairs.json' file is missing
if not os.path.exists('images') or not os.path.exists('val_pairs.json'):
    print("Required data not found. Extracting 'data.zip'...")
    
    # Check if 'data.zip' exists
    if os.path.exists('data.zip'):
        with zipfile.ZipFile('data.zip', 'r') as zip_ref:
            zip_ref.extractall()  # Extract all files in the current directory
        print("Extraction complete!")
    else:
        print("Error: 'data.zip' not found. Please ensure the file is in the current directory.")
else:
    print("All required data is already present.")

All required data is already present.


In [2]:
# Import required libraries
import numpy as np
import pandas as pd
from PIL import Image
import PIL
import torch
import torch.nn.functional as F
from tqdm.auto import tqdm
import open_clip
from datasets import load_dataset
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
import json
import random
import os
from typing import List, Tuple, Dict, Optional
import warnings
import requests
from urllib.parse import urlparse
import hashlib
from collections import defaultdict
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)
random.seed(42)

print("All packages imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

  from .autonotebook import tqdm as notebook_tqdm


All packages imported successfully!
PyTorch version: 2.8.0+cu128
CUDA available: True
Using device: cuda


In [3]:
# Data Loading
print("Loading validation data...")

# Load validation pairs from JSON
with open('val_pairs.json', 'r') as f:
    val_pairs = json.load(f)

print(f"Loaded {len(val_pairs)} validation pairs")

# Helper function to load images
def load_image_from_pair(pair: dict) -> Image.Image:
    """Load image from the pair dictionary using image_path"""
    return Image.open(pair['image_path']).convert('RGB')

# Sample a few pairs to verify data loading
print("\nSample validation pairs:")
for i in range(3):
    pair = val_pairs[i]
    print(f"  Image ID: {pair['image_id']}, Path: {pair['image_path']}")
    print(f"  Caption: {pair['caption'][:50]}..., Match: {pair['is_match']}")
    print()

print(f"Data distribution:")
labels = [pair['is_match'] for pair in val_pairs]
print(f"  Match (True): {sum(labels)}")
print(f"  No-match (False): {len(labels) - sum(labels)}")
print(f"  Balance: {sum(labels)/len(labels):.2%} positive")

Loading validation data...
Loaded 1000 validation pairs

Sample validation pairs:
  Image ID: 38137, Path: images/val/000000119402.jpg
  Caption: A gold bus traveling on a single lane road..., Match: True

  Image ID: 15194, Path: images/val/000000404780.jpg
  Caption: Two women in the snow on skis in front of a large ..., Match: False

  Image ID: 19082, Path: images/val/000000148898.jpg
  Caption: A man jumping a brown horse over an obstacle...., Match: False

Data distribution:
  Match (True): 500
  No-match (False): 500
  Balance: 50.00% positive


In [4]:
# TinyCLIP Scorer Implementation
print("Loading CLIP model...")

# Try to load TinyCLIP, fallback to OpenCLIP ViT-B/32 if failed
try:
    # Attempt to load TinyCLIP from HuggingFace hub
    model, preprocess, tokenizer = open_clip.create_model_and_transforms(
        "hf-hub:microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M"
    )
    print("Successfully loaded TinyCLIP model")
except Exception as e:
    print(f"Failed to load TinyCLIP: {e}")
    print("Falling back to OpenCLIP ViT-B/32...")

    model, preprocess, tokenizer = open_clip.create_model_and_transforms(
        "ViT-B-32", 
        pretrained="laion2b_s34b_b79k"
    )
    print("Successfully loaded OpenCLIP ViT-B/32")

# Move model to device
model = model.to(device)
model.eval()

print(f"Model loaded on: {device}")


Loading CLIP model...
Failed to load TinyCLIP: Failed initial config/weights load from HF Hub microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M: Failed to download file (open_clip_config.json) for microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M. Last error: 404 Client Error. (Request ID: Root=1-68f3fa54-506b74995df7f32878ae3dd2;20d2445e-7a7f-427f-8fd3-16ecbef3a695)

Repository Not Found for url: https://huggingface.co/microsoft/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M/resolve/main/open_clip_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Falling back to OpenCLIP ViT-B/32...
Successfully loaded OpenCLIP ViT-B/32
Model loaded on: cuda


In [5]:
def clip_embed(image: Image.Image, caption: str) -> float:
    """
    Compute cosine similarity between image and text embeddings.
    
    Args:
        image: PIL Image
        caption: Text string
    
    Returns:
        Cosine similarity score between normalized embeddings
    """
    with torch.no_grad():
        # Preprocess image
        image_tensor = preprocess(image).unsqueeze(0).to(device)
        
        # Tokenize text properly using open_clip tokenizer
        text_tokens = open_clip.tokenize([caption]).to(device)
        
        # Get embeddings
        image_features = model.encode_image(image_tensor)
        text_features = model.encode_text(text_tokens)
        
        # Normalize embeddings
        image_features = F.normalize(image_features, dim=-1)
        text_features = F.normalize(text_features, dim=-1)
        
        # Compute cosine similarity
        similarity = (image_features @ text_features.T).item()
        
    return similarity

# Test the embedding function
print("\nTesting CLIP embedding function...")
test_pair = val_pairs[0]
test_image = load_image_from_pair(test_pair)
test_similarity = clip_embed(test_image, test_pair['caption'])
print(f"Sample similarity score: {test_similarity:.4f} (Expected match: {test_pair['is_match']})")

# Test on a few more samples
print("\nTesting on more samples:")
for i in range(3):
    pair = val_pairs[i]
    image = load_image_from_pair(pair)
    similarity = clip_embed(image, pair['caption'])
    print(f"Sample {i+1}: similarity={similarity:.4f}, match={pair['is_match']}")


Testing CLIP embedding function...
Sample similarity score: 0.3695 (Expected match: True)

Testing on more samples:
Sample 1: similarity=0.3526, match=True
Sample 2: similarity=0.0343, match=False
Sample 3: similarity=0.0774, match=False


In [6]:
print(open_clip.tokenize)

<function tokenize at 0x7f810b5fe020>


In [7]:
# Calibration: Fit logistic regression to get alpha, beta parameters
print("Calibrating scorer with logistic regression...")

# Use first 200 samples for calibration
tune_slice = val_pairs[:200]
print(f"Using {len(tune_slice)} samples for calibration")

# Compute similarities for calibration
similarities = []
ground_truths = []

print("Computing similarities for calibration...")
for pair in tqdm(tune_slice, desc="Calibration"):
    image = load_image_from_pair(pair)
    similarity = clip_embed(image, pair['caption'])
    similarities.append(similarity)
    ground_truths.append(int(pair['is_match']))

similarities = np.array(similarities).reshape(-1, 1)
ground_truths = np.array(ground_truths)

# Fit logistic regression: sigmoid(alpha * cosine + beta)
lr = LogisticRegression()
lr.fit(similarities, ground_truths)

# Extract alpha and beta
alpha = lr.coef_[0][0]  # Coefficient for similarity
beta = lr.intercept_[0]  # Intercept

print(f"Calibration complete!")
print(f"   Alpha (slope): {alpha:.4f}")
print(f"   Beta (intercept): {beta:.4f}")

# Test calibration
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

test_similarities = similarities[:5].flatten()
test_labels = ground_truths[:5]
calibrated_probs = sigmoid(alpha * test_similarities + beta)

print(f"\nCalibration test:")
for i in range(5):
    print(f"  Sim: {test_similarities[i]:.4f} → Prob: {calibrated_probs[i]:.4f}, True: {test_labels[i]}")

Calibrating scorer with logistic regression...
Using 200 samples for calibration
Computing similarities for calibration...


Calibration: 100%|██████████| 200/200 [00:03<00:00, 61.83it/s]

Calibration complete!
   Alpha (slope): 6.6231
   Beta (intercept): -1.1130

Calibration test:
  Sim: 0.3561 → Prob: 0.7765, True: 1
  Sim: 0.0343 → Prob: 0.2920, True: 0
  Sim: 0.0802 → Prob: 0.3585, True: 0
  Sim: 0.1899 → Prob: 0.5362, True: 0
  Sim: 0.3586 → Prob: 0.7794, True: 1





In [8]:
from scipy.ndimage import label

# BlackBox API Implementation
import editdistance  # For Levenshtein distance

class BlackBoxAPI:
    """
    Black-box API for the VLM scorer with query budget tracking and cost calculation.
    """
    
    def __init__(self, alpha: float, beta: float, q_max: int = 200):
        """
        Initialize the black-box API.
        
        Args:
            alpha: Logistic regression slope parameter
            beta: Logistic regression intercept parameter  
            q_max: Maximum queries allowed per sample
        """
        self.alpha = alpha
        self.beta = beta
        self.q_max = q_max
        self.query_count = 0
        
    def score(self, image_uint8: np.ndarray, caption_str: str) -> float:
        """
        Score image-caption pair and return probability.
        
        Args:
            image_uint8: Image as uint8 numpy array (H, W, C)
            caption_str: Caption string
            
        Returns:
            Probability in [0, 1] using sigmoid(alpha * cosine + beta)
        """
        if self.query_count >= self.q_max:
            raise RuntimeError(f"Query budget exceeded! Used {self.query_count}/{self.q_max}")
            
        # Convert numpy array to PIL Image
        image_pil = Image.fromarray(image_uint8)
        
        # Get cosine similarity
        cosine_sim = clip_embed(image_pil, caption_str)
        
        # Apply calibrated sigmoid
        logit = self.alpha * cosine_sim + self.beta
        probability = 1 / (1 + np.exp(-logit))
        
        self.query_count += 1
        
        return probability
    
    def reset_query_count(self):
        """Reset query counter for new sample."""
        self.query_count = 0
        
    def get_remaining_queries(self) -> int:
        """Get remaining query budget."""
        return self.q_max - self.query_count

# Cost Functions
def token_edit_cost(original: str, modified: str) -> int:
    """
    Compute token-level Levenshtein distance using CLIP tokenizer.
    
    Args:
        original: Original caption
        modified: Modified caption
        
    Returns:
        Number of token edits (insertions, deletions, substitutions)
    """
    # Use CLIP tokenizer for more accurate tokenization
    orig_tokens = open_clip.tokenize([original], context_length=77)[0].numpy()
    mod_tokens = open_clip.tokenize([modified], context_length=77)[0].numpy()
    
    # Remove padding tokens (0s) and special tokens for fair comparison
    # Keep only actual content tokens
    orig_tokens = orig_tokens[orig_tokens != 0]
    mod_tokens = mod_tokens[mod_tokens != 0]
    
    return editdistance.eval(orig_tokens.tolist(), mod_tokens.tolist())

def pixel_edit_cost(original: np.ndarray, modified: np.ndarray) -> int:
    """
    Compute number of changed pixels with reduced cost for continuous regions.
    
    Args:
        original: Original image as uint8 numpy array
        modified: Modified image as uint8 numpy array
        
    Returns:
        Adjusted cost based on number of changed pixels, with reduced cost for continuous regions.
    """
    # Find the difference mask
    diff_mask = np.any(original != modified, axis=-1)
    
    # Label connected components in the difference mask
    labeled_regions, num_features = label(diff_mask)
    
    # Count pixels in each connected region
    total_cost = 0
    for region_id in range(1, num_features + 1):
        region_size = np.sum(labeled_regions == region_id)
        if region_size > 0:
            # Full cost for the first pixel, half cost for the rest
            total_cost += 1 + (region_size - 1) * 0.5
    
    return int(total_cost)

# Test the BlackBox API
print("Testing BlackBox API...")

# Initialize API with calibrated parameters
api = BlackBoxAPI(alpha, beta, q_max=200)

# Test on a sample
test_pair = val_pairs[0]
test_image = load_image_from_pair(test_pair)
test_image_uint8 = np.array(test_image)

# Get score
score = api.score(test_image_uint8, test_pair['caption'])
print(f"API Score: {score:.4f} (Expected match: {test_pair['is_match']})")
print(f"Queries used: {api.query_count}/{api.q_max}")

# Test cost functions
original_caption = "A cat sitting on a mat"
modified_caption = "A dog standing on a rug" 
token_cost = token_edit_cost(original_caption, modified_caption)
print(f"\nToken edit cost example:")
print(f"  Original: '{original_caption}'")  
print(f"  Modified: '{modified_caption}'")
print(f"  Cost: {token_cost} token edits")

# Test pixel cost (create a simple modification)
original_img = np.zeros((100, 100, 3), dtype=np.uint8)
modified_img = original_img.copy()
modified_img[10:20, 10:20] = 255  # Change a 10x10 region
pixel_cost = pixel_edit_cost(original_img, modified_img)
print(f"\nPixel edit cost example:")
print(f"  Modified {pixel_cost} pixels in 100x100 image")

Testing BlackBox API...
API Score: 0.7881 (Expected match: True)
Queries used: 1/200

Token edit cost example:
  Original: 'A cat sitting on a mat'
  Modified: 'A dog standing on a rug'
  Cost: 3 token edits

Pixel edit cost example:
  Modified 50 pixels in 100x100 image


## Your Task: Implement Adversarial Attacks

### Attack Function Template

Replace the trivial baseline in the `attack()` function with sophisticated adversarial attacks:

```python
def attack(image_np_uint8, caption_str, api, budgets):
    # Your attack implementation here!
    # You can modify:
    # - Caption tokens (text modifications)  
    # - Image pixels (visual modifications)
    # - Both (multimodal attack)
    
    # Stay within budgets:
    # - budgets['T_MAX'] = 10  token edits
    # - budgets['P_MAX'] = 100 pixel edits
    # - budgets['Q_MAX'] = 100 queries
    
    return {
        'success': success,      # bool: did you flip the decision?
        'image': final_image,    # np.array: attacked image
        'caption': final_caption, # str: attacked caption  
        'token_cost': token_cost, # int: tokens changed
        'pixel_cost': pixel_cost, # int: pixels changed
        'query_cost': query_cost  # int: API calls made
    }
```

### Attack Strategies to Consider

- **Text Attacks**: Synonym replacement, word insertion/deletion, semantic paraphrasing
- **Image Attacks**: Adversarial noise, targeted pixel modifications, patch attacks
- **Query Optimization**: Gradient-free optimization, genetic algorithms, hill climbing
- **Multimodal**: Combined text+image attacks for maximum effectiveness

### Evaluation Metrics

Your attack will be scored as: **ASR - 0.5×ANC - 0.1×(AQ/Q_MAX)**

- **ASR**: Attack Success Rate (higher is better)
- **ANC**: Average Number of Changes (lower is better) 
- **AQ**: Average Queries (lower is better)

### Next Steps

1. **Implement your attack** in the `attack()` function above
2. **Test locally** using the evaluation framework  
3. **Run on full dataset** by changing `max_samples=None`

Good luck! 

### <span style="color:red">**FILL THIS CODE BLOCK**</span>

In [9]:
# with torch.no_grad():
#         # Preprocess image
#         image_tensor = preprocess(image).unsqueeze(0).to(device)
        
#         # Tokenize text properly using open_clip tokenizer
#         text_tokens = open_clip.tokenize([caption]).to(device)
        
#         # Get embeddings
#         image_features = model.encode_image(image_tensor)
#         text_features = model.encode_text(text_tokens)
        
#         # Normalize embeddings
#         image_features = F.normalize(image_features, dim=-1)
#         text_features = F.normalize(text_features, dim=-1)
        
#         # Compute cosine similarity
#         similarity = (image_features @ text_features.T).item()
        
#     return similarity

In [10]:
def calc_similarity(image_features, text_features) -> float:
    with torch.no_grad():
        image_features = F.normalize(image_features, dim=-1)
        text_features = F.normalize(text_features, dim=-1)
        
        # Compute cosine similarity
        similarity = (image_features @ text_features.T).item()
        
    return similarity

In [11]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = "cpu"
model.to(device)
def encode_text(text:str):
    text_tokens = open_clip.tokenize([text]).to(device)
    text_features = model.encode_text(text_tokens).to("cpu")
    # free up space on device
    text_tokens = None
    return text_features

In [12]:
# load nounlist.csv
nounlist = pd.read_csv('nounlist.csv', header=None)
nounlist.columns = ['nouns']
# delete all but a 300 rows
# nounlist = nounlist.head(300)

#apply function encode_text to each noun and put in new column
with torch.no_grad():
    nounlist['encoded'] = nounlist['nouns'].apply(encode_text)
nounlist

Unnamed: 0,nouns,encoded
0,ATM,"[[tensor(0.1433), tensor(-0.2801), tensor(-0.1..."
1,CD,"[[tensor(0.2364), tensor(-0.1194), tensor(-0.4..."
2,SUV,"[[tensor(-0.3482), tensor(-0.1286), tensor(-0...."
3,TV,"[[tensor(-0.1657), tensor(-0.3337), tensor(-0...."
4,aardvark,"[[tensor(0.4404), tensor(-0.2014), tensor(-0.0..."
...,...,...
6796,zoo,"[[tensor(-0.1076), tensor(-0.0910), tensor(-0...."
6797,zoologist,"[[tensor(-0.0608), tensor(0.0313), tensor(0.00..."
6798,zoology,"[[tensor(-0.0363), tensor(0.0437), tensor(-0.2..."
6799,zoot-suit,"[[tensor(0.1474), tensor(-0.1697), tensor(-0.4..."


In [30]:
# pip install spacy inflect
# python -m spacy download en_core_web_sm

import spacy
import inflect

nlp = spacy.load("en_core_web_sm")
infl = inflect.engine()

NOUN_TAGS = {"NN", "NNS", "NNP", "NNPS"}

def _match_casing(src: str, repl: str) -> str:
    if src.isupper():
        return repl.upper()
    if src.istitle():
        return repl.title()
    return repl

def _maybe_pluralize(base: str, tag: str) -> str:
    if tag in {"NNS", "NNPS"}:
        return infl.plural(base) or base + "s"
    return base

def attack_replace_nouns(caption: str, replace_word = "") -> str:
    doc = nlp(caption)
    out = []
    for tok in doc:
        if tok.tag_ in NOUN_TAGS:
            word = replace_word
            if word != "":
                word = _maybe_pluralize(replace_word, tok.tag_)
                word = _match_casing(tok.text, word)
            out.append(word + tok.whitespace_)
        else:
            out.append(tok.text_with_ws)
    return "".join(out)

# examples
print(attack_replace_nouns("A small dog runs across the field."))
# -> "A small cat runs across the cat."
print(attack_replace_nouns("John's red cars were parked in New York"))
print(attack_replace_nouns("A man standing in front of a refrigerator freezer."))
# -> "Cat's red cats were parked in Cat Cat."


A small  runs across the .
's red  were parked in  
A  standing in  of a  .


In [31]:
ADJ_TAGS = {"JJ", "JJR", "JJS"}

def _match_casing(src: str, repl: str) -> str:
    if src.isupper():
        return repl.upper()
    if src.istitle():
        return repl.title()
    return repl

def _degree_wrap(base_adj: str, tag: str) -> str:
    # Safer than trying to inflect irregulars: use "more/most" wrappers
    if tag == "JJR":
        return f"more {base_adj}"
    if tag == "JJS":
        return f"most {base_adj}"
    return base_adj

def attack_replace_adjectives(caption: str, replacement_adj: str = "") -> str:
    doc = nlp(caption)
    out = []
    for tok in doc:
        if tok.tag_ in ADJ_TAGS:
            repl = _degree_wrap(replacement_adj, tok.tag_)
            # If the replacement is multiword (e.g., "more fluffy"), match casing of the first word only
            parts = repl.split(" ")
            parts[0] = _match_casing(tok.text, parts[0])
            repl = " ".join(parts)
            # Preserve original whitespace
            out.append(repl + tok.whitespace_)
        else:
            out.append(tok.text_with_ws)
    return "".join(out)

In [32]:
def remove_caption(caption: str) -> str:
    return "".join(caption.split(" ")[:-10])

In [33]:
def caption_remove_attack(original_caption: str, image: np.ndarray, original_label, api) -> str:
    """
    Caption attack that removes words from the original caption to flip the model's prediction.
    
    Args:
        original_caption: The original caption string.
        image: The original image as a uint8 numpy array.
        original_label: The original label (True/False).
        
    Returns:
        The modified caption after removing words.
    """
    words = original_caption.split()
    best_caption = original_caption
    best_score = api.score(image, best_caption)
    
    improved = True
    while improved and len(words) > 1:
        improved = False
        for i in range(len(words)):
            # Create a new caption by removing the i-th word
            new_caption = ' '.join(words[:i] + words[i+1:])
            new_score = api.score(image, new_caption)
            
            # Check if this removal improves the score towards flipping the label
            if (original_label and new_score < 0.5) or (not original_label and new_score > 0.5):
                best_caption = new_caption
                best_score = new_score
                words = new_caption.split()
                improved = True
                break  # Restart the process after a successful removal
                
    return best_caption

In [None]:
def caption_attack(original_caption: str, image: np.ndarray, orginal_label) -> str:
    
    image_tensor = preprocess(Image.fromarray(image)).unsqueeze(0).to(device)
    image_features = model.encode_image(image_tensor)
    if orginal_label == False:
        best_sim = -1
    else:
        best_sim = 1
    
    for index, row in nounlist.iterrows():
        #print(best_sim)
        noun = row['nouns']
        text_features = row['encoded']
        sim = calc_similarity(image_features.to("cpu"), text_features.to("cpu"))
        if (sim < best_sim and orginal_label == True) or (sim > best_sim and orginal_label == False):
            best_sim = sim
            best_noun = noun
    #print(f"Best noun to add: {best_noun} with similarity {best_sim}")

    final_caption = attack_replace_nouns(original_caption, best_noun)

    return final_caption

In [None]:
def attack(image_np_uint8: np.ndarray, caption_str: str, api: BlackBoxAPI, budgets: dict, verbose=False) -> dict:
    """
    Student attack function to implement adversarial attacks.
    
    Args:
        image_np_uint8: Original image as uint8 numpy array (H, W, C)
        caption_str: Original caption string
        api: BlackBoxAPI instance for querying the model
        budgets: Dictionary with 'T_MAX', 'P_MAX', 'Q_MAX' limits
        
    Returns:
        Dictionary with:
        - 'success': bool, whether attack succeeded (flipped decision)
        - 'image': np.ndarray, final attacked image  
        - 'caption': str, final attacked caption
        - 'token_cost': int, number of token edits used
        - 'pixel_cost': int, number of pixel edits used
        - 'query_cost': int, number of queries used
    """
    
    # TRIVIAL BASELINE - Students should replace this!
    # This baseline just returns the original inputs without any attack
    
    original_image = image_np_uint8.copy()
    original_caption = caption_str
    
    
    
    # Get original score to determine target (flip the decision)
    original_score = api.score(original_image, original_caption)
    

    # For this baseline, we don't actually perform any attack
    # Students should implement sophisticated attacks here!
    
    # TODO: IMPLEMENT YOUR ATTACK HERE!
    # You can modify:
    # - Caption tokens (text modifications)  
    # - Image pixels (visual modifications)
    # - Both (multimodal attack)
    
    # Stay within budgets:
    # - budgets['T_MAX'] = maximum token edits
    # - budgets['P_MAX'] = maximum pixel edits
    # - budgets['Q_MAX'] = maximum queries
    
    # Attack Strategies to Consider:
    # - Text Attacks: Synonym replacement, word insertion/deletion, semantic paraphrasing
    # - Image Attacks: Adversarial noise, targeted pixel modifications, patch attacks
    # - Query Optimization: Gradient-free optimization, genetic algorithms, hill climbing
    # - Multimodal: Combined text+image attacks for maximum effectiveness

    original_label = original_score > 0.5
    

    #final_image = image_attack(original_image, original_caption, original_label, pixels_changed=100, epsilon=2, steps=15)        # Comment out this line when implementing image attacks
    final_image = original_image  # Placeholder: No image attack implemented yet
    final_caption = caption_attack(original_caption, original_image, original_label)    # Comment out this line when implementing text attacks
    #final_caption = caption_remove_attack(original_caption, original_image, original_label, api)    # Comment out this line when implementing text attacks
    #final_caption = attack_replace_nouns(original_caption)
    #final_caption = attack_replace_adjectives(original_caption)
    #final_caption = remove_caption(original_caption)
    # Get final score  
    final_score = api.score(final_image, final_caption)
    if verbose:
        print(f"Original caption: {original_caption}")
        print(f"Original score: {original_score:.4f}, Original label: {original_label}")
        print(f"Final score: {final_score:.4f}")
    
    # Check if attack succeeded (decision flipped)
    original_decision = original_score > 0.5
    final_decision = final_score > 0.5
    success = (original_decision != final_decision)
    
    # Calculate costs
    token_cost = token_edit_cost(original_caption, final_caption)
    pixel_cost = pixel_edit_cost(original_image, final_image)
    query_cost = api.query_count
    
    return {
        'success': success,
        'image': final_image,
        'caption': final_caption,
        'token_cost': token_cost,
        'pixel_cost': pixel_cost,  
        'query_cost': query_cost,
        'original_score': original_score,
        'final_score': final_score
    }

print("Attack function defined (TRIVIAL BASELINE)")
print("   Students should replace the trivial implementation with sophisticated attacks!")
print("   Current baseline: Returns original inputs unchanged (0% success rate expected)")

# Test the attack function
print("\nTesting attack function...")
test_pair = val_pairs[0]
test_image = np.array(load_image_from_pair(test_pair))

# Create fresh API instance  
test_api = BlackBoxAPI(alpha, beta, q_max=100)

attack_budgets = {
    'T_MAX': 10,     # Maximum token edits per sample
    'P_MAX': 100,   # Maximum pixel edits per sample  
    'Q_MAX': 100    # Maximum queries per sample
}

result = attack(test_image, test_pair['caption'], test_api, attack_budgets)
print(f"Attack success: {result['success']}")

Attack function defined (TRIVIAL BASELINE)
   Students should replace the trivial implementation with sophisticated attacks!
   Current baseline: Returns original inputs unchanged (0% success rate expected)

Testing attack function...
Original caption: A gold bus traveling on a single lane road
Original score: 0.7615, Original label: True
1
0.13542677462100983
0.13542677462100983
0.13542677462100983
0.13542677462100983
0.13542677462100983
0.13542677462100983
0.13542677462100983
0.13542677462100983
0.11250507086515427
0.11250507086515427
0.11250507086515427
0.11250507086515427
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.06935974210500717
0.069359742105007

In [29]:
# Evaluation Framework
def evaluate_attack(val_pairs: list, attack_function, alpha: float, beta: float, budgets: dict, max_samples: int = None):
    """
    Evaluate attack function on validation pairs.
    
    Args:
        val_pairs: List of validation pairs
        attack_function: Attack function to evaluate
        alpha, beta: Calibrated parameters
        budgets: Attack budgets dictionary
        max_samples: Limit number of samples (None = all)
        
    Returns:
        Dictionary with evaluation metrics
    """
    
    print(f"Starting evaluation...")
    
    # Limit samples if specified
    eval_pairs = val_pairs[:max_samples] if max_samples else val_pairs
    print(f"Evaluating on {len(eval_pairs)} samples")
    
    results = []
    total_success = 0
    total_token_cost = 0
    total_pixel_cost = 0
    total_query_cost = 0
    
    for i, pair in enumerate(tqdm(eval_pairs, desc="Attacking")):
        # Create fresh API instance for each sample
        api = BlackBoxAPI(alpha, beta, q_max=budgets['Q_MAX'])
        
        # Load image
        image = np.array(load_image_from_pair(pair))
        caption = pair['caption']
        
        try:
            # Run attack
            result = attack_function(image, caption, api, budgets)
            
            # Validate budget constraints
            budget_valid = (
                result['token_cost'] <= budgets['T_MAX'] and
                result['pixel_cost'] <= budgets['P_MAX'] and  
                result['query_cost'] <= budgets['Q_MAX']
            )
            
            if not budget_valid:
                print(f"Sample {i}: Budget violation!")
                print(f"   Tokens: {result['token_cost']}/{budgets['T_MAX']}")
                print(f"   Pixels: {result['pixel_cost']}/{budgets['P_MAX']}")  
                print(f"   Queries: {result['query_cost']}/{budgets['Q_MAX']}")
                result['success'] = False  # Invalid attacks count as failures
            
            results.append(result)
            
            if result['success']:
                total_success += 1
            total_token_cost += result['token_cost']
            total_pixel_cost += result['pixel_cost']
            total_query_cost += result['query_cost']
                
        except Exception as e:
            print(f"Sample {i}: Attack failed with error: {e}")
            # Add failed result
            results.append({
                'success': False,
                'token_cost': budgets['T_MAX'],  # Penalize failures
                'pixel_cost': budgets['P_MAX'], 
                'query_cost': budgets['Q_MAX'],
                'error': str(e)
            })
    
    # Calculate metrics
    n_samples = len(results)
    asr = total_success / n_samples  # Attack Success Rate
    anc = (10*total_token_cost + total_pixel_cost) / n_samples  # Average Number of Changes  
    aq = total_query_cost / n_samples  # Average Queries
    
    # Final score: ASR - 0.5*ANC - 0.1*(AQ/Q_MAX)
    score = asr - 0.5 * (anc / (10*budgets['T_MAX'] + budgets['P_MAX'])) - 0.1 * (aq / budgets['Q_MAX'])
    
    evaluation_result = {
        'ASR': asr,
        'ANC': anc, 
        'AQ': aq,
        'Score': score,
        'n_samples': n_samples,
        'total_success': total_success,
        'avg_token_cost': total_token_cost / n_samples,
        'avg_pixel_cost': total_pixel_cost / n_samples,
        'budgets': budgets,
        'results': results
    }
    
    return evaluation_result

# Run Evaluation
print("Running evaluation on validation set...")

# Define attack budgets
attack_budgets = {
    'T_MAX': 10,     # Maximum token edits per sample
    'P_MAX': 100,   # Maximum pixel edits per sample  
    'Q_MAX': 100    # Maximum queries per sample
}

# Evaluate on subset first (faster for testing)
print("Running on first 50 samples for quick testing...")
eval_result = evaluate_attack(
    val_pairs=val_pairs, 
    attack_function=attack,
    alpha=alpha,
    beta=beta, 
    budgets=attack_budgets,
    max_samples=50  # Quick test on 50 samples
)

# Print results
print(f"\nEVALUATION RESULTS (50 samples):")
print(f"{'='*50}")
print(f"Attack Success Rate (ASR): {eval_result['ASR']:.1%}")
print(f"Average Number of Changes (ANC): {eval_result['ANC']:.2f}")  
print(f"Average Queries (AQ): {eval_result['AQ']:.1f}")
print(f"Final Score: {eval_result['Score']:.4f}")
print(f"{'='*50}")
print(f"Budget Usage:")
print(f"  Avg Token Cost: {eval_result['avg_token_cost']:.2f}/{attack_budgets['T_MAX']}")
print(f"  Avg Pixel Cost: {eval_result['avg_pixel_cost']:.2f}/{attack_budgets['P_MAX']}")  
print(f"  Avg Query Cost: {eval_result['AQ']:.1f}/{attack_budgets['Q_MAX']}")


Running evaluation on validation set...
Running on first 50 samples for quick testing...
Starting evaluation...
Evaluating on 50 samples


Attacking:   0%|          | 0/50 [00:00<?, ?it/s]

Original caption: A gold bus traveling on a single lane road
Original score: 0.7744, Original label: True
1
0.13234369456768036
0.13234369456768036
0.13234369456768036
0.13234369456768036
0.13234369456768036
0.13234369456768036
0.13234369456768036
0.13234369456768036
0.11493301391601562
0.11493301391601562
0.11493301391601562
0.11493301391601562
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.06923467665910721
0.0692346766

Attacking:   2%|▏         | 1/50 [00:01<00:53,  1.09s/it]

0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.002429043175652623
0.00242904317

Attacking:   4%|▍         | 2/50 [00:02<00:52,  1.09s/it]

0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807449341
0.2790728807

Attacking:   6%|▌         | 3/50 [00:03<00:50,  1.09s/it]

Final score: 0.6848
Original caption: A lady flying a kite in a neighborhood while there is daylight.
Original score: 0.5362, Original label: True
1
0.13555191457271576
0.13555191457271576
0.1286998987197876
0.1286998987197876
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.1058352142572403
0.08470931649208069
0.08470931649208069
0.08470931649208069
0.07761826366186142
0.07761826366186142
0.07761826366186142
0.07761826366186142
0.07761826366186142
0.07761826366186142
0.07761826366186142
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589481354
0.07189084589

Attacking:   8%|▊         | 4/50 [00:04<00:49,  1.09s/it]

-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.01642901636660099
-0.0164290163

Attacking:  10%|█         | 5/50 [00:05<00:49,  1.10s/it]

Final score: 0.5691
Sample 4: Budget violation!
   Tokens: 12/10
   Pixels: 0/100
   Queries: 2/100
Original caption: A man riding a water ski kicking up waves.
Original score: 0.7138, Original label: True
1
0.1378413587808609
0.1378413587808609
0.10733005404472351
0.10733005404472351
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.1063356101512909
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.10400473326444626
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.08635207265615463
0.0863

Attacking:  12%|█▏        | 6/50 [00:06<00:48,  1.09s/it]

0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.009164676070213318
0.00916467607

Attacking:  14%|█▍        | 7/50 [00:07<00:47,  1.09s/it]

0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.048111967742443085
0.04811196774

Attacking:  16%|█▌        | 8/50 [00:08<00:45,  1.09s/it]

0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935
0.30900126695632935


Attacking:  18%|█▊        | 9/50 [00:09<00:44,  1.09s/it]

0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361362457
0.2928909361

Attacking:  20%|██        | 10/50 [00:10<00:43,  1.09s/it]

Final score: 0.6222
Original caption: Two young boys riding skis down the side of a snow covered slope.
Original score: 0.1955, Original label: False
-1
0.10619892925024033
0.16108998656272888
0.16108998656272888
0.1710442453622818
0.1710442453622818
0.1710442453622818
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.17263182997703552
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269310951233
0.18832269

Attacking:  22%|██▏       | 11/50 [00:12<00:42,  1.09s/it]

0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255
0.28327542543411255


Attacking:  24%|██▍       | 12/50 [00:13<00:41,  1.09s/it]

0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437
0.27387353777885437


Attacking:  26%|██▌       | 13/50 [00:14<00:40,  1.09s/it]

-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
-0.008844197727739811
Best noun to add: sweatsuit with similarity -0.008844197727739811
Final score: 0.2955
Sample 12: Budget violation!
   Tokens: 14/10
   Pixels: 0/100
   Queries: 2/1

Attacking:  28%|██▊       | 14/50 [00:15<00:39,  1.09s/it]

0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.031162602826952934
0.03116260282

Attacking:  30%|███       | 15/50 [00:16<00:38,  1.09s/it]

Original caption: A man is on top of an elephant in a river.
Original score: 0.7538, Original label: True
1
0.1267031580209732
0.1267031580209732
0.1267031580209732
0.1267031580209732
0.1267031580209732
0.12062136083841324
0.12062136083841324
0.12062136083841324
0.12062136083841324
0.12062136083841324
0.12062136083841324
0.12062136083841324
0.12010754644870758
0.12010754644870758
0.12010754644870758
0.12010754644870758
0.10448397696018219
0.10448397696018219
0.10448397696018219
0.10448397696018219
0.10243137925863266
0.10243137925863266
0.10243137925863266
0.10243137925863266
0.09161607176065445
0.09161607176065445
0.09161607176065445
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.08427827060222626
0.084278270602226

Attacking:  32%|███▏      | 16/50 [00:17<00:37,  1.09s/it]

-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.019755953922867775
-0.0197559

Attacking:  34%|███▍      | 17/50 [00:18<00:35,  1.09s/it]

0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113
0.25940442085266113


Attacking:  36%|███▌      | 18/50 [00:19<00:34,  1.09s/it]

0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
0.28299400210380554
Best noun to add: downtown with similarity 0.28299400210380554
Final score: 0.6287
Original caption: A cat laying on top of a white bed on a pillow.
Original score: 0.6650, Original label: True
1
0.12831717729568481
0.12831717729568481
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11269897222518921
0.11

Attacking:  38%|███▊      | 19/50 [00:20<00:33,  1.09s/it]

0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114
0.03252388909459114


Attacking:  38%|███▊      | 19/50 [00:20<00:34,  1.10s/it]

0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373
0.23134812712669373





KeyboardInterrupt: 

In [None]:
# Full Evaluation (Uncomment when ready to test your attack)

def run_full_evaluation():
    """Run evaluation on all 1000 validation samples."""
    print("Running FULL evaluation on all 1000 validation samples...")
    print("This may take several minutes depending on your attack implementation.")
    
    full_result = evaluate_attack(
        val_pairs=val_pairs,
        attack_function=attack, 
        alpha=alpha,
        beta=beta,
        budgets=attack_budgets,
        max_samples=None  # All samples
    )
    
    print(f"\nFINAL EVALUATION RESULTS:")
    print(f"{'='*60}")
    print(f"Attack Success Rate (ASR): {full_result['ASR']:.1%}")
    print(f"Average Number of Changes (ANC): {full_result['ANC']:.2f}")
    print(f"Average Queries (AQ): {full_result['AQ']:.1f}")
    print(f"Final Score: {full_result['Score']:.4f}")
    print(f"{'='*60}")
    
    return full_result

# Uncomment the line below when ready to run full evaluation:
full_results = run_full_evaluation()


Running FULL evaluation on all 1000 validation samples...
This may take several minutes depending on your attack implementation.
Starting evaluation...
Evaluating on 1000 samples


Attacking:   0%|          | 3/1000 [00:00<00:50, 19.92it/s]

Original score: 0.7744, Original label: True
Final score: 0.5207
Original score: 0.2920, Original label: False
Final score: 0.3240
Original score: 0.3550, Original label: False
Final score: 0.3636
Original score: 0.5362, Original label: True
Final score: 0.3820
Original score: 0.7966, Original label: True
Final score: 0.6406


Attacking:   1%|          | 9/1000 [00:00<00:42, 23.33it/s]

Original score: 0.7098, Original label: True
Final score: 0.3760
Original score: 0.7137, Original label: True
Final score: 0.5042
Original score: 0.4313, Original label: False
Final score: 0.3687
Original score: 0.3835, Original label: False
Final score: 0.4848
Original score: 0.2675, Original label: False
Final score: 0.3527


Attacking:   1%|          | 12/1000 [00:00<00:42, 23.39it/s]

Original score: 0.1955, Original label: False
Final score: 0.3064
Original score: 0.3311, Original label: False
Final score: 0.4090
Original score: 0.6945, Original label: True
Final score: 0.2899
Original score: 0.6928, Original label: True
Final score: 0.3973
Original score: 0.3510, Original label: False


Attacking:   2%|▏         | 15/1000 [00:00<00:40, 24.29it/s]

Final score: 0.3563


Attacking:   2%|▏         | 18/1000 [00:00<00:40, 24.38it/s]

Original score: 0.7426, Original label: True
Final score: 0.4353
Original score: 0.2073, Original label: False
Final score: 0.2960
Original score: 0.3979, Original label: False
Final score: 0.3223
Original score: 0.6460, Original label: True
Final score: 0.6010
Original score: 0.2507, Original label: False
Final score: 0.3360


Attacking:   2%|▏         | 24/1000 [00:01<00:40, 24.14it/s]

Original score: 0.2731, Original label: False
Final score: 0.3912
Original score: 0.6404, Original label: True
Final score: 0.6557
Original score: 0.7021, Original label: True
Final score: 0.4359
Original score: 0.8079, Original label: True
Final score: 0.6550
Original score: 0.4467, Original label: False
Final score: 0.3415


Attacking:   3%|▎         | 27/1000 [00:01<00:39, 24.56it/s]

Original score: 0.3255, Original label: False
Final score: 0.3304
Original score: 0.7063, Original label: True
Final score: 0.4435
Original score: 0.6966, Original label: True
Final score: 0.3953
Original score: 0.7314, Original label: True
Final score: 0.3342
Original score: 0.6772, Original label: True


Attacking:   3%|▎         | 30/1000 [00:01<00:39, 24.36it/s]

Final score: 0.4034


Attacking:   3%|▎         | 33/1000 [00:01<00:39, 24.78it/s]

Original score: 0.7685, Original label: True
Final score: 0.3888
Original score: 0.7412, Original label: True
Final score: 0.5465
Original score: 0.2143, Original label: False
Final score: 0.3869
Original score: 0.2381, Original label: False
Final score: 0.3206
Original score: 0.3791, Original label: False
Final score: 0.3274


Attacking:   4%|▍         | 39/1000 [00:01<00:39, 24.11it/s]

Original score: 0.7541, Original label: True
Final score: 0.4654
Original score: 0.6031, Original label: True
Final score: 0.3855
Original score: 0.6642, Original label: True
Final score: 0.5028
Original score: 0.4231, Original label: False
Final score: 0.3547
Original score: 0.7293, Original label: True
Final score: 0.4672


Attacking:   4%|▍         | 42/1000 [00:01<00:39, 24.10it/s]

Original score: 0.3293, Original label: False
Final score: 0.4326
Original score: 0.7426, Original label: True
Final score: 0.5397
Original score: 0.8187, Original label: True
Final score: 0.7079
Original score: 0.3145, Original label: False
Final score: 0.2806
Original score: 0.3656, Original label: False


Attacking:   4%|▍         | 45/1000 [00:01<00:40, 23.56it/s]

Final score: 0.4220


Attacking:   5%|▍         | 48/1000 [00:02<00:40, 23.69it/s]

Original score: 0.7427, Original label: True
Final score: 0.5804
Original score: 0.8042, Original label: True
Final score: 0.3539
Original score: 0.4412, Original label: False
Final score: 0.2755
Original score: 0.2414, Original label: False
Final score: 0.3796
Original score: 0.3424, Original label: False
Final score: 0.3214


Attacking:   5%|▌         | 54/1000 [00:02<00:40, 23.49it/s]

Original score: 0.2952, Original label: False
Final score: 0.3160
Original score: 0.6612, Original label: True
Final score: 0.3847
Original score: 0.7074, Original label: True
Final score: 0.4124
Original score: 0.3547, Original label: False
Final score: 0.3378
Original score: 0.4366, Original label: False
Final score: 0.6501


Attacking:   6%|▌         | 57/1000 [00:02<00:39, 23.85it/s]

Original score: 0.7576, Original label: True
Final score: 0.4420
Original score: 0.6502, Original label: True
Final score: 0.4597
Original score: 0.7143, Original label: True
Final score: 0.3044
Original score: 0.7095, Original label: True
Final score: 0.4970
Original score: 0.2333, Original label: False


Attacking:   6%|▌         | 60/1000 [00:02<00:39, 23.54it/s]

Final score: 0.2848


Attacking:   6%|▋         | 63/1000 [00:02<00:40, 23.36it/s]

Original score: 0.2648, Original label: False
Final score: 0.2392
Original score: 0.7650, Original label: True
Final score: 0.4384
Original score: 0.7195, Original label: True
Final score: 0.5709
Original score: 0.2295, Original label: False
Final score: 0.3108
Original score: 0.2676, Original label: False
Final score: 0.2934


Attacking:   7%|▋         | 69/1000 [00:02<00:38, 24.07it/s]

Original score: 0.2939, Original label: False
Final score: 0.3188
Original score: 0.2232, Original label: False
Final score: 0.2597
Original score: 0.1807, Original label: False
Final score: 0.2816
Original score: 0.2283, Original label: False
Final score: 0.3300
Original score: 0.2869, Original label: False
Final score: 0.3793


Attacking:   7%|▋         | 72/1000 [00:03<00:38, 24.11it/s]

Original score: 0.3424, Original label: False
Final score: 0.3057
Original score: 0.8379, Original label: True
Final score: 0.3723
Original score: 0.2569, Original label: False
Final score: 0.2932
Original score: 0.2645, Original label: False
Final score: 0.3009
Original score: 0.7108, Original label: True


Attacking:   8%|▊         | 75/1000 [00:03<00:38, 24.04it/s]

Final score: 0.3943
Original score: 0.7483, Original label: True


Attacking:   8%|▊         | 78/1000 [00:03<00:37, 24.34it/s]

Final score: 0.4137
Original score: 0.7852, Original label: True
Final score: 0.2822
Original score: 0.5880, Original label: True
Final score: 0.3155
Original score: 0.1811, Original label: False
Final score: 0.2166
Original score: 0.7083, Original label: True
Final score: 0.3991
Original score: 0.2740, Original label: False


Attacking:   8%|▊         | 84/1000 [00:03<00:38, 23.70it/s]

Final score: 0.3410
Original score: 0.7624, Original label: True
Final score: 0.3883
Original score: 0.3330, Original label: False
Final score: 0.4112
Original score: 0.1912, Original label: False
Final score: 0.2397
Original score: 0.3932, Original label: False
Final score: 0.2829
Original score: 0.7369, Original label: True


Attacking:   9%|▊         | 87/1000 [00:03<00:39, 23.39it/s]

Final score: 0.5089
Original score: 0.6205, Original label: True
Final score: 0.4649
Original score: 0.2739, Original label: False
Final score: 0.2526
Original score: 0.4160, Original label: False
Final score: 0.3350
Original score: 0.2260, Original label: False


Attacking:   9%|▉         | 90/1000 [00:03<00:38, 23.68it/s]

Final score: 0.3145
Original score: 0.2836, Original label: False


Attacking:   9%|▉         | 93/1000 [00:03<00:38, 23.70it/s]

Final score: 0.3455
Original score: 0.6905, Original label: True
Final score: 0.5418
Original score: 0.6964, Original label: True
Final score: 0.4292
Original score: 0.7275, Original label: True
Final score: 0.4852
Original score: 0.7276, Original label: True
Final score: 0.3838
Original score: 0.4738, Original label: False


Attacking:  10%|▉         | 99/1000 [00:04<00:37, 23.83it/s]

Final score: 0.4158
Original score: 0.7964, Original label: True
Final score: 0.3610
Original score: 0.5259, Original label: True
Final score: 0.3512
Original score: 0.6895, Original label: True
Final score: 0.3961
Original score: 0.5837, Original label: True
Final score: 0.3788
Original score: 0.2983, Original label: False


Attacking:  10%|█         | 102/1000 [00:04<00:37, 23.95it/s]

Final score: 0.3665
Original score: 0.8255, Original label: True
Final score: 0.4137
Original score: 0.6006, Original label: True
Final score: 0.3699
Original score: 0.2716, Original label: False
Final score: 0.3907
Original score: 0.2694, Original label: False
Final score: 0.4243


Attacking:  10%|█         | 105/1000 [00:04<00:36, 24.29it/s]

Original score: 0.7128, Original label: True
Final score: 0.3678


Attacking:  11%|█         | 108/1000 [00:04<00:36, 24.37it/s]

Original score: 0.7631, Original label: True
Final score: 0.6891
Original score: 0.6388, Original label: True
Final score: 0.3563
Original score: 0.7856, Original label: True
Final score: 0.4062
Original score: 0.6730, Original label: True
Final score: 0.3778


Attacking:  11%|█         | 111/1000 [00:04<00:36, 24.45it/s]

Original score: 0.6600, Original label: True
Final score: 0.3109


Attacking:  11%|█▏        | 114/1000 [00:04<00:36, 24.47it/s]

Original score: 0.1721, Original label: False
Final score: 0.4601
Original score: 0.7640, Original label: True
Final score: 0.5198
Original score: 0.7069, Original label: True
Final score: 0.4591
Original score: 0.6151, Original label: True
Final score: 0.4002
Original score: 0.3220, Original label: False
Final score: 0.3913


Attacking:  12%|█▏        | 120/1000 [00:05<00:36, 24.26it/s]

Original score: 0.3530, Original label: False
Final score: 0.3817
Original score: 0.4717, Original label: False
Final score: 0.3616
Original score: 0.6877, Original label: True
Final score: 0.3550
Original score: 0.7881, Original label: True
Final score: 0.5229
Original score: 0.3199, Original label: False
Final score: 0.2715


Attacking:  12%|█▏        | 123/1000 [00:05<00:36, 23.93it/s]

Original score: 0.4187, Original label: False
Final score: 0.3597
Original score: 0.3420, Original label: False
Final score: 0.3294
Original score: 0.2488, Original label: False
Final score: 0.4129
Original score: 0.7437, Original label: True
Final score: 0.3832


Attacking:  13%|█▎        | 126/1000 [00:05<00:36, 23.76it/s]

Original score: 0.5676, Original label: True
Final score: 0.4691


Attacking:  13%|█▎        | 129/1000 [00:05<00:35, 24.28it/s]

Original score: 0.3597, Original label: False
Final score: 0.4084
Original score: 0.4124, Original label: False
Final score: 0.3584
Original score: 0.7034, Original label: True
Final score: 0.4432
Original score: 0.6701, Original label: True
Final score: 0.3589
Original score: 0.6998, Original label: True
Final score: 0.3638
Original score: 0.2858, Original label: False


Attacking:  14%|█▎        | 135/1000 [00:05<00:35, 24.65it/s]

Final score: 0.3222
Original score: 0.6661, Original label: True
Final score: 0.3922
Original score: 0.7457, Original label: True
Final score: 0.5658
Original score: 0.4470, Original label: False
Final score: 0.3841
Original score: 0.7763, Original label: True
Final score: 0.5912
Original score: 0.7203, Original label: True
Final score: 0.3711


Attacking:  14%|█▍        | 138/1000 [00:05<00:34, 24.74it/s]

Original score: 0.6486, Original label: True
Final score: 0.3962
Original score: 0.8254, Original label: True
Final score: 0.3936
Original score: 0.7312, Original label: True
Final score: 0.4737
Original score: 0.3595, Original label: False


Attacking:  14%|█▍        | 141/1000 [00:05<00:35, 24.49it/s]

Final score: 0.4173
Original score: 0.6256, Original label: True
Final score: 0.3341


Attacking:  14%|█▍        | 144/1000 [00:06<00:35, 24.19it/s]

Original score: 0.5643, Original label: True
Final score: 0.3408
Original score: 0.6797, Original label: True
Final score: 0.4624
Original score: 0.2194, Original label: False
Final score: 0.3544
Original score: 0.7500, Original label: True


Attacking:  15%|█▍        | 147/1000 [00:06<00:34, 24.41it/s]

Final score: 0.6230
Original score: 0.7164, Original label: True
Final score: 0.5280


Attacking:  15%|█▌        | 150/1000 [00:06<00:34, 24.53it/s]

Original score: 0.5806, Original label: True
Final score: 0.3212
Original score: 0.7267, Original label: True
Final score: 0.3469
Original score: 0.3309, Original label: False
Final score: 0.3731
Original score: 0.5275, Original label: True
Final score: 0.4182
Original score: 0.1881, Original label: False
Final score: 0.2405


Attacking:  16%|█▌        | 156/1000 [00:06<00:34, 24.27it/s]

Original score: 0.7277, Original label: True
Final score: 0.6220
Original score: 0.6952, Original label: True
Final score: 0.5205
Original score: 0.2626, Original label: False
Final score: 0.3479
Original score: 0.4910, Original label: False
Final score: 0.4633
Original score: 0.6211, Original label: True
Final score: 0.4501


Attacking:  16%|█▌        | 159/1000 [00:06<00:35, 23.51it/s]

Original score: 0.8070, Original label: True
Final score: 0.6253
Original score: 0.3997, Original label: False
Final score: 0.3701
Original score: 0.6459, Original label: True
Final score: 0.4551
Original score: 0.5096, Original label: True
Final score: 0.4934


Attacking:  16%|█▌        | 162/1000 [00:06<00:34, 24.21it/s]

Original score: 0.7381, Original label: True
Final score: 0.4088


Attacking:  16%|█▋        | 165/1000 [00:06<00:34, 24.21it/s]

Original score: 0.6301, Original label: True
Final score: 0.3266
Original score: 0.2809, Original label: False
Final score: 0.2682
Original score: 0.6909, Original label: True
Final score: 0.4505
Original score: 0.2369, Original label: False
Final score: 0.4589
Original score: 0.6771, Original label: True
Final score: 0.4262


Attacking:  17%|█▋        | 171/1000 [00:07<00:34, 24.18it/s]

Original score: 0.6962, Original label: True
Final score: 0.3683
Original score: 0.6768, Original label: True
Final score: 0.3919
Original score: 0.5963, Original label: True
Final score: 0.3902
Original score: 0.3666, Original label: False
Final score: 0.3587
Original score: 0.5989, Original label: True
Final score: 0.3938


Attacking:  17%|█▋        | 174/1000 [00:07<00:34, 24.20it/s]

Original score: 0.6055, Original label: True
Final score: 0.4086
Original score: 0.7757, Original label: True
Final score: 0.3976
Original score: 0.6798, Original label: True
Final score: 0.4160
Original score: 0.1938, Original label: False
Final score: 0.3974


Attacking:  18%|█▊        | 177/1000 [00:07<00:34, 23.97it/s]

Original score: 0.8181, Original label: True
Final score: 0.6159


Attacking:  18%|█▊        | 180/1000 [00:07<00:34, 24.07it/s]

Original score: 0.6427, Original label: True
Final score: 0.3887
Original score: 0.6638, Original label: True
Final score: 0.3257
Original score: 0.7723, Original label: True
Final score: 0.5906
Original score: 0.2689, Original label: False
Final score: 0.2724
Original score: 0.6871, Original label: True
Final score: 0.5077


Attacking:  19%|█▊        | 186/1000 [00:07<00:33, 24.17it/s]

Original score: 0.5250, Original label: True
Final score: 0.3753
Original score: 0.2470, Original label: False
Final score: 0.3807
Original score: 0.3096, Original label: False
Final score: 0.3693
Original score: 0.4944, Original label: False
Final score: 0.3963
Original score: 0.7117, Original label: True
Final score: 0.4333
Original score: 0.3038, Original label: False


Attacking:  19%|█▉        | 189/1000 [00:07<00:33, 24.55it/s]

Final score: 0.2902
Original score: 0.8004, Original label: True
Final score: 0.6536
Original score: 0.3616, Original label: False
Final score: 0.4934
Original score: 0.7575, Original label: True
Final score: 0.5096
Original score: 0.3487, Original label: False


Attacking:  19%|█▉        | 192/1000 [00:07<00:32, 24.52it/s]

Final score: 0.3459
Original score: 0.2973, Original label: False


Attacking:  20%|█▉        | 195/1000 [00:08<00:32, 24.70it/s]

Final score: 0.2925
Original score: 0.5896, Original label: True
Final score: 0.3978
Original score: 0.5550, Original label: True
Final score: 0.5680
Original score: 0.7471, Original label: True
Final score: 0.5660
Original score: 0.6413, Original label: True
Final score: 0.4794
Original score: 0.4257, Original label: False


Attacking:  20%|██        | 201/1000 [00:08<00:32, 24.34it/s]

Final score: 0.5332
Original score: 0.6759, Original label: True
Final score: 0.5497
Original score: 0.3338, Original label: False
Final score: 0.3642
Original score: 0.6968, Original label: True
Final score: 0.2988
Original score: 0.7236, Original label: True
Final score: 0.4872
Original score: 0.2619, Original label: False


Attacking:  20%|██        | 204/1000 [00:08<00:32, 24.20it/s]

Final score: 0.4126
Original score: 0.7457, Original label: True
Final score: 0.7479
Original score: 0.8026, Original label: True
Final score: 0.4634
Original score: 0.2250, Original label: False
Final score: 0.2672
Original score: 0.7328, Original label: True


Attacking:  21%|██        | 207/1000 [00:08<00:33, 23.80it/s]

Final score: 0.3125
Original score: 0.3957, Original label: False


Attacking:  21%|██        | 210/1000 [00:08<00:32, 24.40it/s]

Final score: 0.2969
Original score: 0.3313, Original label: False
Final score: 0.3578
Original score: 0.2854, Original label: False
Final score: 0.2064
Original score: 0.4658, Original label: False
Final score: 0.4450
Original score: 0.7129, Original label: True
Final score: 0.4216


Attacking:  21%|██▏       | 213/1000 [00:08<00:32, 24.31it/s]

Original score: 0.2557, Original label: False
Final score: 0.3874


Attacking:  22%|██▏       | 216/1000 [00:08<00:32, 24.15it/s]

Original score: 0.8297, Original label: True
Final score: 0.4343
Original score: 0.4422, Original label: False
Final score: 0.4661
Original score: 0.3029, Original label: False
Final score: 0.3779
Original score: 0.6866, Original label: True
Final score: 0.4099
Original score: 0.1907, Original label: False
Final score: 0.3098


Attacking:  22%|██▏       | 222/1000 [00:09<00:32, 24.09it/s]

Original score: 0.3075, Original label: False
Final score: 0.3731
Original score: 0.7768, Original label: True
Final score: 0.7028
Original score: 0.4460, Original label: False
Final score: 0.4647
Original score: 0.2567, Original label: False
Final score: 0.4282
Original score: 0.2821, Original label: False
Final score: 0.5279


Attacking:  22%|██▎       | 225/1000 [00:09<00:32, 23.93it/s]

Original score: 0.6318, Original label: True
Final score: 0.3465
Original score: 0.7820, Original label: True
Final score: 0.4354
Original score: 0.3318, Original label: False
Final score: 0.6659
Original score: 0.2684, Original label: False
Final score: 0.3812


Attacking:  23%|██▎       | 228/1000 [00:09<00:32, 24.08it/s]

Original score: 0.7242, Original label: True
Final score: 0.4332


Attacking:  23%|██▎       | 231/1000 [00:09<00:32, 23.92it/s]

Original score: 0.7360, Original label: True
Final score: 0.4025
Original score: 0.2468, Original label: False
Final score: 0.3225
Original score: 0.7528, Original label: True
Final score: 0.4889
Original score: 0.6943, Original label: True
Final score: 0.5148
Original score: 0.7091, Original label: True
Final score: 0.3930


Attacking:  24%|██▎       | 237/1000 [00:09<00:32, 23.65it/s]

Original score: 0.2370, Original label: False
Final score: 0.4021
Original score: 0.4535, Original label: False
Final score: 0.5379
Original score: 0.6770, Original label: True
Final score: 0.5037
Original score: 0.1368, Original label: False
Final score: 0.3707
Original score: 0.3495, Original label: False
Final score: 0.3469


Attacking:  24%|██▍       | 240/1000 [00:09<00:32, 23.44it/s]

Original score: 0.7060, Original label: True
Final score: 0.4185
Original score: 0.7211, Original label: True
Final score: 0.3489
Original score: 0.7173, Original label: True
Final score: 0.5566
Original score: 0.3252, Original label: False
Final score: 0.3782


Attacking:  24%|██▍       | 243/1000 [00:10<00:32, 23.43it/s]

Original score: 0.7470, Original label: True
Final score: 0.5146


Attacking:  25%|██▍       | 246/1000 [00:10<00:32, 23.31it/s]

Original score: 0.2828, Original label: False
Final score: 0.3425
Original score: 0.2807, Original label: False
Final score: 0.3655
Original score: 0.3111, Original label: False
Final score: 0.4147
Original score: 0.4196, Original label: False
Final score: 0.3973
Original score: 0.3258, Original label: False
Final score: 0.2879


Attacking:  25%|██▌       | 252/1000 [00:10<00:30, 24.16it/s]

Original score: 0.3868, Original label: False
Final score: 0.3275
Original score: 0.4359, Original label: False
Final score: 0.3705
Original score: 0.6321, Original label: True
Final score: 0.3980
Original score: 0.6899, Original label: True
Final score: 0.3440
Original score: 0.2762, Original label: False
Final score: 0.3553
Original score: 0.4070, Original label: False


Attacking:  26%|██▌       | 255/1000 [00:10<00:30, 24.41it/s]

Final score: 0.3578
Original score: 0.3930, Original label: False
Final score: 0.5681
Original score: 0.7029, Original label: True
Final score: 0.4203
Original score: 0.7536, Original label: True
Final score: 0.5363
Original score: 0.6548, Original label: True


Attacking:  26%|██▌       | 258/1000 [00:10<00:30, 24.17it/s]

Final score: 0.4071
Original score: 0.7997, Original label: True


Attacking:  26%|██▌       | 261/1000 [00:10<00:30, 24.48it/s]

Final score: 0.4649
Original score: 0.8001, Original label: True
Final score: 0.4732
Original score: 0.2827, Original label: False
Final score: 0.2759
Original score: 0.6606, Original label: True
Final score: 0.4991
Original score: 0.1849, Original label: False
Final score: 0.3312
Original score: 0.2038, Original label: False
Final score: 0.3568


Attacking:  27%|██▋       | 267/1000 [00:11<00:29, 24.77it/s]

Original score: 0.2595, Original label: False
Final score: 0.3813
Original score: 0.6859, Original label: True
Final score: 0.4355
Original score: 0.6754, Original label: True
Final score: 0.4104
Original score: 0.3265, Original label: False
Final score: 0.4842
Original score: 0.6634, Original label: True
Final score: 0.3555


Attacking:  27%|██▋       | 273/1000 [00:11<00:29, 24.40it/s]

Original score: 0.7342, Original label: True
Final score: 0.4877
Original score: 0.3808, Original label: False
Final score: 0.4475
Original score: 0.5114, Original label: True
Final score: 0.3315
Original score: 0.3424, Original label: False
Final score: 0.3106
Original score: 0.2321, Original label: False
Final score: 0.2951


Attacking:  28%|██▊       | 276/1000 [00:11<00:29, 24.30it/s]

Original score: 0.6662, Original label: True
Final score: 0.3528
Original score: 0.3960, Original label: False
Final score: 0.4090
Original score: 0.3022, Original label: False
Final score: 0.3114
Original score: 0.6910, Original label: True
Final score: 0.5698


Attacking:  28%|██▊       | 279/1000 [00:11<00:29, 24.31it/s]

Original score: 0.2803, Original label: False
Final score: 0.3460
Original score: 0.5044, Original label: True


Attacking:  28%|██▊       | 282/1000 [00:11<00:29, 24.36it/s]

Final score: 0.2987
Original score: 0.7289, Original label: True
Final score: 0.3633
Original score: 0.7169, Original label: True
Final score: 0.3452
Original score: 0.8037, Original label: True
Final score: 0.5165
Original score: 0.6008, Original label: True
Final score: 0.3762
Original score: 0.6760, Original label: True


Attacking:  29%|██▉       | 288/1000 [00:11<00:29, 24.10it/s]

Final score: 0.4819
Original score: 0.6331, Original label: True
Final score: 0.3799
Original score: 0.7686, Original label: True
Final score: 0.4460
Original score: 0.6796, Original label: True
Final score: 0.4640
Original score: 0.7747, Original label: True
Final score: 0.4202
Original score: 0.6732, Original label: True


Attacking:  29%|██▉       | 291/1000 [00:12<00:29, 24.19it/s]

Final score: 0.3755
Original score: 0.3674, Original label: False
Final score: 0.4181
Original score: 0.2326, Original label: False
Final score: 0.3237
Original score: 0.2927, Original label: False
Final score: 0.3412


Attacking:  29%|██▉       | 294/1000 [00:12<00:29, 24.03it/s]

Original score: 0.2504, Original label: False
Final score: 0.4027
Original score: 0.7014, Original label: True


Attacking:  30%|██▉       | 297/1000 [00:12<00:28, 24.27it/s]

Final score: 0.4458
Original score: 0.2988, Original label: False
Final score: 0.3246
Original score: 0.7211, Original label: True
Final score: 0.5035
Original score: 0.7528, Original label: True
Final score: 0.3290
Original score: 0.7222, Original label: True
Final score: 0.6476
Original score: 0.3850, Original label: False


Attacking:  30%|███       | 303/1000 [00:12<00:28, 24.07it/s]

Final score: 0.2521
Original score: 0.6951, Original label: True
Final score: 0.4540
Original score: 0.2745, Original label: False
Final score: 0.3858
Original score: 0.2401, Original label: False
Final score: 0.2744
Original score: 0.6909, Original label: True
Final score: 0.4863
Original score: 0.7701, Original label: True


Attacking:  31%|███       | 306/1000 [00:12<00:28, 24.14it/s]

Final score: 0.5699
Original score: 0.7571, Original label: True
Final score: 0.3931
Original score: 0.7382, Original label: True
Final score: 0.3359
Original score: 0.7781, Original label: True
Final score: 0.3163


Attacking:  31%|███       | 309/1000 [00:12<00:28, 23.91it/s]

Original score: 0.7273, Original label: True
Final score: 0.5438
Original score: 0.7395, Original label: True


Attacking:  31%|███       | 312/1000 [00:12<00:28, 24.40it/s]

Final score: 0.3799
Original score: 0.4819, Original label: False
Final score: 0.4797
Original score: 0.4414, Original label: False
Final score: 0.3029
Original score: 0.7277, Original label: True
Final score: 0.5859
Original score: 0.6676, Original label: True


Attacking:  32%|███▏      | 315/1000 [00:13<00:27, 24.52it/s]

Final score: 0.4158
Original score: 0.7358, Original label: True
Final score: 0.3359


Attacking:  32%|███▏      | 318/1000 [00:13<00:28, 24.14it/s]

Original score: 0.2445, Original label: False
Final score: 0.3688
Original score: 0.2351, Original label: False
Final score: 0.3576
Original score: 0.4963, Original label: False
Final score: 0.4849
Original score: 0.6473, Original label: True
Final score: 0.3981
Original score: 0.8023, Original label: True
Final score: 0.3888


Attacking:  32%|███▏      | 321/1000 [00:13<00:28, 23.91it/s]

Original score: 0.3463, Original label: False
Final score: 0.4230
Original score: 0.1984, Original label: False
Final score: 0.3705
Original score: 0.7741, Original label: True
Final score: 0.3879
Original score: 0.2692, Original label: False


Attacking:  32%|███▏      | 324/1000 [00:13<00:27, 24.18it/s]

Final score: 0.2920
Original score: 0.8098, Original label: True
Final score: 0.6980


Attacking:  33%|███▎      | 327/1000 [00:13<00:28, 23.82it/s]

Original score: 0.3877, Original label: False
Final score: 0.2557
Original score: 0.2370, Original label: False
Final score: 0.3317
Original score: 0.1982, Original label: False
Final score: 0.2333
Original score: 0.3489, Original label: False


Attacking:  33%|███▎      | 330/1000 [00:13<00:28, 23.63it/s]

Final score: 0.4064
Original score: 0.6907, Original label: True
Final score: 0.4003


Attacking:  33%|███▎      | 333/1000 [00:13<00:27, 24.34it/s]

Original score: 0.7624, Original label: True
Final score: 0.5537
Original score: 0.3570, Original label: False
Final score: 0.4151
Original score: 0.7465, Original label: True
Final score: 0.4451
Original score: 0.1434, Original label: False
Final score: 0.2194
Original score: 0.6990, Original label: True
Final score: 0.4257
Original score: 0.3906, Original label: False


Attacking:  34%|███▍      | 339/1000 [00:14<00:27, 24.30it/s]

Final score: 0.3151
Original score: 0.7112, Original label: True
Final score: 0.4052
Original score: 0.3756, Original label: False
Final score: 0.3234
Original score: 0.6489, Original label: True
Final score: 0.3248
Original score: 0.7211, Original label: True
Final score: 0.4874
Original score: 0.2187, Original label: False


Attacking:  34%|███▍      | 342/1000 [00:14<00:27, 24.25it/s]

Final score: 0.3053
Original score: 0.7038, Original label: True
Final score: 0.4725
Original score: 0.3657, Original label: False
Final score: 0.6502
Original score: 0.7705, Original label: True
Final score: 0.4176
Original score: 0.7323, Original label: True


Attacking:  34%|███▍      | 345/1000 [00:14<00:26, 24.33it/s]

Final score: 0.3129
Original score: 0.6469, Original label: True


Attacking:  35%|███▍      | 348/1000 [00:14<00:26, 24.17it/s]

Final score: 0.4301
Original score: 0.3003, Original label: False
Final score: 0.2491
Original score: 0.3041, Original label: False
Final score: 0.3342
Original score: 0.7243, Original label: True
Final score: 0.4335
Original score: 0.7892, Original label: True
Final score: 0.4920
Original score: 0.3383, Original label: False


Attacking:  35%|███▌      | 354/1000 [00:14<00:26, 24.33it/s]

Final score: 0.3817
Original score: 0.7839, Original label: True
Final score: 0.4537
Original score: 0.7097, Original label: True
Final score: 0.3701
Original score: 0.2473, Original label: False
Final score: 0.3457
Original score: 0.3347, Original label: False
Final score: 0.3604
Original score: 0.7309, Original label: True


Attacking:  36%|███▌      | 357/1000 [00:14<00:26, 24.07it/s]

Final score: 0.3621
Original score: 0.2702, Original label: False
Final score: 0.3387
Original score: 0.3172, Original label: False
Final score: 0.3653
Original score: 0.7794, Original label: True
Final score: 0.6792
Original score: 0.3054, Original label: False


Attacking:  36%|███▌      | 360/1000 [00:14<00:26, 23.96it/s]

Final score: 0.3519
Original score: 0.7184, Original label: True


Attacking:  36%|███▋      | 363/1000 [00:15<00:26, 24.09it/s]

Final score: 0.3342
Original score: 0.2035, Original label: False
Final score: 0.3166
Original score: 0.3082, Original label: False
Final score: 0.3479
Original score: 0.7480, Original label: True
Final score: 0.3742
Original score: 0.7558, Original label: True
Final score: 0.3420
Original score: 0.6331, Original label: True


Attacking:  37%|███▋      | 369/1000 [00:15<00:25, 24.36it/s]

Final score: 0.3027
Original score: 0.3250, Original label: False
Final score: 0.3117
Original score: 0.2071, Original label: False
Final score: 0.2751
Original score: 0.3704, Original label: False
Final score: 0.3958
Original score: 0.2628, Original label: False
Final score: 0.4260
Original score: 0.2619, Original label: False


Attacking:  38%|███▊      | 375/1000 [00:15<00:25, 24.47it/s]

Final score: 0.2833
Original score: 0.6680, Original label: True
Final score: 0.3558
Original score: 0.7728, Original label: True
Final score: 0.4801
Original score: 0.3065, Original label: False
Final score: 0.2880
Original score: 0.7328, Original label: True
Final score: 0.3346
Original score: 0.7573, Original label: True


Attacking:  38%|███▊      | 378/1000 [00:15<00:25, 24.37it/s]

Final score: 0.3983
Original score: 0.6414, Original label: True
Final score: 0.3732
Original score: 0.6864, Original label: True
Final score: 0.5154
Original score: 0.7679, Original label: True
Final score: 0.3405
Original score: 0.7425, Original label: True
Final score: 0.3870
Original score: 0.2199, Original label: False
Final score: 0.2518


Attacking:  38%|███▊      | 384/1000 [00:15<00:25, 24.55it/s]

Original score: 0.3059, Original label: False
Final score: 0.2961
Original score: 0.7420, Original label: True
Final score: 0.3697
Original score: 0.3605, Original label: False
Final score: 0.4794
Original score: 0.3331, Original label: False
Final score: 0.2699
Original score: 0.2771, Original label: False
Final score: 0.3775
Original score: 0.4497, Original label: False


Attacking:  39%|███▉      | 390/1000 [00:16<00:24, 24.73it/s]

Final score: 0.3141
Original score: 0.3568, Original label: False
Final score: 0.3469
Original score: 0.4470, Original label: False
Final score: 0.4751
Original score: 0.4367, Original label: False
Final score: 0.4699
Original score: 0.2572, Original label: False
Final score: 0.4186
Original score: 0.2412, Original label: False


Attacking:  39%|███▉      | 393/1000 [00:16<00:24, 24.69it/s]

Final score: 0.3268
Original score: 0.5297, Original label: True
Final score: 0.3705
Original score: 0.7456, Original label: True
Final score: 0.4499
Original score: 0.3610, Original label: False
Final score: 0.2916
Original score: 0.6895, Original label: True


Attacking:  40%|███▉      | 396/1000 [00:16<00:24, 24.62it/s]

Final score: 0.4231
Original score: 0.3591, Original label: False
Final score: 0.3536


Attacking:  40%|███▉      | 399/1000 [00:16<00:24, 24.68it/s]

Original score: 0.5861, Original label: True
Final score: 0.3270
Original score: 0.8070, Original label: True
Final score: 0.4504
Original score: 0.3394, Original label: False
Final score: 0.3238
Original score: 0.2467, Original label: False
Final score: 0.2628


Attacking:  40%|████      | 402/1000 [00:16<00:24, 24.53it/s]

Original score: 0.7154, Original label: True
Final score: 0.4000


Attacking:  40%|████      | 405/1000 [00:16<00:24, 24.54it/s]

Original score: 0.6328, Original label: True
Final score: 0.3845
Original score: 0.7796, Original label: True
Final score: 0.6922
Original score: 0.2798, Original label: False
Final score: 0.4092
Original score: 0.7100, Original label: True
Final score: 0.4432
Original score: 0.3660, Original label: False
Final score: 0.3712


Attacking:  41%|████      | 411/1000 [00:17<00:24, 24.27it/s]

Original score: 0.7431, Original label: True
Final score: 0.4204
Original score: 0.6214, Original label: True
Final score: 0.3667
Original score: 0.3336, Original label: False
Final score: 0.3133
Original score: 0.2153, Original label: False
Final score: 0.6050
Original score: 0.3103, Original label: False
Final score: 0.4353


Attacking:  41%|████▏     | 414/1000 [00:17<00:24, 23.99it/s]

Original score: 0.3170, Original label: False
Final score: 0.3258
Original score: 0.3764, Original label: False
Final score: 0.4438
Original score: 0.6799, Original label: True
Final score: 0.4229
Original score: 0.4334, Original label: False
Final score: 0.4102


Attacking:  42%|████▏     | 417/1000 [00:17<00:24, 24.20it/s]

Original score: 0.7403, Original label: True
Final score: 0.4838


Attacking:  42%|████▏     | 420/1000 [00:17<00:24, 24.07it/s]

Original score: 0.6552, Original label: True
Final score: 0.3083
Original score: 0.3363, Original label: False
Final score: 0.3293
Original score: 0.7832, Original label: True
Final score: 0.4433
Original score: 0.7124, Original label: True
Final score: 0.3963
Original score: 0.7536, Original label: True
Final score: 0.3970


Attacking:  43%|████▎     | 426/1000 [00:17<00:23, 24.47it/s]

Original score: 0.5461, Original label: True
Final score: 0.3291
Original score: 0.2735, Original label: False
Final score: 0.3880
Original score: 0.7375, Original label: True
Final score: 0.4937
Original score: 0.7126, Original label: True
Final score: 0.5318
Original score: 0.2692, Original label: False
Final score: 0.2924


Attacking:  43%|████▎     | 429/1000 [00:17<00:23, 24.38it/s]

Original score: 0.2157, Original label: False
Final score: 0.6325
Original score: 0.2876, Original label: False
Final score: 0.3333
Original score: 0.2535, Original label: False
Final score: 0.3063
Original score: 0.6388, Original label: True
Final score: 0.3723
Original score: 0.7641, Original label: True
Final score: 0.4783


Attacking:  43%|████▎     | 432/1000 [00:17<00:23, 24.62it/s]

Original score: 0.2855, Original label: False


Attacking:  44%|████▎     | 435/1000 [00:18<00:23, 24.51it/s]

Final score: 0.4148
Original score: 0.7965, Original label: True
Final score: 0.6789
Original score: 0.7203, Original label: True
Final score: 0.4535
Original score: 0.7155, Original label: True
Final score: 0.3402
Original score: 0.7847, Original label: True
Final score: 0.3297
Original score: 0.7441, Original label: True


Attacking:  44%|████▍     | 441/1000 [00:18<00:22, 24.45it/s]

Final score: 0.4998
Original score: 0.3469, Original label: False
Final score: 0.3261
Original score: 0.3535, Original label: False
Final score: 0.3427
Original score: 0.3782, Original label: False
Final score: 0.3638
Original score: 0.4729, Original label: False
Final score: 0.5832
Original score: 0.2578, Original label: False


Attacking:  45%|████▍     | 447/1000 [00:18<00:22, 24.27it/s]

Final score: 0.3237
Original score: 0.7558, Original label: True
Final score: 0.3647
Original score: 0.3230, Original label: False
Final score: 0.4177
Original score: 0.6918, Original label: True
Final score: 0.5707
Original score: 0.7727, Original label: True
Final score: 0.3458
Original score: 0.7109, Original label: True


Attacking:  45%|████▌     | 450/1000 [00:18<00:22, 24.62it/s]

Final score: 0.5205
Original score: 0.2415, Original label: False
Final score: 0.4572
Original score: 0.7543, Original label: True
Final score: 0.3269
Original score: 0.6858, Original label: True
Final score: 0.5835
Original score: 0.4028, Original label: False
Final score: 0.3812
Original score: 0.7333, Original label: True


Attacking:  46%|████▌     | 456/1000 [00:18<00:22, 24.19it/s]

Final score: 0.4034
Original score: 0.7434, Original label: True
Final score: 0.4422
Original score: 0.2174, Original label: False
Final score: 0.4628
Original score: 0.5078, Original label: True
Final score: 0.4360
Original score: 0.2894, Original label: False
Final score: 0.2646
Original score: 0.7611, Original label: True


Attacking:  46%|████▌     | 462/1000 [00:19<00:22, 24.39it/s]

Final score: 0.3656
Original score: 0.3762, Original label: False
Final score: 0.3015
Original score: 0.3561, Original label: False
Final score: 0.5920
Original score: 0.6174, Original label: True
Final score: 0.4693
Original score: 0.6766, Original label: True
Final score: 0.3270
Original score: 0.6807, Original label: True


Attacking:  46%|████▋     | 465/1000 [00:19<00:22, 23.96it/s]

Final score: 0.4417
Original score: 0.7358, Original label: True
Final score: 0.4567
Original score: 0.7044, Original label: True
Final score: 0.4490
Original score: 0.7254, Original label: True
Final score: 0.3847
Original score: 0.2835, Original label: False
Final score: 0.3768
Original score: 0.6955, Original label: True


Attacking:  47%|████▋     | 471/1000 [00:19<00:21, 24.24it/s]

Final score: 0.4185
Original score: 0.3566, Original label: False
Final score: 0.4133
Original score: 0.6200, Original label: True
Final score: 0.4560
Original score: 0.7384, Original label: True
Final score: 0.3932
Original score: 0.7237, Original label: True
Final score: 0.3720
Original score: 0.2199, Original label: False


Attacking:  48%|████▊     | 477/1000 [00:19<00:21, 24.01it/s]

Final score: 0.3562
Original score: 0.2870, Original label: False
Final score: 0.3266
Original score: 0.7172, Original label: True
Final score: 0.5743
Original score: 0.6095, Original label: True
Final score: 0.3608
Original score: 0.7504, Original label: True
Final score: 0.4090
Original score: 0.2656, Original label: False


Attacking:  48%|████▊     | 480/1000 [00:19<00:22, 23.60it/s]

Final score: 0.3402
Original score: 0.5381, Original label: True
Final score: 0.3761
Original score: 0.6248, Original label: True
Final score: 0.3626
Original score: 0.3685, Original label: False
Final score: 0.3954
Original score: 0.3857, Original label: False
Final score: 0.3537
Original score: 0.7188, Original label: True


Attacking:  49%|████▊     | 486/1000 [00:20<00:21, 24.34it/s]

Final score: 0.4037
Original score: 0.7650, Original label: True
Final score: 0.3258
Original score: 0.6529, Original label: True
Final score: 0.4229
Original score: 0.2140, Original label: False
Final score: 0.3060
Original score: 0.3669, Original label: False
Final score: 0.3858
Original score: 0.4251, Original label: False


Attacking:  49%|████▉     | 492/1000 [00:20<00:21, 24.16it/s]

Final score: 0.3715
Original score: 0.2667, Original label: False
Final score: 0.4121
Original score: 0.2787, Original label: False
Final score: 0.2796
Original score: 0.7525, Original label: True
Final score: 0.4048
Original score: 0.6309, Original label: True
Final score: 0.3171
Original score: 0.4033, Original label: False


Attacking:  50%|████▉     | 495/1000 [00:20<00:20, 24.33it/s]

Final score: 0.3839
Original score: 0.7598, Original label: True
Final score: 0.6587
Original score: 0.6728, Original label: True
Final score: 0.4268
Original score: 0.7800, Original label: True
Final score: 0.3924
Original score: 0.8260, Original label: True
Final score: 0.4503
Original score: 0.3063, Original label: False
Final score: 0.2637


Attacking:  50%|█████     | 501/1000 [00:20<00:20, 24.53it/s]

Original score: 0.3976, Original label: False
Final score: 0.3566
Original score: 0.5994, Original label: True
Final score: 0.4217
Original score: 0.7929, Original label: True
Final score: 0.6884
Original score: 0.3009, Original label: False
Final score: 0.2849
Original score: 0.6502, Original label: True
Final score: 0.4946


Attacking:  51%|█████     | 507/1000 [00:20<00:20, 24.51it/s]

Original score: 0.7233, Original label: True
Final score: 0.5830
Original score: 0.7571, Original label: True
Final score: 0.3681
Original score: 0.5072, Original label: True
Final score: 0.4275
Original score: 0.7618, Original label: True
Final score: 0.3596
Original score: 0.6910, Original label: True
Final score: 0.4574


Attacking:  51%|█████▏    | 513/1000 [00:21<00:19, 24.48it/s]

Original score: 0.7752, Original label: True
Final score: 0.4251
Original score: 0.7557, Original label: True
Final score: 0.6287
Original score: 0.2610, Original label: False
Final score: 0.5496
Original score: 0.8006, Original label: True
Final score: 0.5364
Original score: 0.7212, Original label: True
Final score: 0.5271


Attacking:  52%|█████▏    | 516/1000 [00:21<00:19, 24.31it/s]

Original score: 0.6658, Original label: True
Final score: 0.4037
Original score: 0.7054, Original label: True
Final score: 0.3207
Original score: 0.4033, Original label: False
Final score: 0.3535
Original score: 0.7292, Original label: True
Final score: 0.4936
Original score: 0.4032, Original label: False
Final score: 0.3354


Attacking:  52%|█████▏    | 522/1000 [00:21<00:19, 24.00it/s]

Original score: 0.8318, Original label: True
Final score: 0.6165
Original score: 0.3111, Original label: False
Final score: 0.3743
Original score: 0.6466, Original label: True
Final score: 0.3585
Original score: 0.2871, Original label: False
Final score: 0.3968
Original score: 0.6613, Original label: True
Final score: 0.3935


Attacking:  53%|█████▎    | 528/1000 [00:21<00:19, 24.12it/s]

Original score: 0.2905, Original label: False
Final score: 0.3120
Original score: 0.3490, Original label: False
Final score: 0.3809
Original score: 0.5366, Original label: True
Final score: 0.4641
Original score: 0.7265, Original label: True
Final score: 0.4608
Original score: 0.7911, Original label: True
Final score: 0.5949


Attacking:  53%|█████▎    | 531/1000 [00:21<00:19, 24.27it/s]

Original score: 0.7436, Original label: True
Final score: 0.5942
Original score: 0.7339, Original label: True
Final score: 0.3430
Original score: 0.2277, Original label: False
Final score: 0.4652
Original score: 0.7428, Original label: True
Final score: 0.3257
Original score: 0.2010, Original label: False
Final score: 0.2716
Original score: 0.7051, Original label: True


Attacking:  54%|█████▎    | 537/1000 [00:22<00:19, 24.31it/s]

Final score: 0.4006
Original score: 0.3931, Original label: False
Final score: 0.3474
Original score: 0.6199, Original label: True
Final score: 0.5022
Original score: 0.4645, Original label: False
Final score: 0.3789
Original score: 0.2575, Original label: False
Final score: 0.4162
Original score: 0.1867, Original label: False


Attacking:  54%|█████▍    | 543/1000 [00:22<00:18, 24.17it/s]

Final score: 0.2935
Original score: 0.6915, Original label: True
Final score: 0.4585
Original score: 0.3747, Original label: False
Final score: 0.3481
Original score: 0.7607, Original label: True
Final score: 0.4095
Original score: 0.3132, Original label: False
Final score: 0.2581
Original score: 0.2283, Original label: False


Attacking:  55%|█████▍    | 546/1000 [00:22<00:18, 24.22it/s]

Final score: 0.2684
Original score: 0.2398, Original label: False
Final score: 0.3801
Original score: 0.6402, Original label: True
Final score: 0.3521
Original score: 0.3012, Original label: False
Final score: 0.3807
Original score: 0.2068, Original label: False
Final score: 0.2659
Original score: 0.6829, Original label: True


Attacking:  55%|█████▌    | 552/1000 [00:22<00:18, 24.16it/s]

Final score: 0.3070
Original score: 0.7013, Original label: True
Final score: 0.6347
Original score: 0.2559, Original label: False
Final score: 0.2569
Original score: 0.3979, Original label: False
Final score: 0.3789
Original score: 0.3182, Original label: False
Final score: 0.3428
Original score: 0.6301, Original label: True


Attacking:  56%|█████▌    | 558/1000 [00:23<00:18, 23.94it/s]

Final score: 0.4181
Original score: 0.1913, Original label: False
Final score: 0.2874
Original score: 0.3371, Original label: False
Final score: 0.2917
Original score: 0.6720, Original label: True
Final score: 0.4102
Original score: 0.2845, Original label: False
Final score: 0.3221
Original score: 0.6977, Original label: True


Attacking:  56%|█████▌    | 561/1000 [00:23<00:18, 24.27it/s]

Final score: 0.3970
Original score: 0.7877, Original label: True
Final score: 0.4613
Original score: 0.3067, Original label: False
Final score: 0.3732
Original score: 0.2698, Original label: False
Final score: 0.2704
Original score: 0.2924, Original label: False
Final score: 0.6534
Original score: 0.3226, Original label: False
Final score: 0.2893


Attacking:  57%|█████▋    | 567/1000 [00:23<00:17, 24.30it/s]

Original score: 0.3154, Original label: False
Final score: 0.3675
Original score: 0.3864, Original label: False
Final score: 0.3730
Original score: 0.4631, Original label: False
Final score: 0.4002
Original score: 0.3594, Original label: False
Final score: 0.2695
Original score: 0.3649, Original label: False
Final score: 0.4439


Attacking:  57%|█████▋    | 573/1000 [00:23<00:17, 24.37it/s]

Original score: 0.4375, Original label: False
Final score: 0.3365
Original score: 0.2459, Original label: False
Final score: 0.4743
Original score: 0.7339, Original label: True
Final score: 0.4101
Original score: 0.2710, Original label: False
Final score: 0.3067
Original score: 0.2715, Original label: False
Final score: 0.4137


Attacking:  58%|█████▊    | 579/1000 [00:23<00:17, 24.52it/s]

Original score: 0.6813, Original label: True
Final score: 0.3352
Original score: 0.4328, Original label: False
Final score: 0.3738
Original score: 0.3328, Original label: False
Final score: 0.3425
Original score: 0.3352, Original label: False
Final score: 0.3229
Original score: 0.4448, Original label: False
Final score: 0.3816


Attacking:  58%|█████▊    | 582/1000 [00:24<00:17, 23.97it/s]

Original score: 0.7292, Original label: True
Final score: 0.4147
Original score: 0.7423, Original label: True
Final score: 0.4466
Original score: 0.7749, Original label: True
Final score: 0.5505
Original score: 0.2620, Original label: False
Final score: 0.3808
Original score: 0.6678, Original label: True
Final score: 0.5050


Attacking:  59%|█████▉    | 588/1000 [00:24<00:16, 24.63it/s]

Original score: 0.6770, Original label: True
Final score: 0.4278
Original score: 0.7477, Original label: True
Final score: 0.5993
Original score: 0.3260, Original label: False
Final score: 0.3457
Original score: 0.7522, Original label: True
Final score: 0.4426
Original score: 0.4070, Original label: False
Final score: 0.3954
Original score: 0.2651, Original label: False


Attacking:  59%|█████▉    | 594/1000 [00:24<00:15, 25.57it/s]

Final score: 0.3038
Original score: 0.3807, Original label: False
Final score: 0.4781
Original score: 0.4194, Original label: False
Final score: 0.5402
Original score: 0.2223, Original label: False
Final score: 0.3144
Original score: 0.3735, Original label: False
Final score: 0.3505
Original score: 0.4165, Original label: False
Final score: 0.3772


Attacking:  60%|██████    | 600/1000 [00:24<00:16, 24.88it/s]

Original score: 0.6864, Original label: True
Final score: 0.3988
Original score: 0.3008, Original label: False
Final score: 0.4108
Original score: 0.3310, Original label: False
Final score: 0.3676
Original score: 0.2084, Original label: False
Final score: 0.3317
Original score: 0.2273, Original label: False
Final score: 0.3363


Attacking:  60%|██████    | 603/1000 [00:24<00:16, 24.80it/s]

Original score: 0.7771, Original label: True
Final score: 0.3814
Original score: 0.6970, Original label: True
Final score: 0.3697
Original score: 0.7056, Original label: True
Final score: 0.4974
Original score: 0.3242, Original label: False
Final score: 0.4238
Original score: 0.7002, Original label: True
Final score: 0.4216


Attacking:  61%|██████    | 609/1000 [00:25<00:16, 24.39it/s]

Original score: 0.1493, Original label: False
Final score: 0.3116
Original score: 0.4781, Original label: False
Final score: 0.4237
Original score: 0.7502, Original label: True
Final score: 0.4470
Original score: 0.2131, Original label: False
Final score: 0.2721
Original score: 0.2672, Original label: False
Final score: 0.3122


Attacking:  62%|██████▏   | 615/1000 [00:25<00:15, 24.17it/s]

Original score: 0.7329, Original label: True
Final score: 0.5662
Original score: 0.5721, Original label: True
Final score: 0.4262
Original score: 0.4209, Original label: False
Final score: 0.3951
Original score: 0.6730, Original label: True
Final score: 0.4125
Original score: 0.2874, Original label: False
Final score: 0.4128


Attacking:  62%|██████▏   | 618/1000 [00:25<00:15, 24.38it/s]

Original score: 0.3461, Original label: False
Final score: 0.3220
Original score: 0.7245, Original label: True
Final score: 0.3705
Original score: 0.7383, Original label: True
Final score: 0.5497
Original score: 0.7196, Original label: True
Final score: 0.4598
Original score: 0.1459, Original label: False
Final score: 0.4580


Attacking:  62%|██████▏   | 624/1000 [00:25<00:15, 24.57it/s]

Original score: 0.5443, Original label: True
Final score: 0.4332
Original score: 0.7729, Original label: True
Final score: 0.3839
Original score: 0.7518, Original label: True
Final score: 0.4269
Original score: 0.3021, Original label: False
Final score: 0.3784
Original score: 0.2501, Original label: False
Final score: 0.4072
Original score: 0.2821, Original label: False


Attacking:  63%|██████▎   | 630/1000 [00:26<00:15, 24.44it/s]

Final score: 0.3047
Original score: 0.4288, Original label: False
Final score: 0.5130
Original score: 0.7311, Original label: True
Final score: 0.4232
Original score: 0.3424, Original label: False
Final score: 0.4024
Original score: 0.6677, Original label: True
Final score: 0.3113
Original score: 0.5022, Original label: True


Attacking:  63%|██████▎   | 633/1000 [00:26<00:14, 24.68it/s]

Final score: 0.4235
Original score: 0.4769, Original label: False
Final score: 0.5997
Original score: 0.7644, Original label: True
Final score: 0.4525
Original score: 0.3075, Original label: False
Final score: 0.3994
Original score: 0.4323, Original label: False
Final score: 0.3578
Original score: 0.7655, Original label: True


Attacking:  64%|██████▍   | 639/1000 [00:26<00:14, 24.56it/s]

Final score: 0.4373
Original score: 0.7105, Original label: True
Final score: 0.4785
Original score: 0.4271, Original label: False
Final score: 0.3701
Original score: 0.6278, Original label: True
Final score: 0.3516
Original score: 0.7592, Original label: True
Final score: 0.3421
Original score: 0.3206, Original label: False


Attacking:  64%|██████▍   | 645/1000 [00:26<00:14, 24.36it/s]

Final score: 0.3780
Original score: 0.6048, Original label: True
Final score: 0.3744
Original score: 0.3573, Original label: False
Final score: 0.3926
Original score: 0.2562, Original label: False
Final score: 0.3697
Original score: 0.6917, Original label: True
Final score: 0.6545
Original score: 0.3047, Original label: False


Attacking:  65%|██████▍   | 648/1000 [00:26<00:14, 24.41it/s]

Final score: 0.4124
Original score: 0.7605, Original label: True
Final score: 0.3785
Original score: 0.5839, Original label: True
Final score: 0.4229
Original score: 0.4115, Original label: False
Final score: 0.4526
Original score: 0.2653, Original label: False
Final score: 0.3470
Original score: 0.7193, Original label: True


Attacking:  65%|██████▌   | 654/1000 [00:27<00:14, 24.01it/s]

Final score: 0.5837
Original score: 0.3267, Original label: False
Final score: 0.4202
Original score: 0.6979, Original label: True
Final score: 0.3364
Original score: 0.2619, Original label: False
Final score: 0.2891
Original score: 0.7443, Original label: True
Final score: 0.5039
Original score: 0.6501, Original label: True


Attacking:  66%|██████▌   | 660/1000 [00:27<00:14, 24.25it/s]

Final score: 0.4257
Original score: 0.3023, Original label: False
Final score: 0.3640
Original score: 0.4921, Original label: False
Final score: 0.4366
Original score: 0.4233, Original label: False
Final score: 0.3218
Original score: 0.4314, Original label: False
Final score: 0.6471
Original score: 0.4799, Original label: False


Attacking:  66%|██████▋   | 663/1000 [00:27<00:13, 24.31it/s]

Final score: 0.4436
Original score: 0.3088, Original label: False
Final score: 0.4106
Original score: 0.7844, Original label: True
Final score: 0.4779
Original score: 0.7521, Original label: True
Final score: 0.3985
Original score: 0.4087, Original label: False
Final score: 0.3948
Original score: 0.7371, Original label: True


Attacking:  67%|██████▋   | 669/1000 [00:27<00:13, 24.23it/s]

Final score: 0.5201
Original score: 0.3056, Original label: False
Final score: 0.3231
Original score: 0.3034, Original label: False
Final score: 0.4491
Original score: 0.3174, Original label: False
Final score: 0.3005
Original score: 0.3520, Original label: False
Final score: 0.3593
Original score: 0.7129, Original label: True


Attacking:  68%|██████▊   | 675/1000 [00:27<00:13, 23.93it/s]

Final score: 0.5802
Original score: 0.1526, Original label: False
Final score: 0.2991
Original score: 0.3301, Original label: False
Final score: 0.4318
Original score: 0.3214, Original label: False
Final score: 0.5673
Original score: 0.2399, Original label: False
Final score: 0.2614
Original score: 0.7356, Original label: True


Attacking:  68%|██████▊   | 678/1000 [00:28<00:13, 24.53it/s]

Final score: 0.4504
Original score: 0.7351, Original label: True
Final score: 0.7098
Original score: 0.2747, Original label: False
Final score: 0.3046
Original score: 0.7624, Original label: True
Final score: 0.4833
Original score: 0.6637, Original label: True
Final score: 0.4491
Original score: 0.7150, Original label: True
Final score: 0.4103


Attacking:  68%|██████▊   | 684/1000 [00:28<00:12, 24.53it/s]

Original score: 0.6807, Original label: True
Final score: 0.4197
Original score: 0.7287, Original label: True
Final score: 0.4134
Original score: 0.3520, Original label: False
Final score: 0.3730
Original score: 0.4811, Original label: False
Final score: 0.3828
Original score: 0.2708, Original label: False
Final score: 0.4511
Original score: 0.8293, Original label: True


Attacking:  69%|██████▉   | 690/1000 [00:28<00:12, 24.81it/s]

Final score: 0.6639
Original score: 0.7553, Original label: True
Final score: 0.4887
Original score: 0.2162, Original label: False
Final score: 0.2711
Original score: 0.2501, Original label: False
Final score: 0.3220
Original score: 0.5662, Original label: True
Final score: 0.4047
Original score: 0.2841, Original label: False


Attacking:  70%|██████▉   | 696/1000 [00:28<00:12, 24.25it/s]

Final score: 0.2734
Original score: 0.6745, Original label: True
Final score: 0.5650
Original score: 0.2689, Original label: False
Final score: 0.3385
Original score: 0.2214, Original label: False
Final score: 0.3060
Original score: 0.6745, Original label: True
Final score: 0.6533
Original score: 0.2956, Original label: False


Attacking:  70%|██████▉   | 699/1000 [00:28<00:12, 24.44it/s]

Final score: 0.3414
Original score: 0.3950, Original label: False
Final score: 0.3526
Original score: 0.6288, Original label: True
Final score: 0.4243
Original score: 0.7367, Original label: True
Final score: 0.3595
Original score: 0.7269, Original label: True
Final score: 0.4028
Original score: 0.6256, Original label: True


Attacking:  70%|███████   | 705/1000 [00:29<00:12, 24.17it/s]

Final score: 0.3869
Original score: 0.7132, Original label: True
Final score: 0.5068
Original score: 0.5994, Original label: True
Final score: 0.3303
Original score: 0.2793, Original label: False
Final score: 0.2015
Original score: 0.7982, Original label: True
Final score: 0.5548
Original score: 0.7174, Original label: True


Attacking:  71%|███████   | 711/1000 [00:29<00:12, 24.04it/s]

Final score: 0.4625
Original score: 0.3504, Original label: False
Final score: 0.6593
Original score: 0.5803, Original label: True
Final score: 0.3878
Original score: 0.6539, Original label: True
Final score: 0.3903
Original score: 0.3279, Original label: False
Final score: 0.3642
Original score: 0.6716, Original label: True


Attacking:  71%|███████▏  | 714/1000 [00:29<00:11, 24.81it/s]

Final score: 0.4531
Original score: 0.7108, Original label: True
Final score: 0.5338
Original score: 0.5549, Original label: True
Final score: 0.4529
Original score: 0.1930, Original label: False
Final score: 0.3649
Original score: 0.2347, Original label: False
Final score: 0.2512
Original score: 0.7424, Original label: True
Final score: 0.4162


Attacking:  72%|███████▏  | 720/1000 [00:29<00:11, 24.81it/s]

Original score: 0.5843, Original label: True
Final score: 0.4294
Original score: 0.6619, Original label: True
Final score: 0.4312
Original score: 0.2975, Original label: False
Final score: 0.3373
Original score: 0.6732, Original label: True
Final score: 0.5903
Original score: 0.3756, Original label: False
Final score: 0.3641


Attacking:  73%|███████▎  | 726/1000 [00:29<00:11, 24.32it/s]

Original score: 0.7630, Original label: True
Final score: 0.4506
Original score: 0.6814, Original label: True
Final score: 0.3940
Original score: 0.2897, Original label: False
Final score: 0.3604
Original score: 0.2558, Original label: False
Final score: 0.3288
Original score: 0.7320, Original label: True
Final score: 0.4408


Attacking:  73%|███████▎  | 732/1000 [00:30<00:11, 24.06it/s]

Original score: 0.2700, Original label: False
Final score: 0.4413
Original score: 0.2597, Original label: False
Final score: 0.3139
Original score: 0.7422, Original label: True
Final score: 0.4950
Original score: 0.2882, Original label: False
Final score: 0.2708
Original score: 0.6251, Original label: True
Final score: 0.3610


Attacking:  74%|███████▎  | 735/1000 [00:30<00:10, 24.43it/s]

Original score: 0.2409, Original label: False
Final score: 0.3568
Original score: 0.2345, Original label: False
Final score: 0.3705
Original score: 0.6748, Original label: True
Final score: 0.3284
Original score: 0.4374, Original label: False
Final score: 0.4188
Original score: 0.4057, Original label: False
Final score: 0.3746
Original score: 0.7562, Original label: True


Attacking:  74%|███████▍  | 741/1000 [00:30<00:10, 24.33it/s]

Final score: 0.3883
Original score: 0.6553, Original label: True
Final score: 0.2868
Original score: 0.6455, Original label: True
Final score: 0.5310
Original score: 0.5999, Original label: True
Final score: 0.3087
Original score: 0.7414, Original label: True
Final score: 0.4004
Original score: 0.8000, Original label: True


Attacking:  75%|███████▍  | 747/1000 [00:30<00:10, 24.51it/s]

Final score: 0.6692
Original score: 0.7222, Original label: True
Final score: 0.5212
Original score: 0.8146, Original label: True
Final score: 0.4646
Original score: 0.3254, Original label: False
Final score: 0.2888
Original score: 0.7055, Original label: True
Final score: 0.4996
Original score: 0.2500, Original label: False
Final score: 0.3201


Attacking:  75%|███████▌  | 753/1000 [00:31<00:10, 24.26it/s]

Original score: 0.2992, Original label: False
Final score: 0.4023
Original score: 0.3883, Original label: False
Final score: 0.4227
Original score: 0.6660, Original label: True
Final score: 0.2939
Original score: 0.2800, Original label: False
Final score: 0.2953
Original score: 0.7673, Original label: True
Final score: 0.3158


Attacking:  76%|███████▌  | 756/1000 [00:31<00:10, 24.37it/s]

Original score: 0.6686, Original label: True
Final score: 0.4005
Original score: 0.2065, Original label: False
Final score: 0.3589
Original score: 0.3051, Original label: False
Final score: 0.3582
Original score: 0.4241, Original label: False
Final score: 0.3217
Original score: 0.7173, Original label: True
Final score: 0.3966


Attacking:  76%|███████▌  | 762/1000 [00:31<00:09, 24.20it/s]

Original score: 0.3430, Original label: False
Final score: 0.2835
Original score: 0.2928, Original label: False
Final score: 0.2544
Original score: 0.6815, Original label: True
Final score: 0.3972
Original score: 0.6690, Original label: True
Final score: 0.4289
Original score: 0.6549, Original label: True
Final score: 0.4813


Attacking:  77%|███████▋  | 768/1000 [00:31<00:09, 24.06it/s]

Original score: 0.6985, Original label: True
Final score: 0.4767
Original score: 0.4768, Original label: False
Final score: 0.3430
Original score: 0.3055, Original label: False
Final score: 0.2742
Original score: 0.6744, Original label: True
Final score: 0.4397
Original score: 0.3429, Original label: False
Final score: 0.3654


Attacking:  77%|███████▋  | 771/1000 [00:31<00:09, 24.23it/s]

Original score: 0.2608, Original label: False
Final score: 0.3063
Original score: 0.7419, Original label: True
Final score: 0.3439
Original score: 0.6548, Original label: True
Final score: 0.4603
Original score: 0.6714, Original label: True
Final score: 0.3656
Original score: 0.3103, Original label: False
Final score: 0.3558


Attacking:  78%|███████▊  | 777/1000 [00:32<00:09, 24.23it/s]

Original score: 0.2250, Original label: False
Final score: 0.2926
Original score: 0.6031, Original label: True
Final score: 0.4654
Original score: 0.7066, Original label: True
Final score: 0.5390
Original score: 0.3718, Original label: False
Final score: 0.5443
Original score: 0.8693, Original label: True
Final score: 0.5362


Attacking:  78%|███████▊  | 783/1000 [00:32<00:09, 23.94it/s]

Original score: 0.2421, Original label: False
Final score: 0.4073
Original score: 0.2074, Original label: False
Final score: 0.4221
Original score: 0.6941, Original label: True
Final score: 0.3563
Original score: 0.2820, Original label: False
Final score: 0.3210
Original score: 0.7599, Original label: True
Final score: 0.3496


Attacking:  79%|███████▊  | 786/1000 [00:32<00:08, 24.15it/s]

Original score: 0.6841, Original label: True
Final score: 0.5741
Original score: 0.6783, Original label: True
Final score: 0.3712
Original score: 0.3624, Original label: False
Final score: 0.3287
Original score: 0.5829, Original label: True
Final score: 0.4653
Original score: 0.2665, Original label: False
Final score: 0.3475


Attacking:  79%|███████▉  | 792/1000 [00:32<00:08, 23.95it/s]

Original score: 0.3547, Original label: False
Final score: 0.4312
Original score: 0.7063, Original label: True
Final score: 0.3719
Original score: 0.6971, Original label: True
Final score: 0.3844
Original score: 0.4344, Original label: False
Final score: 0.3601
Original score: 0.2854, Original label: False
Final score: 0.3574


Attacking:  80%|███████▉  | 798/1000 [00:32<00:08, 23.95it/s]

Original score: 0.3024, Original label: False
Final score: 0.4244
Original score: 0.3401, Original label: False
Final score: 0.3150
Original score: 0.7937, Original label: True
Final score: 0.3721
Original score: 0.2730, Original label: False
Final score: 0.3752
Original score: 0.2663, Original label: False
Final score: 0.2827


Attacking:  80%|████████  | 801/1000 [00:33<00:08, 23.78it/s]

Original score: 0.7467, Original label: True
Final score: 0.4170
Original score: 0.4279, Original label: False
Final score: 0.3520
Original score: 0.2685, Original label: False
Final score: 0.5746
Original score: 0.7765, Original label: True
Final score: 0.3919
Original score: 0.6369, Original label: True
Final score: 0.3764


Attacking:  81%|████████  | 807/1000 [00:33<00:07, 24.26it/s]

Original score: 0.3447, Original label: False
Final score: 0.3609
Original score: 0.1734, Original label: False
Final score: 0.3423
Original score: 0.2264, Original label: False
Final score: 0.3249
Original score: 0.6201, Original label: True
Final score: 0.3665
Original score: 0.8099, Original label: True
Final score: 0.6804


Attacking:  81%|████████▏ | 813/1000 [00:33<00:07, 23.93it/s]

Original score: 0.2548, Original label: False
Final score: 0.4005
Original score: 0.8236, Original label: True
Final score: 0.4736
Original score: 0.7358, Original label: True
Final score: 0.3964
Original score: 0.2761, Original label: False
Final score: 0.3908
Original score: 0.3183, Original label: False
Final score: 0.3702


Attacking:  82%|████████▏ | 816/1000 [00:33<00:07, 23.88it/s]

Original score: 0.7041, Original label: True
Final score: 0.4010
Original score: 0.7815, Original label: True
Final score: 0.4257
Original score: 0.7381, Original label: True
Final score: 0.4085
Original score: 0.7576, Original label: True
Final score: 0.4873
Original score: 0.4091, Original label: False
Final score: 0.3454


Attacking:  82%|████████▏ | 822/1000 [00:33<00:07, 23.73it/s]

Original score: 0.2526, Original label: False
Final score: 0.2725
Original score: 0.8364, Original label: True
Final score: 0.5846
Original score: 0.3685, Original label: False
Final score: 0.3398
Original score: 0.7178, Original label: True
Final score: 0.4714
Original score: 0.1765, Original label: False
Final score: 0.3271


Attacking:  83%|████████▎ | 828/1000 [00:34<00:07, 24.29it/s]

Original score: 0.3587, Original label: False
Final score: 0.4440
Original score: 0.2996, Original label: False
Final score: 0.3766
Original score: 0.7536, Original label: True
Final score: 0.5200
Original score: 0.3328, Original label: False
Final score: 0.3005
Original score: 0.3933, Original label: False
Final score: 0.4909
Original score: 0.3452, Original label: False


Attacking:  83%|████████▎ | 831/1000 [00:34<00:06, 24.21it/s]

Final score: 0.4113
Original score: 0.7298, Original label: True
Final score: 0.3511
Original score: 0.2317, Original label: False
Final score: 0.2215
Original score: 0.7320, Original label: True
Final score: 0.4217
Original score: 0.3219, Original label: False
Final score: 0.3390
Original score: 0.3677, Original label: False


Attacking:  84%|████████▎ | 837/1000 [00:34<00:06, 24.53it/s]

Final score: 0.2834
Original score: 0.3747, Original label: False
Final score: 0.4115
Original score: 0.3588, Original label: False
Final score: 0.4596
Original score: 0.7373, Original label: True
Final score: 0.3640
Original score: 0.7513, Original label: True
Final score: 0.4866
Original score: 0.7913, Original label: True


Attacking:  84%|████████▍ | 843/1000 [00:34<00:06, 24.01it/s]

Final score: 0.5181
Original score: 0.3130, Original label: False
Final score: 0.2768
Original score: 0.3675, Original label: False
Final score: 0.4727
Original score: 0.6202, Original label: True
Final score: 0.3985
Original score: 0.3331, Original label: False
Final score: 0.3262
Original score: 0.6621, Original label: True


Attacking:  85%|████████▍ | 846/1000 [00:34<00:06, 24.48it/s]

Final score: 0.4928
Original score: 0.7501, Original label: True
Final score: 0.5626
Original score: 0.3811, Original label: False
Final score: 0.4702
Original score: 0.5808, Original label: True
Final score: 0.4215
Original score: 0.6597, Original label: True
Final score: 0.5136
Original score: 0.7626, Original label: True
Final score: 0.5400


Attacking:  85%|████████▌ | 852/1000 [00:35<00:06, 24.38it/s]

Original score: 0.3607, Original label: False
Final score: 0.3491
Original score: 0.2957, Original label: False
Final score: 0.4599
Original score: 0.7113, Original label: True
Final score: 0.3956
Original score: 0.6831, Original label: True
Final score: 0.4617
Original score: 0.2544, Original label: False
Final score: 0.3818


Attacking:  86%|████████▌ | 858/1000 [00:35<00:05, 24.27it/s]

Original score: 0.6689, Original label: True
Final score: 0.4487
Original score: 0.3741, Original label: False
Final score: 0.3330
Original score: 0.3070, Original label: False
Final score: 0.4597
Original score: 0.6133, Original label: True
Final score: 0.3816
Original score: 0.3856, Original label: False
Final score: 0.3635


Attacking:  86%|████████▋ | 864/1000 [00:35<00:05, 24.04it/s]

Original score: 0.3433, Original label: False
Final score: 0.4687
Original score: 0.7544, Original label: True
Final score: 0.4118
Original score: 0.3433, Original label: False
Final score: 0.4038
Original score: 0.2280, Original label: False
Final score: 0.2979
Original score: 0.3854, Original label: False
Final score: 0.4080


Attacking:  87%|████████▋ | 867/1000 [00:35<00:05, 24.03it/s]

Original score: 0.3048, Original label: False
Final score: 0.3454
Original score: 0.8119, Original label: True
Final score: 0.4138
Original score: 0.7185, Original label: True
Final score: 0.3923
Original score: 0.3987, Original label: False
Final score: 0.5034
Original score: 0.6215, Original label: True
Final score: 0.3421


Attacking:  87%|████████▋ | 873/1000 [00:36<00:05, 23.86it/s]

Original score: 0.3850, Original label: False
Final score: 0.4158
Original score: 0.7214, Original label: True
Final score: 0.4546
Original score: 0.6509, Original label: True
Final score: 0.4146
Original score: 0.8062, Original label: True
Final score: 0.5021
Original score: 0.5231, Original label: True
Final score: 0.3618


Attacking:  88%|████████▊ | 879/1000 [00:36<00:04, 24.27it/s]

Original score: 0.2605, Original label: False
Final score: 0.2075
Original score: 0.2555, Original label: False
Final score: 0.3048
Original score: 0.7705, Original label: True
Final score: 0.5011
Original score: 0.3586, Original label: False
Final score: 0.3828
Original score: 0.6723, Original label: True
Final score: 0.3887


Attacking:  88%|████████▊ | 882/1000 [00:36<00:04, 24.20it/s]

Original score: 0.7127, Original label: True
Final score: 0.4727
Original score: 0.2451, Original label: False
Final score: 0.3277
Original score: 0.7161, Original label: True
Final score: 0.3377
Original score: 0.6593, Original label: True
Final score: 0.4213
Original score: 0.3523, Original label: False
Final score: 0.3305


Attacking:  89%|████████▉ | 888/1000 [00:36<00:04, 24.14it/s]

Original score: 0.2094, Original label: False
Final score: 0.4025
Original score: 0.3040, Original label: False
Final score: 0.4093
Original score: 0.3353, Original label: False
Final score: 0.3110
Original score: 0.7716, Original label: True
Final score: 0.4168
Original score: 0.7864, Original label: True
Final score: 0.4929


Attacking:  89%|████████▉ | 894/1000 [00:36<00:04, 23.71it/s]

Original score: 0.4991, Original label: False
Final score: 0.4578
Original score: 0.3690, Original label: False
Final score: 0.5333
Original score: 0.2483, Original label: False
Final score: 0.3774
Original score: 0.6768, Original label: True
Final score: 0.4417
Original score: 0.8148, Original label: True
Final score: 0.5324


Attacking:  90%|████████▉ | 897/1000 [00:37<00:04, 23.88it/s]

Original score: 0.7241, Original label: True
Final score: 0.3912
Original score: 0.2034, Original label: False
Final score: 0.3000
Original score: 0.7260, Original label: True
Final score: 0.4389
Original score: 0.2437, Original label: False
Final score: 0.3000
Original score: 0.7150, Original label: True
Final score: 0.3011


Attacking:  90%|█████████ | 903/1000 [00:37<00:04, 24.18it/s]

Original score: 0.3331, Original label: False
Final score: 0.3739
Original score: 0.6893, Original label: True
Final score: 0.4189
Original score: 0.5362, Original label: True
Final score: 0.4402
Original score: 0.3368, Original label: False
Final score: 0.2411
Original score: 0.8306, Original label: True
Final score: 0.4010


Attacking:  91%|█████████ | 909/1000 [00:37<00:03, 23.63it/s]

Original score: 0.5940, Original label: True
Final score: 0.4126
Original score: 0.1963, Original label: False
Final score: 0.3074
Original score: 0.7715, Original label: True
Final score: 0.4313
Original score: 0.5097, Original label: True
Final score: 0.3538
Original score: 0.7109, Original label: True
Final score: 0.3903


Attacking:  91%|█████████ | 912/1000 [00:37<00:03, 23.62it/s]

Original score: 0.7403, Original label: True
Final score: 0.3921
Original score: 0.2701, Original label: False
Final score: 0.6576
Original score: 0.2168, Original label: False
Final score: 0.5267
Original score: 0.2197, Original label: False
Final score: 0.2177
Original score: 0.3766, Original label: False
Final score: 0.3869


Attacking:  92%|█████████▏| 918/1000 [00:37<00:03, 23.29it/s]

Original score: 0.7739, Original label: True
Final score: 0.4245
Original score: 0.4328, Original label: False
Final score: 0.2619
Original score: 0.4871, Original label: False
Final score: 0.2890
Original score: 0.7553, Original label: True
Final score: 0.5311
Original score: 0.2773, Original label: False
Final score: 0.3771


Attacking:  92%|█████████▏| 924/1000 [00:38<00:03, 23.44it/s]

Original score: 0.2696, Original label: False
Final score: 0.3805
Original score: 0.6795, Original label: True
Final score: 0.4874
Original score: 0.7050, Original label: True
Final score: 0.3653
Original score: 0.7581, Original label: True
Final score: 0.7063
Original score: 0.7077, Original label: True
Final score: 0.4614


Attacking:  93%|█████████▎| 927/1000 [00:38<00:03, 23.58it/s]

Original score: 0.7252, Original label: True
Final score: 0.3178
Original score: 0.2473, Original label: False
Final score: 0.3824
Original score: 0.7537, Original label: True
Final score: 0.4688
Original score: 0.7126, Original label: True
Final score: 0.3431
Original score: 0.3432, Original label: False
Final score: 0.3363


Attacking:  93%|█████████▎| 933/1000 [00:38<00:02, 23.70it/s]

Original score: 0.6692, Original label: True
Final score: 0.3881
Original score: 0.7758, Original label: True
Final score: 0.5082
Original score: 0.7217, Original label: True
Final score: 0.3083
Original score: 0.1864, Original label: False
Final score: 0.4211
Original score: 0.6800, Original label: True
Final score: 0.5940


Attacking:  94%|█████████▍| 939/1000 [00:38<00:02, 23.83it/s]

Original score: 0.3438, Original label: False
Final score: 0.3629
Original score: 0.3281, Original label: False
Final score: 0.3641
Original score: 0.6966, Original label: True
Final score: 0.4268
Original score: 0.7249, Original label: True
Final score: 0.3869
Original score: 0.7668, Original label: True
Final score: 0.4124


Attacking:  94%|█████████▍| 942/1000 [00:38<00:02, 23.80it/s]

Original score: 0.3170, Original label: False
Final score: 0.3029
Original score: 0.2894, Original label: False
Final score: 0.4163
Original score: 0.2604, Original label: False
Final score: 0.3411
Original score: 0.5841, Original label: True
Final score: 0.5260
Original score: 0.6150, Original label: True
Final score: 0.3675


Attacking:  95%|█████████▍| 948/1000 [00:39<00:02, 24.15it/s]

Original score: 0.7382, Original label: True
Final score: 0.5550
Original score: 0.6910, Original label: True
Final score: 0.4262
Original score: 0.7275, Original label: True
Final score: 0.4495
Original score: 0.3105, Original label: False
Final score: 0.3802
Original score: 0.3339, Original label: False
Final score: 0.3844
Original score: 0.6940, Original label: True


Attacking:  95%|█████████▌| 954/1000 [00:39<00:01, 24.30it/s]

Final score: 0.4035
Original score: 0.7241, Original label: True
Final score: 0.3893
Original score: 0.8049, Original label: True
Final score: 0.5010
Original score: 0.3047, Original label: False
Final score: 0.3108
Original score: 0.7079, Original label: True
Final score: 0.3837
Original score: 0.2305, Original label: False


Attacking:  96%|█████████▌| 957/1000 [00:39<00:01, 24.11it/s]

Final score: 0.3140
Original score: 0.3258, Original label: False
Final score: 0.3204
Original score: 0.7347, Original label: True
Final score: 0.3724
Original score: 0.2885, Original label: False
Final score: 0.2339
Original score: 0.7263, Original label: True
Final score: 0.4546
Original score: 0.3389, Original label: False


Attacking:  96%|█████████▋| 963/1000 [00:39<00:01, 24.17it/s]

Final score: 0.3112
Original score: 0.7034, Original label: True
Final score: 0.4267
Original score: 0.6983, Original label: True
Final score: 0.5517
Original score: 0.2615, Original label: False
Final score: 0.3836
Original score: 0.4436, Original label: False
Final score: 0.3582
Original score: 0.3575, Original label: False


Attacking:  97%|█████████▋| 969/1000 [00:40<00:01, 24.32it/s]

Final score: 0.4995
Original score: 0.4414, Original label: False
Final score: 0.3387
Original score: 0.2803, Original label: False
Final score: 0.3345
Original score: 0.6779, Original label: True
Final score: 0.4967
Original score: 0.7470, Original label: True
Final score: 0.3308
Original score: 0.7389, Original label: True


Attacking:  97%|█████████▋| 972/1000 [00:40<00:01, 24.33it/s]

Final score: 0.3550
Original score: 0.2505, Original label: False
Final score: 0.3236
Original score: 0.6870, Original label: True
Final score: 0.4884
Original score: 0.7240, Original label: True
Final score: 0.5187
Original score: 0.2436, Original label: False
Final score: 0.2398
Original score: 0.7359, Original label: True
Final score: 0.3796


Attacking:  98%|█████████▊| 978/1000 [00:40<00:00, 24.09it/s]

Original score: 0.7828, Original label: True
Final score: 0.5117
Original score: 0.8035, Original label: True
Final score: 0.4918
Original score: 0.2086, Original label: False
Final score: 0.3885
Original score: 0.2783, Original label: False
Final score: 0.4214
Original score: 0.7735, Original label: True
Final score: 0.4372


Attacking:  98%|█████████▊| 984/1000 [00:40<00:00, 24.35it/s]

Original score: 0.6835, Original label: True
Final score: 0.3650
Original score: 0.2468, Original label: False
Final score: 0.3610
Original score: 0.3112, Original label: False
Final score: 0.3166
Original score: 0.6999, Original label: True
Final score: 0.5343
Original score: 0.7083, Original label: True
Final score: 0.3140


Attacking:  99%|█████████▉| 990/1000 [00:40<00:00, 23.84it/s]

Original score: 0.5777, Original label: True
Final score: 0.4568
Original score: 0.6586, Original label: True
Final score: 0.3702
Original score: 0.4192, Original label: False
Final score: 0.3230
Original score: 0.6870, Original label: True
Final score: 0.3437
Original score: 0.2763, Original label: False
Final score: 0.4081


Attacking:  99%|█████████▉| 993/1000 [00:41<00:00, 24.00it/s]

Original score: 0.1959, Original label: False
Final score: 0.2983
Original score: 0.4744, Original label: False
Final score: 0.2768
Original score: 0.6841, Original label: True
Final score: 0.3827
Original score: 0.3111, Original label: False
Final score: 0.2730
Original score: 0.2622, Original label: False
Final score: 0.3659


Attacking: 100%|██████████| 1000/1000 [00:41<00:00, 24.17it/s]

Original score: 0.2080, Original label: False
Final score: 0.3496
Original score: 0.2476, Original label: False
Final score: 0.3877
Original score: 0.4869, Original label: False
Final score: 0.3172
Original score: 0.5719, Original label: True
Final score: 0.4861
Original score: 0.3726, Original label: False
Final score: 0.4146

FINAL EVALUATION RESULTS:
Attack Success Rate (ASR): 43.3%
Average Number of Changes (ANC): 37.78
Average Queries (AQ): 2.0
Final Score: 0.3366





## Baseline scoring

In [None]:
def baseline_attack(image_np_uint8: np.ndarray, caption_str: str, api: BlackBoxAPI, budgets: dict) -> dict:

    # TRIVIAL BASELINE - Students should replace this!
    # This baseline just returns the original inputs without any attack
    
    original_image = image_np_uint8.copy()
    original_caption = caption_str
    
    # Get original score to determine target (flip the decision)
    original_score = api.score(original_image, original_caption)

    final_image = original_image        # Comment out this line when implementing image attacks
    final_caption = original_caption    # Comment out this line when implementing text attacks
    
    # Get final score  
    final_score = api.score(final_image, final_caption)
    
    # Check if attack succeeded (decision flipped)
    original_decision = original_score > 0.5
    final_decision = final_score > 0.5
    success = (original_decision != final_decision)
    
    # Calculate costs
    token_cost = token_edit_cost(original_caption, final_caption)
    pixel_cost = pixel_edit_cost(original_image, final_image)
    query_cost = api.query_count
    
    return {
        'success': success,
        'image': final_image,
        'caption': final_caption,
        'token_cost': token_cost,
        'pixel_cost': pixel_cost,  
        'query_cost': query_cost,
        'original_score': original_score,
        'final_score': final_score
    }

In [None]:


# Run Evaluation
print("Running evaluation on validation set...")

# Define attack budgets
attack_budgets = {
    'T_MAX': 10,     # Maximum token edits per sample
    'P_MAX': 100,   # Maximum pixel edits per sample  
    'Q_MAX': 100    # Maximum queries per sample
}

# Evaluate on subset first (faster for testing)
print("Running on first 50 samples for quick testing...")
eval_result = evaluate_attack(
    val_pairs=val_pairs, 
    attack_function=baseline_attack,
    alpha=alpha,
    beta=beta, 
    budgets=attack_budgets,
    max_samples=50  # Quick test on 50 samples
)

# Print results
print(f"\nEVALUATION RESULTS (50 samples):")
print(f"{'='*50}")
print(f"Attack Success Rate (ASR): {eval_result['ASR']:.1%}")
print(f"Average Number of Changes (ANC): {eval_result['ANC']:.2f}")  
print(f"Average Queries (AQ): {eval_result['AQ']:.1f}")
print(f"Final Score: {eval_result['Score']:.4f}")
print(f"{'='*50}")
print(f"Budget Usage:")
print(f"  Avg Token Cost: {eval_result['avg_token_cost']:.2f}/{attack_budgets['T_MAX']}")
print(f"  Avg Pixel Cost: {eval_result['avg_pixel_cost']:.2f}/{attack_budgets['P_MAX']}")  
print(f"  Avg Query Cost: {eval_result['AQ']:.1f}/{attack_budgets['Q_MAX']}")
print(f"\nNOTE: This is a trivial baseline (0% ASR expected)")
print(f"Students should implement sophisticated attacks to improve ASR!")

Running evaluation on validation set...
Running on first 50 samples for quick testing...
Starting evaluation...
Evaluating on 50 samples


Attacking: 100%|██████████| 50/50 [00:01<00:00, 28.00it/s]


EVALUATION RESULTS (50 samples):
Attack Success Rate (ASR): 0.0%
Average Number of Changes (ANC): 0.00
Average Queries (AQ): 2.0
Final Score: -0.0020
Budget Usage:
  Avg Token Cost: 0.00/10
  Avg Pixel Cost: 0.00/100
  Avg Query Cost: 2.0/100

NOTE: This is a trivial baseline (0% ASR expected)
Students should implement sophisticated attacks to improve ASR!





In [None]:
# Full Evaluation (Uncomment when ready to test your attack)

def run_full_evaluation():
    """Run evaluation on all 1000 validation samples."""
    print("Running FULL evaluation on all 1000 validation samples...")
    print("This may take several minutes depending on your attack implementation.")
    
    full_result = evaluate_attack(
        val_pairs=val_pairs,
        attack_function=baseline_attack, 
        alpha=alpha,
        beta=beta,
        budgets=attack_budgets,
        max_samples=None  # All samples
    )
    
    print(f"\nFINAL EVALUATION RESULTS:")
    print(f"{'='*60}")
    print(f"Attack Success Rate (ASR): {full_result['ASR']:.1%}")
    print(f"Average Number of Changes (ANC): {full_result['ANC']:.2f}")
    print(f"Average Queries (AQ): {full_result['AQ']:.1f}")
    print(f"Final Score: {full_result['Score']:.4f}")
    print(f"{'='*60}")
    
    return full_result

# Uncomment the line below when ready to run full evaluation:
full_results = run_full_evaluation()

print("To run full evaluation on all 1000 samples:")
print("Uncomment: full_results = run_full_evaluation()")
print("\nCurrent status: Framework ready for student implementations!")

Running FULL evaluation on all 1000 validation samples...
This may take several minutes depending on your attack implementation.
Starting evaluation...
Evaluating on 1000 samples


Attacking: 100%|██████████| 1000/1000 [00:36<00:00, 27.70it/s]


FINAL EVALUATION RESULTS:
Attack Success Rate (ASR): 0.3%
Average Number of Changes (ANC): 0.00
Average Queries (AQ): 2.0
Final Score: 0.0010
To run full evaluation on all 1000 samples:
Uncomment: full_results = run_full_evaluation()

Current status: Framework ready for student implementations!



