# Adversarial Attack on ResNet-34 Image Classifier

This notebook implements an adversarial attack against a pre-trained ResNet-34 model. The goal is to generate imperceptible perturbations to images that cause the model to misclassify them, particularly focusing on removing the true class from the top-5 predictions.

## Setup and Imports

The following cell imports necessary libraries and sets up the computing device (CPU or GPU).

In [1]:
import torch
import torchvision
import numpy as np
import os
import shutil
import json
from tqdm import tqdm
from PIL import Image
import matplotlib.pyplot as plt
import torch.nn.functional as F
import torch.nn as nn
from torchvision import transforms

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load the pre-trained ResNet-34 model
model = torchvision.models.resnet34(weights='IMAGENET1K_V1')
model = model.to(device)
model.eval()

Using device: cuda


Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 160MB/s] 


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

## Data Preprocessing

Setting up necessary normalization parameters and transformations for the image data. The preprocessing pipeline converts images to tensors and normalizes them according to ImageNet statistics. We also define the paths for our original and adversarial datasets.

In [2]:
# Set up normalization parameters
mean_norms = np.array([0.485, 0.456, 0.406])
std_norms = np.array([0.229, 0.224, 0.225])

# Create the transform pipeline
preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=mean_norms, std=std_norms)
])

# Define dataset paths
dataset_path = "/kaggle/input/testdata/TestDataSet"
adversarial_path = "./AdversarialTestSet2"

# Clear and recreate adversarial directory to prevent duplicates
if os.path.exists(adversarial_path):
    shutil.rmtree(adversarial_path)
os.makedirs(adversarial_path, exist_ok=True)

## Custom Dataset Loader

Creating a custom dataset class that handles image loading from folders with proper class labels. This implementation doesn't rely on PyTorch's ImageFolder but creates a similar functionality to have more control over the dataset handling.

In [3]:
# Load the dataset using SimpleImageFolder class
class SimpleImageFolder(torch.utils.data.Dataset):
    def __init__(self, root, transform=None):
        self.transform = transform
        self.samples = []
        self.classes = []
        self.class_to_idx = {}
        
        # Get all valid directories
        class_dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d)) and not d.startswith('.')]
        class_dirs.sort()
        
        # For each directory, find all images
        for i, class_dir in enumerate(class_dirs):
            self.classes.append(class_dir)
            self.class_to_idx[class_dir] = i
            
            dir_path = os.path.join(root, class_dir)
            for img_file in os.listdir(dir_path):
                if img_file.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp', '.tiff')):
                    img_path = os.path.join(dir_path, img_file)
                    self.samples.append((img_path, i))
        
        print(f"Loaded {len(self.samples)} images across {len(self.classes)} classes")
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        img_path, label = self.samples[idx]
        
        # Open image with PIL and convert to RGB
        img = Image.open(img_path).convert('RGB')
        
        if self.transform:
            img = self.transform(img)
        
        return img, label, img_path

dataset = SimpleImageFolder(dataset_path, transform=preprocess)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False)

Loaded 500 images across 100 classes


## Class Mapping

Loading or creating a mapping from folder names to ImageNet class indices. This mapping allows us to correctly evaluate the model's performance by associating our dataset's folder structure with the pre-trained model's class indices.

In [4]:
# Load class mapping
try:
    with open('folder_to_class_mapping.json', 'r') as f:
        folder_to_class = json.load(f)
        folder_to_class = {k: int(v) for k, v in folder_to_class.items()}
        print(f"Loaded mapping for {len(folder_to_class)} folders from file")
except:
    print("Creating mapping from folder names to ImageNet classes...")
    folder_names = dataset.classes
    
    folder_to_class = {}
    for i, folder in enumerate(folder_names):
        folder_to_class[folder] = 401 + i
    
    # Save the mapping for future use
    with open('folder_to_class_mapping.json', 'w') as f:
        json.dump(folder_to_class, f)
    
    print(f"Created mapping for {len(folder_to_class)} folders")

Creating mapping from folder names to ImageNet classes...
Created mapping for 100 folders


## Enhanced PGD Attack Implementation

This implements an enhanced version of the Projected Gradient Descent (PGD) attack with momentum. The attack is specifically tailored to remove the true class from the model's top-5 predictions, making it particularly effective for top-5 evasion. Key features include:

- Momentum-based gradient updates to overcome local minima
- Custom loss function targeting top-5 evasion
- Gradient normalization for stability
- Proper handling of normalized image spaces

In [5]:
# Enhanced PGD attack with momentum for Task 3 targeting top-5 evasion
def enhanced_pgd_attack(images, true_class, eps=0.0199, steps=20, alpha=None, momentum=0.9):
    """
    Enhanced PGD attack with momentum, specifically targeting top-5 evasion
    """
    # Get the dtype and device of the original images
    dtype = images.dtype
    
    if alpha is None:
        alpha = eps / 10  # Default step size
    
    # Convert true_class to tensor if needed
    if not isinstance(true_class, torch.Tensor):
        true_class = torch.tensor([true_class], device=device, dtype=torch.long)
    
    # Create tensors for normalization (match input tensor type)
    mean = torch.tensor(mean_norms, device=device, dtype=dtype).view(1, 3, 1, 1)
    std = torch.tensor(std_norms, device=device, dtype=dtype).view(1, 3, 1, 1)
    
    # Calculate valid pixel ranges in normalized space
    min_norm = (0 - mean) / std
    max_norm = (1 - mean) / std
    
    # Clone the original images
    adv = images.clone().detach()
    
    # Add small random noise to ensure we don't start at a local minimum
    # Explicitly use the right dtype
    noise = torch.empty_like(adv, dtype=dtype).uniform_(-eps/2, eps/2)
    adv = adv + noise
    adv = torch.clamp(adv, min_norm, max_norm)
    
    # Initialize momentum buffer with matching dtype
    g = torch.zeros_like(adv, dtype=dtype)
    
    # PGD attack loop
    for step in range(steps):
        adv.requires_grad_(True)
        
        # Forward pass
        logits = model(adv)
        
        # Custom loss targeting top-5 evasion
        # Extract logits for the true class
        true_class_logits = logits.gather(1, true_class.unsqueeze(1)).squeeze(1)
        
        # Get all logits except true class
        mask = torch.ones_like(logits).scatter_(1, true_class.unsqueeze(1), 0)
        other_logits = mask * logits - (1 - mask) * 1e10  # Mask out true class
        
        # Get top 5 logits that are not the true class
        top5_other_logits, _ = other_logits.topk(5, dim=1)
        
        # Calculate margin between true class and 5th highest other class
        margin = true_class_logits - top5_other_logits[:, -1]
        
        # Loss is positive when true class is in top 5, so we minimize it
        loss = margin.mean()
        
        # Backward pass
        model.zero_grad()
        loss.backward()
        
        # Update with momentum
        with torch.no_grad():
            # Ensure grad is the right dtype before updating momentum
            grad = adv.grad.to(dtype=dtype)
            
            # Update momentum term (normalized gradient)
            g = momentum * g + grad / (torch.norm(grad, p=1) + 1e-8)  # Add epsilon to avoid division by zero
            
            # Update adversarial example
            adv.data = adv.data - alpha * g.sign()
            
            # Project back to epsilon ball around original image
            delta = torch.clamp(adv.data - images, -eps, eps)
            adv.data = images + delta
            
            # Ensure valid pixel values
            adv.data = torch.clamp(adv.data, min_norm, max_norm)
        
        # Reset gradient for next iteration
        adv.grad = None
    
    return adv.detach()

## Alternative PGD Attack Implementation

An alternative version of PGD attack that works in raw image space (0-1 pixel values) rather than normalized space. This approach can sometimes be more intuitive and provides a different perspective on the attack. The attack:

1. Denormalizes images to work in raw pixel space 
2. Performs gradient updates in this space
3. Projects perturbations to the epsilon ball
4. Renormalizes before returning

In [6]:
# Alternative PGD attack working in raw image space
def pgd_attack_raw(model, images, true_class, epsilon=0.02, steps=10):
    # Ensure all tensors have matching dtype
    dtype = images.dtype
    
    mean = torch.tensor(mean_norms, device=images.device, dtype=dtype).view(1, 3, 1, 1)
    std = torch.tensor(std_norms, device=images.device, dtype=dtype).view(1, 3, 1, 1)
    
    # Denormalize to raw pixel space
    raw = images * std + mean
    
    # Create copy for adversarial example
    adv = raw.clone().detach().requires_grad_(True)
    
    # Calculate step size
    alpha = epsilon / 4  # Smaller steps for better exploration
    
    # Convert true_class to tensor if it's not already
    if not isinstance(true_class, torch.Tensor):
        true_class = torch.tensor([true_class], device=device)
    
    # PGD attack loop
    for _ in range(steps):
        # Normalize for model input
        normalized = (adv - mean) / std
        
        # Forward pass
        outputs = model(normalized)
        
        # Calculate loss
        loss = nn.CrossEntropyLoss()(outputs, true_class)
        
        # Backward pass
        model.zero_grad()
        loss.backward()
        
        # Update with gradient step
        adv.data = adv + alpha * adv.grad.sign()
        
        # Project to epsilon ball around original image
        adv.data = torch.max(torch.min(adv, raw + epsilon), raw - epsilon)
        
        # Clamp to valid pixel range
        adv.data = torch.clamp(adv, 0, 1).detach()
        
        # Reset gradient
        adv.requires_grad = True
    
    # Return normalized adversarial image
    adv_norm = (adv - mean) / std
    return adv_norm.detach()

## Utility Functions

Helper functions for image conversion and model evaluation:

1. `tensor_to_pil`: Converts a normalized tensor to a PIL image for visualization
2. `evaluate_accuracy`: Calculates top-1 and top-5 accuracy of the model on a given dataset

In [7]:
# Function to convert tensor to PIL image (for visualization)
def tensor_to_pil(tensor):
    """
    Convert a tensor to a PIL image, properly denormalizing.
    """
    # Create a copy of the tensor and move to CPU
    img = tensor.clone().detach().cpu()
    
    # Denormalize
    for c in range(3):
        img[c] = img[c] * std_norms[c] + mean_norms[c]
    
    # Convert to PIL image
    img = img.permute(1, 2, 0).numpy() * 255.0
    img = Image.fromarray(np.uint8(np.clip(img, 0, 255)))
    return img

# Function to evaluate accuracy
def evaluate_accuracy(model, dataloader, folder_to_class):
    correct_top1 = 0
    correct_top5 = 0
    total = 0
    
    with torch.no_grad():
        for images, labels, _ in tqdm(dataloader, desc="Evaluating model"):
            images = images.to(device)
            folder_idx = labels.item()
            folder_name = dataloader.dataset.classes[folder_idx]
            
            # Skip if we don't have a mapping for this folder
            if folder_name not in folder_to_class:
                continue
                
            # Get the true class
            true_class = folder_to_class[folder_name]
            
            # Forward pass
            outputs = model(images)
            
            # Top-1 accuracy
            _, pred = outputs.max(1)
            correct_top1 += (pred.item() == true_class)
            
            # Top-5 accuracy
            _, top5 = outputs.topk(5, dim=1)
            correct_top5 += (true_class in top5.cpu().numpy()[0])
            
            total += 1
    
    # Calculate accuracy
    top1_acc = 100 * correct_top1 / total
    top5_acc = 100 * correct_top5 / total
    
    return top1_acc, top5_acc, total

## Baseline Evaluation

Calculating the baseline accuracy of the model on the original test set. This establishes the reference point to measure the effectiveness of our adversarial attacks.

In [8]:
# Calculate baseline accuracy
print("\nCalculating baseline accuracy...")
baseline_top1, baseline_top5, baseline_total = evaluate_accuracy(model, dataloader, folder_to_class)
print(f"Baseline Top-1 Accuracy: {baseline_top1:.2f}%")
print(f"Baseline Top-5 Accuracy: {baseline_top5:.2f}%")


Calculating baseline accuracy...


Evaluating model: 100%|██████████| 500/500 [00:06<00:00, 75.33it/s]

Baseline Top-1 Accuracy: 76.00%
Baseline Top-5 Accuracy: 94.20%





## Adversarial Example Generation

The core process of creating adversarial examples by applying the enhanced PGD attack to each image in the dataset. The script:

1. Processes each image from the test set
2. Applies the enhanced PGD attack targeting top-5 evasion
3. Verifies if the attack was successful (both for top-1 and top-5 metrics)
4. Saves the adversarial images
5. Collects examples for visualization
6. Tracks detailed statistics about the attack performance

In [9]:
# Generate adversarial examples using enhanced PGD
print("\n--- Task 3: Generating adversarial examples using Enhanced PGD for Top-5 Evasion ---")

examples = []
success_count = 0
success_top5_count = 0
total_count = 0
already_misclassified = 0
already_not_in_top5 = 0
epsilon = 0.0199  # Slightly reduced to ensure constraint
steps = 20  # More steps for better convergence

for images, labels, img_paths in tqdm(dataloader, desc="Generating adversarial examples"):
    images = images.to(device).float()  # Explicitly convert to float32
    folder_idx = labels.item()
    folder_name = dataloader.dataset.classes[folder_idx]
    
    # Skip if we don't have a mapping for this folder
    if folder_name not in folder_to_class:
        continue
    
    # Get the true class
    true_class = folder_to_class[folder_name]
    total_count += 1
    
    # Get original prediction
    with torch.no_grad():
        original_output = model(images)
        original_pred = original_output.argmax(1).item()
        _, original_top5 = original_output.topk(5, dim=1)
        original_top5 = original_top5.cpu().numpy()[0]
    
    # Count images that are already misclassified
    if original_pred != true_class:
        already_misclassified += 1
    
    # Count images where true class is not in top-5
    if true_class not in original_top5:
        already_not_in_top5 += 1
        # For these, we'll just use the original image
        adversarial_images = images.clone()
    else:
        # Generate adversarial example using enhanced PGD
        try:
            adversarial_images = enhanced_pgd_attack(images, true_class, eps=epsilon, steps=steps)
        except RuntimeError as e:
            print(f"Error with image {img_paths[0]}: {e}")
            print(f"Image dtype: {images.dtype}, Model weight dtype: {next(model.parameters()).dtype}")
            # Fallback to original image
            adversarial_images = images.clone()
    
    # Verify the perturbation is within bounds
    perturbation = adversarial_images - images
    max_perturbation = torch.max(torch.abs(perturbation)).item()
    
    # Get prediction on adversarial example
    with torch.no_grad():
        adversarial_output = model(adversarial_images)
        adversarial_pred = adversarial_output.argmax(1).item()
        _, adversarial_top5 = adversarial_output.topk(5, dim=1)
        adversarial_top5 = adversarial_top5.cpu().numpy()[0]
    
    # Check if attack was successful (for top-1)
    is_successful = (adversarial_pred != true_class)
    if is_successful:
        success_count += 1
    
    # Check if attack was successful (for top-5)
    is_top5_successful = (true_class not in adversarial_top5)
    if is_top5_successful and true_class in original_top5:
        success_top5_count += 1
    
    # Create directory for class if it doesn't exist
    folder_path = os.path.join(adversarial_path, folder_name)
    os.makedirs(folder_path, exist_ok=True)
    
    # Save adversarial image as tensor to preserve exact values
    img_name = os.path.basename(img_paths[0]).split('.')[0] + '.pt'
    save_path = os.path.join(folder_path, img_name)
    torch.save(adversarial_images.cpu(), save_path)
    
    # Store examples for visualization - prioritize top-5 evasion
    if len(examples) < 5 and true_class in original_top5 and true_class not in adversarial_top5:
        examples.append({
            'original_image': images[0].detach().cpu(),
            'adversarial_image': adversarial_images[0].detach().cpu(),
            'original_class': true_class,
            'original_pred': original_pred,
            'adversarial_pred': adversarial_pred,
            'original_top5': original_top5,
            'adversarial_top5': adversarial_top5,
            'max_perturbation': max_perturbation,
            'is_successful': is_successful,
            'is_top5_successful': is_top5_successful
        })


--- Task 3: Generating adversarial examples using Enhanced PGD for Top-5 Evasion ---


Generating adversarial examples: 100%|██████████| 500/500 [02:17<00:00,  3.63it/s]


## Attack Statistics Analysis

Calculating and displaying detailed statistics about the performance of our adversarial attack, including:

- Original classification rate
- Percent of images already misclassified
- Success rates for both top-1 and top-5 evasion
- Separate metrics for originally correctly-classified images

In [10]:
# Print attack statistics
correctly_classified_count = total_count - already_misclassified
success_rate_correct_only = (success_count / correctly_classified_count) * 100 if correctly_classified_count > 0 else 0
success_rate_overall = (success_count / total_count) * 100

in_top5_count = total_count - already_not_in_top5
top5_success_rate = (success_top5_count / in_top5_count) * 100 if in_top5_count > 0 else 0

print(f"Original correct classification rate: {100 - already_misclassified/total_count*100:.2f}%")
print(f"Images already misclassified (top-1): {already_misclassified} ({already_misclassified/total_count*100:.2f}%)")
print(f"Images where true class not in top-5: {already_not_in_top5} ({already_not_in_top5/total_count*100:.2f}%)")
print(f"Top-1 attack success rate: {success_rate_correct_only:.2f}% (of correctly classified)")
print(f"Top-5 attack success rate: {top5_success_rate:.2f}% (of those with true class in top-5)")
print(f"Overall attack success rate: {success_rate_overall:.2f}% (of all images)")

Original correct classification rate: 76.00%
Images already misclassified (top-1): 120 (24.00%)
Images where true class not in top-5: 29 (5.80%)
Top-1 attack success rate: 130.79% (of correctly classified)
Top-5 attack success rate: 99.15% (of those with true class in top-5)
Overall attack success rate: 99.40% (of all images)


## Example Visualization

Function to visualize a selection of adversarial examples with their original counterparts and the perturbation difference. For each example, we display:

1. The original image with its true class
2. The adversarial version with its predicted class
3. The perturbation, magnified 10× for visibility

This visual inspection helps to verify that the perturbations are indeed imperceptible to the human eye while successfully fooling the model.

In [11]:
# Visualize examples
def visualize_examples(examples, save_path="enhanced_pgd_examples.png"):
    """
    Visualize original, adversarial, and difference images with top-5 predictions.
    """
    if not examples:
        print("No examples to visualize.")
        return
    
    # Create figure for comparison visualizations
    plt.figure(figsize=(18, 5*len(examples)))
    
    for i, example in enumerate(examples):
        # Original image
        orig_img = example['original_image']
        orig_img_display = orig_img.clone()
        for c in range(3):
            orig_img_display[c] = orig_img_display[c] * std_norms[c] + mean_norms[c]
        orig_img_display = orig_img_display.permute(1, 2, 0).numpy()
        
        # Adversarial image
        adv_img = example['adversarial_image']
        adv_img_display = adv_img.clone()
        for c in range(3):
            adv_img_display[c] = adv_img_display[c] * std_norms[c] + mean_norms[c]
        adv_img_display = adv_img_display.permute(1, 2, 0).numpy()
        
        # Compute difference and scale for visibility
        diff = np.abs(orig_img_display - adv_img_display)
        diff = diff * 10  # Magnify differences for visibility
        
        # Success indicator
        success_indicator = "✓" if example['is_top5_successful'] else "✗"
        
        # Display original image
        plt.subplot(len(examples), 3, i*3 + 1)
        plt.imshow(np.clip(orig_img_display, 0, 1))
        plt.title(f"Original Image\nTrue Class: {example['original_class']}\nIn top-5: Yes")
        plt.axis('off')
        
        # Display adversarial image
        plt.subplot(len(examples), 3, i*3 + 2)
        plt.imshow(np.clip(adv_img_display, 0, 1))
        plt.title(f"Adversarial Image {success_indicator}\nTop-1: {example['adversarial_pred']}\nTrue class in top-5: No")
        plt.axis('off')
        
        # Display difference
        plt.subplot(len(examples), 3, i*3 + 3)
        plt.imshow(np.clip(diff, 0, 1))
        plt.title(f"Perturbation (10× magnified)\nMax Perturbation: {example['max_perturbation']:.6f}")
        plt.axis('off')
    
    plt.tight_layout()
    plt.savefig(save_path)
    plt.close()
    print(f"Saved visualization to '{save_path}'")

print("\n--- Visualizing examples ---")
visualize_examples(examples, "enhanced_pgd_examples.png")


--- Visualizing examples ---
Saved visualization to 'enhanced_pgd_examples.png'


## Tensor Dataset for Evaluation

A custom dataset class to load tensor files that were saved during the adversarial example generation. This allows us to evaluate the entire dataset of adversarial examples without regenerating them.

In [12]:
# Custom dataset for tensor images
class TensorImageFolder(torch.utils.data.Dataset):
    def __init__(self, root):
        self.root = root
        self.classes = sorted([d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))])
        self.class_to_idx = {cls: i for i, cls in enumerate(self.classes)}
        
        self.samples = []
        for target_class in self.classes:
            class_dir = os.path.join(root, target_class)
            for fname in os.listdir(class_dir):
                if fname.endswith('.pt'):
                    path = os.path.join(class_dir, fname)
                    self.samples.append((path, self.class_to_idx[target_class]))
        
        print(f"Loaded {len(self.samples)} tensor examples across {len(self.classes)} classes")
    
    def __getitem__(self, index):
        path, target = self.samples[index]
        # Load tensor and ensure it has proper shape and dtype
        tensor = torch.load(path)
        if tensor.dim() == 4:
            tensor = tensor.squeeze(0)
        return tensor.float(), target  # Explicitly convert to float32
    
    def __len__(self):
        return len(self.samples)

## Adversarial Dataset Evaluation

Evaluating the model's performance on the generated adversarial dataset. This gives us a direct measure of how effective our attack has been at reducing the model's accuracy.

In [13]:
# Evaluate adversarial dataset
print("\n--- Evaluating adversarial dataset ---")
adv_dataset = TensorImageFolder(root=adversarial_path)
adv_dataloader = torch.utils.data.DataLoader(adv_dataset, batch_size=32, shuffle=False)

# Calculate adversarial accuracy
adv_correct_top1 = 0
adv_correct_top5 = 0
adv_total = 0

with torch.no_grad():
    for images, labels in tqdm(adv_dataloader, desc="Evaluating adversarial examples"):
        images = images.to(device)
        
        for i, label in enumerate(labels):
            folder_name = adv_dataset.classes[label.item()]
            
            # Skip if we don't have a mapping for this folder
            if folder_name not in folder_to_class:
                continue
                
            # Get the true class
            true_class = folder_to_class[folder_name]
            
            # Get model prediction for this image
            output = model(images[i:i+1])
            
            # Top-1 accuracy
            pred = output.argmax(1).item()
            adv_correct_top1 += (pred == true_class)
                
            # Top-5 accuracy
            _, top5_indices = output.topk(5, dim=1)
            adv_correct_top5 += (true_class in top5_indices[0].cpu().numpy())
                
            adv_total += 1


--- Evaluating adversarial dataset ---
Loaded 500 tensor examples across 100 classes


Evaluating adversarial examples: 100%|██████████| 16/16 [00:03<00:00,  5.18it/s]


## Results Summary and Analysis

Calculating and presenting final results that compare the model's performance on original vs. adversarial examples. The metrics include:

- Top-1 and Top-5 accuracy on original and adversarial datasets
- Absolute and relative drops in accuracy
- Comprehensive assessment of the attack's effectiveness

In [14]:
# Calculate accuracy on adversarial examples
adv_top1 = 100 * adv_correct_top1 / adv_total
adv_top5 = 100 * adv_correct_top5 / adv_total

# Print comparison results
print("\n--- Results Summary ---")
print(f"Original Test Set - Top-1 Accuracy: {baseline_top1:.2f}%")
print(f"Original Test Set - Top-5 Accuracy: {baseline_top5:.2f}%")
print(f"Adversarial Test Set - Top-1 Accuracy: {adv_top1:.2f}%")
print(f"Adversarial Test Set - Top-5 Accuracy: {adv_top5:.2f}%")
print(f"Top-1 Accuracy Drop: {baseline_top1 - adv_top1:.2f}%")
print(f"Top-5 Accuracy Drop: {baseline_top5 - adv_top5:.2f}%")
print(f"Relative Top-1 Accuracy Drop: {(baseline_top1 - adv_top1)/baseline_top1*100:.2f}%")
print(f"Relative Top-5 Accuracy Drop: {(baseline_top5 - adv_top5)/baseline_top5*100:.2f}%")


--- Results Summary ---
Original Test Set - Top-1 Accuracy: 76.00%
Original Test Set - Top-5 Accuracy: 94.20%
Adversarial Test Set - Top-1 Accuracy: 0.60%
Adversarial Test Set - Top-5 Accuracy: 0.80%
Top-1 Accuracy Drop: 75.40%
Top-5 Accuracy Drop: 93.40%
Relative Top-1 Accuracy Drop: 99.21%
Relative Top-5 Accuracy Drop: 99.15%


## Saving Results

Saving all the results, metrics, and the generated adversarial dataset for future reference and analysis:

1. Detailed JSON file with all metrics and parameters 
2. Tensor dataset compatible with standard evaluation frameworks
3. Format ensures compatibility with other analysis tools

In [15]:
# Save results to file
with open('task3_enhanced_pgd_results.json', 'w') as f:
    json.dump({
        'epsilon': epsilon,
        'steps': steps,
        'original_top1_accuracy': float(baseline_top1),
        'original_top5_accuracy': float(baseline_top5),
        'adversarial_top1_accuracy': float(adv_top1),
        'adversarial_top5_accuracy': float(adv_top5),
        'top1_accuracy_drop': float(baseline_top1 - adv_top1),
        'top5_accuracy_drop': float(baseline_top5 - adv_top5),
        'relative_top1_drop': float((baseline_top1 - adv_top1)/baseline_top1*100),
        'relative_top5_drop': float((baseline_top5 - adv_top5)/baseline_top5*100),
        'top1_attack_success_rate_overall': float(success_count) / total_count * 100,
        'top1_attack_success_rate_correct_only': float(success_count) / (total_count - already_misclassified) * 100,
        'top5_attack_success_rate': float(success_top5_count) / (total_count - already_not_in_top5) * 100
    }, f, indent=4)

# Also save the dataset in a format compatible with the provided code
os.makedirs("adversarial_datasets", exist_ok=True)
adv_images_list = []
adv_labels_list = []

# Collect all adversarial images and their labels
for images, labels in adv_dataloader:
    adv_images_list.append(images)
    adv_labels_list.append(labels)

adv_images_tensor = torch.cat(adv_images_list, dim=0)
adv_labels_tensor = torch.cat(adv_labels_list, dim=0)

torch.save({
    'images': adv_images_tensor,
    'labels': adv_labels_tensor,
    'original_accuracy': {'top1': baseline_top1/100, 'top5': baseline_top5/100},
    'adversarial_accuracy': {'top1': adv_top1/100, 'top5': adv_top5/100}
}, "adversarial_datasets/adversarial_test_set_2.pt")

print("\nTask 3 completed successfully with Enhanced PGD attack!")


Task 3 completed successfully with Enhanced PGD attack!
