# 🎯 AI Security Lab: Adversarial Evasion Attacks

Welcome to your first AI Security lab! In this hands-on session, you'll learn how to:

1. **Understand adversarial examples** - Small perturbations that fool ML models
2. **Implement FGSM attacks** - Fast Gradient Sign Method
3. **Test on real models** - Attack pre-trained image classifiers
4. **Analyze results** - Measure attack effectiveness

**🏆 Goal**: Generate adversarial examples that achieve >90% attack success rate

**📍 Flag Location**: Complete all challenges to reveal the flag!

## 📚 Step 1: Import Libraries and Setup

Let's start by importing the necessary libraries for our adversarial attack lab.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.models as models
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO
import json

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✅ Libraries imported successfully!")

## 🧠 Step 2: Load Pre-trained Model

We'll use a pre-trained ResNet-18 model as our target classifier.

In [None]:
# Load pre-trained ResNet-18
model = models.resnet18(pretrained=True)
model.eval()  # Set to evaluation mode
model = model.to(device)

# ImageNet preprocessing
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                       std=[0.229, 0.224, 0.225])
])

# Load ImageNet class labels
with open('https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt') as f:
    classes = [line.strip() for line in f.readlines()]

print(f"✅ Model loaded: {model.__class__.__name__}")
print(f"📊 Model parameters: {sum(p.numel() for p in model.parameters()):,}")

## 📸 Step 3: Load and Classify a Test Image

Let's start with a clean image and see how the model classifies it.

In [None]:
# Sample image URL (you can replace with your own)
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png"

def load_image(url):
    """Load image from URL"""
    response = requests.get(url)
    image = Image.open(BytesIO(response.content)).convert('RGB')
    return image

def classify_image(image, model, transform, classes, top_k=5):
    """Classify image and return top-k predictions"""
    # Preprocess image
    img_tensor = transform(image).unsqueeze(0).to(device)
    
    # Get predictions
    with torch.no_grad():
        outputs = model(img_tensor)
        probabilities = F.softmax(outputs, dim=1)
        top_prob, top_indices = torch.topk(probabilities, top_k)
    
    # Format results
    results = []
    for i in range(top_k):
        results.append({
            'class': classes[top_indices[0][i]],
            'confidence': top_prob[0][i].item()
        })
    
    return results, img_tensor

# Load and classify original image
original_image = load_image(image_url)
original_results, original_tensor = classify_image(original_image, model, transform, classes)

# Display results
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.imshow(original_image)
plt.title('Original Image')
plt.axis('off')

plt.subplot(1, 2, 2)
class_names = [r['class'] for r in original_results]
confidences = [r['confidence'] for r in original_results]
plt.barh(class_names, confidences)
plt.title('Model Predictions')
plt.xlabel('Confidence')
plt.tight_layout()
plt.show()

print("🔍 Original Image Classification:")
for i, result in enumerate(original_results):
    print(f"{i+1}. {result['class']}: {result['confidence']:.3f}")

# Store the original prediction for later
original_class = original_results[0]['class']
original_confidence = original_results[0]['confidence']

## ⚔️ Step 4: Implement FGSM Attack

Now comes the exciting part - implementing the Fast Gradient Sign Method (FGSM) attack!

In [None]:
def fgsm_attack(image, epsilon, data_grad):
    """Perform FGSM attack on image"""
    # Get the sign of the data gradient
    sign_data_grad = data_grad.sign()
    
    # Create adversarial example
    perturbed_image = image + epsilon * sign_data_grad
    
    # Clamp to maintain valid pixel range
    perturbed_image = torch.clamp(perturbed_image, 0, 1)
    
    return perturbed_image

def generate_adversarial_example(image_tensor, target_class, model, epsilon):
    """Generate adversarial example using FGSM"""
    # Set image tensor to require gradients
    image_tensor.requires_grad = True
    
    # Forward pass
    outputs = model(image_tensor)
    
    # Calculate loss
    loss = F.cross_entropy(outputs, target_class)
    
    # Zero gradients
    model.zero_grad()
    
    # Backward pass
    loss.backward()
    
    # Get gradients
    data_grad = image_tensor.grad.data
    
    # Generate adversarial example
    adversarial_image = fgsm_attack(image_tensor, epsilon, data_grad)
    
    return adversarial_image, data_grad

print("✅ FGSM attack functions defined!")
print("🎯 Ready to generate adversarial examples!")

## 💥 Step 5: Launch the Attack!

Let's create adversarial examples with different epsilon values and see how they affect the model's predictions.

In [None]:
# Attack parameters
epsilons = [0.01, 0.03, 0.05, 0.1, 0.3]
target_class = torch.tensor([1]).to(device)  # Target class for attack

# Store results
attack_results = []

plt.figure(figsize=(20, 4))

for i, epsilon in enumerate(epsilons):
    # Generate adversarial example
    adv_image, grad = generate_adversarial_example(
        original_tensor.clone(), target_class, model, epsilon
    )
    
    # Convert back to PIL image for display
    adv_img_display = adv_image.squeeze().cpu().detach()
    # Denormalize for display
    mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
    adv_img_display = adv_img_display * std + mean
    adv_img_display = torch.clamp(adv_img_display, 0, 1)
    
    # Get predictions for adversarial example
    with torch.no_grad():
        adv_outputs = model(adv_image)
        adv_probs = F.softmax(adv_outputs, dim=1)
        top_prob, top_idx = torch.topk(adv_probs, 1)
        
    predicted_class = classes[top_idx[0][0]]
    confidence = top_prob[0][0].item()
    
    # Check if attack was successful (changed prediction)
    attack_success = predicted_class != original_class
    
    attack_results.append({
        'epsilon': epsilon,
        'predicted_class': predicted_class,
        'confidence': confidence,
        'attack_success': attack_success
    })
    
    # Plot results
    plt.subplot(1, 5, i+1)
    plt.imshow(adv_img_display.permute(1, 2, 0))
    status = "✅ SUCCESS" if attack_success else "❌ FAILED"
    plt.title(f'ε={epsilon}\n{predicted_class}\n{confidence:.3f}\n{status}')
    plt.axis('off')

plt.tight_layout()
plt.show()

print("\n🎯 Attack Results Summary:")
print(f"Original: {original_class} ({original_confidence:.3f})\n")

successful_attacks = 0
for result in attack_results:
    status = "SUCCESS" if result['attack_success'] else "FAILED"
    print(f"ε={result['epsilon']}: {result['predicted_class']} ({result['confidence']:.3f}) - {status}")
    if result['attack_success']:
        successful_attacks += 1

success_rate = (successful_attacks / len(epsilons)) * 100
print(f"\n📊 Overall Attack Success Rate: {success_rate:.1f}%")

## 🔍 Step 6: Analyze Attack Effectiveness

Let's dig deeper into our attack results and understand what makes them effective.

In [None]:
# Plot success rate vs epsilon
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
success_rates = [1 if r['attack_success'] else 0 for r in attack_results]
plt.plot(epsilons, success_rates, 'o-', linewidth=2, markersize=8)
plt.xlabel('Epsilon (ε)')
plt.ylabel('Attack Success')
plt.title('Attack Success vs Perturbation Strength')
plt.grid(True)
plt.ylim(-0.1, 1.1)

plt.subplot(1, 2, 2)
confidences = [r['confidence'] for r in attack_results]
plt.plot(epsilons, confidences, 's-', linewidth=2, markersize=8, color='red')
plt.axhline(y=original_confidence, color='blue', linestyle='--', label='Original Confidence')
plt.xlabel('Epsilon (ε)')
plt.ylabel('Model Confidence')
plt.title('Model Confidence vs Perturbation Strength')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

# Find optimal epsilon
successful_epsilons = [r['epsilon'] for r in attack_results if r['attack_success']]
if successful_epsilons:
    optimal_epsilon = min(successful_epsilons)
    print(f"🎯 Optimal epsilon (smallest successful): {optimal_epsilon}")
else:
    print("❌ No successful attacks found. Try larger epsilon values!")

## 🏆 Step 7: Challenge - Achieve High Attack Success

**Challenge**: Create adversarial examples that achieve >90% attack success rate!

Try different strategies:
1. Adjust epsilon values
2. Try different target classes
3. Experiment with multiple iterations

In [None]:
# Challenge: Implement your attack strategy here!
# Try to achieve >90% success rate

# Test with finer epsilon range
challenge_epsilons = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4]
challenge_results = []

print("🎯 Challenge: Achieving >90% Attack Success Rate")
print("Testing with extended epsilon range...\n")

successful_attacks = 0
for epsilon in challenge_epsilons:
    # Generate adversarial example
    adv_image, _ = generate_adversarial_example(
        original_tensor.clone(), target_class, model, epsilon
    )
    
    # Test the attack
    with torch.no_grad():
        adv_outputs = model(adv_image)
        adv_probs = F.softmax(adv_outputs, dim=1)
        top_prob, top_idx = torch.topk(adv_probs, 1)
        
    predicted_class = classes[top_idx[0][0]]
    confidence = top_prob[0][0].item()
    attack_success = predicted_class != original_class
    
    if attack_success:
        successful_attacks += 1
    
    status = "✅ SUCCESS" if attack_success else "❌ FAILED"
    print(f"ε={epsilon}: {predicted_class} ({confidence:.3f}) - {status}")
    
    challenge_results.append({
        'epsilon': epsilon,
        'attack_success': attack_success
    })

# Calculate final success rate
final_success_rate = (successful_attacks / len(challenge_epsilons)) * 100
print(f"\n📊 Final Attack Success Rate: {final_success_rate:.1f}%")

# Check if challenge is completed
if final_success_rate >= 90:
    print("\n🎉 CHALLENGE COMPLETED!")
    print("🏆 You achieved >90% attack success rate!")
    
    # Reveal the flag!
    flag = "PWNTHEPROMPT{92.5_percent_adversarial_success_achieved}"
    print(f"\n🚩 FLAG: {flag}")
    
    print("\n📝 What you learned:")
    print("• How to implement FGSM adversarial attacks")
    print("• The relationship between perturbation strength and attack success")
    print("• How small pixel changes can fool state-of-the-art models")
    print("• The importance of adversarial robustness in AI systems")
else:
    print(f"\n🎯 Keep trying! You need {90 - final_success_rate:.1f}% more success rate.")
    print("💡 Hint: Try even larger epsilon values or different attack strategies!")

## 🛡️ Step 8: Defense Awareness

Now that you've successfully attacked a model, let's discuss defenses!

In [None]:
print("🛡️ Adversarial Defense Strategies:")
print("\n1. Adversarial Training:")
print("   • Train models on adversarial examples")
print("   • Improves robustness but may reduce clean accuracy")

print("\n2. Input Preprocessing:")
print("   • Gaussian noise addition")
print("   • JPEG compression")
print("   • Feature squeezing")

print("\n3. Detection Methods:")
print("   • Statistical analysis of inputs")
print("   • Adversarial example detectors")
print("   • Ensemble methods")

print("\n4. Certified Defenses:")
print("   • Randomized smoothing")
print("   • Interval bound propagation")
print("   • Formal verification")

print("\n🔍 Key Takeaways:")
print("• Adversarial attacks are a fundamental security concern")
print("• No perfect defense exists yet")
print("• Defense-aware development is crucial")
print("• Regular security testing should include adversarial robustness")

print("\n🎓 Congratulations on completing the AI Security Lab!")
print("You now understand the basics of adversarial attacks and defenses.")