# Module 1 - Video 1: Fast Gradient Sign Method (FGSM)

## üéØ Learning Objectives

In this notebook, you will learn:
- What is the Fast Gradient Sign Method (FGSM)
- How FGSM generates adversarial examples
- How to implement FGSM from scratch
- How to visualize adversarial perturbations
- How adversarial examples fool neural networks

## üìö Background

The Fast Gradient Sign Method (FGSM) is one of the simplest and most fundamental adversarial attack techniques. It was introduced by Ian Goodfellow et al. in 2014. FGSM creates adversarial examples by adding small perturbations to the input in the direction of the gradient of the loss function.

### Mathematical Formula

The adversarial example is generated as:

$$x_{adv} = x + \epsilon \cdot sign(\nabla_x J(\theta, x, y))$$

Where:
- $x$ is the original input
- $\epsilon$ is the perturbation magnitude
- $\nabla_x J(\theta, x, y)$ is the gradient of the loss with respect to the input
- $sign()$ returns the sign of the gradient

## üîß Setup and Imports

In [None]:
# Import required libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## üß† Define a Simple Neural Network

We'll use a simple Convolutional Neural Network for image classification.

In [None]:
class SimpleCNN(nn.Module):
    """Simple CNN for MNIST classification"""
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(0.25)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

## üìä Load Dataset

We'll use the MNIST dataset for this demonstration.

In [None]:
# Define transformation
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Load MNIST test dataset
test_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=1,
    shuffle=False
)

print(f"Loaded {len(test_dataset)} test images")

## üèãÔ∏è Load Pre-trained Model

For this demonstration, we'll initialize a model. In practice, you would load a pre-trained model.

In [None]:
# Initialize model
model = SimpleCNN().to(device)
model.eval()

print("Model loaded and set to evaluation mode")

## ‚öîÔ∏è Implement FGSM Attack

Now, let's implement the FGSM attack function.

In [None]:
def fgsm_attack(image, epsilon, data_grad):
    """
    Generate adversarial example using FGSM
    
    Args:
        image: Original input image
        epsilon: Perturbation magnitude
        data_grad: Gradient of loss w.r.t. input
    
    Returns:
        perturbed_image: Adversarial example
    """
    # Get the sign of the gradient
    sign_data_grad = data_grad.sign()
    
    # Create the perturbed image
    perturbed_image = image + epsilon * sign_data_grad
    
    # Clip to maintain valid pixel range
    perturbed_image = torch.clamp(perturbed_image, -3, 3)  # Normalized range
    
    return perturbed_image

## üéØ Generate Adversarial Examples

Let's test the attack on a sample image.

In [None]:
def test_attack(model, device, test_loader, epsilon):
    """
    Test FGSM attack on test dataset
    
    Args:
        model: Neural network model
        device: Device to run on (CPU/GPU)
        test_loader: Test data loader
        epsilon: Perturbation magnitude
    
    Returns:
        accuracy: Accuracy on adversarial examples
        adv_examples: List of adversarial examples
    """
    correct = 0
    adv_examples = []
    
    # Loop through test set
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)
        
        # Set requires_grad to True for input data
        data.requires_grad = True
        
        # Forward pass
        output = model(data)
        init_pred = output.max(1, keepdim=True)[1]
        
        # If initially incorrect, skip
        if init_pred.item() != target.item():
            continue
        
        # Calculate loss
        loss = F.nll_loss(output, target)
        
        # Zero gradients
        model.zero_grad()
        
        # Backward pass
        loss.backward()
        
        # Get gradient
        data_grad = data.grad.data
        
        # Generate adversarial example
        perturbed_data = fgsm_attack(data, epsilon, data_grad)
        
        # Re-classify
        output = model(perturbed_data)
        final_pred = output.max(1, keepdim=True)[1]
        
        if final_pred.item() == target.item():
            correct += 1
        else:
            # Save some adversarial examples for visualization
            if len(adv_examples) < 5:
                adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
                adv_examples.append((init_pred.item(), final_pred.item(), adv_ex))
        
        # Only test on first 100 examples for speed
        if len(adv_examples) >= 5:
            break
    
    # Calculate final accuracy
    final_acc = correct / float(len(adv_examples) + correct)
    print(f"Epsilon: {epsilon}\tTest Accuracy = {correct}/{correct + len(adv_examples)} = {final_acc:.4f}")
    
    return final_acc, adv_examples

## üìà Run Attack with Different Epsilon Values

Let's test the attack with different perturbation magnitudes.

In [None]:
# Test different epsilon values
epsilons = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]
accuracies = []
examples = []

# Run test for each epsilon
for eps in epsilons:
    acc, ex = test_attack(model, device, test_loader, eps)
    accuracies.append(acc)
    examples.append(ex)

## üìä Visualize Results

Let's visualize how accuracy degrades with increasing epsilon.

In [None]:
# Plot accuracy vs epsilon
plt.figure(figsize=(10, 6))
plt.plot(epsilons, accuracies, marker='o', linewidth=2, markersize=8)
plt.xlabel('Epsilon (Perturbation Magnitude)', fontsize=12)
plt.ylabel('Accuracy', fontsize=12)
plt.title('Accuracy vs Epsilon - FGSM Attack', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.ylim([0, 1.1])
plt.show()

## üñºÔ∏è Visualize Adversarial Examples

Let's visualize some adversarial examples to see the perturbations.

In [None]:
# Plot several examples at different epsilons
cnt = 0
plt.figure(figsize=(15, 8))
for i in range(len(epsilons)):
    for j in range(min(len(examples[i]), 3)):
        cnt += 1
        plt.subplot(len(epsilons), 3, cnt)
        plt.xticks([], [])
        plt.yticks([], [])
        if j == 0:
            plt.ylabel(f"Eps: {epsilons[i]}", fontsize=12)
        orig, adv, ex = examples[i][j]
        plt.title(f"Orig: {orig} -> Adv: {adv}")
        plt.imshow(ex, cmap="gray")
plt.tight_layout()
plt.show()

## üéì Key Takeaways

1. **FGSM is Simple but Effective**: With just one gradient computation, we can generate adversarial examples
2. **Epsilon Controls Attack Strength**: Larger epsilon values create stronger perturbations but may be more visible
3. **Trade-off Between Success and Perceptibility**: There's a balance between fooling the model and keeping perturbations imperceptible
4. **Gradient-Based Attacks**: FGSM demonstrates the vulnerability of neural networks to gradient-based attacks

## üîç Next Steps

- Try implementing other attack methods (PGD, C&W)
- Explore defense mechanisms (adversarial training, input preprocessing)
- Test on different models and datasets
- Measure perceptibility using metrics like PSNR or SSIM

## üìö References

1. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
2. Madry, A., et al. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.