# Evasion Attack: Fast Gradient Sign Method (FGSM)

This notebook demonstrates how to perform the Fast Gradient Sign Method (FGSM) evasion attack against our deployed MNIST digit classifier. FGSM is one of the four evasion attack algorithms mentioned in the Security and Privacy of AI Knowledge Guide.

## Overview of FGSM

FGSM, introduced by Goodfellow et al. in 2014, is a one-step attack that creates adversarial examples by perturbing the input in the direction of the gradient of the loss function with respect to the input. The perturbation is scaled by a small epsilon value and the sign of the gradient is used to determine the direction of the perturbation.

The mathematical formulation is:

$$x_{adv} = x + \epsilon \cdot \text{sign}(\nabla_x J(\theta, x, y))$$

where:
- $x_{adv}$ is the adversarial example
- $x$ is the original input
- $\epsilon$ is the perturbation magnitude
- $\nabla_x J$ is the gradient of the loss function with respect to the input
- $\theta$ represents the model parameters
- $y$ is the true label

## Steps in this notebook:
1. Import required libraries
2. Set up connection to the deployed model
3. Load test data
4. Implement the FGSM attack
5. Generate adversarial examples
6. Evaluate attack success rate
7. Visualize results

## 1. Import Required Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.datasets import mnist
import requests
import json
import time
from tqdm.notebook import tqdm
from PIL import Image
import io
import base64

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

## 2. Set up connection to the deployed model

We'll create functions to interact with the deployed model API.

In [None]:
# API endpoint of the deployed model
API_URL = "http://localhost:5000"

def get_model_info():
    """Get information about the deployed model"""
    response = requests.get(f"{API_URL}/info")
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"Error fetching model info: {response.text}")

def get_prediction(pixels):
    """Get prediction for an image from the deployed model"""
    # Ensure pixels are flattened
    pixels_flat = pixels.reshape(-1).tolist()
    
    # Prepare the request data
    data = {
        'pixels': pixels_flat
    }
    
    # Send the request
    response = requests.post(f"{API_URL}/predict_raw", json=data)
    
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"Error getting prediction: {response.text}")

def get_gradient(pixels, label):
    """Get the gradient of the loss with respect to the input from the deployed model"""
    # Ensure pixels are flattened
    pixels_flat = pixels.reshape(-1).tolist()
    
    # Prepare the request data
    data = {
        'pixels': pixels_flat,
        'label': int(label)
    }
    
    # Send the request
    response = requests.post(f"{API_URL}/get_gradient", json=data)
    
    if response.status_code == 200:
        result = response.json()
        gradient = np.array(result['gradient'])
        gradient_shape = result['gradient_shape']
        return gradient.reshape(gradient_shape)
    else:
        raise Exception(f"Error getting gradient: {response.text}")
        
# Test the API connection
try:
    model_info = get_model_info()
    print("Successfully connected to the model API!")
    print(f"Model: {model_info['model_name']}")
    print(f"Input shape: {model_info['input_shape']}")
    print(f"Classes: {model_info['classes']}")
    print(f"Test accuracy: {model_info['test_accuracy']}")
except Exception as e:
    print(f"Failed to connect to the model API: {e}")
    print("Make sure the Flask server is running at http://localhost:5000")

## 3. Load Test Data

We'll load the MNIST test dataset to use for our attack.

In [None]:
# Load MNIST dataset
print("Loading MNIST dataset...")
(_, _), (x_test, y_test) = mnist.load_data()

# Preprocess the data (normalize to 0-1)
x_test = x_test.astype('float32') / 255.0

# Reshape to include channel dimension (MNIST is grayscale, so 1 channel)
x_test = x_test.reshape(-1, 28, 28, 1)

print(f"Loaded {len(x_test)} test images")
print(f"Data shape: {x_test.shape}")
print(f"Labels shape: {y_test.shape}")

### Examine Some Images

In [None]:
# Display some test images
plt.figure(figsize=(12, 5))
for i in range(10):
    plt.subplot(2, 5, i+1)
    plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
    plt.title(f"Label: {y_test[i]}")
    plt.axis('off')
plt.tight_layout()
plt.show()

### Verify Model Predictions

Before attempting any attack, let's verify that the deployed model is correctly classifying the original images.

In [None]:
# Test model predictions on a few original images
num_test_samples = 5
test_indices = np.random.choice(len(x_test), num_test_samples, replace=False)

print("Testing model predictions on original images...")
for i, idx in enumerate(test_indices):
    # Get the image and true label
    image = x_test[idx]
    true_label = y_test[idx]
    
    # Get model prediction
    result = get_prediction(image)
    predicted_class = result['predicted_class']
    confidence = result['confidence']
    
    print(f"Sample {i+1}: True label = {true_label}, Predicted class = {predicted_class}, Confidence = {confidence:.4f}")

## 4. Implement the FGSM Attack

Now, let's implement the Fast Gradient Sign Method (FGSM) attack. The key steps are:
1. Calculate the gradient of the loss with respect to the input image
2. Take the sign of the gradient
3. Perturb the input in the direction of the sign of the gradient, scaled by epsilon
4. Clip the perturbed image to ensure it's within the valid range (0-1)

In [None]:
def fgsm_attack(image, label, epsilon=0.1):
    """
    Generates an adversarial example using the Fast Gradient Sign Method
    
    Args:
        image: Input image to be perturbed (numpy array)
        label: True label of the image
        epsilon: Perturbation magnitude
        
    Returns:
        Adversarial example
    """
    # Step 1: Get the gradient of the loss with respect to the input
    gradient = get_gradient(image, label)
    
    # Step 2: Take the sign of the gradient
    sign_gradient = np.sign(gradient)
    
    # Step 3: Perturb the input in the direction of the sign of the gradient
    perturbed_image = image + epsilon * sign_gradient
    
    # Step 4: Clip the perturbed image to ensure it's within the valid range (0-1)
    perturbed_image = np.clip(perturbed_image, 0, 1)
    
    return perturbed_image

## 5. Generate Adversarial Examples

Now let's generate adversarial examples for a subset of the test data.

In [None]:
# Define the number of samples to attack and the perturbation strength
num_samples = 20
epsilon = 0.2  # Perturbation magnitude

# Randomly select samples from the test set
sample_indices = np.random.choice(len(x_test), num_samples, replace=False)

# Lists to store results
original_images = []
adversarial_images = []
original_preds = []
adversarial_preds = []
true_labels = []
perturbations = []

print(f"Generating adversarial examples using FGSM with epsilon={epsilon}...")

# Generate adversarial examples
for i, idx in enumerate(tqdm(sample_indices)):
    # Get the original image and label
    original_image = x_test[idx]
    true_label = y_test[idx]
    
    # Get original prediction
    try:
        orig_result = get_prediction(original_image)
        original_pred = orig_result['predicted_class']
        
        # Only attack correctly classified images
        if int(original_pred) == true_label:
            # Generate adversarial example
            adversarial_image = fgsm_attack(original_image, true_label, epsilon)
            
            # Calculate perturbation
            perturbation = adversarial_image - original_image
            
            # Get adversarial prediction
            adv_result = get_prediction(adversarial_image)
            adversarial_pred = adv_result['predicted_class']
            
            # Store results
            original_images.append(original_image)
            adversarial_images.append(adversarial_image)
            original_preds.append(original_pred)
            adversarial_preds.append(adversarial_pred)
            true_labels.append(true_label)
            perturbations.append(perturbation)
    except Exception as e:
        print(f"Error processing sample {idx}: {e}")
        continue
        
# Convert lists to numpy arrays for easier handling
original_images = np.array(original_images)
adversarial_images = np.array(adversarial_images)
original_preds = np.array(original_preds)
adversarial_preds = np.array(adversarial_preds)
true_labels = np.array(true_labels)
perturbations = np.array(perturbations)

print(f"Generated {len(adversarial_images)} adversarial examples")

## 6. Evaluate Attack Success Rate

Let's evaluate how successful our FGSM attack was by calculating the attack success rate.

In [None]:
# Calculate attack success rate
successful_attacks = (adversarial_preds != true_labels)
attack_success_rate = np.mean(successful_attacks) * 100

print(f"Attack Success Rate: {attack_success_rate:.2f}%")
print(f"Number of successful attacks: {np.sum(successful_attacks)} out of {len(adversarial_images)}")

# Calculate average perturbation magnitude
avg_perturbation_l2 = np.mean([np.linalg.norm(p) for p in perturbations])
avg_perturbation_linf = np.mean([np.max(np.abs(p)) for p in perturbations])

print(f"Average L2 perturbation magnitude: {avg_perturbation_l2:.4f}")
print(f"Average L∞ perturbation magnitude: {avg_perturbation_linf:.4f}")

## 7. Visualize Results

Now, let's visualize the original images, perturbations, and adversarial examples to better understand the attack.

In [None]:
# Display a subset of the results
num_to_display = min(10, len(adversarial_images))

plt.figure(figsize=(15, 5 * num_to_display))

for i in range(num_to_display):
    # Original image
    plt.subplot(num_to_display, 3, 3*i + 1)
    plt.imshow(original_images[i].reshape(28, 28), cmap='gray')
    plt.title(f"Original\nTrue: {true_labels[i]}, Pred: {original_preds[i]}")
    plt.axis('off')
    
    # Perturbation
    plt.subplot(num_to_display, 3, 3*i + 2)
    plt.imshow(perturbations[i].reshape(28, 28), cmap='RdBu_r')
    plt.title(f"Perturbation\nEpsilon: {epsilon}")
    plt.colorbar(fraction=0.046, pad=0.04)
    plt.axis('off')
    
    # Adversarial image
    plt.subplot(num_to_display, 3, 3*i + 3)
    plt.imshow(adversarial_images[i].reshape(28, 28), cmap='gray')
    color = 'green' if successful_attacks[i] else 'red'
    result = 'Success' if successful_attacks[i] else 'Failure'
    plt.title(f"Adversarial\nPred: {adversarial_preds[i]}\nAttack: {result}", color=color)
    plt.axis('off')

plt.tight_layout()
plt.show()

### Effect of Epsilon on Attack Success Rate

Let's investigate how the perturbation magnitude (epsilon) affects the attack success rate.

In [None]:
# Test various epsilon values
epsilon_values = [0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]
test_samples = 10  # Use a smaller subset for efficiency
success_rates = []

# Sample indices for testing
test_indices = np.random.choice(len(x_test), test_samples, replace=False)

print("Testing attack success rate for different epsilon values...")
for eps in tqdm(epsilon_values):
    successful = 0
    total = 0
    
    for idx in test_indices:
        original_image = x_test[idx]
        true_label = y_test[idx]
        
        try:
            # Get original prediction
            orig_result = get_prediction(original_image)
            original_pred = orig_result['predicted_class']
            
            # Only attack correctly classified images
            if int(original_pred) == true_label:
                # Generate adversarial example
                adversarial_image = fgsm_attack(original_image, true_label, eps)
                
                # Get adversarial prediction
                adv_result = get_prediction(adversarial_image)
                adversarial_pred = adv_result['predicted_class']
                
                # Count successful attacks
                if int(adversarial_pred) != true_label:
                    successful += 1
                total += 1
        except Exception as e:
            print(f"Error processing sample {idx} with epsilon={eps}: {e}")
            continue
    
    # Calculate and store success rate
    if total > 0:
        success_rate = (successful / total) * 100
        success_rates.append(success_rate)
        print(f"Epsilon: {eps}, Success Rate: {success_rate:.2f}%")
    else:
        success_rates.append(0)
        print(f"Epsilon: {eps}, No valid samples")

In [None]:
# Plot the relationship between epsilon and attack success rate
plt.figure(figsize=(10, 6))
plt.plot(epsilon_values, success_rates, 'o-', linewidth=2, markersize=8)
plt.xlabel('Epsilon (Perturbation Magnitude)', fontsize=12)
plt.ylabel('Attack Success Rate (%)', fontsize=12)
plt.title('Effect of Epsilon on FGSM Attack Success Rate', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.7)
plt.xticks(epsilon_values)
plt.ylim([0, 105])

# Add data labels
for i, rate in enumerate(success_rates):
    plt.text(epsilon_values[i], rate + 2, f"{rate:.1f}%", ha='center')

plt.tight_layout()
plt.show()

## Conclusion

In this notebook, we have demonstrated the Fast Gradient Sign Method (FGSM) evasion attack against a deployed MNIST digit classifier. We have shown how to:

1. Connect to a deployed model API
2. Implement the FGSM attack algorithm
3. Generate adversarial examples
4. Evaluate the attack success rate
5. Visualize the results
6. Study the impact of the perturbation magnitude (epsilon) on the attack effectiveness

Key findings:
- The FGSM attack can successfully fool the model with relatively small perturbations
- The attack success rate increases with larger epsilon values
- There is a trade-off between the perturbation visibility (how noticeable the changes are) and attack success rate

These results demonstrate the vulnerability of neural networks to adversarial examples, even with simple one-step attack methods like FGSM. This highlights the importance of developing robust models that can withstand such attacks.