<a href="https://colab.research.google.com/github/linhkid/GDG-DevFest-Codelab-24/blob/main/problems/02-a-FGSM-Adversarial-Attack_fill.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fast Gradient Sign Method (FGSM) Adversarial Attack Workshop

## Introduction to Adversarial Attacks
In this workshop, we'll explore how to create adversarial examples using the Fast Gradient Sign Method (FGSM). These examples are carefully crafted perturbations that can cause a deep learning model to misclassify images, despite the changes being nearly imperceptible to human eyes.

## Learning objectives
After completing this workshop, you'll be able to:

- Create adversarial examples using the Fast Gradient Sign Method (FGSM)
- Understand the concepts behind adversarial attacks
- Implement the FGSM attack

## Approach

- Utilizing a pre-trained Resnet50 model, we input a pizza image to calculate the loss value. From this loss value, the gradient of the loss function is computed.
- Next, we take the sign of the gradient and multiply it by epsilon to generate a perturbation.
- This perturbation is added to the original image to create an adversarial image.
- Finally, the adversarial image is fed back into Resnet50, and the prediction results are observed to evaluate the effectiveness of the attack.


## Key Terms

- Fast Gradient Sign Method (FGSM): A method for creating adversarial examples by perturbing the input image in the direction of the gradient of the loss function with respect to the input image.
- Adversarial attacks: A technique for creating adversarial examples that can be used to fool machine learning models

## Application
Detecting vulnerabilities and misjudgments in systems:

- Facial recognition
- Sign recognition
- Self-driving cars
- etc.


### Install and Import Dependencies
Run this cell to install and import all required libraries

In [None]:
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
import numpy as np
import matplotlib.pyplot as plt

print("TensorFlow version:", tf.__version__)

### Load Pre-trained Model

In [None]:
# Load Pre-trained ResNet50 Model
# This cell loads a pre-trained ResNet50 model that we'll try to fool

model = ResNet50(weights='imagenet')
print("Model loaded successfully!")

## Helper functions

In [None]:
# Image Preprocessing Function
# This function prepares images for our model

def preprocess_image(image_path):
    """
    Loads and preprocesses an image for ResNet50.

    Args:
        image_path (str): Path to the image file

    Returns:
        numpy.ndarray: Preprocessed image array
    """
    image = tf.keras.preprocessing.image.load_img(image_path, target_size=(224, 224))
    image = tf.keras.preprocessing.image.img_to_array(image)
    image = np.expand_dims(image, axis=0)
    image = preprocess_input(image)
    return image

## FGSM Attack Implementation

In [None]:
# This cell contains the core FGSM attack implementation

def fgsm_attack(image, epsilon, data_grad):
    """
    Implements the Fast Gradient Sign Method attack.

    Args:
        image: Input image
        epsilon: Attack strength parameter
        data_grad: Gradient of the loss with respect to the input image

    Returns:
        Adversarial image
    """
    sign_data_grad = tf.sign(data_grad)
    # TODO: Fill in the appropriate code
    perturbed_image = """TODO: caculate the perturbed image value with epsilon and sign_data_grad"""
    perturbed_image = tf.clip_by_value(perturbed_image, -1, 1)
    return perturbed_image

def generate_adversarial_example(image, epsilon):
    """
    Generates an adversarial example using FGSM.

    Args:
        image: Input image
        epsilon: Attack strength parameter

    Returns:
        Adversarial version of the input image
    """
    # TODO: Fill in the appropriate code
    image_tensor = tf.convert_to_tensor("""TODO: generate the image tensor from Pizza image""")
    with tf.GradientTape() as tape:
        tape.watch(image_tensor)
        # TODO: Fill in the appropriate code
        prediction = """TODO: generate the prediction from Pizza image"""
        loss = tf.keras.losses.categorical_crossentropy(prediction, prediction)

    gradient = tape.gradient(loss, image_tensor)
    # TODO: Fill in the appropriate code
    perturbed_image = """TODO: generate the adversarial image with epsilon and gradient"""
    return perturbed_image

## Generate and Test Adversarial Example

In [None]:
# Run this cell to create and test an adversarial example
# You can modify the epsilon value to control attack strength
# Download Pizza image from: https://github.com/linhkid/GDG-DevFest-Codelab-24/blob/main/data/pizza_2.jpg 
# Upload it to Files tab on Colab. Right click on it and select "copy path" 

# TODO: Fill in the appropriate code
image_path = """TODO: Fill in the appropriate code"""  # @param {type:"string"}
epsilon = 0.089  # @param {type:"slider", min:0.001, max:0.1, step:0.001}

# Load and preprocess the image
# TODO: Fill in the appropriate code
original_image = preprocess_image("""TODO: Fill in the appropriate code""")

# Generate adversarial example
# TODO: Fill in the appropriate code
adversarial_image = """TODO: Fill in the appropriate code"""

# Make predictions
# TODO: Fill in the appropriate code
original_pred = """TODO: generate the prediction from Original Pizza image"""
adversarial_pred = """TODO: generate the prediction from Perturbed Pizza image"""

# Decode predictions
original_label = decode_predictions(original_pred)[0][0]
# TODO: Fill in the appropriate code
adversarial_label = """TODO: generate the prediction label from Perturbed Pizza image"""

## Visualize Results

In [None]:
# Visualize Original vs Adversarial Images
# This cell will display the original and adversarial images side by side

plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.imshow(tf.keras.preprocessing.image.array_to_img(original_image[0]))
plt.title(f"Original: {original_label[1]}\nConfidence: {original_label[2]:.2f}")
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(tf.keras.preprocessing.image.array_to_img(adversarial_image[0].numpy()))
plt.title(f"Adversarial: {adversarial_label[1]}\nConfidence: {adversarial_label[2]:.2f}")
plt.axis('off')

plt.tight_layout()
plt.show()

print(f"Original prediction: {original_label[1]} ({original_label[2]:.2f})")
print(f"Adversarial prediction: {adversarial_label[1]} ({adversarial_label[2]:.2f})")

## Discussion
- Why use the sign of the gradient instead of the value?
- What information does the gradient of the loss function tell about the model?
- What systems can FGSM be used to attack?
- What are the applications of FGSM in practice?
- How to build adversarial defenses from adversarial attacks?
- FGSM focuses on changing the pixel values ​​of images. Can FGSM be applied to other types of data such as text and audio?
- FGSM is a "white-box" attack method, which requires detailed information about the model. So how can we perform a "black-box" attack when we do not know the model?

## Extra Exercise Section
Try experimenting with:
1. Different epsilon values - how does this affect the attack's success and visibility?
2. Different input images - do some types of images work better than others?
3. Different target classes - can you modify the attack to target a specific class?

## Additional Notes:
- The epsilon value controls the strength of the attack. Larger values create stronger attacks but more visible perturbations.
- Some images may be more resistant to adversarial attacks than others.
- The success of the attack can vary depending on the confidence of the original prediction.