<a href="https://colab.research.google.com/github/linhkid/GDG-DevFest-Codelab-24/blob/main/problems/02-b-PGD-Adversarial-Attack-EfficientNet-AdvProp_fill.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adversarial Attacks on Deep Learning Models: EfficientNet and AdvProp

## Introduction
In this workshop, we'll explore how to create adversarial examples that can fool deep learning models. We'll use EfficientNet and its Adversarial Propagation (AdvProp) variant as our target models, implementing a targeted Projected Gradient Descent (PGD) attack. This demonstrates both the vulnerabilities of deep learning models and techniques to make them more robust.

### What are Adversarial Propagation and EfficientNet?

- **Adversarial Propagation (AdvProp)**: A training method that improves adversarial robustness by propagating adversarial perturbations through the network during training. This helps the model learn to be robust against adversarial attacks.
- **EfficientNet**: A family of convolutional neural networks that achieve state-of-the-art accuracy with fewer parameters and FLOPS. EfficientNet models are known for their scalability and performance across various tasks.
- **Projected Gradient Descent (PGD)**: An iterative optimization technique used to generate adversarial examples. PGD attacks aim to maximize the model's loss by perturbing the input image within a specified epsilon range.

![PGD](../img/pgd.png)

### Learning Objectives
- Understand adversarial attacks and their implications for AI safety
- Implement targeted PGD attacks on image classification models
- Compare robustness between standard models and those trained with adversarial examples
- Learn about transfer learning and model fine-tuning

### Prerequisites
- Basic understanding of deep learning and computer vision
- Familiarity with TensorFlow and Keras
- GPU-enabled environment (recommended)
- Python 3.6 or later

## 1. Initial Setup

First, we'll set up our environment with the necessary dependencies:

In [None]:
# Install required packages
!pip install protobuf==3.20.*
!pip install tensorflow==2.8.0 tensorflow-gpu==2.8.0
!pip install tensorflow-datasets


# Import necessary libraries
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
import tensorflow as tf

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

import matplotlib.pyplot as plt
import numpy as np
import random
import shutil
import time
import cv2
import os

# Set random seeds for reproducibility
SEED = 666
tf.random.set_seed(SEED)
np.random.seed(SEED)


## 2. Dataset Preparation

We'll use the TensorFlow Flowers dataset, which provides a good balance between simplicity and real-world applicability:

In [None]:
# Load the Flowers dataset
train_ds, validation_ds = tfds.load(
    "tf_flowers",
    # TODO: Fill in the appropriate code
    split=["train[:85%]", """TODO: Fill in the appropriate code"""],
    as_supervised=True
)

# Define class labels
CLASSES = ["daisy", "dandelion", "roses", "sunflowers", "tulips"]

# Image preprocessing function
SIZE = (224, 224)

def preprocess_image(image, label):
    image = tf.image.resize(image, SIZE)
    return (image, label)

# Prepare datasets with batching and prefetching
# TODO: Fill in the appropriate code
BATCH_SIZE = """TODO: Fill in the appropriate code"""
AUTO = tf.data.experimental.AUTOTUNE

train_ds = (
    train_ds
    .map(preprocess_image, num_parallel_calls=AUTO)
    .cache()
    .shuffle(1024)
    .batch(BATCH_SIZE)
    .prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
)

# TODO: Fill in the appropriate code
validation_ds = (
"""TODO: Fill in the appropriate code"""
)


## 3. Model Architecture and Training

We'll create two variants of EfficientNet: one with standard ImageNet weights and another with AdvProp weights:

In [4]:
# TODO: Fill in the appropriate code
def get_training_model(base_model):
    inputs = Input(shape=(224, 224, 3))
    x = base_model(inputs, training=False)
    x = GlobalAveragePooling2D()("""TODO: Fill in the appropriate code""")
    x = BatchNormalization()(x)
    x = Dropout(0.2)(x)
    x = """TODO: Fill in the appropriate code"""
    return Model(inputs=inputs, outputs=x)

# Custom learning rate schedule
def lrfn(epoch):
    LR_START = 1e-5
    LR_MAX = 1e-2
    LR_RAMPUP_EPOCHS = 5
    LR_SUSTAIN_EPOCHS = 0
    LR_STEP_DECAY = 0.75

    if epoch < LR_RAMPUP_EPOCHS:
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS * epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        lr = LR_MAX * LR_STEP_DECAY**((epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS)//10)
    return lr

### Plot the progress

In [8]:
def plot_progress(hist):
    plt.plot(hist.history["loss"], label="train_loss")
    plt.plot(hist.history["val_loss"], label="validation_loss")
    plt.plot(hist.history["accuracy"], label="training_accuracy")
    plt.plot(hist.history["val_accuracy"], label="validation_accuracy")
    plt.title("Training Progress")
    plt.ylabel("accuracy/loss")
    plt.xlabel("epoch")
    plt.legend(loc="upper left")
    plt.show()

Fetch the [AdvProp](https://arxiv.org/abs/1911.09665) training checkpoints for EfficientNetB0 and convert the checkpoints to `.h5`.

In [9]:
!wget -q https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/advprop/efficientnet-b0.tar.gz
!tar -xf efficientnet-b0.tar.gz

In [None]:
!wget -q https://raw.githubusercontent.com/yixingfu/tensorflow/updateweights/tensorflow/python/keras/applications/efficientnet_weight_update_util.py
!python efficientnet_weight_update_util.py --model b0 --notop --ckpt \
       efficientnet-b0/model.ckpt --o efficientnetb0_notop.h5

### Training an EfficientNetB0 initialized with adversarial propagation (AdvProp) weights

In [None]:
# Load the EfficientNetB0 model but exclude the classification layers
# Note that the model was trained using AdvProp (https://arxiv.org/abs/1911.09665)
base_model_eb0_ap = EfficientNetB0(weights="efficientnetb0_notop.h5", include_top=False)
base_model_eb0_ap.trainable = False # We are not fine-tuning at this point
get_training_model(base_model_eb0_ap).summary()

In [None]:
lr2 = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose=True)

rng = [i for i in range(100)]
y = [lrfn(x) for x in rng]
plt.plot(rng, y);
plt.xlabel('epoch',size=14); plt.ylabel('learning rate',size=14)
plt.title('Training Schedule',size=16); plt.show()

In [13]:
# Early stopping callback
es = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=5, restore_best_weights=True
)

In [None]:
# Train the model
# TODO: Fill in the appropriate code
model_eb0_ap = get_training_model("""TODO: Fill in the appropriate code""")
model_eb0_ap.compile("""TODO: Fill in the appropriate code""")
start = time.time()
h = model_eb0_ap.fit("""TODO: Fill in the appropriate code""",
                     callbacks=[lr2, es])
print("Total training time (seconds): ",time.time()-start)
plot_progress(h)

### Training an EfficientNetB0 initialized with ImageNet weights

In [15]:
# Load the EfficientNetB0 model but exclude the classification layers
# This time the weights are from traditional ImageNet pre-training
base_model_eb0 = EfficientNetB0(weights="imagenet", include_top=False)
base_model_eb0.trainable = False # We are not fine-tuning at this point

In [None]:
# Train the model
# TODO: Fill in the appropriate code
model_eb0 = get_training_model("""TODO: Fill in the appropriate code""")
model_eb0.compile("""TODO: Fill in the appropriate code""")
start = time.time()
h = model_eb0.fit("""TODO: Fill in the appropriate code""",
              callbacks=[lr2, es])
print("Total training time (seconds): ",time.time()-start)
plot_progress(h)

### Plotting sample predictions

In [17]:
# Utility to plot sample predictions
def plot_predictions(images, labels, probability):
    plt.figure(figsize=(15, 15))
    for i, image in enumerate(images):
        ax = plt.subplot(4, 4, i + 1)
        plt.imshow(image.numpy().astype("uint8"))
        predicted_label = CLASSES[np.argmax(probability[i])]
        maximum_probability = "{:.3f}".format(max(probability[i]))
        text = "{} with probability: {}".format(predicted_label, maximum_probability) + \
            "\nGround-truth: {}".format(CLASSES[int(labels[i])])
        plt.title(text)
        plt.axis("off")
    plt.show()

### Inference with the regular EfficienNet model

In [None]:
# Let's run inference on a batch of images from the validation set
(batch_images, batch_labels) = next(iter(validation_ds))
# TODO: Fill in the appropriate code
predictions = """TODO: Fill in the appropriate code"""
plot_predictions(batch_images[:16], batch_labels[:16], predictions[:16])

### Inference with the AdvProp weights initialized EfficientNet model

In [None]:
# Let's run inference on a batch of images from the validation set
(batch_images, batch_labels) = next(iter(validation_ds))
# TODO: Fill in the appropriate code
predictions = """TODO: Fill in the appropriate code"""
plot_predictions(batch_images[:16], batch_labels[:16], predictions[:16])

## 4. Adversarial Attack Implementation

Here we implement our targeted PGD attack:

In [20]:
EPS = 2./255

def clip_eps(delta_tensor):
    return tf.clip_by_value(delta_tensor, clip_value_min=-EPS, clip_value_max=EPS)


In this attack we will use PGD to simply maximize the loss for the given (true) class and at the same time minimize the loss for the target class such that the visual semantics of our input image does not get hampered.

In [21]:
def generate_adversaries_targeted(model, image_tensor, delta, true_index, target_index):
    scc_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    optimizer = tf.keras.optimizers.Adam(learning_rate=5e-3)

    for t in range(350):
        with tf.GradientTape() as tape:
            tape.watch(delta)
            # TODO: Fill in the appropriate code
            inp = ("""TODO: Fill in the appropriate code""")
            predictions = model("""TODO: Fill in the appropriate code""", training=False)
            loss = (- scc_loss(tf.convert_to_tensor([true_index]), predictions) +
                    scc_loss(tf.convert_to_tensor([target_index]), predictions))

            if t % 20 == 0:
                print(f"Step {t}, Loss: {loss.numpy():.4f}")
                plt.imshow(50*delta.numpy().squeeze()+0.5)
                plt.show()

        gradients = tape.gradient(loss, delta)
        optimizer.apply_gradients([(gradients, delta)])
        delta.assign_add(clip_eps(delta))

    return delta

def perturb_image(model, image, true, target):
    print("Before adversarial attack")
    # TODO: Fill in the appropriate code
    probabilities = """TODO: Fill in the appropriate code"""
    class_index = np.argmax("""TODO: Fill in the appropriate code""")
    print(f"Ground-truth label: {CLASSES[true]} predicted label: {CLASSES[class_index]}")

    image_tensor = tf.constant(image, dtype=tf.float32)
    delta = tf.Variable(tf.zeros_like(image_tensor), trainable=True)
    # TODO: Fill in the appropriate code
    delta_tensor = """TODO: Fill in the appropriate code"""

    # TODO: Fill in the appropriate code
    perturbed_image = (image_tensor + delta_tensor)
    print("\nAfter adversarial attack")
    preds = model.predict("""TODO: Fill in the appropriate code""")[0]
    pred_label = CLASSES["""TODO: Fill in the appropriate code"""]
    print(f"Predicted label: {pred_label}")

    return perturbed_image, delta_tensor

## 5. Running the Attack

Now we can run our attack and visualize the results:

In [None]:
# Select a sample image
index = 15  # Example index
sample_val_image = np.expand_dims(batch_images[index], 0)

# Run attack on standard EfficientNet
print("Attack on Standard EfficientNet:")
perturbed_standard, delta_standard = perturb_image(
    model_eb0,
    sample_val_image,
    batch_labels[index].numpy(),
    4  # Target class (tulips)
)

# Run attack on AdvProp EfficientNet
print("\nAttack on AdvProp EfficientNet:")

# TODO: Fill in the appropriate code
perturbed_advprop, delta_advprop = """TODO: Fill in the appropriate code"""

# Visualize results
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.imshow(sample_val_image[0].astype("uint8"))
plt.title("Original Image")
plt.axis("off")

plt.subplot(1, 3, 2)
plt.imshow(perturbed_standard[0].numpy().astype("uint8"))
plt.title("Standard EfficientNet Attack")
plt.axis("off")

plt.subplot(1, 3, 3)
plt.imshow(perturbed_advprop[0].numpy().astype("uint8"))
plt.title("AdvProp EfficientNet Attack")
plt.axis("off")
plt.show()

## 6. Analysis and Discussion

### Key Observations
1. The adversarial perturbations are imperceptible to human eyes but can fool the model
2. AdvProp training provides some robustness against adversarial attacks
3. The attack success rate varies between the two models

### Security Implications
- Model vulnerabilities in real-world applications
- Importance of adversarial training
- Trade-offs between accuracy and robustness

## Additional Resources
1. [AdvProp Paper](https://arxiv.org/abs/1911.09665)
2. [EfficientNet Paper](https://arxiv.org/abs/1905.11946)
3. [PGD Attack Paper](https://arxiv.org/abs/1706.06083)

## Exercises for Participants
1. Try different epsilon values for the attack
2. Experiment with different target classes
3. Modify the attack algorithm parameters
4. Compare results with different model architectures