# Adversarial Attacks on Deep Learning Models: EfficientNet and AdvProp

## Introduction
In this workshop, we'll explore how to create adversarial examples that can fool deep learning models. We'll use EfficientNet and its Adversarial Propagation (AdvProp) variant as our target models, implementing a targeted Projected Gradient Descent (PGD) attack. This demonstrates both the vulnerabilities of deep learning models and techniques to make them more robust.

### Learning Objectives
- Understand adversarial attacks and their implications for AI safety
- Implement targeted PGD attacks on image classification models
- Compare robustness between standard models and those trained with adversarial examples
- Learn about transfer learning and model fine-tuning

### Prerequisites
- Basic understanding of deep learning and computer vision
- Familiarity with TensorFlow and Keras
- GPU-enabled environment (recommended)
- Python 3.6 or later

## 1. Initial Setup

First, we'll set up our environment with the necessary dependencies:

In [None]:
# Install required packages
!pip install tensorflow-gpu==2.8.0
!pip install tensorflow-datasets

# Import necessary libraries
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import *
from tensorflow.keras.models import *

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

import matplotlib.pyplot as plt
import numpy as np
import random
import shutil
import time
import cv2
import os

# Set random seeds for reproducibility
SEED = 666
tf.random.set_seed(SEED)
np.random.seed(SEED)


## 2. Dataset Preparation

We'll use the TensorFlow Flowers dataset, which provides a good balance between simplicity and real-world applicability:

In [None]:
# Load the Flowers dataset
train_ds, validation_ds = tfds.load(
    "tf_flowers",
    split=["train[:85%]", "train[85%:]"],
    as_supervised=True
)

# Define class labels
CLASSES = ["daisy", "dandelion", "roses", "sunflowers", "tulips"]

# Image preprocessing function
SIZE = (224, 224)

def preprocess_image(image, label):
    image = tf.image.resize(image, SIZE)
    return (image, label)

# Prepare datasets with batching and prefetching
BATCH_SIZE = 64
AUTO = tf.data.experimental.AUTOTUNE

train_ds = (
    train_ds
    .map(preprocess_image, num_parallel_calls=AUTO)
    .cache()
    .shuffle(1024)
    .batch(BATCH_SIZE)
    .prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
)

validation_ds = (
    validation_ds
    .map(preprocess_image, num_parallel_calls=AUTO)
    .cache()
    .batch(BATCH_SIZE)
    .prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
)


## 3. Model Architecture and Training

We'll create two variants of EfficientNet: one with standard ImageNet weights and another with AdvProp weights:

In [None]:
def get_training_model(base_model):
    inputs = Input(shape=(224, 224, 3))
    x = base_model(inputs, training=False)
    x = GlobalAveragePooling2D()(x)
    x = BatchNormalization()(x)
    x = Dropout(0.2)(x)
    x = Dense(5, activation="softmax")(x)
    return Model(inputs=inputs, outputs=x)

# Custom learning rate schedule
def lrfn(epoch):
    LR_START = 1e-5
    LR_MAX = 1e-2
    LR_RAMPUP_EPOCHS = 5
    LR_SUSTAIN_EPOCHS = 0
    LR_STEP_DECAY = 0.75
    
    if epoch < LR_RAMPUP_EPOCHS:
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS * epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        lr = LR_MAX * LR_STEP_DECAY**((epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS)//10)
    return lr

## 4. Adversarial Attack Implementation

Here we implement our targeted PGD attack:

In [None]:
EPS = 2./255

def clip_eps(delta_tensor):
    return tf.clip_by_value(delta_tensor, clip_value_min=-EPS, clip_value_max=EPS)

def generate_adversaries_targeted(model, image_tensor, delta, true_index, target_index):
    scc_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    optimizer = tf.keras.optimizers.Adam(learning_rate=5e-3)

    for t in range(350):
        with tf.GradientTape() as tape:
            tape.watch(delta)
            inp = (image_tensor + delta)
            predictions = model(inp, training=False)
            loss = (- scc_loss(tf.convert_to_tensor([true_index]), predictions) + 
                    scc_loss(tf.convert_to_tensor([target_index]), predictions))
            
            if t % 20 == 0:
                print(f"Step {t}, Loss: {loss.numpy():.4f}")
                plt.imshow(50*delta.numpy().squeeze()+0.5)
                plt.show()
            
        gradients = tape.gradient(loss, delta)
        optimizer.apply_gradients([(gradients, delta)])
        delta.assign_add(clip_eps(delta))

    return delta

def perturb_image(model, image, true, target):
    print("Before adversarial attack")
    probabilities = model.predict(image)
    class_index = np.argmax(probabilities)
    print(f"Ground-truth label: {CLASSES[true]} predicted label: {CLASSES[class_index]}")
    
    image_tensor = tf.constant(image, dtype=tf.float32)
    delta = tf.Variable(tf.zeros_like(image_tensor), trainable=True)
    delta_tensor = generate_adversaries_targeted(model, image_tensor, delta, true, target)
    
    perturbed_image = (image_tensor + delta_tensor)
    print("\nAfter adversarial attack")
    preds = model.predict(perturbed_image)[0]
    pred_label = CLASSES[np.argmax(preds)]
    print(f"Predicted label: {pred_label}")
    
    return perturbed_image, delta_tensor

## 5. Running the Attack

Now we can run our attack and visualize the results:

In [None]:
# Select a sample image
index = 15  # Example index
sample_val_image = np.expand_dims(batch_images[index], 0)

# Run attack on standard EfficientNet
print("Attack on Standard EfficientNet:")
perturbed_standard, delta_standard = perturb_image(
    model_eb0, 
    sample_val_image, 
    batch_labels[index].numpy(), 
    4  # Target class (tulips)
)

# Run attack on AdvProp EfficientNet
print("\nAttack on AdvProp EfficientNet:")
perturbed_advprop, delta_advprop = perturb_image(
    model_eb0_ap, 
    sample_val_image, 
    batch_labels[index].numpy(), 
    4  # Target class (tulips)
)

# Visualize results
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.imshow(sample_val_image[0].astype("uint8"))
plt.title("Original Image")
plt.axis("off")

plt.subplot(1, 3, 2)
plt.imshow(perturbed_standard[0].numpy().astype("uint8"))
plt.title("Standard EfficientNet Attack")
plt.axis("off")

plt.subplot(1, 3, 3)
plt.imshow(perturbed_advprop[0].numpy().astype("uint8"))
plt.title("AdvProp EfficientNet Attack")
plt.axis("off")
plt.show()

## 6. Analysis and Discussion

### Key Observations
1. The adversarial perturbations are imperceptible to human eyes but can fool the model
2. AdvProp training provides some robustness against adversarial attacks
3. The attack success rate varies between the two models

### Security Implications
- Model vulnerabilities in real-world applications
- Importance of adversarial training
- Trade-offs between accuracy and robustness

## Additional Resources
1. [AdvProp Paper](https://arxiv.org/abs/1911.09665)
2. [EfficientNet Paper](https://arxiv.org/abs/1905.11946)
3. [PGD Attack Paper](https://arxiv.org/abs/1706.06083)

## Exercises for Participants
1. Try different epsilon values for the attack
2. Experiment with different target classes
3. Modify the attack algorithm parameters
4. Compare results with different model architectures