# Adversarial Training Demonstration

## Core Concept 1: Why Standard Training Fails
*   **Standard training**: minimize loss on clean data
*   Model learns brittle patterns optimized for test set distribution
*   **Distribution shift** (adversarial examples) causes failures
*   **Adversarial training**: include examples from shifted distribution
*   Model learns to be *robust* across broader input space

## Adversarial Training Loop
**Standard training loop**:
1.  Forward pass on clean examples
2.  Compute loss
3.  Backpropagate
4.  Update weights

**Adversarial training loop**:
1.  Generate adversarial examples (FGSM, PGD) from training batch
2.  Forward pass on *both* clean AND adversarial examples
3.  Compute combined loss
4.  Backpropagate
5.  Update weights

## Robustness vs Accuracy Tradeoff
*   **Standard model**: 98% clean accuracy, 10% robustness (under attack)
*   **Adversarially trained model**: 92% clean accuracy, 85% robustness (under attack)
*   **Core insight**: robustness requires sacrificing some clean accuracy
*   The tradeoff depends on epsilon (attack strength)

In [None]:
import tensorflow as tf
import numpy as np

# Mocking necessary components for the demonstration structure
def pgd_attack(model, images, labels, epsilon=0.03, alpha=0.01, num_steps=10):
    """
    Simulates a PGD attack generation.
    In a real scenario, this would perform iterative gradient updates on the input image using the model gradients.
    """
    # Placeholder: Random perturbation for demonstration purposes
    noise = tf.random.uniform(images.shape, -epsilon, epsilon)
    adv_images = images + noise
    adv_images = tf.clip_by_value(adv_images, 0, 1)
    return adv_images

def concat(tensors):
    return tf.concat(tensors, axis=0)

cross_entropy = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

# --------------------------------------------------------------------------
# Code Structure for Adversarial Training
# --------------------------------------------------------------------------

def train_adversarial_demo(model, training_data, epochs=5, epsilon=0.03, optimizer=None):
    if optimizer is None:
        optimizer = tf.keras.optimizers.Adam()

    print("Starting Adversarial Training Loop...\\n")
    
    for epoch in range(epochs):
        print(f"Epoch {epoch+1}/{epochs}")
        for step, (batch_data, batch_labels) in enumerate(training_data):
            # 1. Generate adversarial examples
            adv_examples = pgd_attack(model, batch_data, batch_labels, epsilon)

            # 2. Train on both clean and adversarial
            combined_data = concat([batch_data, adv_examples])
            combined_labels = concat([batch_labels, batch_labels])

            # 3. Standard training step on combined data
            with tf.GradientTape() as tape:
                predictions = model(combined_data)
                loss = cross_entropy(combined_labels, predictions)
            
            gradients = tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(gradients, model.trainable_variables))
            
            if step % 10 == 0:
                print(f"  Step {step}: Loss = {loss.numpy():.4f}")

# Note: To run this, you would need a 'model' and 'training_data' (tf.data.Dataset)
