# Securing Machine Learning Models: Attacks and Defenses

The notebook investigates methods to attack machine learning models as well as defense strategies against such attacks. 
The analysis draws from knowledge acquired throughout lectures about SVMs and neural networks and ensemble learning and decision trees and evaluation metrics and additional topics.

**Goals:**
- Demonstrate adversarial attack techniques (FGSM)
- Evaluate model robustness
- Use concepts such as PCA and ensemble methods for discussion
- Propose and implement defenses like adversarial training

In [None]:
# Imports
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torchattacks import FGSM
import matplotlib.pyplot as plt
import numpy as np

##  Data Loading and Preprocessing
The experiment uses a MNIST, a handwritten digit dataset (28x28 grayscale images).

In [None]:
# Data transform
transform = transforms.ToTensor()

# Download datasets
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

##  CNN Model Architecture
Following principles from Lecture 8 (Neural Networks), a convolutional neural network (CNN) is defined.
This model will be vulnerable to adversarial inputs.

In [None]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.network = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.25),
            nn.Flatten(),
            nn.Linear(9216, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, 10)
        )
    def forward(self, x):
        return self.network(x)

## Training and Evaluation Functions
Training follows supervised learning using CrossEntropyLoss.

In [None]:
def train(model, device, loader, optimizer, adversarial=False, attack=None):
    model.train()
    loss_fn = nn.CrossEntropyLoss()
    for data, target in loader:
        data, target = data.to(device), target.to(device)
        if adversarial and attack:
            data = attack(data, target)
        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()

def test(model, device, loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            total += target.size(0)
    return 100 * correct / total

##  Experiment: FGSM Attack vs Adversarial Training
This code demonstrates how adversarial examples reduce accuracy — and how adversarial training helps.

In [None]:
# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Standard training
for epoch in range(3):
    train(model, device, train_loader, optimizer)

clean_acc = test(model, device, test_loader)
print(f"Accuracy on clean test set: {clean_acc:.2f}%")

# FGSM attack
attack = FGSM(model, eps=0.3)
adv_acc = test(model, device, test_loader)
print(f"Accuracy under FGSM attack: {adv_acc:.2f}%")

##  Defense: Adversarial Training
This code uses FGSM examples during training to increase robustness.

In [None]:
# Adversarial training
model_adv = CNN().to(device)
optimizer_adv = optim.Adam(model_adv.parameters(), lr=0.001)
attack_adv = FGSM(model_adv, eps=0.3)

for epoch in range(3):
    train(model_adv, device, train_loader, optimizer_adv, adversarial=True, attack=attack_adv)

adv_train_acc = test(model_adv, device, test_loader)
print(f"Accuracy after adversarial training: {adv_train_acc:.2f}%")

## Key findings
- Clean model accuracy dropped after attack
- Adversarial training improves defense
