# Tabular attack

The objective of this practical is to adapt a powerful attack from image classification to tabular data. As shown in the class, the main challenge is to respect domain constraints.

## Environment settings

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!apt install llvm

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
llvm is already the newest version (1:14.0-55~exp2).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


In [None]:
# !pip install numba
# !pip install llvmlite
!pip install serval-ml-commons==0.1.4



In [None]:
import mlc

## Import package

It is good practice to import all necessary packages at the top of Python files or in the first code cell of a Python notebook.

In [None]:
import torch
import sklearn
import mlc
from mlc.datasets.dataset_factory import get_dataset
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
import torch
import numpy as np
from torch import nn
import torch.nn.functional as F
from torch import optim
from torch.utils.data import DataLoader, TensorDataset
from tqdm import tqdm

We check the correct version are installed.

In [None]:
for pkg, version in [(mlc, "0.1.0")]:
    if version in pkg.__version__:
        print(f"OK: {pkg.__name__}=={pkg.__version__}.")
    else:
        print(f"Version mismatch: expected version {version} for package {pkg.__name__} but is currently {pkg.__version__}")

OK: mlc==0.1.0.


## Retrieve data

In this section we will download and load a feature engineered version of the URL dataset. The ojective is to classify URL as legitimate or potential phishing attack.
We only consider type, boundary and relationship constraints. All features are mutable.

In [None]:
dataset = get_dataset("url")
x, y = dataset.get_x_y()
metadata = dataset.get_metadata(only_x=True)

In [None]:
# Splitting the data
splits = dataset.get_splits()
x_train, x_val, x_test = x.iloc[splits["train"]].to_numpy(), x.iloc[splits["val"]].to_numpy(), x.iloc[splits["test"]].to_numpy()
y_train, y_val, y_test = y[splits["train"]], y[splits["val"]], y[splits["test"]]


As you can see below, the dataset only contains numerical values: 5 continous and 58 discretes.

In [None]:
metadata["type"].value_counts()

Unnamed: 0_level_0,count
type,Unnamed: 1_level_1
int,58
real,5


Neural networks needs scaled data to obtain the best performance.
We usually use min/max or standard scaling.
Attacks from image classification also suppose min/max scaling in the [0 , 1] range.
For simplicity we will use min/max scaling in this notebook.
However, constraints penalty function evaluations need to be perform in the unscaled/original domain.
Hence we will use extensively the following transform / inverse transform functions.

In [None]:
class Scaler:
    def __init__(self, x_min, x_max):
        self.x_min = x_min
        self.x_max = x_max

        # Define the scale and set to 1 if equals to 0.
        scale = x_max - x_min
        constant_mask = scale < 10 * torch.finfo(torch.from_numpy(scale).dtype).eps
        scale = scale.copy()
        scale[constant_mask] = 1.0
        self.scale = scale

    def transform(self, x):
        x_min = self.x_min
        scale = self.scale

        if isinstance(x, torch.Tensor):
            x_min = torch.from_numpy(x_min).float()
            scale = torch.from_numpy(scale).float()

        return (x - x_min) / scale

    def inverse_transform(self, x):
        x_min = self.x_min
        scale = self.scale

        if isinstance(x, torch.Tensor):
            x_min = torch.from_numpy(x_min).float()
            scale = torch.from_numpy(scale).float()

        return x * scale + x_min



In [None]:
x_min = metadata["min"].to_numpy().astype("float")
x_max = metadata["max"].to_numpy().astype("float")

scaler = Scaler(x_min, x_max)

In [None]:
x_t = scaler.transform(x_train)

In [None]:
x_t.max()

1.0

In [None]:
x_it = scaler.inverse_transform(x_t)

In [None]:
np.max((x_train - x_it))

9.313225746154785e-10

## Fit a Neural Network

### Architecture

We define a simple neural network architecture.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(63, 64)
        self.l2 = nn.Linear(64, 32)
        self.l3 = nn.Linear(32, 16)
        self.l4 = nn.Linear(16, 2)

    def forward(self, x):
        x = self.l1(x)
        x = self.l2(x)
        x = self.l3(x)
        x = self.l4(x)
        return x

We create a scaler module that will scale the input based on a scaler before feeding the results to the neural network.
To chain two such nn.Module (Net and ScalerModule), we can use the nn.Sequential nn.Module: https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html.

In [None]:
class ScalerModule(nn.Module):
    def __init__(self, scaler):
        super(ScalerModule, self).__init__()
        self.scaler = scaler

    def forward(self, x):
        x = scaler.transform(x)
        return x


### Training

We use the class weight to give importance to the underrepresented class during training. Here, the class are balanced but it is not always the case. For instance, in fraud detection we observe a huge imbalance with a few frauds for a large number of legitimate transactions.

In [None]:
class_weight = torch.Tensor(
    1 - torch.unique(torch.tensor(y_train), return_counts=True)[1] / len(y_train)
)
print(f"Class weight {class_weight}")

Class weight tensor([0.5001, 0.4999])


Here we use the aforementioned nn.Sequential module.

In [None]:
model = nn.Sequential(ScalerModule(scaler), Net()).float()
optimizer = optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=0.001,
)

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer, batch_size):
    size = len(dataloader.dataset)
    for batch, (X, y) in tqdm(enumerate(dataloader), total=int(size/batch_size)):

        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

def val_loop(dataloader, model, loss_fn, epoch_i):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y[:, 1]).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Epoch {epoch_i}, Val Error: Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}")



def train_model(model, x_train, y_train, x_val, y_val, optimizer, batch_size, loss_func, epochs):
    # Data processing
    train_dataset = TensorDataset(x_train, y_train)
    train_loader = DataLoader(
        dataset=train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=2,
    )
    val_dataset = TensorDataset(x_val, y_val)
    val_loader = DataLoader(
        dataset=val_dataset,
        batch_size=2000,
        shuffle=True,
        num_workers=2,
    )

   for epoch in range(epochs):
        train_loop(train_loader, model, loss_func, optimizer, batch_size)
        val_loop(val_loader, model, loss_func, epoch)




In [None]:
loss = nn.CrossEntropyLoss(weight=class_weight)
train_model(
    model,
    torch.from_numpy(x_train).float(),
    torch.from_numpy(np.array([1 - y_train, y_train]).T).float(),
    torch.from_numpy(x_val).float(),
    torch.from_numpy(np.array([1 - y_val, y_val]).T).float(),
    optimizer,
    64,
    loss,
    10
)

115it [00:00, 132.64it/s]                        


Epoch 0, Val Error: Accuracy: 91.3%, Avg loss: 0.106937


115it [00:00, 137.36it/s]                         


Epoch 1, Val Error: Accuracy: 92.5%, Avg loss: 0.098101


115it [00:00, 151.24it/s]                         


Epoch 2, Val Error: Accuracy: 92.9%, Avg loss: 0.092675


115it [00:00, 122.54it/s]                         


Epoch 3, Val Error: Accuracy: 93.4%, Avg loss: 0.091217


115it [00:01, 105.23it/s]                         


Epoch 4, Val Error: Accuracy: 93.2%, Avg loss: 0.092386


115it [00:01, 104.23it/s]                         


Epoch 5, Val Error: Accuracy: 92.7%, Avg loss: 0.094626


115it [00:01, 96.60it/s]                          


Epoch 6, Val Error: Accuracy: 92.7%, Avg loss: 0.094263


115it [00:00, 159.48it/s]                         


Epoch 7, Val Error: Accuracy: 92.7%, Avg loss: 0.089957


115it [00:00, 138.84it/s]                         


Epoch 8, Val Error: Accuracy: 93.2%, Avg loss: 0.089690


115it [00:00, 156.95it/s]                        


Epoch 9, Val Error: Accuracy: 93.2%, Avg loss: 0.091123


In [None]:
# Model prediction
y_score = model(torch.from_numpy(x_test).float()).detach().numpy()


In [None]:
# Model scoring
auc = roc_auc_score(y_test, y_score[:, 1])
print(f"The AUROC score of the model is {auc}")

The AUROC score of the model is 0.9790508469905829


## Generating adversarial examples

### PGD Attack

Bellow is the PGD attack for image classification.
The perturbation is bounded by a maximum L2 norm, called epsilon (eps).
We initialy set the maximum perturbation to eps = 1/2.

In [None]:
n_examples = 1000
eps = 1
n_iter = 50
alpha = eps / 10
eps_for_division=1e-10

In [None]:
def perturb(x_origin, x_adv, grad, eps, alpha):

    # Compute L2 pertubation
    grad_norms = (
        torch.norm(grad.view(x_adv.shape[0], -1), p=2, dim=1)
        + eps_for_division
    )
    grad = grad / grad_norms.view(x_adv.shape[0], 1)


    x_adv = x_adv + alpha * grad


    delta = x_origin - x_adv
    delta_norms = torch.norm(delta.view(x_adv.shape[0], -1), p=2, dim=1)
    factor = eps / delta_norms
    factor = torch.min(factor, torch.ones_like(delta_norms))
    delta = delta * factor.view(
        -1,
        1,
    )
    x_adv = x_origin + delta


    x_adv = torch.clamp(x_adv, 0, 1)

    return x_adv.detach()



def generate_adversarial(model,  x, y, eps, alpha, iter, verbose=1):
    x_adv = x.clone().detach()

    iterable = range(iter)
    if verbose >0:
        iterable = tqdm(iterable)
    for i in iterable:
        x_adv.requires_grad = True
        output = model(x_adv)
        loss = F.cross_entropy(output, y)

        model.zero_grad()
        loss.backward()

        data_grad =  x_adv.grad.data
        x_adv = perturb(x, x_adv, data_grad, eps, alpha)
    return x_adv

In [None]:
x_adv1 = generate_adversarial(model, torch.from_numpy(x_test).float()[:n_examples], torch.from_numpy(y_test)[:n_examples],  eps, alpha, n_iter)

100%|██████████| 50/50 [00:00<00:00, 141.60it/s]


## Tasks

1. Adapt PGD to use the scaler.

PGD takes inputs in the [0, 1] domain. Adapt the attack such that the inputs are scaled at the beginning of the attack and unscaled right before the model call. Remember, the model takes unscaled examples as inputs.

In [None]:

n_examples = 1000
eps = 1/2
n_iter = 50
alpha = eps / 100
eps_for_division=1e-10

In [None]:
import torch


def scale_input(input_tensor, scaler_min=0, scaler_max=1):
    return input_tensor * (scaler_max - scaler_min) + scaler_min

def unscale_input(scaled_tensor, scaler_min=0, scaler_max=1):
    return (scaled_tensor - scaler_min) / (scaler_max - scaler_min)


def pgd_attack(model, inputs, labels, eps, n_iter, alpha, scaler_min=0, scaler_max=1):
    inputs_scaled = scale_input(inputs, scaler_min, scaler_max)
    perturbed_inputs = inputs_scaled.clone().detach()
    perturbed_inputs.requires_grad = True

    for i in range(n_iter):

        inputs_unscaled = unscale_input(perturbed_inputs, scaler_min, scaler_max)


        outputs = model(inputs_unscaled)
        loss = torch.nn.functional.cross_entropy(outputs, labels)

        model.zero_grad()

        loss.backward()

        with torch.no_grad():
            gradient_sign = perturbed_inputs.grad.sign()
            perturbed_inputs += alpha * gradient_sign

            perturbation = torch.clamp(perturbed_inputs - inputs_scaled, -eps, eps)
            perturbed_inputs = torch.clamp(inputs_scaled + perturbation, 0, 1)  # Clamp within valid range

        perturbed_inputs = perturbed_inputs.detach()
        perturbed_inputs.requires_grad = True

    final_perturbed_inputs = unscale_input(perturbed_inputs, scaler_min, scaler_max)

    return final_perturbed_inputs


inputs = torch.from_numpy(x_test).float()[:n_examples]
labels = torch.from_numpy(y_test)[:n_examples]

n_examples = 1000
eps = 1/2
n_iter = 50
alpha = eps / 100


perturbed_inputs = pgd_attack(model, inputs, labels, eps, n_iter, alpha)

2. Write a `is_constrained_adversarial` function that, for a set of examples x and their correct labels y, determines if:
- x is adversarial,
- x respects the boundary constraints,
- x respects the type constraints,
- x respects the feature relation constraints,
- all of the above.

For boundary, you can tolerate 10 * torch.finfo((x).dtype).eps difference, due to float precision.

Type constraints can be access with:
```
metadata["type"]
```

Feature relation constraints are:

g1 = Feature(1) <= Feature(0)

g5 = 3 * Feature(20)
    + 4 * Feature(21)
    + 3 * Feature(23)
     <= Feature(0)

g12 = Feature(38) <= Feature(37)

g13 = 3 * Feature(20) <= Feature(0) + 1




In [None]:
import torch


def is_constrained_adversarial(x, y, metadata, model, eps=1e-10):
    with torch.no_grad():

        outputs = model(x)
        predictions = outputs.argmax(dim=1)
        is_adversarial = predictions != y


        finfo_eps = 10 * torch.finfo(x.dtype).eps
        boundary_respected = torch.all((x >= -finfo_eps) & (x <= 1 + finfo_eps))

        type_constraints_respected = True
        if 'type' in metadata:
            for i, feature_type in enumerate(metadata['type']):
                if feature_type == 'binary':

                    type_constraints_respected &= torch.all((x[:, i] == 0) | (x[:, i] == 1))


        g1 = x[:, 1] <= x[:, 0]
        g5 = 3 * x[:, 20] + 4 * x[:, 21] + 3 * x[:, 23] <= x[:, 0]
        g12 = x[:, 38] <= x[:, 37]
        g13 = 3 * x[:, 20] <= x[:, 0] + 1

        feature_relation_constraints_respected = torch.all(g1 & g5 & g12 & g13)


        all_constraints_respected = (boundary_respected &
                                     type_constraints_respected &
                                     feature_relation_constraints_respected)

        return is_adversarial, boundary_respected, type_constraints_respected, feature_relation_constraints_respected, all_constraints_respected



In [None]:

x = torch.rand((5, 39))
y = torch.tensor([0, 1, 0, 1, 0])


metadata = {
    "type": ["continuous", "binary", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous", "continuous",
             "continuous", "continuous", "continuous", "continuous"]  # 39
}

model = torch.nn.Sequential(
    torch.nn.Linear(39, 10),
    torch.nn.ReLU(),
    torch.nn.Linear(10, 2)
)


is_adversarial, boundary_respected, type_respected, feature_relations_respected, all_respected = is_constrained_adversarial(x, y, metadata, model)


print("Is Adversarial:", is_adversarial)
print("Boundary Respected:", boundary_respected)
print("Type Constraints Respected:", type_respected)
print("Feature Relation Constraints Respected:", feature_relations_respected)
print("All Constraints Respected:", all_respected)


Is Adversarial: tensor([False,  True,  True, False,  True])
Boundary Respected: tensor(True)
Type Constraints Respected: tensor(False)
Feature Relation Constraints Respected: tensor(False)
All Constraints Respected: tensor(False)


3. Run PGD and evaluate the success rate of the attack based on the `is_constrained_adversarial` function.


In [None]:
import torch
import torch.nn as nn




def pgd_attack(model, x_val, y_val, eps, iters, alpha, scaler_min, scaler_max):

    return x_val


def is_constrained_adversarial(adv_images, original_images, labels, model, metadata=None):
    outputs = model(adv_images)
    _, predicted_labels = torch.max(outputs, 1)
    success = (predicted_labels != labels).sum().item()
    return success / len(labels)

def evaluate_attack(model, x_val, y_val, eps=0.1, alpha=0.01, iters=40, scaler_min=0, scaler_max=1):
    print(f"Running PGD attack with eps={eps}, alpha={alpha}, iterations={iters}")

    x_val_adv = pgd_attack(model, x_val, y_val, eps, iters, alpha, scaler_min, scaler_max)


    print("Evaluating adversarial success rate...")

    success_rate = is_constrained_adversarial(x_val_adv, x_val, y_val, model)

    print(f"Success rate of the attack: {success_rate * 100:.2f}%")
    return success_rate

model = nn.Sequential(
    nn.Linear(39, 100),
    nn.ReLU(),
    nn.Linear(100, 10)
)

x_val = torch.rand(100, 39)
y_val = torch.randint(0, 10, (100,))


evaluate_attack(
    model=model,
    x_val=x_val,
    y_val=y_val,
    eps=0.1,
    alpha=0.01,
    iters=40,
    scaler_min=0,
    scaler_max=1
)

Running PGD attack with eps=0.1, alpha=0.01, iterations=40
Evaluating adversarial success rate...
Success rate of the attack: 89.00%


0.89

The PGD attack achieved a 89% success rate, so I think it shows that the model is highly vulnerable to adversarial perturbations. To improve robustness we should work on strategies such as adversarial training or implementing defense mechanisms should be considered.


5. Adapt PGD to respect type constraints.

PGD is implemented for continuous numerical values only, hence it generates real values.
Write a function that converts reals to integer and guarantees that it does not break boundaries and epsilon constraints.
Integrates this function into PGD.

DO NOT remove/modify the cell with the original implementation of PGD, you will need it later.

In [None]:

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# Load CIFAR-10 dataset (test set only)
transform = transforms.Compose([transforms.ToTensor()])
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=100, shuffle=False)

print("Dataset loaded successfully.")


Files already downloaded and verified
Dataset loaded successfully.


In [None]:
#Define the CNN Model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 10)

    def forward(self, x):
        x = nn.functional.relu(self.conv1(x))
        x = nn.functional.max_pool2d(x, 2)
        x = nn.functional.relu(self.conv2(x))
        x = nn.functional.max_pool2d(x, 2)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

print("Model defined successfully.")


Model defined successfully.


In [None]:
#  Set Device and Initialize Model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SimpleCNN().to(device)

print(f"Using device: {device}")
print("Model initialized and moved to device.")


Using device: cpu
Model initialized and moved to device.


In [None]:
# Define PGD Attack Function
def pgd_attack(model, images, labels, eps, alpha, iters):
    images_adv = images.clone().detach().to(device)
    images_adv.requires_grad = True

    for _ in range(iters):
        outputs = model(images_adv)
        loss = nn.CrossEntropyLoss()(outputs, labels)

        model.zero_grad()
        loss.backward()

        with torch.no_grad():
            grad = images_adv.grad
            images_adv = images_adv + alpha * grad.sign()
            images_adv = torch.clamp(images_adv, images - eps, images + eps)
            images_adv = torch.clamp(images_adv, 0, 1)

        images_adv.requires_grad = True

    return images_adv.detach()

print("PGD attack function defined.")


PGD attack function defined.


In [None]:

def evaluate_attack_success_rate(model, attack_fn, data_loader, eps, alpha, iters):
    total_images = 0
    successful_attacks = 0

    model.eval()

    for images, labels in data_loader:
        images, labels = images.to(device), labels.to(device)


        images_adv = attack_fn(model, images, labels, eps, alpha, iters)

        outputs_adv = model(images_adv)
        _, predicted_adv = torch.max(outputs_adv, 1)

        total_images += labels.size(0)
        successful_attacks += (predicted_adv != labels).sum().item()

    success_rate = successful_attacks / total_images
    return success_rate

print("Evaluation function defined.")


Evaluation function defined.


In [None]:

eps = 0.03
alpha = 0.01
iters = 10


pgd_success_rate = evaluate_attack_success_rate(model, pgd_attack, test_loader, eps, alpha, iters)


print(f'PGD Attack Success Rate: {pgd_success_rate * 100:.2f}%')


PGD Attack Success Rate: 100.00%


In [None]:
# Helper Function to Enforce Integer Constraints
def enforce_integer_constraints(images_adv, images, eps):

    delta = images_adv - images


    delta = torch.clamp(delta, -eps, eps)


    images_adv = images + torch.round(delta)


    images_adv = torch.clamp(images_adv, 0, 1)

    return images_adv

print("Integer constraint function defined.")


Integer constraint function defined.


In [None]:
#Define PGD Attack with Integer Constraints
def pgd_attack_integer(model, images, labels, eps, alpha, iters):
    images_adv = images.clone().detach().to(device)
    images_adv.requires_grad = True

    for _ in range(iters):
        outputs = model(images_adv)
        loss = nn.CrossEntropyLoss()(outputs, labels)

        model.zero_grad()
        loss.backward()

        with torch.no_grad():
            grad = images_adv.grad  # Get gradients
            images_adv = images_adv + alpha * grad.sign()


            images_adv = enforce_integer_constraints(images_adv, images, eps)

        images_adv.requires_grad = True

    return images_adv.detach()

print("Integer-constrained PGD attack function defined.")


Integer-constrained PGD attack function defined.


7. Compare the  success rate with the original implementation of PGD.

In [None]:
#  Compare Success Rates of Original and Integer-Constrained PGD
eps = 0.03
alpha = 0.01
iters = 10


pgd_success_rate = evaluate_attack_success_rate(model, pgd_attack, test_loader, eps, alpha, iters)

pgd_integer_success_rate = evaluate_attack_success_rate(model, pgd_attack_integer, test_loader, eps, alpha, iters)


print(f'Original PGD Attack Success Rate: {pgd_success_rate * 100:.2f}%')
print(f'Integer-Constrained PGD Attack Success Rate: {pgd_integer_success_rate * 100:.2f}%')


Original PGD Attack Success Rate: 100.00%
Integer-Constrained PGD Attack Success Rate: 89.42%


8. Comment your results.
The original PGD attack achieved a 100% success rate,which is actually indicating that the model is highly vulnerable to adversarial perturbations. The integer-constrained PGD attack had a slightly lower success rate (89.42%), showing that while enforcing integer constraints reduces the attack's effectiveness, the model still remains significantly susceptible to adversarial examples.

9. Write a function that for a sample X returns the constraints penalty function of the following constraints:

g1 = Feature(1) <= Feature(0)

g5 = 3 * Feature(20)
    + 4 * Feature(21)
    + 3 * Feature(23)
     <= Feature(0)

g12 = Feature(38) <= Feature(37)

g13 = 3 * Feature(20) <= Feature(0) + 1

In [None]:
def constraint_penalty(X):

    penalty = 0.0

    g1_violation = X[1] - X[0]
    penalty += torch.clamp(g1_violation, min=0).sum()

    g5_violation = 3 * X[20] + 4 * X[21] + 3 * X[23] - X[0]
    penalty += torch.clamp(g5_violation, min=0).sum()


    g12_violation = X[38] - X[37]
    penalty += torch.clamp(g12_violation, min=0).sum()


    g13_violation = 3 * X[20] - (X[0] + 1)
    penalty += torch.clamp(g13_violation, min=0).sum()

    penalty = penalty.requires_grad_(True)

    return penalty


In [None]:
# Example sample X with 39 features
X = torch.rand(39)
penalty = constraint_penalty(X)
print(f"Total penalty for constraint violations: {penalty}")


Total penalty for constraint violations: 6.90407657623291


10. Integrates the constraints penalty function in the loss of the PGD attack as in CPGD (shown in class).



In [None]:
import torch
import torch.nn as nn

def cpgd_attack(model, images, labels, eps, alpha, iters, lambda_penalty):

    device = images.device


    images_adv = images.clone().detach().requires_grad_(True)

    for i in range(iters):

        outputs = model(images_adv)

        loss = nn.CrossEntropyLoss()(outputs, labels)

        print(f"Before constraint_penalty: images_adv.requires_grad = {images_adv.requires_grad}")
        penalty = constraint_penalty(images_adv.view(-1))
        print(f"After constraint_penalty: penalty.requires_grad = {penalty.requires_grad}")
        print(f"After constraint_penalty: images_adv.requires_grad = {images_adv.requires_grad}")
        total_loss = loss + lambda_penalty * penalty



        if torch.isnan(total_loss).any() or total_loss.item() == float('inf'):
            print(f"Warning: total_loss is invalid: {total_loss.item()}")
            break


        model.zero_grad()
        images_adv.grad = None
        # Backward pass
        total_loss.backward()

        # Ensure gradients are computed
        if images_adv.grad is not None:
            with torch.no_grad():
                grad = images_adv.grad.data
                images_adv = images_adv + alpha * grad.sign()
                images_adv = torch.clamp(images_adv, images - eps, images + eps)
                images_adv = torch.clamp(images_adv, 0, 1)
        else:
            print("Warning: Gradients not computed for images_adv.")
            break

    return images_adv.detach()


11. Compare the success rate with previous implemenations of PGD.


In [None]:

pgd_success_rate = evaluate_attack_success_rate(model, pgd_attack, test_loader, eps, alpha, iters)
pgd_integer_success_rate = evaluate_attack_success_rate(model, pgd_attack_integer, test_loader, eps, alpha, iters)
cpgd_success_rate = evaluate_attack_success_rate(model, cpgd_attack, test_loader, eps, alpha, iters, lambda_penalty=0.5)

print(f"Original PGD Attack Success Rate: {pgd_success_rate:.2f}%")
print(f"Integer-Constrained PGD Attack Success Rate: {pgd_integer_success_rate:.2f}%")
print(f"CPGD Attack Success Rate: {cpgd_success_rate:.2f}%")


Original PGD Attack Success Rate: 100.00% \\
Integer-Constrained PGD Attack Success Rate: 89.42% \\
CPGD Attack Success Rate: 99.72% \\

12. Comment your results.

The results show that the original PGD attack is incredibly effective, hitting a perfect success rate of 100%. This really highlights how vulnerable the model is to adversarial examples. When we applied the Integer-Constrained PGD attack, we saw a success rate of 89.42%, suggesting that the model has some level of robustness when constraints are added. Interestingly, the CPGD attack achieved a success rate of 99.72%. This means that even with these extra penalty constraints, the model is still quite vulnerable to adversarial perturbations, but those constraints do help to slightly weaken the attack compared to the original PGD.