<h1 align="center">Introduction to Machine Learning - 25737-2</h1>
<h4 align="center">Dr. R. Amiri</h4>
<h4 align="center">Sharif University of Technology, Summer 2024</h4>
<h4 align="center">Project Phase one</h4>



**Student Name**: Zahra Maleki & Hossain Anjidani

**Student ID**: 400110009 & 400100746

In [1]:
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score, roc_auc_score
from torch.utils.data import DataLoader, random_split
import numpy as np
import torch.nn.functional as F
from torch.utils.data import Subset
import random
import time
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score

In [2]:
# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

**Simulation Question 1**

In [3]:
# Step 1: Data Preparation

def prepare_data(batch_size=16):
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
    ])
    dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    val_loader =0
    train_loader = test_dataset
    
    return train_loader, val_loader, test_loader



In [4]:
# Step 2: ResNet-18 Modification
class ModifiedResNet18(nn.Module):
    def __init__(self, num_classes=10):
        super(ModifiedResNet18, self).__init__()
        self.resnet = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.DEFAULT)
        self.resnet.fc = nn.Sequential(
            nn.Linear(self.resnet.fc.in_features, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        return self.resnet(x)

In [5]:
# Step 3: SISA Training Implementation

def train_sisa_model(train_loader, val_loader, num_classes, S, R, epochs=1):
    shard_size = len(train_loader) // S

    models = []

    for s in range(S):
        print(s)
        shard_data = torch.utils.data.Subset(train_loader, list(range(s * shard_size, (s + 1) * shard_size)))

        # shard_loader = DataLoader(shard_data, batch_size=train_loader.batch_size, shuffle=True)
        shard_loader= shard_data

        model = ModifiedResNet18(num_classes).to(device)
        optimizer = optim.Adam(model.parameters(), lr=0.0001)
        criterion = nn.CrossEntropyLoss()

        for r in range(R):
            slice_size = shard_size // R
            slice_data = torch.utils.data.Subset(shard_loader.dataset, list(range(r * slice_size, (r + 1) * slice_size)))
            slice_loader = DataLoader(slice_data, batch_size=16, shuffle=True)


            model.train()
            for epoch in range(epochs):

                for inputs, labels in slice_loader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    optimizer.zero_grad()
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)
                    loss.backward()
                    optimizer.step()

            models.append(model)

    return models

In [6]:
def aggregate_models(trained_model,dataloader, num_classes, device='cuda'):
    all_probabilities = []
    all_predictions = []
    all_labels = []

    with torch.no_grad():
        
        for images, labels in dataloader:

            images = images.to(device)
            outputs = [model(images).cpu() for model in trained_model]
            avg_output = sum(outputs) / len(outputs)
            
            probabilities = nn.Softmax(dim=1)(avg_output)
            _, predictions = torch.max(probabilities, 1)
            all_probabilities.extend(probabilities.numpy())
            all_predictions.extend(predictions.numpy())
            all_labels.extend(labels.numpy())

    return all_probabilities, all_predictions, all_labels


def evaluate_model( probabilities, predictions,labels):

    f1 = f1_score(labels, predictions,average='macro')
    accuracy = accuracy_score(labels, predictions)
    precision = precision_score( labels, predictions, average='macro')
    recall = recall_score(labels,predictions, average='macro')

    auroc = roc_auc_score(labels,probabilities, multi_class='ovr')
    return f1, accuracy,precision, recall,auroc

Our proposed aggregation methods are:

1. Majority Voting: Each constituent model votes for a class label, and the majority label is
output. Strength is simplicity, it effectively averages out errors. Weakness is that it only uses
hard class labels rather than probabilities, and accurate constituents may be overwhelmed by
many inaccurate ones.
2. Averaging Probabilities: Each constituent outputs a probability distribution over classes, and
the average distribution is output. Strength is it makes use of probability information rather
than just labels. Weakness is that it assumes constituents are reasonably well-calibrated, and
accuracy may decrease if some constituents are very inaccurate.
3. Stacked Generalization: Train a meta-learner on the outputs of the constituent models. Strength
is the meta-learner can learn the best way to combine the constituents based on their actual
performance, mitigating weaknesses of simple averaging. Weakness is increased complexity,
and may overfit if not regularized properly given the small number of constituent models.

The paper found that for simpler learning tasks on datasets like Purchase and SVHN, both
majority voting and averaging probabilities worked reasonably well with SISA training and did not
significantly degrade accuracy compared to the baseline.

Method number 2, "Averaging Probabilities," can be considered better in this context.
Unlike the "Majority Voting" method, which only considers hard class labels, "Averaging Probabilities" takes into account the probability distribution over classes generated by each constituent model. This allows for a more nuanced understanding of the confidence or uncertainty associated with each prediction.

In [8]:
train_loader, val_loader, test_loader = prepare_data(batch_size=64)
num_classes = 10  # CIFAR-10 has 10 classes

S_values = [5, 10, 20]
R_values = [5, 10, 20]

# S_values = [5]
# R_values = [5]

results = []


Files already downloaded and verified
Files already downloaded and verified


In [10]:
initial_model_states = {}

# S=R=5
for S in S_values:
    for R in R_values:

        modelss = train_sisa_model(train_loader, val_loader, num_classes, S, R, epochs=1)
        initial_model_states[(S, R)] = modelss


0


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 122MB/s] 


1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19


In [None]:
for S in S_values:
    print(S)
    for R in R_values:
        # Train initial models

        modelss = initial_model_states[(S, R)]
        probabilities, predictions, labels = aggregate_models(modelss, test_loader, num_classes)
        f1, accuracy, precision, recall, auroc = evaluate_model(probabilities, predictions, labels)

        print(f"S: {S}, R: {R}, F1: {f1}, Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}, AUROC: {auroc}")


5

S: 5, R: 5, F1: 0.7378659002734873, Accuracy: 0.7391, Precision: 0.7414161543071336, Recall: 0.7390999999999999, AUROC: 0.9642954777777778

S: 5, R: 10, F1: 0.6126626651784358, Accuracy: 0.6208, Precision: 0.6414155216951756, Recall: 0.6208, AUROC: 0.936843311111111

S: 5, R: 20, F1: 0.7001602609672454, Accuracy: 0.699, Precision: 0.7110040071795748, Recall: 0.6990000000000001, AUROC: 0.9540885333333333

10

S: 10, R: 5, F1: 0.6070439250451045, Accuracy: 0.6107, Precision: 0.6859478630519296, Recall: 0.6106999999999999, AUROC: 0.9361826444444444

S: 10, R: 10, F1: 0.6249221724759357, Accuracy: 0.6343, Precision: 0.6819560302081789, Recall: 0.6343, AUROC: 0.9378788777777778

S: 10, R: 20, F1: 0.5301820971813231, Accuracy: 0.5512, Precision: 0.6400270334416847, Recall: 0.5511999999999999, AUROC: 0.9167874833333334

20

S: 20, R: 5, F1: 0.6596517389308058, Accuracy: 0.6579, Precision: 0.6701503678289822, Recall: 0.6579, AUROC: 0.943767688888889


In [35]:
S=20
R_values_2 = [10, 20]

for R in R_values_2:
        # Train initial models

    modelss = initial_model_states[(S, R)]
    probabilities, predictions, labels = aggregate_models(modelss, test_loader, num_classes)
    f1, accuracy, precision, recall, auroc = evaluate_model(probabilities, predictions, labels)

        # results.append((S, R, f1, accuracy, precision, recall, auroc))

    print(f"S: {S}, R: {R}, F1: {f1}, Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}, AUROC: {auroc}")


S: 20, R: 10, F1: 0.3516821968340481, Accuracy: 0.3766, Precision: 0.41011125168073564, Recall: 0.3766, AUROC: 0.8424659444444444
S: 20, R: 20, F1: 0.2659517278318598, Accuracy: 0.3304, Precision: 0.48025230832906063, Recall: 0.3304, AUROC: 0.8548166888888888


  _warn_prf(average, modifier, msg_start, len(result))


The best accurecy is resulted for S=R=5

In [9]:
S=5
R=5

modelss = train_sisa_model(train_loader, val_loader, num_classes, S, R, epochs=5)
probabilities, predictions, labels = aggregate_models(modelss, test_loader, num_classes)
f1, accuracy, precision, recall, auroc = evaluate_model(probabilities, predictions, labels)

print(f"S: {S}, R: {R}, F1: {f1}, Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}, AUROC: {auroc}")


0


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 81.4MB/s]


1
2
3
4
S: 5, R: 5, F1: 0.8856942628312907, Accuracy: 0.8859, Precision: 0.8867759705110402, Recall: 0.8859, AUROC: 0.9916806444444445


$Simulation Question 2$

In [7]:
# Step 1: Identify and remove data points to be unlearned
def identify_data_to_unlearn(dataset, num_to_unlearn=500):
    all_indices = list(range(len(dataset)))
    indices_to_unlearn = random.sample(all_indices, num_to_unlearn)
    return indices_to_unlearn

In [8]:
# Step 2: Unlearn data from the relevant shards and slices
def unlearn_data(models, train_loader, num_classes, S, R, indices_to_unlearn, epochs=1):
    shard_size = len(train_loader) // S
    updated_models = []
    for s in range(S):
        shard_start = s * shard_size
        shard_end = (s + 1) * shard_size
        shard_indices = set(range(shard_start, shard_end))
        unlearn_indices_in_shard = shard_indices.intersection(indices_to_unlearn)

        if unlearn_indices_in_shard:
            shard_data = torch.utils.data.Subset(train_loader, list(shard_indices - unlearn_indices_in_shard))
#             shard_loader = DataLoader(shard_data, batch_size=64, shuffle=True)
            shard_loader = shard_data
            model = ModifiedResNet18(num_classes).cuda()
            optimizer = optim.Adam(model.parameters(), lr=0.001)
            criterion = nn.CrossEntropyLoss()

            for r in range(R):
                slice_size = shard_size // R
                slice_data = torch.utils.data.Subset(shard_loader.dataset, list(range(r * slice_size, (r + 1) * slice_size)))
                slice_loader = DataLoader(slice_data, batch_size=16, shuffle=True)

                for epoch in range(epochs):
                    model.train()
                    for inputs, labels in slice_loader:
                        inputs, labels = inputs.to(device), labels.to(device)
                        optimizer.zero_grad()
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)
                        loss.backward()
                        optimizer.step()

                updated_models.append(model)
        else:
            updated_models.append(models[s])

    return updated_models

The code calculates the size of each shard based on the length of the training loader and the number of shards.
The code then iterates over each shard (s) from 0 to S-1.
For each shard, it identifies the start and end indices of the shard in the training data.
It checks if any of the indices to unlearn fall within the current shard. If there are unlearn indices, the code proceeds to unlearn the data in that shard.
If there are unlearn indices in the shard, a new subset of the training data is created by excluding the unlearn indices using torch.utils.data.Subset.

The code then enters a loop that iterates R times, representing the slices within the shard.
For each slice, a new subset of the shard data is created.

The code enters another loop that runs for epochs times.
Within each epoch, the model is put in training mode, and for each batch of inputs and labels in the slice loader, the optimization steps are performed: zeroing the gradients, computing the outputs, calculating the loss, backpropagating the gradients, and updating the model parameters.

After the training loops, the updated model is appended to the updated_models list.
If there are no unlearn indices in the shard, the original model from the models list is appended to the updated_models list without any modification.

Once all shards have been processed, the function returns the updated_models list.

In [9]:
# Step 3: Evaluate performance metrics
def evaluate_unlearning_performance(test_loader, models, num_classes):
    start_time = time.time()

    probabilities, predictions, labels = aggregate_models(models, test_loader,num_classes)
    f1, accuracy, precision, recall, auroc = evaluate_model(probabilities, predictions, labels)

    unlearning_time = time.time() - start_time
    return f1, accuracy, precision, recall, auroc, unlearning_time


In [15]:
initial_results = []
unlearning_results = []
num_to_unlearn = 500

indices_to_unlearn_dict = {}

for S in S_values:
    for R in R_values:

            # Train initial models
            
        initial_models = initial_model_states[(S, R)]

            # Identify data points to be unlearned
        indices_to_unlearn = identify_data_to_unlearn(train_loader, num_to_unlearn=num_to_unlearn)
        indices_to_unlearn_dict[(S, R)] = indices_to_unlearn
        

In [28]:
# Main Function

for S in S_values:
    for R in R_values:

        indices_to_unlearn = indices_to_unlearn_dict[(S, R)]

        # Unlearn data
        updated_models = unlearn_data(initial_models, train_loader, num_classes, S, R, indices_to_unlearn, epochs=3)

        # Evaluate unlearning performance
        unlearn_f1, unlearn_accuracy, unlearn_precision, unlearn_recall, unlearn_auroc, unlearning_time = evaluate_unlearning_performance(test_loader, updated_models, num_classes)
        unlearning_results.append((S, R, unlearn_f1, unlearn_accuracy, unlearn_precision, unlearn_recall, unlearn_auroc, unlearning_time))

        #print(f"Initial - S: {S}, R: {R}, F1: {initial_f1}, Accuracy: {initial_accuracy}, Precision: {initial_precision}, Recall: {initial_recall}, AUROC: {initial_auroc}")
        print(f"Unlearn - S: {S}, R: {R}, F1: {unlearn_f1}, Accuracy: {unlearn_accuracy}, Precision: {unlearn_precision}, Recall: {unlearn_recall}, AUROC: {unlearn_auroc}, Time: {unlearning_time}")



Unlearn - S: 5, R: 5, F1: 0.7633936614861464, Accuracy: 0.7629, Precision: 0.7664485311405249, Recall: 0.7629, AUROC: 0.9700106777777778, Time: 244.53131890296936
Unlearn - S: 5, R: 10, F1: 0.6530344699052529, Accuracy: 0.6566, Precision: 0.6669224856746666, Recall: 0.6566, AUROC: 0.9434641277777777, Time: 474.72060441970825
Unlearn - S: 5, R: 20, F1: 0.6705007734050479, Accuracy: 0.6715, Precision: 0.6858269915247638, Recall: 0.6715, AUROC: 0.9458656555555557, Time: 932.6459593772888
Unlearn - S: 10, R: 5, F1: 0.5879710195672135, Accuracy: 0.5956, Precision: 0.6626835431671212, Recall: 0.5955999999999999, AUROC: 0.9293068, Time: 473.94844698905945
Unlearn - S: 10, R: 10, F1: 0.6125650379884067, Accuracy: 0.6216, Precision: 0.6323818961206391, Recall: 0.6216, AUROC: 0.9300428777777776, Time: 933.6839303970337
Unlearn - S: 10, R: 20, F1: 0.499533264202629, Accuracy: 0.5249, Precision: 0.6055703711905982, Recall: 0.5249, AUROC: 0.9015256111111111, Time: 1848.2416596412659
Unlearn - S: 20

$Simulation Question 3$

In [10]:
def compute_losses(model, data_loader, device='cuda'):
    model.eval()
    losses = []
    with torch.no_grad():
        for inputs, labels in data_loader:
            inputs, labels = inputs.cuda(), labels.cuda()
            outputs = model(inputs)
            loss = F.cross_entropy(outputs, labels, reduction='none')
            losses.extend(loss.cpu().numpy())
            
    return losses

In [11]:
def membership_inference_attack(losses_forget, losses_test):

    X = np.concatenate([losses_forget, losses_test]).reshape(-1, 1)
    y = np.concatenate([np.ones(len(losses_forget)), np.zeros(len(losses_test))])
    min_samples = min(len(losses_forget), len(losses_test))
    cv_splits = max(2, min(3, min_samples))
    clf = LogisticRegressionCV(cv=cv_splits).fit(X, y)
    scores = cross_val_score(clf, X, y, cv=cv_splits)

    return scores.mean()

The code concatenates the losses from both sets into a single array X.
It creates a target array y where it assigns a label of 1 to the losses from losses_forget and a label of 0 to the losses from losses_test.
A logistic regression classifier with cross-validation (LogisticRegressionCV) is fitted to the data (X and y).

Cross-validation scores are computed using cross_val_score to evaluate the performance of the classifier.

The mean of the cross-validation scores is returned as the attack score.

The membership_inference_attack function combines the losses from both the training and test data, trains a logistic regression classifier using cross-validation, and returns the mean cross-validation score as the attack score. 

This score indicates the success of the membership inference attack in distinguishing between data points used for training and those not used for training.

In [12]:
def evaluate_unlearning_performance_3(forget_set_loader, test_loader, initial_model, unlearned_model, num_test_samples):

    initial_losses_forget = compute_losses(initial_model, forget_set_loader)
    initial_losses_test = compute_losses(initial_model, test_loader)

    test_indices = list(range(len(initial_losses_test)))
    random_test_indices = random.sample(test_indices, num_test_samples)
    sampled_test_losses_initial = [initial_losses_test[i] for i in random_test_indices]

    # Perform MIA on initial model
    initial_mia_score = membership_inference_attack(initial_losses_forget, sampled_test_losses_initial)

    unlearned_losses_forget = compute_losses(unlearned_model, forget_set_loader)
    unlearned_losses_test = compute_losses(unlearned_model, test_loader)
    sampled_test_losses_unlearned = [unlearned_losses_test[i] for i in random_test_indices]

    # Perform MIA on unlearned model
    unlearned_mia_score = membership_inference_attack(unlearned_losses_forget, sampled_test_losses_unlearned)

    return initial_mia_score, unlearned_mia_score

In [56]:
if __name__ == "__main__":
    train_loader, val_loader, test_loader = prepare_data(batch_size=16)
    num_classes = 10  # CIFAR-10 has 10 classes

    S_values = [5, 10, 20]
    R_values = [5, 10, 20]

    initial_results = []
    unlearning_results = []
    num_to_unlearn = 500
    num_test_samples= 500

    for S in S_values:
        for R in R_values:

            # Train initial models
            initial_model_states_1 = initial_model_states[(S, R)]
            initial_model = ModifiedResNet18(num_classes).cuda()

            indices_to_unlearn = indices_to_unlearn_dict[(S, R)]

            # Prepare forget set loader
            forget_set_data = torch.utils.data.Subset(train_loader, indices_to_unlearn)
            forget_set_loader = DataLoader(forget_set_data, batch_size=64, shuffle=False)

            # Unlearn data
            updated_model_states = unlearn_data(initial_model_states_1, train_loader, num_classes, S, R, indices_to_unlearn, epochs=1)
            unlearned_model = ModifiedResNet18(num_classes).cuda()

            # Evaluate unlearning performance
            initial_mia_score, unlearned_mia_score = evaluate_unlearning_performance_3(forget_set_loader, test_loader, initial_model, unlearned_model, num_test_samples)

            print(f"S: {S}, R: {R}")
            print(f"Initial MIA Score: {initial_mia_score}")
            print(f"Unlearned MIA Score: {unlearned_mia_score}")

Files already downloaded and verified
Files already downloaded and verified
S: 5, R: 5
Initial MIA Score: 0.490990990990991
Unlearned MIA Score: 0.5100279920639202
S: 5, R: 10
Initial MIA Score: 0.46601092110074144
Unlearned MIA Score: 0.49899899899899897
S: 5, R: 20
Initial MIA Score: 0.47002091912271554
Unlearned MIA Score: 0.4860249471027915
S: 10, R: 5
Initial MIA Score: 0.49101496706287123
Unlearned MIA Score: 0.499997002991015
S: 10, R: 10
Initial MIA Score: 0.4920129710548872
Unlearned MIA Score: 0.48799398200595806
S: 10, R: 20
Initial MIA Score: 0.48297399195602786
Unlearned MIA Score: 0.5159320997644351
S: 20, R: 5
Initial MIA Score: 0.506006006006006
Unlearned MIA Score: 0.4999760239281197


OutOfMemoryError: CUDA out of memory. Tried to allocate 98.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 91.06 MiB is free. Process 2725 has 14.66 GiB memory in use. Of the allocated memory 13.78 GiB is allocated by PyTorch, and 740.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

As you can see CUDA is out of memory but I proceed to run the code from the first for the remaing S and R:

In [21]:
num_to_unlearn = 500
num_test_samples= 500
    
S=20
R_values_2 = [10, 20]
    
for R in R_values_2:

    # Train initial models
    initial_model_states_1 = initial_model_states[(S, R)]
    initial_model = ModifiedResNet18(num_classes).cuda()

    indices_to_unlearn = indices_to_unlearn_dict[(S, R)]

    # Prepare forget set loader
    forget_set_data = torch.utils.data.Subset(train_loader, indices_to_unlearn)
    forget_set_loader = DataLoader(forget_set_data, batch_size=16, shuffle=False)

    # Unlearn data
    updated_model_states = unlearn_data(initial_model_states_1, train_loader, num_classes, S, R, indices_to_unlearn, epochs=1)
    unlearned_model = ModifiedResNet18(num_classes).cuda()
    
    
    # Evaluate unlearning performance
    initial_mia_score, unlearned_mia_score = evaluate_unlearning_performance_3(forget_set_loader, test_loader, initial_model, unlearned_model, num_test_samples)

    print(f"S: {S}, R: {R}")
    print(f"Initial MIA Score: {initial_mia_score}")
    print(f"Unlearned MIA Score: {unlearned_mia_score}")

S: 20, R: 10
Initial MIA Score: 0.5069980159800519
Unlearned MIA Score: 0.4879819939700179
S: 20, R: 20
Initial MIA Score: 0.4840289391187594
Unlearned MIA Score: 0.48799398200595806


$Add On. Simulation Question 1$

For a perfectly unlearned model, we would expect the following:

Reduced Membership Inference Attack (MIA) Score: The cross-validation score of the Logistic Regression-based Membership Inference Attack should be significantly lower after unlearning. This would indicate that the model can no longer effectively distinguish between the data that was part of the training set (forget set) and the data that was not (randomly chosen test data). In an ideal scenario, the MIA score should be close to that of a random guess (around 0.5 for a binary classification task in a balanced dataset).

Restored General Performance: The general performance metrics (F1-score, accuracy, precision, recall, and AUROC) on clean test data should not be significantly degraded after the unlearning process. This would demonstrate that the model has retained its ability to generalize well on clean data even after the unlearning.

Reduced Attack Success Rate (ASR): For the backdoor attack, the ASR should be significantly lower after unlearning. This indicates that the model has successfully "forgotten" the association between the backdoor trigger and the target label.

S: 20, R: 5
Initial MIA Score: 0.506006006006006
Unlearned MIA Score: 0.4999760239281197

In [13]:
def poison_data(dataset, num_samples=500, target_class=0):
    poisoned_indices = []
    poisoned_data = []

    # Select indices of the target class
    target_indices = [i for i, (_, label) in enumerate(dataset) if label == target_class]
    selected_indices = random.sample(target_indices, num_samples)

    for idx in selected_indices:
        img, label = dataset[idx]
        img = np.array(img)
        
        # Randomly choose the position of the 3x3 black block
        x = random.randint(0, img.shape[1] - 3)
        y = random.randint(0, img.shape[2] - 3)
        img[:, x:x+3, y:y+3] = 0  # Set the block to black
        poisoned_data.append((torch.tensor(img), label))
        poisoned_indices.append(idx)

    return poisoned_data, poisoned_indices

In [14]:
# Function to add poisoned data to the dataset
def add_poisoned_data(train_loader, poisoned_data):
    
    poisoned_dataset = list(train_loader) + poisoned_data
    poisoned_loader = poisoned_dataset
    return poisoned_loader

In [17]:
train_loader, val_loader, test_loader = prepare_data(batch_size=16)
num_classes = 10  # CIFAR-10 has 10 classes

Files already downloaded and verified
Files already downloaded and verified


In [18]:
poisoned_data, poisoned_indices = poison_data(train_loader, num_samples=500, target_class=0)
poisoned_train_loader = add_poisoned_data(train_loader, poisoned_data)

In [19]:
def train_sisa_model_11(train_loader, val_loader, num_classes, S, R, epochs=1):
    shard_size = len(train_loader) // S

    models = []

    for s in range(S):
        print(s)
        shard_data = torch.utils.data.Subset(train_loader, list(range(s * shard_size, (s + 1) * shard_size)))

        shard_loader= shard_data

        model = ModifiedResNet18(num_classes).to(device)
        optimizer = optim.Adam(model.parameters(), lr=0.001)
        criterion = nn.CrossEntropyLoss()

        for r in range(R):
            slice_size = shard_size // R
            slice_data = torch.utils.data.Subset(shard_loader.dataset, list(range(r * slice_size, (r + 1) * slice_size)))
            slice_loader = DataLoader(slice_data, batch_size=16, shuffle=True)


            model.train()
            for epoch in range(epochs):

                for inputs, labels in slice_loader:
                    inputs, labels = inputs.to(device), labels.to(device)
                    optimizer.zero_grad()
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)
                    loss.backward()
                    optimizer.step()

            models.append(model)

    return models

In [20]:
S = 20
R = 5

best_model_states = train_sisa_model_11(poisoned_train_loader, val_loader, num_classes, S, R, epochs=1)

0


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 143MB/s] 


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19


In [21]:
probabilities_p, predictions_p, labels_p = aggregate_models(best_model_states, test_loader, num_classes)
f1_p, accuracy_p, precision_p, recall_p, auroc_p = evaluate_model(probabilities_p, predictions_p, labels_p)

print(f"Clean Test Data - F1: {f1_p}, Accuracy: {accuracy_p}, Precision: {precision_p}, Recall: {recall_p}, AUROC: {auroc_p}")


Clean Test Data - F1: 0.40790780553316297, Accuracy: 0.4373, Precision: 0.46660492814549215, Recall: 0.4373, AUROC: 0.8657756055555555


In [22]:
def poison_test_data(test_loader, target_class=0):
    poisoned_test_data = []

    for img, label in test_loader.dataset:
        img = np.array(img)
        # Randomly choose the position of the 3x3 black block
        x = random.randint(0, img.shape[1] - 3)
        y = random.randint(0, img.shape[2] - 3)
        img[:, x:x+3, y:y+3] = 0  # Set the block to black
        poisoned_test_data.append((torch.tensor(img), target_class))

    return poisoned_test_data

In [23]:
poisoned_test_data = poison_test_data(test_loader, target_class=0)
poisoned_test_loader = DataLoader(poisoned_test_data, batch_size=test_loader.batch_size, shuffle=False)

# Evaluate on poisoned test data
probabilities_pp, predictions_pp, labels_pp = aggregate_models(best_model_states, poisoned_test_loader, num_classes)
asr = (np.array(predictions_pp) == 0).mean()  # ASR is the percentage of samples misclassified as the target class

print(f"Attack Success Rate (ASR): {asr}")

Attack Success Rate (ASR): 0.11


which is a good score

$Add On. Simulation Question 2.$

In [24]:
indices_to_unlearn = poisoned_indices
unlearned_model_states = unlearn_data(best_model_states, train_loader, num_classes, S, R, indices_to_unlearn, epochs=1)

In [25]:
probabilities, predictions, labels = aggregate_models(unlearned_model_states, test_loader, num_classes)
f1, accuracy, precision, recall, auroc = evaluate_model(probabilities, predictions, labels)

print(f"Clean Test Data After Unlearning - F1: {f1}, Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}, AUROC: {auroc}")

Clean Test Data After Unlearning - F1: 0.39386923133051766, Accuracy: 0.4254, Precision: 0.4314379862844909, Recall: 0.42539999999999994, AUROC: 0.8539407777777777


In [26]:
probabilities, predictions, labels = aggregate_models(unlearned_model_states, poisoned_test_loader, num_classes)
asr = (np.array(predictions) == 0).mean()
print(f"Attack Success Rate (ASR) After Unlearning: {asr}")

Attack Success Rate (ASR) After Unlearning: 0.0553


As you can see after the unlerning the score is reduced.