# Membership Inference Competition (MICO) @ IEEE SatML 2023: Purchase-100

Welcome to the MICO competition!

This notebook will walk you through the process of creating and packaging a submission to one of the challenges.

Let's start by downloading and extracting the archive for the Purchase-100 challenge.

**NOTE**: Public anonymous access to the competition data is disabled. 
Upon registering for the competition, you will be shown a URL with an embedded bearer token that you must use instead of the URL below.

## Contents

The archive was extracted under the `purchase100` folder containing 3 sub-folders, one for each of the scenarios in the challenge:

- `purchase100_lo`  : Models trained with DP-SGD and a small privacy budget ($\epsilon \approx 4$) 
- `purchase100_hi`  : Models trained with DP-SGD and a large privacy budget ($\epsilon \approx 10$) 
- `purchase100_inf` : Models trained without differential privacy guarantee ($\epsilon = \infty$)

Each of these folders contains 3 other folders:

- `train`: Models with metadata allowing to reconstruct their full training datasets. Use these to develop your attacks without having to train your own models.
- `dev`: Models with metadata allowing to reconstruct just the set of challenge examples. Membership predictions for these challenges will be used to evaluate submissions during the competition and update the live scoreboard in CodaLab. 
- `final`: Models with metadata allowing to reconstruct just the set of challenge examples. Membership predictions for these challenges will be used to evaluate submissions when the competition closes and to determine the final ranking.

Each model folder in `train`, `dev`, and `final` contains a `model.pt` file with the model weights (a serialized PyTorch `state_dict`). There are 100 models in `train`, and 50 models in each of `dev` and `final`.

Models in the `train` folder come with 3 PRNG seeds used to reconstruct the set of member and non-member challenge examples, and the rest of the examples in the training dataset of the model. Additionally (and redundantly), a `solution.csv` file reveals the membership information of the challenge examples.

Models in the `dev` and `final` folders contain just 1 PRNG seed used to reconstruct the set of challenge examples, without revealing which were included in the training dataset.

We provide utilities to reconstruct the different data splits from provided seeds and to load models as classes inheriting from `torch.nn.Module`. If you use TensorFlow, JAX, or any other framework, you can easily convert the models to the appropriate format (e.g. using ONXX).

Here's a summary of how the contents are structured:

- `purchase100_lo`
  - `train`
      - `model_0`
        - `model.pt`: Serialized model weights
        - `seed_challenge`: PRNG seed used to select a list of 100 challenge examples
        - `seed_training`: PRNG seed used to select the non-challenge examples in the training dataset
        - `seed_membership`: PRNG seed used to split the set of challenge examples into members and non-members (100 of each)
        - `solution.csv`: Membership information of the challenge examples (`1` for member, `0` for non-member)
      - ...
  - `dev`
      - `model_100`
        - `model.pt`
        - `seed_challenge`
      - ...
  - `final`
    - `model_150`
      - `model.pt`
      - `seed_challenge`
    - ...
- `purchase100_hi`
  - ...
- `purchase100_inf`
  - ...

In [None]:
url = "https://membershipinference.blob.core.windows.net/mico/purchase100.zip?si=purchase100&spr=https&sv=2021-06-08&sr=b&sig=YzJUTPoNndtIy0y2666XnPXS4WBF%2BbN7kbVM2soQNoU%3D"
filename = "purchase100.zip"
md5 = "67eba1f88d112932fe722fef85fb95fd"

try:
    download_and_extract_archive(url=url, download_root=os.curdir, extract_root=None, filename=filename, md5=md5, remove_finished=False)
except urllib.error.HTTPError as e:
    print(e)
    print("Have you replaced the URL above with the one you got after registering?")

In [1]:
import os
import urllib

from torchvision.datasets.utils import download_and_extract_archive

## Task

Your task as a competitor is to produce, for each model in `dev` and `final`, a CSV file listing your confidence scores (values between 0 and 1) for the membership of the challenge examples. You must save these scores in a `prediction.csv` file and place it in the same folder as the corresponding model. A submission to the challenge is an an archive containing just these `prediction.csv` files.

**You must submit predictions for both `dev` and `final` when you submit to CodaLab.**

In the following, we will show you how to compute predictions from a basic membership inference attack and package them as a submission archive. 

In [2]:
import numpy as np
import torch
import csv

from torch.autograd import Variable
from sklearn import metrics
from tqdm.notebook import tqdm
from torch.distributions import normal
from torch.utils.data import DataLoader, Dataset
from mico_competition import ChallengeDataset, load_purchase100, load_model

In [3]:
def normalize_preds(preds):
    # Normalize to unit interval
    min_prediction = np.min(preds)
    max_prediction = np.max(preds)
    preds = (preds - min_prediction) / (max_prediction - min_prediction)
    return preds

In [4]:
# Attack based on checking robustness in neighborhood
@torch.no_grad()
def neighborhood_robustness(model, features, epsilon: float, n_neighbors: int):
    batch_size_desired = len(features)
    noise = normal.Normal(0, epsilon)
    l2_diffs = []
    base_preds = model(features).cpu().numpy()
    n_classes = base_preds.shape[1]
    for i, feature in enumerate(features):
        neighbors = []
        for _ in range(n_neighbors):
            neighbors.append(feature + noise.sample(feature.shape).to(feature.device))
        neighbors = torch.stack(neighbors, 0)
        prediction = model(neighbors).cpu().numpy()
        # Use L2 to check robustness
        predictions_diff = np.linalg.norm(prediction - base_preds[i], axis=1)
        l2_diffs.append(np.mean(predictions_diff))
    return np.array(l2_diffs)

In [5]:
def neighborhood_and_loss(model, features, labels, as_features: bool = False):
    criterion = torch.nn.CrossEntropyLoss(reduction='none')
    features_collect = []
    n_neighbors = 20

    # Neighborhood robustness attack (near)
    epsilon = 0.01
    n_neighbors = 10
    predictions_robust = neighborhood_robustness(model, features, epsilon, n_neighbors)
    if as_features:
        features_collect.append(predictions_robust)
    predictions_robust = normalize_preds(predictions_robust)
    predictions_robust = (1 - predictions_robust)
    
    # Neighborhood robustness attack (far)
    epsilon = 0.1
    n_neighbors = 10
    predictions_robust = neighborhood_robustness(model, features, epsilon, n_neighbors)
    if as_features:
        features_collect.append(predictions_robust)
    predictions_robust = normalize_preds(predictions_robust)
    predictions_robust = (1 - predictions_robust)
    
    # Neighborhood robustness attack (further)
    epsilon = 0.3
    n_neighbors = 10
    predictions_robust = neighborhood_robustness(model, features, epsilon, n_neighbors)
    if as_features:
        features_collect.append(predictions_robust)
    predictions_robust = normalize_preds(predictions_robust)
    predictions_robust = (1 - predictions_robust)
            
    # Loss Threshold Attack
    output = model(features)
    predictions = -criterion(output, labels).detach().numpy()
    if as_features:
        features_collect.append(predictions)
        return np.array(features_collect).T
    predictions = normalize_preds(predictions)
            
    predictions += predictions_robust
    return predictions

In [6]:
def get_gradient_norm(model, features, labels):
    criterion = torch.nn.CrossEntropyLoss(reduction='none')
    features_collected = []
    for feature, label in zip(features, labels):
        model.zero_grad()
        feature_var = Variable(feature)
        output = model(feature)
        loss = criterion(torch.unsqueeze(output, 0), torch.unsqueeze(label, 0))
        loss.backward()
        features_collected.append([torch.linalg.norm(x.grad.detach()).item() for x in model.parameters()])
    features_collected = np.array(features_collected)
    return features_collected

In [28]:
def get_gradient_update_norms(model, features, labels):
    lr = 0.001
    n_steps = 10
    criterion = torch.nn.CrossEntropyLoss(reduction='none')
    features_collected = []
    for feature, label in zip(features, labels):
        features_inside = []
        optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0)
        for _ in range(n_steps):
            optimizer.zero_grad()
            feature_var = Variable(feature)
            output = model(feature)
            loss = criterion(torch.unsqueeze(output, 0), torch.unsqueeze(label, 0))
            loss.backward()
            optimizer.step()
            features_inside.append([torch.linalg.norm(x.grad.detach()).item() for x in model.parameters()])
        features_collected.append(np.array(features_inside))
    features_collected = np.array(features_collected)
    return features_collected.reshape(features_collected.shape[0], -1)

In [29]:
def get_activations(model, features, labels):
    layerwise_features = {}
    # Define hook
    def get_features(name):
        def hook(model, input, output):
            layerwise_features[name] = output.detach()
        return hook
    
    preds = model(features)
    return None

In [30]:
def gradient_and_robustness(model, features, labels):
    features_1 = neighborhood_and_loss(model, features, labels, as_features=True)
    features_2 = get_gradient_norm(model, features, labels)
    combined_feratures = np.concatenate((features_1, features_2), 1)
    return combined_feratures

In [22]:
# Collect "training data" using models from train split
def collect_training_data():
    CHALLENGE = "purchase100"
    LEN_TRAINING = 150000
    LEN_CHALLENGE = 100

    scenarios = os.listdir(CHALLENGE)
    phases = ['dev', 'final', 'train']

    dataset = load_purchase100(dataset_dir="/u/as9rw/work/MICO/data")

    collected_features = {x:[] for x in scenarios}
    collected_labels = {x:[] for x in scenarios}
    phase = "train"
    for scenario in tqdm(scenarios, desc="scenario"):
        root = os.path.join(CHALLENGE, scenario, phase)
        for model_folder in tqdm(sorted(os.listdir(root), key=lambda d: int(d.split('_')[1])), desc="model"):
            path = os.path.join(root, model_folder)
            challenge_dataset = ChallengeDataset.from_path(path, dataset=dataset, len_training=LEN_TRAINING)
            challenge_points = challenge_dataset.get_challenges()
            
            model = load_model('purchase100', path)
            challenge_dataloader = torch.utils.data.DataLoader(challenge_points, batch_size=2*LEN_CHALLENGE)
            features, labels = next(iter(challenge_dataloader))

            # Based on gradients
            # processed_features = get_gradient_norm(model, features, labels)

            # Based on loss + robustness
            # processed_features = neighborhood_and_loss(model, features, labels, as_features=True)

            # Based on loss + robustness + gradients
            # processed_features = gradient_and_robustness(model, features, labels)

            # Based on neighborhood robustness
            # processed_features = neighborhood_robustness(model, features, 0.1, 50)
            
            # Based on multiple gradient updates
            processed_features = get_gradient_update_norms(model, features, labels)

            # Collect features
            collected_features[scenario].append(processed_features)
            
            # Get labels for membership
            collected_labels[scenario].append(challenge_dataset.get_solutions())
            np_y = np.array(challenge_dataset.get_solutions())
            
#             import matplotlib.pyplot as plt
#             print(processed_features.shape[1])
#             plt.scatter(processed_features[np_y == 0][0], processed_features[np_y == 0][1], color='r')
#             plt.scatter(processed_features[np_y == 1][0], processed_features[np_y == 1][1], color='b')
# #             plt.plot(np.arange(np.sum(np_y == 0)), np.sort(processed_features[np_y == 0]))
# #             plt.plot(np.arange(np.sum(np_y == 1)), np.sort(processed_features[np_y == 1]))
#             return None, None
    
    for sc in scenarios:
        collected_features[sc] = np.concatenate(collected_features[sc], 0)
        collected_labels[sc] = np.concatenate(collected_labels[sc], 0)

    return collected_features, collected_labels

In [23]:
X_for_meta, Y_for_meta = collect_training_data()

Successfully loaded the Purchase-100 dataset consisting of 197324 records and 600 attributes.


scenario:   0%|          | 0/3 [00:00<?, ?it/s]

model:   0%|          | 0/100 [00:00<?, ?it/s]

model:   0%|          | 0/100 [00:00<?, ?it/s]

model:   0%|          | 0/100 [00:00<?, ?it/s]

In [27]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

# Train different meta-classifiers per scenario
CHALLENGE = "purchase100"
scenarios = os.listdir(CHALLENGE)
meta_clfs = {x: MLPClassifier(hidden_layer_sizes=(32, 16, 8, 4), max_iter=300) for x in X_for_meta.keys()}
# meta_clfs = {x: RandomForestClassifier(max_depth=12) for x in X_for_meta.keys()}
# meta_clfs = {x: RandomForestClassifier(max_depth=10) for x in X_for_meta.keys()}
for sc in scenarios: 
    X_train, X_test, y_train, y_test = train_test_split(X_for_meta[sc], Y_for_meta[sc], test_size=0.1)
    meta_clfs[sc].fit(X_train, y_train)
    preds = meta_clfs[sc].predict_proba(X_test)[:, 1]
    print("%s | Train: %.3f" % (sc, meta_clfs[sc].score(X_train, y_train)))
    print("%s | Validation: %.3f" % (sc, meta_clfs[sc].score(X_test, y_test)))
    print("%s | Validation (AUC) : %.3f" % (sc, metrics.roc_auc_score(y_test, preds)))
    print()

purchase100_inf | Train: 0.575
purchase100_inf | Validation: 0.587
purchase100_inf | Validation (AUC) : 0.621

purchase100_hi | Train: 0.519
purchase100_hi | Validation: 0.518
purchase100_hi | Validation (AUC) : 0.524

purchase100_lo | Train: 0.509
purchase100_lo | Validation: 0.506
purchase100_lo | Validation (AUC) : 0.529



In [None]:
# Use given datapoint, train models w and w/o that point
# Adapt KL test (from our SaTML paper) to make prediction

In [None]:
# Another idea- perform one step of GD on model with datapoint, and compare gradient updates with
# cases where member was not seen before, and use this as a feature for a meta-classifier

In [None]:
# Plain old permutation-invariant network-based meta-classifier, but also take as input
# The raw datapoint. Hope meta-classifier learns to form associations, but not sure how to
# design such a meta-classifier (modifications). 

In [None]:
def pick_top_k_points(model, data, discard_percentage=):
    criterion = torch.nn.CrossEntropyLoss(reduction='none')
    dataloader = torch.utils.data.DataLoader(data, batch_size=1000, shuffle=False)
    loss_vals = []
    for x, y in dataloader:
        loss_vals.append(criterion(model(x), y).detach())
    loss_vals = ch.cat(loss_vals).cpu().numpy()
    

In [None]:
def approximate_retraining_attack(model, x, challenge_dataset):
    LEN_TRAINING = 150000
    LEN_CHALLENGE = 100
    rest_points = challenge_dataset.get_rest()
    challenge_dataloader = torch.utils.data.DataLoader(rest_points, batch_size=LEN_TRAINING - (2*LEN_CHALLENGE))
    rest_x, rest_y = next(iter(challenge_dataloader))
    # Get loss values for these 

In [None]:
CHALLENGE = "purchase100"
LEN_TRAINING = 150000
LEN_CHALLENGE = 100

scenarios = os.listdir(CHALLENGE)
phases = ['dev', 'final', 'train']

dataset = load_purchase100(dataset_dir="/u/as9rw/work/MICO/data")

for scenario in tqdm(scenarios, desc="scenario"):
    for phase in tqdm(phases, desc="phase"):
        root = os.path.join(CHALLENGE, scenario, phase)
        for model_folder in tqdm(sorted(os.listdir(root), key=lambda d: int(d.split('_')[1])), desc="model"):
            path = os.path.join(root, model_folder)
            challenge_dataset = ChallengeDataset.from_path(path, dataset=dataset, len_training=LEN_TRAINING)
            challenge_points = challenge_dataset.get_challenges()
            
            model = load_model('purchase100', path)
            challenge_dataloader = torch.utils.data.DataLoader(challenge_points, batch_size=2*LEN_CHALLENGE)
            features, labels = next(iter(challenge_dataloader))

            # This is where you plug in your membership inference attack
            # Combine preds from both
            # Got 0.1106 score
            # predictions = neighborhood_and_loss(model, features, 0.01, 10)
            # predictions = normalize_preds(predictions)
            
            # Meta-classifier :Random Forest, directly across all data/models
            # Got 0.1121 score
            # Scenario-wise meta-clfs got 0.1231 score
            # processed_features = neighborhood_and_loss(model, features, labels, as_features=True)
            
            # Meta-classifier: Random forest, on gradient updates
            # Got 0.0716 score
            # processed_features = get_gradient_norm(model, features, labels)
            
            # Meta-classifier: Random forest, on gradient updates, loss, and robustness
            # Got 0.1224 score
#             processed_features = gradient_and_robustness(model, features, labels)
            
            # Meta-classifier on multiple gradient updates (RF)
            # Got score
            processed_features = get_gradient_update_norms(model, features, labels)
            
            predictions = meta_clfs[scenario].predict_proba(processed_features)[:, 1]

            assert np.all((0 <= predictions) & (predictions <= 1))

            with open(os.path.join(path, "prediction.csv"), "w") as f:
                 csv.writer(f).writerow(predictions)

Successfully loaded the Purchase-100 dataset consisting of 197324 records and 600 attributes.


scenario:   0%|          | 0/3 [00:00<?, ?it/s]

phase:   0%|          | 0/3 [00:00<?, ?it/s]

model:   0%|          | 0/50 [00:00<?, ?it/s]

model:   0%|          | 0/50 [00:00<?, ?it/s]

model:   0%|          | 0/100 [00:00<?, ?it/s]

phase:   0%|          | 0/3 [00:00<?, ?it/s]

model:   0%|          | 0/50 [00:00<?, ?it/s]

model:   0%|          | 0/50 [00:00<?, ?it/s]

model:   0%|          | 0/100 [00:00<?, ?it/s]

phase:   0%|          | 0/3 [00:00<?, ?it/s]

model:   0%|          | 0/50 [00:00<?, ?it/s]

model:   0%|          | 0/50 [00:00<?, ?it/s]

## Scoring

Let's see how the attack does on `train`, for which we have the ground truth. 
When preparing a submission, you can use part of `train` to develop an attack and a held-out part to evaluate your attack. 

In [None]:
from mico_competition.scoring import tpr_at_fpr, score, generate_roc, generate_table
from sklearn.metrics import roc_curve, roc_auc_score

FPR_THRESHOLD = 0.1

all_scores = {}
phases = ['train']

for scenario in tqdm(scenarios, desc="scenario"): 
    all_scores[scenario] = {}    
    for phase in tqdm(phases, desc="phase"):
        predictions = []
        solutions  = []

        root = os.path.join(CHALLENGE, scenario, phase)
        for model_folder in tqdm(sorted(os.listdir(root), key=lambda d: int(d.split('_')[1])), desc="model"):
            path = os.path.join(root, model_folder)
            predictions.append(np.loadtxt(os.path.join(path, "prediction.csv"), delimiter=","))
            solutions.append(np.loadtxt(os.path.join(path, "solution.csv"),   delimiter=","))

        predictions = np.concatenate(predictions)
        solutions = np.concatenate(solutions)
        
        scores = score(solutions, predictions)
        all_scores[scenario][phase] = scores

Let's plot the ROC curve for the attack and see how the attack performed on different metrics

In [None]:
import matplotlib.pyplot as plt
import matplotlib

for scenario in scenarios:
    fpr = all_scores[scenario]['train']['fpr']
    tpr = all_scores[scenario]['train']['tpr']
    fig = generate_roc(fpr, tpr)
    fig.suptitle(f"{scenario}", x=-0.1, y=0.5)
    fig.tight_layout(pad=1.0)

In [None]:
import pandas as pd

for scenario in scenarios:
    print(scenario)
    scores = all_scores[scenario]['train']
    scores.pop('fpr', None)
    scores.pop('tpr', None)
    display(pd.DataFrame([scores]))

## Packaging the submission

Now we can store the predictions into a zip file, which you can submit to CodaLab.

In [None]:
import zipfile

phases = ['dev', 'final']
experiment_name = "predictions_gradient_multiupdates"

with zipfile.ZipFile(f"{experiment_name}.zip", 'w') as zipf:
    for scenario in tqdm(scenarios, desc="scenario"): 
        for phase in tqdm(phases, desc="phase"):
            root = os.path.join(CHALLENGE, scenario, phase)
            for model_folder in tqdm(sorted(os.listdir(root), key=lambda d: int(d.split('_')[1])), desc="model"):
                path = os.path.join(root, model_folder)
                file = os.path.join(path, "prediction.csv")
                if os.path.exists(file):
                    zipf.write(file)
                else:
                    raise FileNotFoundError(f"`prediction.csv` not found in {path}. You need to provide predictions for all challenges")