# Membership inference attack of a binary classification Model

In this notebook, we will evaluate the security of a binary classification model trained on the Adult dataset. We will use the ML-leaks method to perform membership inference attack (MIA) to train an attacker model and measure the model's security against this attacker using traditional metrics.

In [1]:
from holisticai.security.attackers import MLleaks
from holisticai.datasets import load_dataset
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn import metrics

## Loading the dataset

We will use the Adult dataset, which contains census data. The target variable is whether a person's income exceeds $50K/year, and the protected attribute we will consider is 'sex'. For time constraints, we will only use a small subset of the data for testing the model and use the attackers.

Following the MIA pipeline of this method, we will assume that the attacker has a dataset that comes from the same distribution as the training dataset. This dataset will be used to train a shadow model that will be used to mimic the target model. 

In [2]:
dataset = load_dataset('adult', protected_attribute='sex', preprocessed=True)
train_test = dataset.train_test_split(test_size=0.5, random_state=42)
target = train_test['train'].train_test_split(test_size=0.5, random_state=42)
shadow = train_test['test'].train_test_split(test_size=0.5, random_state=42)

In [3]:
X_target_train = target['train']['X']
y_target_train = target['train']['y']
X_target_test = target['test']['X']
y_target_test = target['test']['y']

X_shadow_train = shadow['train']['X']
y_shadow_train = shadow['train']['y']
X_shadow_test = shadow['test']['X']
y_shadow_test = shadow['test']['y']

print("Training set size:", X_target_train.shape[0])
print("Test set size:", X_target_test.shape[0])

print("Shadow Training set size:", X_shadow_train.shape[0])
print("Shadow Test set size:", X_shadow_test.shape[0])

Training set size: 11305
Test set size: 11306
Shadow Training set size: 11305
Shadow Test set size: 11306


## Training the target model

In [4]:
target_model = RandomForestClassifier(random_state=42)
target_model.fit(X_target_train, y_target_train)
y_target_pred = target_model.predict(X_target_test)
print("Target Model Performance:")
print("Accuracy:", accuracy_score(y_target_test, y_target_pred))

Target Model Performance:
Accuracy: 0.8439766495666018


## Setting up the membership inference attack

We will use the ML-leaks method to perform membership inference attack (MIA) presented by Salem et al. in 2018. The method relaxes the assumption that the attacker uses multiple shadow models to train the attacker model. Instead, it uses a single shadow model to get the confidence scores, generate and attacker dataset and use it to train the attacker model.

In [5]:
tgt_dataset = ((X_target_train, y_target_train), (X_target_test, y_target_test))
sdw_dataset = ((X_shadow_train, y_shadow_train), (X_shadow_test, y_shadow_test))

mia_attacker = MLleaks(target_model, tgt_dataset, sdw_dataset)
train_attacker_data, test_attacker_data = mia_attacker.generate_attack_dataset()

In [6]:
X_mia_train, y_mia_train = train_attacker_data
X_mia_test, y_mia_test = test_attacker_data

In [7]:
# Train the Attacker Model
attacker_model = mia_attacker.fit()
y_mia_test_pred = attacker_model.predict(X_mia_test)
print("Attacker Model Performance:")
print("Accuracy:", accuracy_score(y_mia_test, y_mia_test_pred))

Attacker Model Performance:
Accuracy: 0.6052363893680067


In [9]:
precision = metrics.precision_score(y_mia_test, y_mia_test_pred)
recall = metrics.recall_score(y_mia_test, y_mia_test_pred)
f1 = metrics.f1_score(y_mia_test, y_mia_test_pred)
print("Attacker Precision:", precision)
print("Attacker Recall:", recall)
print("Attacker F1 Score:", f1)

Attacker Precision: 0.5752705182560274
Attacker Recall: 0.8041574524546661
Attacker F1 Score: 0.6707245093699277


From these results, we can see that the attacker model achieves a precision of 0.575 and a recall of 0.805. The F1 score is 0.67, which indicates that the attacker model is able to identify the members of the target model with a reasonable accuracy. 