# Membership Inference Attack Examples

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/privML/privacy-evaluator/blob/feat/116-testable-notebooks/notebooks/membership_inference_attack.ipynb"><img src="https://raw.githubusercontent.com/privML/privacy-evaluator/feat/116-testable-notebooks/notebooks/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/privML/privacy-evaluator/blob/feat/116-testable-notebooks/notebooks/membership_inference_attack.ipynb"><img src="https://raw.githubusercontent.com/privML/privacy-evaluator/feat/116-testable-notebooks/notebooks/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Overview

In this notebook, we want to show you how to use the `privacy-evaluator` tool to perform Membership Inference Attacks (short MIA) on either a PyTorch or Tensorflow model. We will conduct three different Membership Inference Attack Methods: a Black Box Attack, a Black Box Rule-Based Attack and a Label Only Decision Boundary Attack.

## Setup

First, you should set the notebook's runtime to use a GPU (e.g. if Colab is used go to ***Runtime > Change runtime type > Hardware accelerator***). Now we can install the `privacy-evaluator` package and import all needed modules.

In [None]:
!pip3 install git+https://github.com/privML/privacy-evaluator@feat/116-testable-notebooks

In [None]:
import tensorflow as tf
import torch
import numpy as np

import tensorflow.python.ops.numpy_ops.np_config as np_config
np_config.enable_numpy_behavior()

import privacy_evaluator.models.torch.dcti.dcti as torch_dcti
import privacy_evaluator.models.tf.dcti.dcti as tf_dcti 

from privacy_evaluator.datasets.tf.cifar10 import TFCIFAR10
from privacy_evaluator.datasets.torch.cifar10 import TorchCIFAR10

from privacy_evaluator.classifiers.classifier import Classifier

from privacy_evaluator.attacks.membership_inference.black_box import MembershipInferenceBlackBoxAttack
from privacy_evaluator.attacks.membership_inference.black_box_rule_based import MembershipInferenceBlackBoxRuleBasedAttack
from privacy_evaluator.attacks.membership_inference.label_only_decision_boundary import MembershipInferenceLabelOnlyDecisionBoundaryAttack

## Conduct Membership Inference Attacks

Now we can start with conducting the Membership Inference Attacks. Therefore, we prepared two similar paths: one for the PyTorch model and one for the TensorFlow model. For both paths, we use simple neural networks trained on the CIFAR-10 dataset and implement a Lightweight Deep Convolutional Neural Network architecture (For more details, please read the following paper: https://www.scitepress.org/Papers/2018/67520/67520.pdf).

### PyTorch

We start the evaluation with the PyTorch model.

#### Load CIFAR10 Dataset

Before we can start to conduct the membership inference attacks, we need to load the dataset. The CIFAR10 dataset needs to be preprocessed in a specific manner to work for the PyTorch model.

In [None]:
# Load CIFAR10 dataset as numpy array
x_train, y_train, x_test, y_test = TorchCIFAR10.numpy()

#### Prepare target model

Now, we need to initialize our pre-trained Lightweight Deep Convolutional Neural Network (short DCTI) as a generic `Classifier`. Therefore we need to specify the loss function used to train the model (in our case the `torch.nn.CrossEntropyLoss`), the number of classes and the input shape of our CIFAR-10 dataset.

In [None]:
# Initalize PyTorch model as a Classifier
target_model = Classifier(
    torch_dcti.load_dcti(), # PyTorch DCTI 
    loss=torch.nn.CrossEntropyLoss(reduction="none"), # Loss function of the PyTorch model
    nb_classes=TorchCIFAR10.N_CLASSES, # Number of classes of the CIFAR10 dataset
    input_shape=TorchCIFAR10.INPUT_SHAPE # Input shape of the CIFAR10 dataset
)

#### Perform Membership Inference Black Box Attack

First, we want to attack our target model with the Membership Inference  Black Box Attack. Thus, we initialize the attack with the target model and a dataset used to fit the attack model. The dataset consists of two different sets. The first contains data (`x_train`) and its corresponding labels (`y_train`) which were used to train the target model. The second contains data (`x_test`) and its corresponding labels (`y_test`) which were not part of the training process of the target model. After the initialization, we first need to fit the attack model before we can attack the target model. To attack certain data points, we simply input them into the `attack()` method. The result of the attack is an array holding the inferred membership status, 1 indicates a member and 0 indicates non-member.

In [None]:
attack = MembershipInferenceBlackBoxAttack(
    target_model, 
    x_train[:100], 
    y_train[:100], 
    x_test[:100], 
    y_test[:100]
)

attack.fit()
attack.attack(x_train[:100], y_train[:100])

#### Get machine-readable attack statistics

Besides the inferred membership status, we can create more general statistics. To do so, we generate an attack output by providing again the data points which should be attacked and the correct inferred membership labels (in this case all attacked data points are part of the training dataset and thus for all of them a membership should be predicted by the attack model). As result, we get the accuracy, the train-to-test accuracy gap and the train-to-test ratio for the target model and the accuracy for the attack model.

In [None]:
output = attack.attack_output(
    x_train[:100], 
    y_train[:100], 
    np.ones((100,))
)

output.to_json()

#### Perform Membership Inference Black Box Rule Based Attack

Next, we want to perform a Membership Inference Black Box Rule-Based Attack. In this case, we do not need to fit the attack because this approach is fully rule-based and depends only on the attacked data points and the target model. That means, every time the target model classifies a data point correctly, the attack model identifies the datapoint as a member and vice versa.

In [None]:
attack = MembershipInferenceBlackBoxRuleBasedAttack(
    target_model, 
    x_train, 
    y_train, 
    x_test, 
    y_test
)

attack.attack(x_train, y_train)

#### Get pythonic attack statistics

Again, we want to get a better overview of the attack results. This time we want to receive the result in a more pythonic manner and we are only interested in the attack model's accuracy. Therefore, we filter the output and convert it into a dictionary.

In [None]:
output = attack.attack_output(
    x_train, 
    y_train,
    np.ones((len(y_train),))
)

output.to_dict(filter=["attack_model_accuracy"])

#### Perform Membership Inference Label Only Decision Boundary Attack

Last but not least, we want to perform a Label Only Decision Boundary Attack. We again need to train the attack model first. This attack is really time expensive, thus we only fit the attack on a very small set which is not realistic and should not be repeated in production.

In [None]:
attack = MembershipInferenceLabelOnlyDecisionBoundaryAttack(
    target_model, 
    x_train[:1], 
    y_train[:1], 
    x_test[:1], 
    y_test[:1]
)

attack.fit(max_iter=1, max_eval=1, init_eval=1)
attack.attack(x_train[:1], y_train[:1])

#### Get human-readable attack statistics

For this attack we want a human-readable output, thus we convert it to a sting.

In [None]:
output = attack.attack_output(
    x_train[:1], 
    y_train[:1], 
    np.ones((1,))
)

str(output)

### TensorFlow

Now we do the same with the TensorFlow model.

#### Load CIFAR10 Dataset

Again, before we can start to conduct the membership inference attacks, we need to load the dataset. The CIFAR10 dataset needs to be preprocessed in a specific manner to work for the TensorFlow model.

In [None]:
# Load CIFAR10 dataset as numpy array
x_train, y_train, x_test, y_test = TFCIFAR10.numpy()

#### Prepare target model

Now, we need to initialize our pre-trained Lightweight Deep Convolutional Neural Network (short DCTI) as a generic `Classifier`. Therefore we need to specify the loss function used to train the model (in our case the `tf.keras.losses.CategoricalCrossentropy`), the number of classes and the input shape of our CIFAR-10 dataset.

In [None]:
# Initalize TensorFlow target model
target_model = Classifier(
    tf_dcti.load_dcti(), # TensorFlow DCTI
    loss=tf.keras.losses.CategoricalCrossentropy(), # Loss function of the TensorFlow target model
    nb_classes=TFCIFAR10.N_CLASSES, # Number of classes of the CIFAR10 dataset
    input_shape=TFCIFAR10.INPUT_SHAPE # Input shape of the CIFAR10 dataset
)

#### Perform Membership Inference Black Box Attack

First, we want to attack our target model with the Membership Inference  Black Box Attack. Thus, we initialize the attack with the target model and a dataset used to fit the attack model. The dataset consists of two different sets. The first contains data (`x_train`) and its corresponding labels (`y_train`) which were used to train the target model. The second contains data (`x_test`) and its corresponding labels (`y_test`) which were not part of the training process of the target model. After the initialization, we first need to fit the attack model before we can attack the target model. To attack certain data points, we simply input them into the `attack()` method. The result of the attack is an array holding the inferred membership status, 1 indicates a member and 0 indicates non-member.

In [None]:
attack = MembershipInferenceBlackBoxAttack(
    target_model, 
    x_train[:100], 
    y_train[:100], 
    x_test[:100], 
    y_test[:100]
)

attack.fit()
attack.attack(x_train[:100], y_train[:100])

#### Get machine-readable attack statistics

Besides the inferred membership status, we can create more general statistics. To do so, we generate an attack output by providing again the data points which should be attacked and the correct inferred membership labels (in this case all attacked data points are part of the training dataset and thus for all of them a membership should be predicted by the attack model). As result, we get the accuracy, the train-to-test accuracy gap and the train-to-test ratio for the target model and the accuracy for the attack model.

In [None]:
output = attack.attack_output(
    x_train[:100], 
    y_train[:100], 
    np.ones((100,))
)

output.to_json()

#### Perform Membership Inference Black Box Rule Based Attack

Next, we want to perform a Membership Inference Black Box Rule-Based Attack. In this case, we do not need to fit the attack because this approach is fully rule-based and depends only on the attacked data points and the target model. That means, every time the target model classifies a data point correctly, the attack model identifies the datapoint as a member and vice versa.

In [None]:
attack = MembershipInferenceBlackBoxRuleBasedAttack(
    target_model, 
    x_train, 
    y_train, 
    x_test, 
    y_test
)

attack.attack(x_train, y_train)

#### Get pythonic attack statistics

Again, we want to get a better overview of the attack results. This time we want to receive the result in a more pythonic manner and we are only interested in the attack model's accuracy. Therefore, we filter the output and convert it into a dictionary.

In [None]:
output = attack.attack_output(
    x_train, 
    y_train, 
    np.ones((len(y_train),))
)

output.to_dict(filter=["attack_model_accuracy"])

#### Perform Membership Inference Label Only Decision Boundary Attack

Last but not least, we want to perform a Label Only Decision Boundary Attack. We again need to train the attack model first. This attack is really time expensive, thus we only fit the attack on a very small set which is not realistic and should not be repeated in production.

In [None]:
attack = MembershipInferenceLabelOnlyDecisionBoundaryAttack(
    target_model, 
    x_train[:1], 
    y_train[:1], 
    x_test[:1], 
    y_test[:1]
)

attack.fit(max_iter=1, max_eval=1, init_eval=1)
attack.attack(x_train[:1], y_train[:1])

#### Get human-readable attack statistics

For this attack we want a human-readable output, thus we convert it to a sting.

In [None]:
output = attack.attack_output(
    x_train[:1], 
    y_train[:1], 
    np.ones((1,))
)

str(output)