# Membership Privacy Risk Score Examples

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/privML/privacy-evaluator/blob/feat/116-testable-notebooks/notebooks/membership_privacy_risk_score.ipynb"><img src="https://raw.githubusercontent.com/privML/privacy-evaluator/feat/116-testable-notebooks/notebooks/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/privML/privacy-evaluator/blob/feat/116-testable-notebooks/notebooks/membership_privacy_risk_score.ipynb"><img src="https://raw.githubusercontent.com/privML/privacy-evaluator/feat/116-testable-notebooks/notebooks/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Overview
In this notebook we'll use two simple image classification models pre-trained on the CIFAR10 dataset. The architecture of both models are based on https://www.scitepress.org/Papers/2018/67520/67520.pdf, one being implemented in PyTorch and the other in TensorFlow. The models will be used to compute each sample's probability of being in the training set, denoted as membership privacy risk score or short privacy risk score (cf. https://arxiv.org/abs/2003.10595). As dataset we will use the original CIFAR10 dataset.

## Setup
First, we should set this notebook's runtime to use a GPU (e.g. if Colab is used go to ***Runtime > Change runtime type > Hardware accelerator***). Now we can install the `privacy-evaluator` package and import all needed modules.

In [None]:
!pip3 install git+https://github.com/privML/privacy-evaluator@feat/116-testable-notebooks

In [None]:
import tensorflow as tf
import torch
import numpy as np

import privacy_evaluator.models.torch.dcti.dcti as torch_dcti
import privacy_evaluator.models.tf.dcti.dcti as tf_dcti 
from privacy_evaluator.datasets.cifar10 import CIFAR10
from privacy_evaluator.classifiers.classifier import Classifier

from privacy_evaluator.metrics.privacy_risk_score import * 
from privacy_evaluator.output.user_output_privacy_score import UserOutputPrivacyScore

## Compute Membership Privacy Risk Scores

### PyTorch

We start the evaluation with the PyTorch model. 

#### Load CIFAR10 Dataset

Before we can start computing the membership privacy risk scores, we need to load the dataset. The CIFAR10 dataset needs to be preprocesses in a specific manner to work for the PyTorch model. 

In [None]:
# Load CIFAR10 dataset as numpy array
x_train, y_train, x_test, y_test = CIFAR10.numpy(model_type='torch')

# Number of classes of CIFAR10 dataset
nb_classes=CIFAR10.N_CLASSES, 

# Input shape of CIFAR10 dataset
input_shape=CIFAR10.INPUT_SHAPE["torch"]

#### Prepare target model

Now we can set the loss function (in our case it is the `torch.nn.CrossEntropyLoss`) of the model and initialize the model as a generic `Classifier` with the corresponding values.

In [None]:
# Loss function of the Pytrorch model
loss = torch.nn.CrossEntropyLoss(reduction="none")

# Initalize PyTorch model as a Classifier
target_model = Classifier(
    torch_dcti.load_dcti(), 
    loss=loss, 
    nb_classes=nb_classes, 
    input_shape=input_shape
)

#### Compute privacy risk score
We can then compute the privacy risk scores. As a result, we get scores for the train and test set separately. 

In [None]:
# Compute membership privacy risk score for the PyTorch model
(
    train_privacy_risk_score, 
    test_privacy_risk_score
) = compute_privacy_risk_score(
    target_model, 
    x_train[:100], 
    y_train[:100], 
    x_test[:100], 
    y_test[:100]
)

#### Visualise the result
As the last step, we then visualize the results as a histogram. The histogram depicts the k-top most vulnerable points of the dataset per class.

In [None]:
# Create user output and plot histogram for train dataset
user_output = UserOutputPrivacyScore(
    np.argmax(y_train[:100], axis=1),
    train_privacy_risk_score, 
)

In [None]:
# Plot absolut values 
labels, count = user_output.histogram_top_k(range(10), 50)

In [None]:
# Plot relative values 
labels, count = user_output.histogram_top_k_relative(range(10), 50)

In [None]:
# Create user output and plot histogram for test dataset
user_output = UserOutputPrivacyScore(
    np.argmax(y_test[:100], axis=1),
    test_privacy_risk_score, 
)

In [None]:
# Plot absolut values 
labels, count = user_output.histogram_top_k(range(10), 50)

In [None]:
# Plot relative values 
labels, count = user_output.histogram_top_k_relative(range(10), 50)

### TensorFlow

Now we do the same with the TensorFlow model.

#### Load CIFAR10 Dataset

Again, we load the correct dataset for the TensorFLow model.

In [None]:
# Load CIFAR10 dataset as numpy array
x_train, y_train, x_test, y_test = CIFAR10.numpy(model_type='tf')

# Number of classes of CIFAR10 dataset
nb_classes=CIFAR10.N_CLASSES, 

# Input shape of CIFAR10 dataset
input_shape=CIFAR10.INPUT_SHAPE["tf"]

#### Prepare target model

Then we initialize the target model. This time we use the `tf.keras.losses.CategoricalCrossentropy` as loss function.

In [None]:
# Loss function of the TensorFlow target model
loss = tf.keras.losses.CategoricalCrossentropy()

# Initalize TensorFlow target model
target_model = Classifier(
    tf_dcti.load_dcti(), 
    loss=loss, 
    nb_classes=nb_classes, 
    input_shape=input_shape
)

#### Compute privacy risk score
Next, we can then compute the privacy risk scores. As a result, we get scores for the train and test set separately. 

In [None]:
# Compute privacy risk score for the TensorFlow target model
(
    train_privacy_risk_score, 
    test_privacy_risk_score
) = compute_privacy_risk_score(
    target_model, 
    x_train[:100], 
    y_train[:100], 
    x_test[:100], 
    y_test[:100]
)

#### Visualise the result
Now, we can again visualize the results as a histogram. The histogram depicts the k-top most vulnerable points of the dataset per class.

In [None]:
# Create user output and plot histogram for train dataset
user_output = UserOutputPrivacyScore(
    np.argmax(y_train[:100], axis=1), 
    train_privacy_risk_score, 
)

In [None]:
# Plot absolut values 
labels, count = user_output.histogram_top_k(range(10), 50)

In [None]:
# Plot relative values 
labels, count = user_output.histogram_top_k_relative(range(10), 50)

In [None]:
# Create user output and plot histogram for test dataset
user_output = UserOutputPrivacyScore(
    np.argmax(y_test[:100], axis=1), 
    test_privacy_risk_score, 
)

In [None]:
# Plot absolut values 
labels, count = user_output.histogram_top_k(range(10), 50)

In [None]:
# Plot absolut values 
labels, count = user_output.histogram_top_k_relative(range(10), 50)