# Auditing a CNN trained on CIFAR100 using the Reference Attack

## Introduction

In this tutorial, we will see:

- How to specify the dataset and model for Privacy Meter
- How to audit a Tensorflow model
- How to use the `ReferenceMetric` to evaluate membership leakage using loss values from reference models

## Imports

In [1]:
import numpy as np
import tensorflow as tf

For now we install the Privacy Meter library from the local source. A version will be pushed to pip soon.

In [2]:
import sys
!{sys.executable} -m pip install -e ../.
from privacy_meter.dataset import Dataset
from privacy_meter.model import TensorflowModel
from privacy_meter.information_source import InformationSource
from privacy_meter.audit import Audit, MetricEnum

Obtaining file:///Users/aadyaamaddi/Desktop/ML%20Privacy%20Meter/privacy_meter
  Preparing metadata (setup.py) ... [?25ldone
[?25hInstalling collected packages: privacy-meter
  Attempting uninstall: privacy-meter
    Found existing installation: privacy-meter 1.0
    Uninstalling privacy-meter-1.0:
      Successfully uninstalled privacy-meter-1.0
  Running setup.py develop for privacy-meter
Successfully installed privacy-meter-1.0


## Settings

Setting seed for reproducibility:

In [3]:
seed = 1234
np.random.seed(seed)
rng = np.random.default_rng(seed=seed)

Hyperparameters:

In [4]:
# for training the target and reference models
num_train_points = 5000
num_test_points = 5000
num_reference_train_points = 10000
loss_fn = tf.keras.losses.CategoricalCrossentropy()
optim_fn = 'adam'
epochs = 25
batch_size = 64
regularizer_penalty = 0.01
regularizer = tf.keras.regularizers.l2(l=regularizer_penalty)

In [5]:
# for the reference metric
num_reference_models = 10
fpr_tolerance_list = [
    0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0
]

## Dataset creation

We use the CIFAR100 dataset for this tutorial. As Tensorflow already has the data loading code for CIFAR100, we just need to add our pre-processing code on top of it.

In [6]:
def preprocess_cifar100_dataset():
    input_shape, num_classes = (32, 32, 3), 100

    # split the data between train and test sets
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar100.load_data()

    # scale images to the [0, 1] range
    x_train = x_train.astype("float32") / 255
    x_test = x_test.astype("float32") / 255

    # convert labels into one hot vectors
    y_train = tf.keras.utils.to_categorical(y_train, num_classes)
    y_test = tf.keras.utils.to_categorical(y_test, num_classes)

    return x_train, y_train, x_test, y_test, input_shape, num_classes

x_train_all, y_train_all, x_test_all, y_test_all, input_shape, num_classes = preprocess_cifar100_dataset()

CIFAR100 comes with the predetermined train and test partitions. We further split the train partition into more sets - 'train' and 'reference' for the audit. 

We will have the following sets at the end of this partitioning:

- The 'train' set will be used to train the target model. It will be used as the 'member' set for the audit.
- The 'test' set will be used as the 'non-member' set for the audit.
- The 'reference' set will be used later as the pool of data to train the reference models.

In [7]:
x_train, y_train = x_train_all[:num_train_points], y_train_all[:num_train_points]
x_test, y_test = x_test_all[:num_test_points], y_test_all[:num_test_points]
x_reference = x_train_all[num_train_points:]
y_reference = y_train_all[num_train_points:]

We wrap the sets into a `Dataset` object, which takes in the following arguments:

- `data_dict` contains the actual dataset, in the form of a 2D dictionary. The first key corresponds to the split name (here we have two: "train" and "test"), and the second key to the feature name (here we also have two: "x" and "y").
- `default_input` contains the name of the feature that should be used as the models input (here "x").
- `default_output` contains the name of the feature that should be used as the label / models output (here "y").

In [8]:
# create the target model's dataset
train_ds = {'x': x_train, 'y': y_train}
test_ds = {'x': x_test, 'y': y_test}
target_dataset = Dataset(
    data_dict={'train': train_ds, 'test': test_ds},
    default_input='x', default_output='y'
)

## Training the target and reference models

We define the Tensorflow model to be used as the target and reference models:

In [9]:
def get_tensorflow_cnn_classifier(input_shape, num_classes, regularizer):
    # TODO: change model architecture
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu',
                                     input_shape=input_shape, kernel_regularizer=regularizer))
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu',
                                     kernel_regularizer=regularizer))
    model.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
    return model

And we compile and train the target model using the target dataset we defined above:

In [10]:
x = target_dataset.get_feature('train', '<default_input>')
y = target_dataset.get_feature('train', '<default_output>')
model = get_tensorflow_cnn_classifier(input_shape, num_classes, regularizer)
model.summary()
model.compile(optimizer=optim_fn, loss=loss_fn, metrics=['accuracy'])
model.fit(x, y, batch_size=batch_size, epochs=epochs, verbose=2)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 2304)              0         
_________________________________________________________________
dropout (Dropout)            (None, 2304)              0         
_________________________________________________________________
dense (Dense)                (None, 100)               2

<tensorflow.python.keras.callbacks.History at 0x7fdbb2b378d0>

We wrap the target model in the `TensorflowModel` object:

In [11]:
target_model = TensorflowModel(model_obj=model, loss_fn=loss_fn)

We will now sample data from the reference pool and train reference models, and wrap each one in a `TensorflowModel` object:

In [12]:
reference_models = []
for model_idx in range(num_reference_models):
    print(f"Training reference model {model_idx}...")
    indices = rng.choice(a=x_reference.shape[0], size=num_reference_train_points)
    x_subset, y_subset = x_reference[indices], y_reference[indices]
    reference_model = get_tensorflow_cnn_classifier(input_shape, num_classes, regularizer)
    reference_model.compile(optimizer=optim_fn, loss=loss_fn, metrics=['accuracy'])
    reference_model.fit(x_subset, y_subset, batch_size=batch_size, epochs=epochs, verbose=2)
    reference_models.append(
        TensorflowModel(model_obj=reference_model, loss_fn=loss_fn)
    )

Training reference model 0...
Train on 10000 samples
Epoch 1/25
10000/10000 - 11s - loss: 4.5359 - accuracy: 0.0413
Epoch 2/25
10000/10000 - 10s - loss: 3.9506 - accuracy: 0.1299
Epoch 3/25
10000/10000 - 10s - loss: 3.6933 - accuracy: 0.1767
Epoch 4/25
10000/10000 - 10s - loss: 3.5097 - accuracy: 0.2060
Epoch 5/25
10000/10000 - 10s - loss: 3.3742 - accuracy: 0.2338
Epoch 6/25
10000/10000 - 10s - loss: 3.2678 - accuracy: 0.2538
Epoch 7/25
10000/10000 - 10s - loss: 3.1645 - accuracy: 0.2746
Epoch 8/25
10000/10000 - 10s - loss: 3.0776 - accuracy: 0.2956
Epoch 9/25
10000/10000 - 10s - loss: 2.9977 - accuracy: 0.3120
Epoch 10/25
10000/10000 - 10s - loss: 2.9261 - accuracy: 0.3276
Epoch 11/25
10000/10000 - 10s - loss: 2.8550 - accuracy: 0.3455
Epoch 12/25
10000/10000 - 11s - loss: 2.8181 - accuracy: 0.3526
Epoch 13/25
10000/10000 - 11s - loss: 2.7545 - accuracy: 0.3647
Epoch 14/25
10000/10000 - 11s - loss: 2.6988 - accuracy: 0.3847
Epoch 15/25
10000/10000 - 10s - loss: 2.6491 - accuracy: 0.3

Training reference model 5...
Train on 10000 samples
Epoch 1/25
10000/10000 - 11s - loss: 4.5489 - accuracy: 0.0394
Epoch 2/25
10000/10000 - 11s - loss: 4.0579 - accuracy: 0.1065
Epoch 3/25
10000/10000 - 10s - loss: 3.7807 - accuracy: 0.1509
Epoch 4/25
10000/10000 - 10s - loss: 3.6056 - accuracy: 0.1864
Epoch 5/25
10000/10000 - 10s - loss: 3.4717 - accuracy: 0.2122
Epoch 6/25
10000/10000 - 10s - loss: 3.3647 - accuracy: 0.2329
Epoch 7/25
10000/10000 - 10s - loss: 3.2621 - accuracy: 0.2530
Epoch 8/25
10000/10000 - 10s - loss: 3.1813 - accuracy: 0.2757
Epoch 9/25
10000/10000 - 10s - loss: 3.0974 - accuracy: 0.2933
Epoch 10/25
10000/10000 - 10s - loss: 3.0357 - accuracy: 0.3009
Epoch 11/25
10000/10000 - 10s - loss: 2.9782 - accuracy: 0.3146
Epoch 12/25
10000/10000 - 10s - loss: 2.9267 - accuracy: 0.3237
Epoch 13/25
10000/10000 - 11s - loss: 2.8738 - accuracy: 0.3387
Epoch 14/25
10000/10000 - 11s - loss: 2.8223 - accuracy: 0.3530
Epoch 15/25
10000/10000 - 10s - loss: 2.7875 - accuracy: 0.3

## Information Sources

We can now define two `InformationSource` objects. Basically, an information source is an abstraction representing a set of models, and their corresponding dataset. Note that for the `ReferenceMetric` we use the same dataset in both the target and reference information sources, but the models that will be used for querying the dataset will differ.

In [13]:
target_info_source = InformationSource(
    models=[target_model],
    datasets=[target_dataset]
)

reference_info_source = InformationSource(
    models=reference_models,
    datasets=[target_dataset]
)

## Metric and Audit

We now create a `Metric` object, which is an abstraction representing an algorithm used to measure something on an `InformationSource`, such as membership information leakage. In this case, we use the `ReferenceMetric` to measure the membership information leakage of `target_info_source` in a black-box setting, using loss values returned by the reference model on the target dataset in `reference_info_source`.

The `Audit` object is a wrapper to actually run the audit, and display the results. More visualization options will be added soon.

As we will be using the default version of the `ReferenceMetric`, we pass the `REFERENCE` enum value as the metric argument for the `Audit` object.

In [14]:
audit_obj = Audit(
    metric=MetricEnum.REFERENCE,
    target_info_source=target_info_source,
    reference_info_source=reference_info_source,
    fpr_tolerance_list=fpr_tolerance_list
)
audit_obj.prepare()

In [15]:
audit_results = audit_obj.run()
for result in audit_results:
    print(result)

Results are stored in: /Users/aadyaamaddi/Desktop/ML Privacy Meter/privacy_meter/docs/log_2022-04-06_18-42-22
Accuracy          = 0.6893
ROC AUC Score     = 0.6892999999999999
FPR               = 0.0896
TN, FP, FN, TP    = (4552, 448, 2659, 2341)
Accuracy          = 0.6893
ROC AUC Score     = 0.6892999999999999
FPR               = 0.0896
TN, FP, FN, TP    = (4552, 448, 2659, 2341)
Accuracy          = 0.7331
ROC AUC Score     = 0.7331
FPR               = 0.159
TN, FP, FN, TP    = (4205, 795, 1874, 3126)
Accuracy          = 0.7451
ROC AUC Score     = 0.7451000000000001
FPR               = 0.2292
TN, FP, FN, TP    = (3854, 1146, 1403, 3597)
Accuracy          = 0.7422
ROC AUC Score     = 0.7421999999999999
FPR               = 0.3032
TN, FP, FN, TP    = (3484, 1516, 1062, 3938)
Accuracy          = 0.7274
ROC AUC Score     = 0.7273999999999999
FPR               = 0.382
TN, FP, FN, TP    = (3090, 1910, 816, 4184)
Accuracy          = 0.7139
ROC AUC Score     = 0.7139000000000001
FPR           