# Attack Runner Tutorial
This notebook shows how to use the attack runner script. It's goal will be to run one MIA attack and one Direct GMIA attack and build one attack report for both attacks.

Each attack configuration is based on its tutorial configuration:

<table class="tfo-notebook-buttons" align="left">
        <td>
        <a href="https://colab.research.google.com/github/hallojs/ml-pepr/blob/master/notebooks/mia_tutorial.ipynb"><img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="42" height="42" />MIA Tutorial</a>
        </td>
        <td>
        <a href="https://colab.research.google.com/github/hallojs/ml-pepr/blob/master/notebooks/gmia_tutorial.ipynb"><img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="42" height="42" />Direct GMIA Tutorial</a>
        </td>
</table>

## Prepare Environment
**Important: Restart the Runtime after this Cell!**
The restart is needed because of `pip install -e`.

In [None]:
!git clone https://github.com/hallojs/ml-pepr.git
%pip install -e ml-pepr
%pip install pylatex

## Imports
Note: These are the imports needed by this notebook. If a function like a `create_model` function needs additional imports, they should be defined inside the function body. In this way the attack runner can evaluate the imports dynamically during execution.

In [None]:
from pepr import attack_runner

import tensorflow as tf

import numpy as np
import logging

## Setup Logging

In [None]:
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s', '%Y-%m-%d %H:%M:%S')

# TensorFlow Logger
file_handler_tf = logging.FileHandler('tf.log')
file_handler_tf.setLevel(logging.INFO)
file_handler_tf.setFormatter(formatter)

tf.get_logger().setLevel(logging.INFO)
logger_tf = tf.get_logger()
logger_tf.addHandler(file_handler_tf)

# PePR Logger
level = logging.DEBUG
stream_handler_pr = logging.StreamHandler()
stream_handler_pr.setLevel(level)
stream_handler_pr.setFormatter(formatter)

# -- Add MIA logger
file_handler_pr = logging.FileHandler('pepr.privacy.mia.log')
file_handler_pr.setLevel(level)
file_handler_pr.setFormatter(formatter)
logger_pr = logging.getLogger('pepr.privacy.mia')
logger_pr.addHandler(file_handler_pr)
logger_pr.addHandler(stream_handler_pr)

# -- Add GMIA logger
file_handler_pr = logging.FileHandler('pepr.privacy.gmia.log')
file_handler_pr.setLevel(level)
file_handler_pr.setFormatter(formatter)
logger_pr = logging.getLogger('pepr.privacy.gmia')
logger_pr.addHandler(file_handler_pr)
logger_pr.addHandler(stream_handler_pr)

# -- Add attack runner logger
file_handler_pr = logging.FileHandler('pepr.attack_runner.log')
file_handler_pr.setLevel(level)
file_handler_pr.setFormatter(formatter)
logger_pr = logging.getLogger('pepr.attack_runner')
logger_pr.addHandler(file_handler_pr)
logger_pr.addHandler(stream_handler_pr)

## Functions
Functions for creating models and preparing the dataset.

### MIA Functions
Define functions used by the MIA configuration.

In [None]:
def get_target_model(input_shape, number_of_labels):
    target_model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, (3,3), activation="tanh", padding='same', input_shape=input_shape),
        tf.keras.layers.MaxPool2D((2,2)),
        tf.keras.layers.Conv2D(64, (3,3), activation="tanh", padding='same'),
        tf.keras.layers.MaxPool2D((2,2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation="tanh"),
        tf.keras.layers.Dense(number_of_labels),
        tf.keras.layers.Softmax()
    ])
    return target_model

def get_attack_model(number_of_labels):
    attack_model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, activation="relu", input_shape=(number_of_labels,)),
        tf.keras.layers.Dense(1, activation="sigmoid"),
    ])
    return attack_model

def create_compile_shadow_model():
    """Create compiled target/shadow model.

    Returns
    -------
    tensorflow.python.keras.engine.sequential.Sequential
        A compiled tensorflow model.
    """

    from tensorflow.keras import models
    from tensorflow.keras import optimizers

    input_shape = (32, 32, 3)
    number_classes = 100

    model = get_target_model(input_shape, number_classes)

    optimizer = optimizers.Adam(lr=0.0001)
    loss = 'sparse_categorical_crossentropy'
    metrics = ["accuracy"]
    model.compile(optimizer, loss=loss, metrics=metrics)

    return model

def create_compile_attack_model():
    """Create compiled attack model.

    Returns
    -------
    tensorflow.python.keras.engine.sequential.Sequential
        A compiled tensorflow model.
    """

    from tensorflow.keras import models
    from tensorflow.keras import optimizers

    number_classes = 100

    model = get_attack_model(number_classes)

    optimizer = optimizers.Adam(lr=0.0001)
    loss = 'binary_crossentropy'
    metrics = ["accuracy"]
    model.compile(optimizer, loss=loss, metrics=metrics)

    return model

def load_cifar100():
    """Loads and preprocesses the CIFAR100 dataset.

    Returns
    -------
    tuple
        (training data, training labels, test data, test labels)
    """
    train, test = tf.keras.datasets.cifar100.load_data()
    train_data, train_labels = train
    test_data, test_labels = test

    # Normalize the data to a range between 0 and 1
    train_data = np.array(train_data, dtype=np.float32) / 255
    test_data = np.array(test_data, dtype=np.float32) / 255

    # Reshape the images to (32, 32, 3)
    train_data = train_data.reshape(train_data.shape[0], 32, 32, 3)
    test_data = test_data.reshape(test_data.shape[0], 32, 32, 3)

    train_labels = np.reshape(np.array(train_labels, dtype=np.int32), (train_labels.shape[0],))
    test_labels = np.reshape(np.array(test_labels, dtype=np.int32), (test_labels.shape[0],))

    return np.vstack((train_data, test_data)), np.hstack((train_labels, test_labels))

### Direct GMIA Functions
Define functions used by the GMIA configuration.

In [None]:
def create_model(input_shape, n_categories):
    """Architecture of the target and reference models.

    Parameters
    ----------
    input_shape : tuple
        Dimensions of the input for the target/training
    n_categories : int
        number of categories for the prediction
    models.

    Returns
    -------
    tensorflow.python.keras.engine.sequential.Sequential
        A convolutional neuronal network model.
    """

    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import MaxPooling2D
    from tensorflow.keras.layers import Conv2D
    from tensorflow.keras.layers import Activation
    from tensorflow.keras.layers import Dropout
    from tensorflow.keras.layers import Flatten
    from tensorflow.keras.layers import Dense

    model = Sequential()

    # first convolution layer
    model.add(Conv2D(filters=32, kernel_size=(5, 5), strides=(
        1, 1), padding='same', input_shape=input_shape))
    model.add(Activation('relu'))

    # max pooling layer
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))

    # second convolution layer
    model.add(Conv2D(filters=64, kernel_size=(
        5, 5), strides=(1, 1), padding='same'))
    model.add(Activation('relu'))

    # max pooling layer
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'))

    # fully connected layer
    model.add(Flatten())
    model.add(Dense(1024))
    model.add(Activation('relu'))

    # drop out
    model.add(Dropout(rate=0.5))

    # fully connected layer
    model.add(Dense(n_categories))
    model.add(Activation('softmax'))

    return model

def create_compile_model():
    """Create compiled model.

    At the moment pepr.gmia needs this function to train the reference models.

    Returns
    -------
    tensorflow.python.keras.engine.sequential.Sequential
        A compiled tensorflow model.
    """

    from tensorflow.keras import models
    from tensorflow.keras import optimizers

    input_shape = (28, 28, 1)
    number_classes = 10

    model = create_model(input_shape, number_classes)

    optimizer = optimizers.Adam(lr=0.0001)
    loss = 'categorical_crossentropy'
    metrics = ["accuracy"]
    model.compile(optimizer, loss=loss, metrics=metrics)

    return model

def load_fashion_mnist():
    """Loads and preprocesses the fashion mnist dataset.

    Returns
    -------
    tuple
        (training data, training labels, test data, test labels)
    """

    train, test = tf.keras.datasets.fashion_mnist.load_data()
    train_data, train_labels = train
    test_data, test_labels = test

    # Normalize the data to a range between 0 and 1
    train_data = np.array(train_data, dtype=np.float32) / 255
    test_data = np.array(test_data, dtype=np.float32) / 255

    # Reshape the images to (28, 28, 1)
    train_data = train_data.reshape(train_data.shape[0], 28, 28, 1)
    test_data = test_data.reshape(test_data.shape[0], 28, 28, 1)

    train_labels = np.array(train_labels, dtype=np.int32)
    test_labels = np.array(test_labels, dtype=np.int32)

    return np.vstack((train_data, test_data)), np.hstack((train_labels, test_labels))

## Train Target Models
Train and save the target models for the attacks.

In [None]:
data_c100, labels_c100 = load_cifar100()
target_model = create_compile_shadow_model()
target_model.fit(data_c100[40000:50000],
                 labels_c100[40000:50000],
                 epochs=1,
                 batch_size=50,
                 verbose=0)
target_model.save('data/target_model_mia')

In [None]:
data_fmnist, labels_fmnist = load_fashion_mnist()
target_model = create_compile_model()
target_model.fit(data_fmnist[40000:50000],
                 tf.keras.utils.to_categorical(labels_fmnist[40000:50000], num_classes=10),
                 epochs=1,
                 batch_size=50,
                 verbose=0)
target_model.save('data/target_model_gmia')

## Attack Runner Configuration

### YAML Configuration file
The attack runner configuration is described in an YAML file. The attack runner will parse it, load stored arrays and execute all specified attacks. More information on how to write a configuration file can be found in the [documentation](https://hallojs.github.io/ml-pepr/attack_runner.html).

Our attack runner configuration for this example looks like this:

```yaml
# Attack Parameters
attack_pars:
  - attack_type: "mia"
    attack_alias: "MIA Tutorial"
    number_shadow_models: 100
    shadow_training_set_size: 2500
    path_to_dataset_data: "datasets/cifar100_data.npy"
    path_to_dataset_labels: "datasets/cifar100_labels.npy"
    number_classes: 100
    <fn>create_compile_shadow_model: "create_compile_shadow_model"
    shadow_epochs: 100
    shadow_batch_size: 50
    <fn>create_compile_attack_model: "create_compile_attack_model"
    attack_epochs: 50
    attack_batch_size: 50
    target_model_paths:
    - "data/target_model_mia"
  - attack_type: "gmia"
    attack_alias: "GMIA Tutorial"
    number_reference_models: 100
    reference_training_set_size: 10000
    path_to_dataset_data: "datasets/fmnist_data.npy"
    path_to_dataset_labels: "datasets/fmnist_labels.npy"
    number_classes: 10
    <fn>create_compile_model: "create_compile_reference_model"
    reference_epochs: 50
    reference_batch_size: 50
    hlf_metric: "cosine"
    hlf_layer_number: 10
    number_target_records: 25
    target_model_paths:
    - "data/target_model_gmia"

# Data Configuration
data_conf:
  - <np>shadow_indices: "datasets/shadow_ref_indices.npy"
    <np>target_indices: "datasets/target_indices.npy"
    <np>evaluation_indices: "datasets/evaluation_indices.npy"
    <np>record_indices_per_target: "datasets/record_indices_per_target.npy"
  - <np>reference_indices: "datasets/shadow_ref_indices.npy"
    <np>target_indices: "datasets/target_indices.npy"
    <np>evaluation_indices: "datasets/evaluation_indices.npy"
    <np>record_indices_per_target: "datasets/record_indices_per_target.npy"
```

### Save datasets
Save arrays that are referenced by the attack runner configuration.

In [None]:
!mkdir -p datasets
np.save("datasets/shadow_ref_indices", np.arange(40000))
np.save("datasets/target_indices", np.arange(40000, 50000))
np.save("datasets/evaluation_indices", np.arange(40000, 60000))
np.save("datasets/record_indices_per_target", np.array([np.arange(10000)]))

np.save("datasets/cifar100_data", data_c100)
np.save("datasets/cifar100_labels", labels_c100)

np.save("datasets/fmnist_data", data_fmnist)
np.save("datasets/fmnist_labels", labels_fmnist)

# Optional: Free memory
del target_model, data_c100, labels_c100, data_fmnist, labels_fmnist

### Function Map
The attack runner can only access functions from the notebook, if it knows the function pointers. The pointers are passed by a dictionary where the keys are the names which the attack runner configuration refers to (for example: `create_compile_model: "create_compile_reference_model"` is resolved to `create_compile_model: create_compile_model` with the function mapping below).

In [None]:
functions = {
    "create_compile_reference_model": create_compile_model,
    "create_compile_shadow_model": create_compile_shadow_model,
    "create_compile_attack_model": create_compile_attack_model,
}

## Run Attacks

In [None]:
attack_paths = attack_runner.run_attacks("ml-pepr/notebooks/attack_runner_tutorial/attack_runner_config.yml",
                          "attack_objects", 
                          functions)

attack_runner.create_report(attack_paths, "report")

In [None]:
# Zip report directory if you want to download it from google colab
!zip -r -q report.zip report