# Shadow Model Attack
## Intuition
This is the first-ever membership inference attack, presented by Shokri et al. https://arxiv.org/abs/1610.05820.
The intuition behind this attack is that models often behave differently on the data that they were trained on versus the data they "see" for the first time.
The simplest method for this is to build a model which distinguishes model outputs calculated on training vs non-training data.
But how does the attacker know whether a sample was used for training or not?


To solve this conundrum, the attack trains **shadow models** (proxies for the real model), where he knows the membership information of the samples and can easily collect the corresponding model outputs. 


The attacker trains _k_ shadow models on his own data, then queries training and non-training samples with these models. 
It creates a data set containing the sample's _true label_, the shadow model's _prediction_ and a binary label indicating whether or not it is a training sample (_in_ or _out_).
It trains the attacker model (which can be any kind of binary classifier) on this dataset.
Its input is the (_real label_, _prediction_) tuple and its output is _in_ or _out_ (meaning the sample is _in_ the training dataset or not).

![title](img/shadow_model_attack.png)

## Overview
How to implement the attack using ART.
#### 1. [Preliminaries](#preliminaries)
1. [Load data and target model](#load)
2. [Wrap model in ART classifier wrapper](#wrap)

#### 2. [Attack](#attack)
1. [Define shadow model](#shadow)
2. [Instantiate attack](#instantiate)
3. [Fit the attack on shadow data](#fit)
3. [Infer membership on evaluation data](#infer)

In [1]:
import torch
from torch import nn
from models.mnist import Net

<a id='preliminaries'></a>
## Preliminaries

<a id='load'></a>
### Load data and target model

In [2]:
import numpy as np
import os
import sys
sys.path.insert(0, os.path.abspath('..'))

from art.utils import load_mnist

# data
(x_train, y_train), (x_test, y_test), _min, _max = load_mnist(raw=True)

# limit training data to 50000 samples
x_train_target = np.expand_dims(x_train, axis=1).astype(np.float32)[:50000]
y_train_target = y_train[:50000]
x_test = np.expand_dims(x_test, axis=1).astype(np.float32)

# shadow data (10 000 disjunct samples)
x_train_shadow = np.expand_dims(x_train, axis=1).astype(np.float32)[50000:]
y_train_shadow = y_train[50000:]

<a id='wrap'></a>
### Wrap model in PyTorchClassifier

In [3]:
import torch.optim as optim
from art.estimators.classification.pytorch import PyTorchClassifier

model = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

art_model = PyTorchClassifier(model=model, loss=criterion, optimizer=optimizer, channels_first=True, input_shape=(1,28,28,), nb_classes=10, clip_values=(_min,_max))

### Fit model if not already pretrained

In [4]:
art_model.fit(x_train_target, y_train_target, nb_epochs=10)

#### Test accuracy

In [7]:
pred = np.array([np.argmax(arr) for arr in art_model.predict(x_train_target)])

print('Base model accuracy: ', np.sum(pred == y_train_target) / len(y_train_target))

Base model accuracy:  0.99304


<a id='attack'></a>
## Attack

<a id='shadow'></a>
### Define shadow model

We define a shadow model that will mirror the behaviour of the target model. An attacker with black-box knowledge is assumed, who does not know the architecture, model/training parameters of the target model. 

In [38]:
import torch.nn.functional as F
class Linear_Net(nn.Module):
    def __init__(self):
        super(Linear_Net, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output
    
shadow_model = Linear_Net()

optimizer = optim.Adam(shadow_model.parameters())

art_shadow_model = PyTorchClassifier(shadow_model, loss=criterion, optimizer=optimizer, channels_first=True, input_shape=(1,28,28,), nb_classes=10, clip_values=(_min,_max))

<a id='instantiate'></a>
### Instantiate attack

Inputs to the attack:
- target model
- shadow model
- number of shadow models to create
- number of samples that the shadow models train on
- input type ("loss" or "prediction")
- attack model type ("rf" for Random Forest, "gb" for Gradient Boosting, or "nn" for neural network)

You can also define your own custom attacker model and pass it to `attack_model` as an argument (if provided `attack_model_type` is ignored).  

In [39]:
from art.attacks.inference.membership_inference import ShadowModelAttack

shadow_model_atk = ShadowModelAttack(
    art_model,
    shadow_model=art_shadow_model,
    nb_shadow_models=5,
    shadow_dataset_size=3000,
    input_type="prediction",
    attack_model_type="rf"
)

<a id='fit'></a>
### Fit

In [40]:
shadow_model_atk.fit(
    x_train_shadow,
    y_train_shadow,
    x_test,
    y_test,
    nb_epochs=40,
)

Accuracy of shadow model (on test set): 0.919333
Accuracy of shadow model (on test set): 0.908667
Accuracy of shadow model (on test set): 0.908667
Accuracy of shadow model (on test set): 0.924000
Accuracy of shadow model (on test set): 0.913000


<a id='infer'></a>
### Infer membeship on evaluation data

We use the training data of the target model and the testing data for evaluation (10 000 samples each).

In [41]:
membership = [1] * 10000 + [0] * 10000

inferred_membership = shadow_model_atk.infer(np.concatenate([x_train_target[:10000], x_test]), np.concatenate([y_train_target[:10000], y_test]))

In [42]:
from sklearn.metrics import accuracy_score

acc = accuracy_score(membership, inferred_membership)

print("Shadow model attack accuracy: %f" % acc)

Shadow model attack accuracy: 0.695700
