# Loss-based membership inference
## Intuition
This attack was presented in Yeom et al. https://arxiv.org/abs/1709.01604.
The attack is very simple - the attacker determines _x_ as being a training sample, if its prediction loss is smaller than the average loss of all training samples, otherwise it infers it as a non-training sample. For this the attacker only needs to know a subset of the training samples to estimate their average loss.

## Overview
How to implement the attack using ART.
#### 1. [Preliminaries](#preliminaries)
1. [Load data and target model](#load)
2. [Wrap model in ART classifier wrapper](#wrap)

#### 2. [Attack](#attack)
1. [Instantiate attack](#instantiate)
2. [Fit the attack on shadow data](#fit)
3. [Infer membership on evaluation data](#infer)

In [1]:
import torch
from torch import nn
import numpy as np
from models.mnist import Net

<a id='preliminaries'></a>
## Preliminaries

<a id='load'></a>
### Load data and target model

In [2]:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))

from art.utils import load_mnist

# data
(x_train, y_train), (x_test, y_test), _min, _max = load_mnist(raw=True)

# limit training data to 1000 samples
x_train = np.expand_dims(x_train, axis=1).astype(np.float32)[:1000]
y_train = y_train[:1000]
x_test = np.expand_dims(x_test, axis=1).astype(np.float32)

<a id='wrap'></a>
### Wrap model in PyTorchClassifier

In [3]:
import torch.optim as optim
from art.estimators.classification.pytorch import PyTorchClassifier

model = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

art_model = PyTorchClassifier(model=model, loss=criterion, optimizer=optimizer, channels_first=True, input_shape=(1,28,28,), nb_classes=10, clip_values=(_min,_max))

### Fit model if not already pretrained

In [4]:
art_model.fit(x_train, y_train, nb_epochs=10)

#### Train accuracy of the target model

In [5]:
pred = np.array([np.argmax(arr) for arr in art_model.predict(x_test)])

print('Base model accuracy: ', np.sum(pred == y_test) / len(y_test))

Base model accuracy:  0.8549


#### Test accuracy of the target model

In [6]:
pred = np.array([np.argmax(arr) for arr in art_model.predict(x_train)])

print('Base model accuracy: ', np.sum(pred == y_train) / len(y_train))

Base model accuracy:  0.944


<a id='attack'></a>
## Attack

<a id='instantiate'></a>
### Instantiate attack

Inputs to the attack:
- target model

In [7]:
from art.attacks.inference.membership_inference import MembershipInferenceBlackBoxLossBased

loss_based_attack = MembershipInferenceBlackBoxLossBased(art_model)

<a id='fit'></a>
### Fit

The attacker knows 500 (out of the 1000) training samples.

In [8]:
attacker_data_size = 500

x_atk = x_train[:attacker_data_size]
y_atk = y_train[:attacker_data_size]

In [9]:
loss_based_attack.fit(x_atk, y_atk)

<a id='infer'></a>
### Infer membeship on evaluation data

On 500 training and 500 testing samples.

In [10]:
from numpy.random import choice
# evaluation data
n = 500
eval_train_idx = choice(len(x_train), n)
eval_test_idx = choice(len(x_test), n)
x = np.concatenate([x_train[eval_train_idx], x_test[eval_test_idx]])
y = np.concatenate([y_train[eval_train_idx], y_test[eval_test_idx]])
eval_label = np.array([1] * n + [0] * n)

In [11]:
pred_label = loss_based_attack.infer(x, y)

#### Attack accuracy

In [12]:
from sklearn.metrics import accuracy_score

print("Accuracy: %f" % accuracy_score(eval_label, pred_label))

Accuracy: 0.565000
