# DeepDOT: Identifying Backdoor Attack on Neural Network

## Introduction

Neural networks, by design, lack transparency. This means that their internal functioning cannot be visualized directly by looking at the network itself. Deep neural networks are multi-layered models made up of neurons with thousands, possibly millions of inter-connected 'synapses'.

This makes it susceptible to attacks, where certain 'connections' between the neurons (and the weights of these neurons) supersede normal functioning of the network to produce unexpected or unwanted classification results.

For example, an object recognition model with a backdoor always identifies the input as a 'penguin' if a specific symbol is present in the input.

This has serious security implications, especially today when many mainstream products in the market are now adopting neural networks. This includes self-driving cars and biometric authentication systems.

We discuss a system for detecting backdoor attacks in Deep Neural Networks.

![Illustration of a backdoor attack](https://i.imgur.com/mSfZSYP.png)

<small>Fig: An illustration of backdoor attack. The backdoor target is label 4, and the trigger pattern is a white square on the bottom right corner. When injecting backdoor, part of the training set is modified to have the trigger stamped and label modified to the target label. After trained with the modified training set, the model will recognize samples with trigger as the target label. Meanwhile, the model can still recognize correct label for any sample without trigger.</small>

Necessary Imports

In [0]:
import os
import time

In [0]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
base_path = "/content/drive/My Drive/Sem 7/DeepDOT/detection/"

Mounted at /content/drive


In [0]:
%tensorflow_version 1.x
import tensorflow as tf
tf.test.gpu_device_name()

'/device:GPU:0'

In [0]:
import numpy as np
import random
%tensorflow_version 1.x
from tensorflow import set_random_seed
random.seed(123)
np.random.seed(123)
set_random_seed(123)

In [0]:
from keras.models import load_model
from keras.preprocessing.image import ImageDataGenerator

In [0]:
from visualizer import Visualizer
import utils_backdoor

## Part 1: Define Parameters

We define a few parameters that will be used in the detection process. These include the GPU device to use, path to the model on which suspected attack has occurred, and parameters for optimization of trigger (discussed below).

The comment after each line provides more details about the parameter.

In [0]:
DEVICE = '0'  # GPU Device to use

DATA_DIR = base_path + 'data'  # dataset folder
DATA_FILE = 'gtsrb_dataset_int.h5'  # dataset file
MODEL_DIR = base_path + 'models'  # model directory
MODEL_FILENAME = 'gtsrb_bottom_right_white_4_target_33.h5'  # model file
RESULT_DIR = base_path + 'results_gpu'  # directory for storing results
# image filename template for visualization results
IMG_FILENAME_TEMPLATE = 'gtsrb_visualize_%s_label_%d.png'

# input size
IMG_ROWS = 32
IMG_COLS = 32
IMG_COLOR = 3
INPUT_SHAPE = (IMG_ROWS, IMG_COLS, IMG_COLOR)

NUM_CLASSES = 43  # total number of classes in the model
# (optional) infected target label, used for prioritizing label scanning
# NOTE that this will only cause the label to be scanned in the beginning
# and not defining this will have no effect on the final results.
Y_TARGET = 33

 # preprocessing method for the task, GTSRB uses raw pixel intensities
INTENSITY_RANGE = 'raw' 

# parameters for optimization
BATCH_SIZE = 32  # batch size used for optimization
LR = 0.1  # learning rate
STEPS = 500  # total optimization iterations
NB_SAMPLE = 500  # number of samples in each mini batch
MINI_BATCH = NB_SAMPLE // BATCH_SIZE  # mini batch size used for early stop
INIT_COST = 1e-3  # initial weight used for balancing two objectives

REGULARIZATION = 'l1'  # reg term to control the mask's norm

ATTACK_SUCC_THRESHOLD = 0.99  # attack success threshold of the reversed attack
PATIENCE = 5  # patience for adjusting weight, number of mini batches
COST_MULTIPLIER = 2  # multiplier for auto-control of weight (COST)
SAVE_LAST = False  # whether to save the last result or best result

EARLY_STOP = True  # whether to early stop
EARLY_STOP_THRESHOLD = 1.0  # loss threshold for early stop
EARLY_STOP_PATIENCE = 5 * PATIENCE  # patience for early stop

# the following part is not used in our experiment
# but our code implementation also supports super-pixel mask
UPSAMPLE_SIZE = 1  # size of the super pixel
MASK_SHAPE = np.ceil(np.array(INPUT_SHAPE[0:2], dtype=float) / UPSAMPLE_SIZE)
MASK_SHAPE = MASK_SHAPE.astype(int)

##############################
#      END PARAMETERS        #
##############################

## Part 2: Load Dataset

We load the dataset, store the X and Y test data in their respective variables, and return.

In [0]:
def load_dataset(data_file=('%s/%s' % (DATA_DIR, DATA_FILE))):

    dataset = utils_backdoor.load_dataset(data_file, keys=['X_test', 'Y_test'])

    X_test = np.array(dataset['X_test'], dtype='float32')
    Y_test = np.array(dataset['Y_test'], dtype='float32')

    print('X_test shape %s' % str(X_test.shape))
    print('Y_test shape %s' % str(Y_test.shape))

    return X_test, Y_test


In [0]:
def build_data_loader(X, Y):

    datagen = ImageDataGenerator()
    generator = datagen.flow(
        X, Y, batch_size=BATCH_SIZE)

    return generator

## Part 3: Visualize and Optimize Trigger

This is done in the following three steps <small>(Wang, Bolun, et al. "Neural cleanse: Identifying and mitigating backdoor attacks in neural networks." Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks (2019): 0)</small>:

1. For every given label, it is treated as a potential target of a backdoor attack. We use an optimization scheme (discussed later) to design a 'minimal' trigger that will cause all samples from other labels to be classified into the target label. For example, in a model with visual input, this would be the smallest collection of pixels that would lead to misclassification.

2. This step is repeated for each label in the model. For $N$ labels, $N$ 'minimal' triggers will be generated.

3. After generating the 'minimal' triggers for each label, the size of the triggers are measured (this is the number of pixels for an image, for example). We then perform _outlier detection_ to check for triggers that are significantly smaller than others. A significant outlier will represent a real trigger, and this would be the target label of a backdoor attack.

In [0]:
After generating the 'minimal' triggers for each label, the size of the triggers are measured (this is the number of pixels for an image, for example). We then perform outlier detection to check for triggers that are significantly smaller than others. A significant outlier will represent a real trigger, and this would be the target label of a backdoor attack.

## Optimization Scheme



In [0]:
def save_pattern(pattern, mask, y_target):

    # create result dir
    if not os.path.exists(RESULT_DIR):
        os.mkdir(RESULT_DIR)

    img_filename = (
        '%s/%s' % (RESULT_DIR,
                   IMG_FILENAME_TEMPLATE % ('pattern', y_target)))
    utils_backdoor.dump_image(pattern, img_filename, 'png')

    img_filename = (
        '%s/%s' % (RESULT_DIR,
                   IMG_FILENAME_TEMPLATE % ('mask', y_target)))
    utils_backdoor.dump_image(np.expand_dims(mask, axis=2) * 255,
                              img_filename,
                              'png')

    fusion = np.multiply(pattern, np.expand_dims(mask, axis=2))
    img_filename = (
        '%s/%s' % (RESULT_DIR,
                   IMG_FILENAME_TEMPLATE % ('fusion', y_target)))
    utils_backdoor.dump_image(fusion, img_filename, 'png')

    pass


In [0]:
def gtsrb_visualize_label_scan_bottom_right_white_4():

    print('loading dataset')
    X_test, Y_test = load_dataset()
    # transform numpy arrays into data generator
    test_generator = build_data_loader(X_test, Y_test)

    print('loading model')
    model_file = '%s/%s' % (MODEL_DIR, MODEL_FILENAME)
    model = load_model(model_file)

    # initialize visualizer
    visualizer = Visualizer(
        model, intensity_range=INTENSITY_RANGE, regularization=REGULARIZATION,
        input_shape=INPUT_SHAPE,
        init_cost=INIT_COST, steps=STEPS, lr=LR, num_classes=NUM_CLASSES,
        mini_batch=MINI_BATCH,
        upsample_size=UPSAMPLE_SIZE,
        attack_succ_threshold=ATTACK_SUCC_THRESHOLD,
        patience=PATIENCE, cost_multiplier=COST_MULTIPLIER,
        img_color=IMG_COLOR, batch_size=BATCH_SIZE, verbose=2,
        save_last=SAVE_LAST,
        early_stop=EARLY_STOP, early_stop_threshold=EARLY_STOP_THRESHOLD,
        early_stop_patience=EARLY_STOP_PATIENCE)

    log_mapping = {}

    # y_label list to analyze
    y_target_list = list(range(NUM_CLASSES))
    y_target_list.remove(Y_TARGET)
    y_target_list = [Y_TARGET] + y_target_list
    for y_target in y_target_list:

        print('processing label %d' % y_target)

        _, _, logs = visualize_trigger_w_mask(
            visualizer, test_generator, y_target=y_target,
            save_pattern_flag=True)

        log_mapping[y_target] = logs

    pass


## Putting it together

We define the GPU to use, and call `main` which starts the minimal trigger generation process on our model.

In [0]:
def main():
    os.environ["CUDA_VISIBLE_DEVICES"] = DEVICE
    utils_backdoor.fix_gpu_memory()
    gtsrb_visualize_label_scan_bottom_right_white_4()

    pass


In [0]:
start_time = time.time()
main()
elapsed_time = time.time() - start_time
print('elapsed time %s s' % elapsed_time)

loading dataset
X_test shape (12630, 32, 32, 3)
Y_test shape (12630, 43)
loading model
processing label 33
resetting state
mask_tanh -3.672258450327029 3.5073656483843307
pattern_tanh -3.9436355297972994 4.010161127999756
step:   0, cost: 0.00E+00, attack: 0.794, loss: 3.387233, ce: 3.387233, reg: 530.657349, reg_best: inf
step:   1, cost: 0.00E+00, attack: 1.000, loss: 0.000000, ce: 0.000000, reg: 540.151001, reg_best: 540.151001
step:   2, cost: 0.00E+00, attack: 1.000, loss: 0.000000, ce: 0.000000, reg: 540.162476, reg_best: 540.151001
step:   3, cost: 0.00E+00, attack: 1.000, loss: 0.000000, ce: 0.000000, reg: 540.163147, reg_best: 540.151001
step:   4, cost: 0.00E+00, attack: 1.000, loss: 0.000000, ce: 0.000000, reg: 540.164124, reg_best: 540.151001
step:   5, cost: 0.00E+00, attack: 1.000, loss: 0.000000, ce: 0.000000, reg: 540.164185, reg_best: 540.151001
initialize cost to 1.00E-03
step:   6, cost: 1.00E-03, attack: 1.000, loss: 0.343191, ce: 0.002346, reg: 340.845123, reg_best

## What's next

As discussed in 'Step 3' above, we will now use an outlier detection algorithm to check for triggers that are significantly smaller than others. The label associated with this trigger would most likely be the target label for the backdoor attack.