# Attack Utils

This notebook is a helper notebook containing functions required to attack the neural network models in the library. To see the variants of popular adversarial attacks that we developed to attack siamese neural networks, see 'attack_variants.ipynb'.

In [6]:
%run "imports.ipynb"
%run "helper_utils.ipynb"

### Attacking standard convolutional and feed-forward neural networks

The functions in the cell below are used to attack the standard, undefended models and their traditionally adversarially trained counterparts.

In [7]:
"""Attacking standard convolutional and feed-forward neural networks"""

def attack_model(model,original_weights, x, y, defended_weights='', eps=0.2, iter_size=0.05, num_iters=4, lp_norm=np.inf):
    """
    Run this function to attack a CNN model with FGSM, BIM, PGD and MIM, and receive and evaluation. 
    If the weights for defended versions of the model are included, then the defended versions will 
    also be attack and evaluated.
    params:
        Tensorflow model: model
        string: original_weights. The path to the weights of the undefended model.
        np_array: x. Clean x data e.g. MNIST images.
        np_array: y. Clean y labels.
        string: defended_weights. Include a path to alternative weights for the model to see them evaluated.
        float: eps. Attack epsilon (greater epsilon means more likely to produce strong adversarial examples, but less subtle)
        float: iter_size. How much a perturbation will change in each iteration of the iterative attack
        int: num_iters. The number of iterations/steps to the attacks
        lp_norm. The lp norm can be 1, 2 or inf
    Returns:
        list: evals. The list of model evaluations (loss and accuracy)
        list: accs. List of just accuracy scores.
        """
    
    # load the undefended weights and generate adversarial datasets to attack the undefended model
    model.load_weights(original_weights)
    x_fgsm = fast_gradient_method(model, x, eps, lp_norm)
    x_bim = basic_iterative_method(model, x, eps, iter_size, num_iters, lp_norm)
    x_madry = madry_et_al(model, x, eps, iter_size, num_iters, lp_norm)
    x_mim = momentum_iterative_method(model, x, eps)
    
    evals = []
    accs = []
    
    # if a second set of model weights is provided to defend the model, load them and evaluate their performance
    # otherwise, evaluate the original model's performance against the adversarial datasets
    if defended_weights !='':
        model.load_weights(defended_weights)
    
    fgsm_eval = model.evaluate(x_fgsm,y)
    bim_eval = model.evaluate(x_bim, y)
    madry_eval = model.evaluate(x_madry,y)
    mim_eval = model.evaluate(x_mim, y)
    
    evals.append(fgsm_eval)
    evals.append(bim_eval)
    evals.append(madry_eval)
    evals.append(mim_eval)
    
    accs.append(fgsm_eval[1])
    accs.append(bim_eval[1])
    accs.append(madry_eval[1])
    accs.append(mim_eval[1])
    
    return evals, accs

def attack_model_variations(model,original_weights,model_weights,x,y,eps=0.2):
    """
    Attack a standard CNN model and variations of that model with different weights.
    
    Params:
        Tensorflow model: model. The model architecture.
        string: original_weights. A path to the model's weights before any adversarial training or defences.
        list: model_weights. A list of strings containing paths to the model's defended weights.
        np_array: x. The test data that will be used to generate adversarial datasets.
        np_array: y. The labels corresponding to the test data.
        float: eps. The attack epsilon value. This dictates how much each pixel's value can change by in an adversarial attack.
            0.2 is considered a large epsilon on normalised data, thus producing powerful attacks.
            
    Returns:
        list: evals. A list of the evaluation scores for the models against the attacks.
        list: accs. A list of the accuracy of each model variation against each adversarial attack.
    """
    evals = []
    accs = []
    # run the attacks on every single variant of the model's weights
    for weights in model_weights:
        print(weights)
        model.load_weights(weights)
        new_evals, new_accs = attack_model(model, original_weights,x,y,defended_weights=weights,eps=0.2,iter_size=0.05,num_iters=4,lp_norm=np.inf)
        evals.append(new_evals)
        accs.append(new_accs)
    return evals, accs

### Attacking siamese neural networks

The following functions are designed to attack siamese neural networks. The first uses our specially developed attack variants to specifically attack a siamese neural network's image pairs, and the second of which attacks the siamese verification networks with adversarial examples generated for the undefended standard neural networks.

In [8]:
def attack_model_siamese(model, x, y, eps=0.2, iter_size=0.05, num_iters=4, lp_norm=np.inf, loss_fn=my_contrastive_loss, multi=True):
    """
    Run each attack on a given siamese model, and return the evalutation metrics from each attack.
    
    Params:
        Tensorflow model: model.
        np_array: x.
        np_array: y.
        float: eps. Attack epsilon (greater epsilon means more likely to produce strong adversarial examples, but less subtle)
        float: iter_size. How much a perturbation will change in each iteration of the iterative attack
        int: num_iters. The number of iterations/steps to the attacks
        lp_norm. The lp norm can be 1, 2 or inf
        Loss function: loss_fn. Set to contrastive_loss as default as that is what our siamese models use. Only change if you
            use a different loss function.
        bool: multi. Set to true to apply perturbations to both images in the image pair, and false to apply only to a single
            image. Multi= true produces stronger attacks.
    Returns:
        list: evals. The list of model evaluations (loss and accuracy)
        list: accs. List of just accuracy scores. 
    """
    
    # generate adversarial examples for each attack
    x_fgsm = fgsm_siamese(model, x, eps, lp_norm, loss_fn=loss_fn, y=y, multi=True)
    x_fgsm = process_adversarial_output(x_fgsm)
    x_bim = pgd_siamese(model, x, eps,iter_size,num_iters, lp_norm, loss_fn=loss_fn, y=y,rand_init=0, multi=True)
    x_bim = process_adversarial_output(x_bim)
    x_pgd = pgd_siamese(model, x, eps, iter_size, num_iters, lp_norm, loss_fn=loss_fn, y=y, multi=True)
    x_pgd = process_adversarial_output(x_pgd)
    x_mim = mim_siamese(model, x, eps, loss_fn=loss_fn, y=y, multi=True)
    x_mim = process_adversarial_output(x_mim)
    
    evals = []
    accs = []
    
    # evaluate the model against each attack
    fgsm_eval = siamese_model_evaluate(model, x_fgsm, y)
    evals.append(fgsm_eval)
    accs.append(fgsm_eval[1])
    bim_eval = siamese_model_evaluate(model, x_bim, y)
    evals.append(bim_eval)
    accs.append(bim_eval[1])
    pgd_eval = siamese_model_evaluate(model, x_pgd, y)
    evals.append(pgd_eval)
    accs.append(pgd_eval[1])
    mim_eval = siamese_model_evaluate(model, x_mim, y)
    accs.append(mim_eval[1])
    evals.append(mim_eval)
    return evals, accs

def attack_siamese_models(model_weights,datasets,attacks=['FGSM','BIM','PGD','MIM'],threshold=0.5):
    """
    Use this function to attack multiple variations of a siamese neural network model based on our architecture.
    list: model_weights. A list of the different weights of the model that is being evaluated.
    list: datasets. A list of the adversarial datasets to evaluate the siamese models against.
    list: attacks. List of the attacks used. Leave as default unless additional attacks are added.
    int: threshold. The threshold for the siamese similarity metric. If the model's output is lower than the threshold,
        it predicts a true match. We find that 0.4 and 0.5 work best.
    Returns:
        list: siamese_evals. A list of the evaluation scores for the models against the attacks.
        list: siamese_accs. A list of the accuracy of each model variation against each adversarial attack.
    """
    model_architecture = get_siamese_model_architecture(np.asarray(datasets[0][0][0][0]).shape, embedding_dim=128,conv_size=32,audio=False, kernel_size=3)
    siamese_evals = []
    siamese_accs = []
    for weights in model_weights:
        print(weights)
        model_architecture.load_weights(str(weights))
        print('\nModel: ' +str(weights))
        for i in range(0,4):
            accuracy, precision, auc, loss = siamese_model_evaluate(model_architecture,datasets[i][0],datasets[i][1], threshold=threshold)
            print(attacks[i],': Accuracy ',accuracy,'Precision ',precision,'AUC ',auc,' Loss ',loss)
            siamese_evals.append([loss,accuracy])
            siamese_accs.append(accuracy)
            print(accuracy)
    return siamese_evals, siamese_accs

In [9]:
def process_adversarial_output(adv_x):
    """
    After running our siamese attack variants, we have to transpose the data to return it to its original dimensions
    so that it is still accepted by the model it is intended for.
    
    Params:
        list/np_array: adv_x. A list or numpy array of adversarial examples.
    Returns:
        np_array: adv_x. Return the transposed np_array.
    """
    adv_x = adv_x.numpy()
    adv_x = np.transpose(adv_x, (1,0,2,3,4))
    return adv_x

In [10]:
def print_attack_eval(model_eval, attacks = ['FGSM','BIM','PGD','MIM']):
    """
    A clean way to print the evaluation scores of standard CNN models.
    
    Params:
        List of tuples: model_eval. A list containing tuples for loss and accuracy, generated from a model's evaluate function.
        list: attacks. A list of the attacks that the model is evaluated against. Only change from default if new attacks are
            added.
    """
    i = 0
    for attack in attacks:
        print("Model on ",attack," adversarial examples - Accuracy: ",model_eval[i][1]," -  Loss: ",model_eval[i][0])
        i+=1